跳转至

训练SlowFast

训练代码

https://github.com/wojiazaiyugang/SlowFast

训练环境

docker pull wojiazaiyugang/slow-fast
进入docker后进入SlowFast文件夹安装
python setup.py build develop

数据集准备

参考原仓库的数据集准备 https://github.com/facebookresearch/SlowFast/blob/master/slowfast/datasets/DATASET.md 有两种数据集格式,一种是Kinetics类型的动作识别,也就是视频分类任务。另外一种是AVA类型的时空动作检测,暂时只使用Kinetics格式的数据

Kinetics

数据集目录结构,使用scripts/datasets/generate_slowfast_label.py生成标注文件,注意标注文件包含目录结构

root@senseport-2080ti:/home/senseport0/Workspace/SlowFast/data# tree -L 1 basketball_action
basketball_action
├── basketball_action
├── test.csv
├── train.csv
└── val.csv

1 directory, 3 files

把训练、验证和测试视频准备好。视频貌似没什么要求,什么分辨率都可以,然后整三个标注文件train.txtval.txttest.txt,类别从0开始

basketball_action/974d9f4b787128a20635ce87a863456d.mp4 48
basketball_action/d0251d4fb3d90a95785961c2774cff78.mp4 48
basketball_action/1b14f68770db020a46b1d01d991d2259.mp4 48
basketball_action/fee540175564faf22ad6f0aa6bcb6109.mp4 62
basketball_action/0f42ac483cb18d92c979862a86caf5e3.mp4 62
basketball_action/b73b59c5b6f1a6f90c40dc5100716ebb.mp4 62
basketball_action/e16dcb985520412299a15dce48f56173.mp4 62
basketball_action/bfd3faaba72148df48b8729ae8d4eee5.mp4 49
basketball_action/1922b2a51c28c2d3f137e7b9f326006c.mp4 49
...

每行分别是一个视频和对应的类别id。

配置文件

configs里添加对应的配置文件,文件夹是数据集,里面是配置文件

root@senseport-2080ti:/home/senseport0/Workspace/SlowFast# tree -L 2 configs/
configs/
├── AVA
│   ├── c2
│   ├── SLOW_8x8_R50_SHORT.yaml
│   └── SLOWFAST_32x2_R50_SHORT.yaml
├── BasketballAction
│   ├── SLOWFAST_4x16_R50.yaml
│   └── SLOWFAST_8x8_R50.yaml
├── Charades
│   ├── pytorchvideo
│   ├── SLOWFAST_16x8_R50_multigrid.yaml
│   └── SLOWFAST_16x8_R50.yaml
├── ImageNet
│   ├── MVIT_B_16_CONV.yaml
│   └── RES_R50.yaml
├── Kinetics
│   ├── c2
│   ├── C2D_8x8_R50_IN1K.yaml
│   ├── C2D_8x8_R50.yaml
│   ├── C2D_NLN_8x8_R50_IN1K.yaml
│   ├── C2D_NLN_8x8_R50.yaml
│   ├── I3D_8x8_R101.yaml
│   ├── I3D_8x8_R50_IN1K.yaml
│   ├── I3D_8x8_R50.yaml
│   ├── I3D_NLN_8x8_R101.yaml
│   ├── I3D_NLN_8x8_R50_IN1K.yaml
│   ├── I3D_NLN_8x8_R50.yaml
│   ├── MVIT_B_16x4_CONV.yaml
│   ├── MVIT_B_32x3_CONV.yaml
│   ├── pytorchvideo
│   ├── SLOW_4x16_R50.yaml
│   ├── SLOW_8x8_R50.yaml
│   ├── SLOWFAST_4x16_R50.yaml
│   ├── SLOWFAST_8x8_R50_stepwise_multigrid.yaml
│   ├── SLOWFAST_8x8_R50_stepwise.yaml
│   ├── SLOWFAST_8x8_R50.yaml
│   ├── SLOWFAST_NLN_4x16_R50.yaml
│   ├── SLOWFAST_NLN_8x8_R50.yaml
│   ├── SLOW_NLN_4x16_R50.yaml
│   ├── SLOW_NLN_8x8_R50.yaml
│   ├── X3D_L.yaml
│   ├── X3D_M.yaml
│   ├── X3D_S.yaml
│   └── X3D_XS.yaml
├── Kth
│   └── kth.yaml
└── SSv2
    ├── pytorchvideo
    ├── SLOWFAST_16x8_R50_multigrid.yaml
    └── SLOWFAST_16x8_R50.yaml

12 directories, 37 files

对于配置文件configs/BasketballAction/SLOWFAST_4x16_R50.yaml

TRAIN:
  ENABLE: True # 是否开启训练
  DATASET: kinetics
  BATCH_SIZE: 8
  EVAL_PERIOD: 10
  CHECKPOINT_PERIOD: 1 # resume训练开始的epoch
  CHECKPOINT_EPOCH_RESET: True # 如果resume训练,是否重置epoch,如果要增量训练要把这个改成True,否则直接从pth读出来之前的epoch了
  AUTO_RESUME: False # 是否自动继续训练,True的话会自动去checkpoints文件夹找最新的checkpoint恢复训练
#  CHECKPOINT_FILE_PATH: # 手动恢复训练的checkpoint
DATA:
  NUM_FRAMES: 32 # fast一个clip的帧数
  SAMPLING_RATE: 2 # fast采样速率 NUM_FRAMES*SAMPLING_RATE = 32*2 = 64表示进行一次推理需要原始帧64帧
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 256
  INPUT_CHANNEL_NUM: [3, 3]
  PATH_PREFIX: /workspace/SlowFast/data/basketball_action/ # 数据路径前缀,这个路径拼上标注文件里的路径形成数据完整路径,即/workspace/SlowFast/data/basketball_action/basketball_action/xxx.mp4
  PATH_TO_DATA_DIR: /workspace/SlowFast/data/basketball_action/ # 标注文件路径,即在这个路径下去找train.csv test.csv和val.csv
SLOWFAST:
  ALPHA: 8 # slow和fast通道数据采样倍数,NUM_FRAMES=32表示fast一个clip要32个,那slow一个clip就要32/8=4个,也就是这个配置文件4*16中的4,另一个参数就是64/4=16
  BETA_INV: 8
  FUSION_CONV_CHANNEL_RATIO: 2
  FUSION_KERNEL_SZ: 5
RESNET:
  ZERO_INIT_FINAL_BN: True
  WIDTH_PER_GROUP: 64
  NUM_GROUPS: 1
  DEPTH: 50
  TRANS_FUNC: bottleneck_transform
  STRIDE_1X1: False
  NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
  SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [2, 2]]
  SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [1, 1]]
NONLOCAL:
  LOCATION: [[[], []], [[], []], [[], []], [[], []]]
  GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
  INSTANTIATION: dot_product
BN:
  USE_PRECISE_STATS: True
  NUM_BATCHES_PRECISE: 200
SOLVER:
  BASE_LR: 0.1
  LR_POLICY: cosine
  MAX_EPOCH: 196
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-4
  WARMUP_EPOCHS: 34.0
  WARMUP_START_LR: 0.01
  OPTIMIZING_METHOD: sgd
MODEL:
  NUM_CLASSES: 66 # 分类类别数。这里应该有18个类,但是打标签的时候类别id都加了48,所以这里改成了66不然会报错。这个类别数不能小于5,不然代码里会报错,比如只有两类,这里也可以写5,标注文件里没有后三个类别就行了
  ARCH: slowfast
  MODEL_NAME: SlowFast
  LOSS_FUNC: cross_entropy
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: False # 是否执行test
  DATASET: kinetics
  BATCH_SIZE: 64
DATA_LOADER:
  NUM_WORKERS: 8
  PIN_MEMORY: True
NUM_GPUS: 1 # 改成1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .

从头训练

CHECKPOINT_FILE_PATH注释

python tools/run_net.py --cfg configs/BasketballAction/SLOWFAST_4x16_R50.yaml

继续训练

同训练命令,把配置文件中的CHECKPOINT_PERIOD改成1,CHECKPOINT_EPOCH_RESET改成TrueCHECKPOINT_FILE_PATH改成要resume的权重文件

测试

同训练命令,把配置文件中的TRAIN下的ENABLE关闭,把TEST的ENABLE打开即可。配置文件中TRAIN中的CHECKPOINT_FILE_PATH是要测试的模型

转onnx

python tools/export_model_to_onnx.py --cfg configs/ShootAction/SLOWFAST_4x16_R50.yaml --checkpoint checkpoints/checkpoint_epoch_00190.pyth --save test.onnx
注意把里面
fast_pathway = torch.randn(1, 3, 32, 256, 256)
slow_pathway = torch.randn(1, 3, 4, 256, 256)
中的32和4分别改成fast和slow的所需帧数。

评论