训练SlowFast¶
训练代码¶
https://github.com/wojiazaiyugang/SlowFast
训练环境¶
docker pull wojiazaiyugang/slow-fast
python setup.py build develop
数据集准备¶
参考原仓库的数据集准备 https://github.com/facebookresearch/SlowFast/blob/master/slowfast/datasets/DATASET.md 有两种数据集格式,一种是Kinetics类型的动作识别,也就是视频分类任务。另外一种是AVA类型的时空动作检测,暂时只使用Kinetics格式的数据
Kinetics¶
数据集目录结构,使用scripts/datasets/generate_slowfast_label.py
生成标注文件,注意标注文件包含目录结构
root@senseport-2080ti:/home/senseport0/Workspace/SlowFast/data# tree -L 1 basketball_action
basketball_action
├── basketball_action
├── test.csv
├── train.csv
└── val.csv
1 directory, 3 files
把训练、验证和测试视频准备好。视频貌似没什么要求,什么分辨率都可以,然后整三个标注文件train.txt
、val.txt
和test.txt
,类别从0开始
basketball_action/974d9f4b787128a20635ce87a863456d.mp4 48
basketball_action/d0251d4fb3d90a95785961c2774cff78.mp4 48
basketball_action/1b14f68770db020a46b1d01d991d2259.mp4 48
basketball_action/fee540175564faf22ad6f0aa6bcb6109.mp4 62
basketball_action/0f42ac483cb18d92c979862a86caf5e3.mp4 62
basketball_action/b73b59c5b6f1a6f90c40dc5100716ebb.mp4 62
basketball_action/e16dcb985520412299a15dce48f56173.mp4 62
basketball_action/bfd3faaba72148df48b8729ae8d4eee5.mp4 49
basketball_action/1922b2a51c28c2d3f137e7b9f326006c.mp4 49
...
每行分别是一个视频和对应的类别id。
配置文件¶
在configs
里添加对应的配置文件,文件夹是数据集,里面是配置文件
root@senseport-2080ti:/home/senseport0/Workspace/SlowFast# tree -L 2 configs/
configs/
├── AVA
│ ├── c2
│ ├── SLOW_8x8_R50_SHORT.yaml
│ └── SLOWFAST_32x2_R50_SHORT.yaml
├── BasketballAction
│ ├── SLOWFAST_4x16_R50.yaml
│ └── SLOWFAST_8x8_R50.yaml
├── Charades
│ ├── pytorchvideo
│ ├── SLOWFAST_16x8_R50_multigrid.yaml
│ └── SLOWFAST_16x8_R50.yaml
├── ImageNet
│ ├── MVIT_B_16_CONV.yaml
│ └── RES_R50.yaml
├── Kinetics
│ ├── c2
│ ├── C2D_8x8_R50_IN1K.yaml
│ ├── C2D_8x8_R50.yaml
│ ├── C2D_NLN_8x8_R50_IN1K.yaml
│ ├── C2D_NLN_8x8_R50.yaml
│ ├── I3D_8x8_R101.yaml
│ ├── I3D_8x8_R50_IN1K.yaml
│ ├── I3D_8x8_R50.yaml
│ ├── I3D_NLN_8x8_R101.yaml
│ ├── I3D_NLN_8x8_R50_IN1K.yaml
│ ├── I3D_NLN_8x8_R50.yaml
│ ├── MVIT_B_16x4_CONV.yaml
│ ├── MVIT_B_32x3_CONV.yaml
│ ├── pytorchvideo
│ ├── SLOW_4x16_R50.yaml
│ ├── SLOW_8x8_R50.yaml
│ ├── SLOWFAST_4x16_R50.yaml
│ ├── SLOWFAST_8x8_R50_stepwise_multigrid.yaml
│ ├── SLOWFAST_8x8_R50_stepwise.yaml
│ ├── SLOWFAST_8x8_R50.yaml
│ ├── SLOWFAST_NLN_4x16_R50.yaml
│ ├── SLOWFAST_NLN_8x8_R50.yaml
│ ├── SLOW_NLN_4x16_R50.yaml
│ ├── SLOW_NLN_8x8_R50.yaml
│ ├── X3D_L.yaml
│ ├── X3D_M.yaml
│ ├── X3D_S.yaml
│ └── X3D_XS.yaml
├── Kth
│ └── kth.yaml
└── SSv2
├── pytorchvideo
├── SLOWFAST_16x8_R50_multigrid.yaml
└── SLOWFAST_16x8_R50.yaml
12 directories, 37 files
对于配置文件configs/BasketballAction/SLOWFAST_4x16_R50.yaml
TRAIN:
ENABLE: True # 是否开启训练
DATASET: kinetics
BATCH_SIZE: 8
EVAL_PERIOD: 10
CHECKPOINT_PERIOD: 1 # resume训练开始的epoch
CHECKPOINT_EPOCH_RESET: True # 如果resume训练,是否重置epoch,如果要增量训练要把这个改成True,否则直接从pth读出来之前的epoch了
AUTO_RESUME: False # 是否自动继续训练,True的话会自动去checkpoints文件夹找最新的checkpoint恢复训练
# CHECKPOINT_FILE_PATH: # 手动恢复训练的checkpoint
DATA:
NUM_FRAMES: 32 # fast一个clip的帧数
SAMPLING_RATE: 2 # fast采样速率 NUM_FRAMES*SAMPLING_RATE = 32*2 = 64表示进行一次推理需要原始帧64帧
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 256
INPUT_CHANNEL_NUM: [3, 3]
PATH_PREFIX: /workspace/SlowFast/data/basketball_action/ # 数据路径前缀,这个路径拼上标注文件里的路径形成数据完整路径,即/workspace/SlowFast/data/basketball_action/basketball_action/xxx.mp4
PATH_TO_DATA_DIR: /workspace/SlowFast/data/basketball_action/ # 标注文件路径,即在这个路径下去找train.csv test.csv和val.csv
SLOWFAST:
ALPHA: 8 # slow和fast通道数据采样倍数,NUM_FRAMES=32表示fast一个clip要32个,那slow一个clip就要32/8=4个,也就是这个配置文件4*16中的4,另一个参数就是64/4=16
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 5
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 50
TRANS_FUNC: bottleneck_transform
STRIDE_1X1: False
NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [2, 2]]
SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [1, 1]]
NONLOCAL:
LOCATION: [[[], []], [[], []], [[], []], [[], []]]
GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
INSTANTIATION: dot_product
BN:
USE_PRECISE_STATS: True
NUM_BATCHES_PRECISE: 200
SOLVER:
BASE_LR: 0.1
LR_POLICY: cosine
MAX_EPOCH: 196
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-4
WARMUP_EPOCHS: 34.0
WARMUP_START_LR: 0.01
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 66 # 分类类别数。这里应该有18个类,但是打标签的时候类别id都加了48,所以这里改成了66不然会报错。这个类别数不能小于5,不然代码里会报错,比如只有两类,这里也可以写5,标注文件里没有后三个类别就行了
ARCH: slowfast
MODEL_NAME: SlowFast
LOSS_FUNC: cross_entropy
DROPOUT_RATE: 0.5
TEST:
ENABLE: False # 是否执行test
DATASET: kinetics
BATCH_SIZE: 64
DATA_LOADER:
NUM_WORKERS: 8
PIN_MEMORY: True
NUM_GPUS: 1 # 改成1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
从头训练¶
CHECKPOINT_FILE_PATH
注释
python tools/run_net.py --cfg configs/BasketballAction/SLOWFAST_4x16_R50.yaml
继续训练¶
同训练命令,把配置文件中的CHECKPOINT_PERIOD
改成1,CHECKPOINT_EPOCH_RESET
改成True
,CHECKPOINT_FILE_PATH
改成要resume的权重文件
测试¶
同训练命令,把配置文件中的TRAIN下的ENABLE关闭,把TEST的ENABLE打开即可。配置文件中TRAIN中的CHECKPOINT_FILE_PATH
是要测试的模型
转onnx¶
python tools/export_model_to_onnx.py --cfg configs/ShootAction/SLOWFAST_4x16_R50.yaml --checkpoint checkpoints/checkpoint_epoch_00190.pyth --save test.onnx
fast_pathway = torch.randn(1, 3, 32, 256, 256)
slow_pathway = torch.randn(1, 3, 4, 256, 256)