3d-cnn for facial micro-and macro-expression spotting on long video sequences using temporal oriented reference frame
Proceedings of the 30th ACM International Conference on Multimedia, 2022•dl.acm.org
Facial expression spotting is the preliminary step for micro-and macro-expression analysis.
The task of reliably spotting such expressions in video sequences is currently unsolved.
Current best systems depend upon optical flow methods to extract regional motion features,
before categorisation of that motion into a specific class of facial movement. Optical flow is
susceptible to drift error, which introduces a serious problem for motions with long-term
dependencies, such as high frame-rate macro-expression. We propose a purely deep …
The task of reliably spotting such expressions in video sequences is currently unsolved.
Current best systems depend upon optical flow methods to extract regional motion features,
before categorisation of that motion into a specific class of facial movement. Optical flow is
susceptible to drift error, which introduces a serious problem for motions with long-term
dependencies, such as high frame-rate macro-expression. We propose a purely deep …
Facial expression spotting is the preliminary step for micro- and macro-expression analysis. The task of reliably spotting such expressions in video sequences is currently unsolved. Current best systems depend upon optical flow methods to extract regional motion features, before categorisation of that motion into a specific class of facial movement. Optical flow is susceptible to drift error, which introduces a serious problem for motions with long-term dependencies, such as high frame-rate macro-expression. We propose a purely deep learning solution which, rather than tracking frame differential motion, compares via a convolutional model, each frame with two temporally local reference frames. Reference frames are sampled according to calculated micro- and macro-expression duration. As baseline for MEGC2021 using leave-one-subject-out evaluation method, we show that our solution performed better in a high frame-rate (200 fps) SAMM long videos dataset (SAMM-LV) than a low frame-rate (30 fps) (CAS(ME)2) dataset. We introduce a new unseen dataset for MEGC2022 challenge (MEGC2022-testSet) and achieves F1-Score of 0.1531 as baseline result.
ACM Digital Library