Abstract
Motor behaviour analysis is essential to biomedical research and clinical diagnostics as it provides a non-invasive strategy for identifying motor impairment and its change caused by interventions. State-of-the-art instrumented movement analysis is time- and cost-intensive, because it requires the placement of physical or virtual markers. As well as the effort required for marking the keypoints or annotations necessary for training or fine-tuning a detector, users need to know the interesting behaviour beforehand to provide meaningful keypoints. Here, we introduce unsupervised behaviour analysis and magnification (uBAM), an automatic deep learning algorithm for analysing behaviour by discovering and magnifying deviations. A central aspect is unsupervised learning of posture and behaviour representations to enable an objective comparison of movement. Besides discovering and quantifying deviations in behaviour, we also propose a generative model for visually magnifying subtle behaviour differences directly in a video without requiring a detour via keypoints or annotations. Essential for this magnification of deviations, even across different individuals, is a disentangling of appearance and behaviour. Evaluations on rodents and human patients with neurological diseases demonstrate the wide applicability of our approach. Moreover, combining optogenetic stimulation with our unsupervised behaviour analysis shows its suitability as a non-invasive diagnostic tool correlating function to brain plasticity.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The rat data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Rats.zip. The optogenetics data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Optogenetics.zip. The mice data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Mice.zip. The human dataset cannot be publicly released because of privacy issues (please contact the authors if needed).
Code availability
The code for training and evaluating our models is publicly available on GitHub at the following address: https://github.com/utabuechler/uBAM (ref. 59).
References
Berman, G. J. Measuring behavior across scales. BMC Biol. 16, 23 (2018).
Filli, L. et al. Profiling walking dysfunction in multiple sclerosis: characterisation, classification and progression over time. Sci. Rep. 8, 4984 (2018).
Vargas-Irwin, C. E. et al. Decoding complete reach and grasp actions from local primary motor cortex populations. J. Neurosci. 30, 9659–9669 (2010).
Loper, M. M., Mahmood, N. & Black, M. J. {MoSh}: motion and shape capture from sparse markers. ACM Trans. Graph. 33, 220:1–220:13 (2014).
Huang, Y. et al. Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. 37, 185:1–185:15 (2018).
Robie, A. A., Seagraves, K. M., Egnor, S. R. & Branson, K. Machine vision methods for analyzing social interactions. J. Exp. Biol. 220, 25–34 (2017).
Dell, A. I. et al. Automated image-based tracking and its application in ecology. Trends Ecol. Evol. 29, 417–428 (2014).
Peters, S. M. et al. Novel approach to automatically classify rat social behavior using a video tracking system. J. Neurosci. Methods 268, 163–170 (2016).
Arac, A., Zhao, P., Dobkin, B. H., Carmichael, S. T. & Golshani, P. DeepBehavior: a deep learning toolbox for automated analysis of animal and human behavior imaging data. Front. Syst. Neurosci. 13, 20 (2019).
Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).
Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Simon, T., Joo, H., Matthews, I. & Sheikh, Y. Hand keypoint detection in single images using multiview bootstrapping. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1145–1153 (IEEE, 2017).
Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).
Mathis, M. W. & Mathis, A. Deep learning tools for the measurement of animal behavior in neuroscience. Curr. Opin. Neurobiol. 60, 1–11 (2020).
Mu, J., Qiu, W., Hager, G. D. & Yuille, A. L. Learning from synthetic animals. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12386–12395 (IEEE, 2020).
Li, S. et al. Deformation-aware unpaired image translation for pose estimation on laboratory animals. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 13158–13168 (IEEE, 2020).
Sanakoyeu, A., Khalidov, V., McCarthy, M. S., Vedaldi, A. & Neverova, N. Transferring dense pose to proximal animal classes. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 5233–5242 (IEEE, 2020).
Kocabas, M., Athanasiou, N. & Black, M. J. Vibe: video inference for human body pose and shape estimation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 5253–5263 (IEEE, 2020).
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G. & Black, M. J. SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015).
Zuffi, S., Kanazawa, A., Berger-Wolf, T. & Black, M. J. Three-D Safari: learning to estimate zebra pose, shape and texture from images ‘in the wild’. In Proc. IEEE/CVF International Conference on Computer Vision 5359–5368 (IEEE, 2019).
Habermann, M., Xu, W., Zollhofer, M., Pons-Moll, G. & Theobalt, C. DeepCap: monocular human performance capture using weak supervision. In Proc IEEE/CVF Conference on Computer Vision and Pattern Recognition 5052–5063 (IEEE, 2020).
Batty, E. et al. BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In Advances in Neural Information Processing Systems 15680–15691 (NIPS, 2019).
Ryait, H. et al. Data-driven analyses of motor impairments in animal models of neurological disorders. PLoS Biol. 17, 1–30 (2019).
Kabra, M., Robie, A. A., Rivera-Alba, M., Branson, S. & Branson, K. JAABA: interactive machine learning for automatic annotation of animal behavior. Nat. Methods 10, 64–67 (2012).
Brattoli, B., Büchler, U., Wahl, A. S., Schwab, M. E. & Ommer, B. LSTM self-supervision for detailed behavior analysis. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 3747–3756 (IEEE, 2017).
Büchler, U., Brattoli, B. & Ommer, B. Improving spatiotemporal self-supervision by deep reinforcement learning. In Proc. IEEE/ECVF European Conference on Computer Vision 770–776 (IEEE, 2017).
Noroozi, M. & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proc. IEEE/ECVF European Conference on Computer Vision 69–84 (IEEE, 2016).
Lee, H. Y., Huang, J. B., Singh, M. K. & Yang, M. H. Unsupervised representation learning by sorting sequences. In Proc. IEEE/ECVF International Conference on Computer Vision 667–676 (IEEE, 2017).
Oh, T. H. et al. Learning-based video motion magnification. In Proc. IEEE/CVF European Conference on Computer Vision 633–648 (IEEE, 2018).
Liu, C., Torralba, A., Freeman, W. T., Durand, F. & Adelson, E. H. Motion magnification. ACM Trans. Graph 24, 519–526 (2005).
Wu, H. Y. et al. Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph 31, 65 (2012).
Elgharib, M., Hefeeda, M., Durand, F. & Freeman, W. T. Video magnification in presence of large motions. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4119–4127 (IEEE, 2015).
Wadhwa, N., Rubinstein, M., Durand, F. & Freeman, W. T. Phase-based video motion processing. ACM Trans. Graph. 32, 80 (2013).
Wadhwa, N., Rubinstein, M., Durand, F. & Freeman, W. T. Riesz pyramids for fast phase-based video magnification. In Proc. International Conference on Computational Photography 1–10 (IEEE, 2014).
Zhang, Y., Pintea, S. L. & Van Gemert, J. C. Video acceleration magnification. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 529–537 (IEEE, 2017).
Tulyakov, S. et al. Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2396–2404 (IEEE, 2016).
Dekel, T., Michaeli, T., Irani, M. & Freeman, W. T. Revealing and modifying non-local variations in a single image. ACM Trans. Graph. 34, 227 (2015).
Wadhwa, N., Dekel, T., Wei, D., Durand, F. & Freeman, W. T. Deviation magnification: revealing departures from ideal geometries. ACM Trans. Graph. 34, 226 (2015).
Kingma, D.P. & Welling, M. Auto-encoding variational bayes. In 2nd International Conference on Learning Representations (ICLR, 2014).
Goodfellow, I. et al. Generative adversarial nets. In Proc. Advances in Neural Information Processing Systems Vol. 27, 2672–2680 (NIPS, 2014).
Esser, P., Sutter, E. & Ommer, B. A variational U-Net for conditional appearance and shape generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8857–8866 (IEEE, 2018).
Goodman, A. D. et al. Sustained-release oral fampridine in multiple sclerosis: a randomised, double-blind, controlled trial. Lancet 373, 732–738 (2009).
Zörner, B. et al. Prolonged-release fampridine in multiple sclerosis: improved ambulation effected by changes in walking pattern. Mult. Scler. 22, 1463–1475 (2016).
Schniepp, R. et al. Walking assessment after lumbar puncture in normal-pressure hydrocephalus: a delayed improvement over 3 days. J. Neurosurg. 126, 148–157 (2017).
Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 6450–6459 (IEEE, 2018).
Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Lafferty, C. K. & Britt, J. P. Off-target influences of arch-mediated axon terminal inhibition on network activity and behavior. Front. Neural Circuits 14, 10 (2020).
Miao, C. et al. Hippocampal remapping after partial inactivation of the medial entorhinal cortex. Neuron 88, 590–603 (2015).
Carta, I., Chen, C. H., Schott, A. L., Dorizan, S. & Khodakhah, K. Cerebellar modulation of the reward circuitry and social behavior. Science 363, eaav0581 (2019).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems 1097–1105 (NIPS, 2012).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Johnson, J., Alahi, A. & Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proc. IEEE/ECVF European Conference on Computer Vision 694–711 (Springer, 2016).
Alaverdashvili, M. & Whishaw, I. Q. A behavioral method for identifying recovery and compensation: hand use in a preclinical stroke model using the single pellet reaching task. Neurosci. Biobehav. Rev. 37, 950–967 (2013).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936).
Wahl, A. S. et al. Optogenetically stimulating intact rat corticospinal tract post-stroke restores motor control through regionalized functional circuit formation. Nat. Commun. 8, 1187 (2017).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Brattoli, B., Buechler, U. & Ommer, B. Source code of uBAM: first release (version v.1.0) (2020); https://github.com/utabuechler/uBAM. https://doi.org/10.5281/zenodo.4304070
Acknowledgements
This work was supported in part by German Research Foundation (DFG) projects 371923335 and 421703927 to B.O. as well as the Branco Weiss Fellowship Society in Science and the Swiss National Foundation Grant (Nr. 192678) to ASW.
Author information
Authors and Affiliations
Contributions
B.B., U.B. and B.O. developed uBAM. B.B. and U.B. implemented and evaluated the framework and M.D. and P.R. the VAE. A.-S.W., L.F. and F.H. conducted the biomedical experiments and validated the results. B.B., U.B. and B.O. prepared the figures with input from A.-S.W. All authors contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks Ahmet Arac, Sven Dickinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Qualitative comparison with the state-of-the-art in motion magnification.
To compare our results with Oh et al.(30), we show five clips from different impaired subjects before and after magnification for both methods. First, we re-synthesize the healthy reference behavior to change the appearance to that of the impaired subject so differences in posture can be studied directly, first row (see Method). The second row is the query impaired sequence. Third and forth rows show the magnified frame using the method by Oh et al.(30) and our approach, respectively. The magnified results, represented by magenta markers, show that Oh et al. corrupts the subject appearance, while our method emphasises the differences in posture without altering the appearance. (Details in Supplementary).
Extended Data Fig. 2 Quantitative comparison with the state-of-the-art in motion magnification.
a: mean-squared difference (white = 0) between the original query frame and its magnification using our method and the approach proposed by Oh et al.(30). For impaired subjects, our method modifies only the leg posture, while healthy subjects are not altered. Oh et al.(30) mostly changes the background and alters impaired and healthy indiscriminately. b: Measuring the fraction of frames with important deviation from healthy reference behaviour for each subject and video sequence and plotting the distribution of these scores. c, mean and standard deviation of deviation scores per cohort and approach. (Details in Supplementary).
Extended Data Fig. 3 Abnormality posture before and after magnification.
We show that our magnification supports spotting abnormal postures by applying a generic classifier on our behaviour magnified frames. This doubles the amount of detected abnormal postures without introducing a substantial number of false positives. In particular, we use a one-class linear-svm on ImageNet features trained only on one group (that is healthy) and predict abnormalities on healthy and impaired before and after magnification. The ratio of abnormalities is unaltered within the healthy cohort ( ~ 2%) while it doubles in the impaired cohort (5.7% to 11.7%) showing that our magnification method can detect and magnify small deviations, but that it does not artificially introduce abnormalities. (Details in Supplementary).
Extended Data Fig. 4 Qualitative evaluation of our posture encoding on the rat grasping dataset.
Projection from our posture encoding to a 2D embedding of 1000 randomly chosen postures using tSNE. Similar postures are located close to each other and the grasping action can be reconstructed by following the circle clockwise (best viewed by zooming in on the digital version of this figure). (Details in Supplementary).
Extended Data Fig. 5 Comparison with PCA of posture encoding.
a: A single video clip projected onto the two most important factors of variation using PCA directly on RGB input (left) and our representation (right). Consecutive frames are connected by straight lines colourised according to the time within the video. Every four frames we plot the original frame. PCA is able to sort the frames over time automatically, showing that each cycle is overlapping with the previous one. Our representation better separates different postures thus reflected by the circular shape of the embedding. b: same as a but including more videos. Each colour represent a different subject. In this case, PCA is strongly biased towards the subject appearance. Thus it separates subjects and does not allow to compare behaviour. c: We reduce the appearance bias by normalising per video with the mean appearance. The result still shows subject separation and no similarity of posture across subjects. d: Using our posture representation and applying PCA on Eπ instead of directly on video frames shows no subject bias and only similar postures are near in the 2D space. (Details in Supplementary).
Extended Data Fig. 6 Disentanglement comparison with simple baseline.
We transfer posture from a subject (row) to others with different appearance (columns). a: A baseline model which uses the average video frames as appearance. The appearance is subtracted from each frame to extract the posture. b: Disentanglement using our custom VAE for extracting posture and appearance. Checking for consistency in posture along a row and for similarity in appearance along a column shows that disentanglement is a hard problem: a pixel-based representation cannot solve the task, while our model produces more detailed and realistic images. (Details in Supplementary).
Extended Data Fig. 7 DeepLabCut trainset size.
We train DLC models on a growing number of training samples. The model is evaluated as described in Fig. 2 of the main manuscript. Note the limited gain in performance despite annotation increasing by more than an order of magnitude. (Details in Supplementary).
Extended Data Fig. 8 Comparison with R3D.
Besides JAABA and DLC we also compare our method with R3D which is another non-parametric model, very popular for video classification. We extract R3D features and evaluate the representation using the same protocol as our method. Our model is more suited to behaviour analysis. More information regarding the evaluation protocol can be found in the Methods section of the main manuscript. (Details in Supplementary).
Extended Data Fig. 9 Regress Key-points.
We show qualitative results for the key-point regression from our posture representation to key-points and ene-to-end inferred key-points for DLC. This experiment was computed on 14 keypoints, however we only show 6 for clarity: wrist (yellow), start of the first finger (purple), tip of each finger. The ground-truth location is shown with a circle and the detection inferred by the model with a cross. Even though our representation was not trained on keypoint detection, for some frames we can recover keypoints as good as, or even better, than DLC which was trained end-to-end on the task. We study the gap in performance in more detail in the Supplementary (Supplementary Figure 3).
Extended Data Fig. 10 Typical high/low scoring grasps with optogenetics.
Given the classifier that produced Fig. 5b, we score all testing sequences from the same animal and show two typical sequences with high/low classification scores. The positive score indicates that the sequence was predicted as light-on, the negative that it was predicted as light-off. Both sequences are correctly classified as indicated by the ground-truth (‘GT’) and classifier score (‘SVM-Score’). The sequence on the left shows a missed grasp, consistent with a light-on inhibitory behaviour, while the same animal performs a successful grasp in the sequence on the right for the light-off. Obviously, the classifier cannot see the fiber optics, since we cropped this area out before passing it to the classifier. (Details in Supplementary).
Supplementary information
Supplementary Information
Supplementary Figs. 1–3, Tables 1–6 and Discussion.
Rights and permissions
About this article
Cite this article
Brattoli, B., Büchler, U., Dorkenwald, M. et al. Unsupervised behaviour analysis and magnification (uBAM) using deep learning. Nat Mach Intell 3, 495–506 (2021). https://doi.org/10.1038/s42256-021-00326-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-021-00326-x
This article is cited by
-
ONIX: a unified open-source platform for multimodal neural recording and perturbation during naturalistic behavior
Nature Methods (2024)
-
SUBTLE: An Unsupervised Platform with Temporal Link Embedding that Maps Animal Behavior
International Journal of Computer Vision (2024)
-
EXPLORE: a novel deep learning-based analysis method for exploration behaviour in object recognition tests
Scientific Reports (2023)