Abstract
In this paper, we demonstrate a system that automates the process of recording video lectures in classrooms. Through special hardware (lecturer and audience facing cameras and microphone arrays), we record multiple points of view of the lecture. Person detection and tracking, along with recognition of different human actions are used to digitally zoom in on the lecturer, and alternate focus between the lecturer and the slides or the blackboard. Audio sound source localization, along with face detection and tracking, is used to detect questions from the audience, to digitally zoom in on the member of the audience asking the question and to improve the quality of the sound recording. Finally, an automatic video editing system is used to naturally switch between the different video streams and to compose a compelling end product. We demonstrate the working system in two classrooms, over two 2-h lectures, given by two lecturers.
This work is supported by the Cametron Project grant.
Excluding the corresponding author, authors are listed in alphabetical order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Seminar recordings: https://youtu.be/DalAafs38TU Matthew recordings: https://youtu.be/p3ZeFfj238g.
- 2.
References
Aerts, B., Goedemé, T., Vennekens, J.: A probabilistic logic programming approach to automatic video montage. In: ECAI, pp. 234–242 (2016)
Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)
Brotherton, J.A., Abowd, G.D.: Lessons learned from eclass: assessing automated capture and access in the classroom. ACM Trans. Comput.-Hum. Interact. (TOCHI) 11(2), 121–155 (2004)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV) (2015)
Hahn, E.: Video lectures help enhance online information literacy course. Ref. Serv. Rev. 40(1), 49–60 (2012)
Hulens, D., Van Beeck, K., Goedemé, T.: Fast and accurate face orientation measurement in low-resolution images on embedded hardware. In: Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016), vol. 4, pp. 538–544. Scitepress (2016)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Lampi, F., Kopf, S., Benz, M., Effelsberg, W.: An automatic cameraman in a lecture recording system. In: Proceedings of the International Workshop on Educational Multimedia and Multimedia Education, pp. 11–18. ACM (2007)
Marchand, J.P., Pearson, M.L., Albon, S.P.: Student and faculty member perspectives on lecture capture in pharmacy education. Am. J. Pharm. Educ. 78(4), 74 (2014)
Mavlankar, A., Agrawal, P., Pang, D., Halawa, S., Cheung, N.M., Girod, B.: An interactive region-of-interest video streaming system for online lecture viewing. In: 18th International Packet Video Workshop (PV), pp. 64–71. IEEE (2010)
Mestre, X., Lagunas, M.A.: On diagonal loading for minimum variance beamformers. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 459–462. IEEE (2003)
Pearce, D.: Aurora working group: DSR front end LVCSR evaluation AU/384/02. Ph.D. thesis, Mississippi State University (2002)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), No. EPFL-CONF-192584. IEEE (2011)
Rui, Y., Gupta, A., Grudin, J., He, L.: Automating lecture capture and broadcast: technology and videography. Multimed. Syst. 10(1), 3–15 (2004)
Schulte, O.A., Wunden, T., Brunner, A.: Replay: an integrated and open solution to produce, handle, and distributeaudio-visual (lecture) recordings. In: Proceedings of the 36th Annual ACM SIGUCCS Fall Conference: Moving Mountains, Blazing Trails, pp. 195–198. ACM (2008)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)
Tan, Z.H., Lindberg, B.: Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Top. Signal Process. 4(5), 798–807 (2010)
Tugrul, T.O.: Student perceptions of an educational technology tool: video recordings of project presentations. Procedia-Soc. Behav. Sci. 64, 133–140 (2012)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L 1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74936-3_22
Zhang, C., Rui, Y., Crawford, J., He, L.W.: An automated end-to-end lecture capture and broadcasting system. ACM Trans. Multimed. Comput. Commun. App. (TOMM) 4(1), 6 (2008)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Hulens, D. et al. (2018). The CAMETRON Lecture Recording System: High Quality Video Recording and Editing with Minimal Human Supervision. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)