Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/319463.319484acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article
Free access

Multimodal people ID for a multimedia meeting browser

Published: 30 October 1999 Publication History

Abstract

A meeting browser is a system that allows users to review a multimedia meeting record from a variety of indexing methods. Identification of meeting participants is essential for creating such a multimedia meeting record. Moreover, knowing who is speaking can enhance the performance of speech recognition and indexing meeting transcription. In this paper, we present an approach that identifies meeting participants by fusing multimodal inputs. We use face ID, speaker ID, color appearance ID, and sound source directional ID to identify and track meeting. After describing the different modules in detail, we will discuss a framework for combining the information sources. Integration of the multimodal people ID into the multimedia meeting browser is in its preliminary stage.

References

[1]
F.Bimbot, H. Hurter, C. Jaboulet, J. Koolwaaij, J. Lindberg, and J. Pierrot. Speaker verification in the telephone network: Research activities in the cave project. Technical report, PTT telecom, ENST, IDIAP, KTH, KUN, and Ubilab, 1997
[2]
C.M.Bishop, Neural Networks for Pattern Recognition, Oxford Press, 1995
[3]
R. Chellappa, C.L. Wilson, and S. Sirohey, Human and machine recognition of faces: a survey. Proceedings of the IEEE, 83(5), pages 705-41, 1995.
[4]
T. Choudhury, B. Clarkson, T. Jebara and A.Pentland. Multimodal person recognition using unconstrained audio and video. In Proceedings of AVBPA99
[5]
B. Duc, E.S. Bigun, J. Bigun, G. Maim, and S. Fischer. Fusion of audio and video information for multi-modal person authentication. Pattern Recognition Letters, 18 (9), pages 835-843, 1998.
[6]
S. Kullback, Information theory and statistics, New York: John Wiley and Sons, 1959
[7]
J. Lin. Divergence measures based on the Shannon Entropy. IEEE Transactions on Information Theory, 37( 1 ), 145-151
[8]
A. Pentland, B.Moghaddam, and T.Starner. View-based and modular eigenspaces for face recognition, Proc. CVPR'94, pp.84-91, June 1994
[9]
L.R. Rabiner and B-H. Juang. Fundamentals of speech recognition. Englewood Cliffs, N.J. : PTR Prentice Hall, 1993.
[10]
D. A. Reynolds and R. C, Rose, Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models, IEEE Trans. Speech. Audio Processing, vol.3, pp.72-83, Jan, 1995
[11]
L. Sirovich and M. Kirby, Low-dimensional procedure for the characterization of human faces, journal of Opt. Soc. Am., 4(3), pages 519-524, 1987.
[12]
S. Shafer, J. Krumm, B. Brumitt, B. Meyers, M. Czerwinski, D. Robbins. The new EasyLiving project at Microsoft Research. Joint DARPAfNIST Smart Spaces Workshop, July 30-31, 1998.
[13]
R. Stiefelhagen, M. Finke, J. Yang, and A. Waibel. From gaze to focus of attention, in Proceedings of the Workshop on Perceptual User Interfaces (PUI98).
[14]
M. j. Swain and D. H. Ballard. Color indexing. International Journal of Computer Vision, 7(1) pages 11-32, 1991.
[15]
J. C. Terrillon, M. David, S. Akamatsu. Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments, in Proceedings of the Third international conference on automatic face and gesture recognition. Nara Japan, 112-117, 1998.
[16]
M.A. Turk and A. Pentland. Face recognition using eigenfaces. In Proc. {EEE Conf. on Computer Vision and Pattern Recognition, pages 586-59t, 1991.
[17]
A. Waibel, M. Bett, M. Finke, and R. Stiefelhagen. Meeting browser: Tracking and summarizing meetings. In Proceedings of the DARPA Broadcast News Workshop 1998.
[18]
C. Wren, A. Azarbayejani, T. Darrell, A. Pentland. Pfinder: real-Time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), pp. 780- 785,1997.
[19]
J. Yang and A. Waibel. A Real-time face tracker, in Proceedings ofWACV'96, pages 142-147, 1996.
[20]
I. Yang, R. Stiefelhagen, U. Meier, and A. Waibel. Visual tracking for multimodal human computer interaction. In Proceedings of CH198, pp. 140-147
[21]
H. Yu, C. Clark, R. Malkin, and A. Waibel. Experiments in automatic meeting transcription using J'RTK. In Proceedings of ICASSP'98.

Cited By

View all
  • (2024)ConVoiFilter: A Case Study of Doing Cocktail Party Speech Recognition2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)10.1109/ICASSPW62465.2024.10626098(565-569)Online publication date: 14-Apr-2024
  • (2013)People reidentification in surveillance and forensicsACM Computing Surveys10.1145/2543581.254359646:2(1-37)Online publication date: 27-Dec-2013
  • (2013)User Interface Patterns for Multimodal InteractionTransactions on Pattern Languages of Programming III10.1007/978-3-642-38676-3_4(111-167)Online publication date: 2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)
October 1999
516 pages
ISBN:1581131518
DOI:10.1145/319463
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 1999

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data fusion
  2. meeting browser
  3. multimedia
  4. multimodal
  5. people identification

Qualifiers

  • Article

Conference

MM99: ACM Multimedia 1999
October 30 - November 5, 1999
Florida, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ConVoiFilter: A Case Study of Doing Cocktail Party Speech Recognition2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)10.1109/ICASSPW62465.2024.10626098(565-569)Online publication date: 14-Apr-2024
  • (2013)People reidentification in surveillance and forensicsACM Computing Surveys10.1145/2543581.254359646:2(1-37)Online publication date: 27-Dec-2013
  • (2013)User Interface Patterns for Multimodal InteractionTransactions on Pattern Languages of Programming III10.1007/978-3-642-38676-3_4(111-167)Online publication date: 2013
  • (2010)Smart meeting systemsACM Computing Surveys10.1145/1667062.166706542:2(1-20)Online publication date: 5-Mar-2010
  • (2009)Communicative gestures in coreference identification in multiparty meetingsProceedings of the 2009 international conference on Multimodal interfaces10.1145/1647314.1647352(211-218)Online publication date: 2-Nov-2009
  • (2009)Multimodal identity tracking in a smart roomPersonal and Ubiquitous Computing10.1007/s00779-007-0175-y13:1(25-31)Online publication date: 1-Jan-2009
  • (2008)Probabilistic integration of sparse audio-visual cues for identity trackingProceedings of the 16th ACM international conference on Multimedia10.1145/1459359.1459380(151-158)Online publication date: 26-Oct-2008
  • (2008)As go the feet...Proceedings of the 10th international conference on Multimodal interfaces10.1145/1452392.1452412(97-104)Online publication date: 20-Oct-2008
  • (2007)Towards smart meetingProceedings of the 9th international conference on Multimodal interfaces10.1145/1322192.1322210(86-93)Online publication date: 12-Nov-2007
  • (2007)Audio-visual multi-person tracking and identification for smart environmentsProceedings of the 15th ACM international conference on Multimedia10.1145/1291233.1291388(661-670)Online publication date: 29-Sep-2007
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media