default search action
David F. Harwath
Person information
- affiliation (PhD 2018): Massachusetts Institute of Technology, Cambridge, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c58]Jordan Voas, David Harwath, Raymond Mooney:
Multimodal Contextualized Semantic Parsing from Speech. ACL (1) 2024: 7354-7369 - [c57]Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. ACL (1) 2024: 12442-12462 - [c56]Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman:
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos. CVPR 2024: 27242-27252 - [c55]Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi:
Textless Speech-to-Speech Translation With Limited Parallel Data. EMNLP (Findings) 2024: 16208-16224 - [c54]Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data. ICASSP Workshops 2024: 465-469 - [c53]Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-Yi Lee, David Harwath:
Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model. ICASSP Workshops 2024: 645-649 - [c52]Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. ICASSP 2024: 6890-6894 - [c51]Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024 - [i47]Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. CoRR abs/2402.01591 (2024) - [i46]Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath:
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model. CoRR abs/2402.05819 (2024) - [i45]Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. CoRR abs/2402.06959 (2024) - [i44]Puyuan Peng, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. CoRR abs/2403.16973 (2024) - [i43]Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman:
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos. CoRR abs/2404.05206 (2024) - [i42]Jordan Voas, Raymond Mooney, David Harwath:
Multimodal Contextualized Semantic Parsing from Speech. CoRR abs/2406.06438 (2024) - [i41]Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. CoRR abs/2406.09272 (2024) - [i40]Yi-Jen Shih, David Harwath:
Interface Design for Self-Supervised Speech Models. CoRR abs/2406.12209 (2024) - [i39]Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath:
Self-supervised Speech Models for Word-Level Stuttered Speech Detection. CoRR abs/2409.10704 (2024) - [i38]Alan Baade, Puyuan Peng, David Harwath:
SyllableLM: Learning Coarse Semantic Units for Speech Language Models. CoRR abs/2410.04029 (2024) - 2023
- [c50]Anuj Diwan, Eunsol Choi, David Harwath:
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants. ACL (2) 2023: 1639-1650 - [c49]Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. ASRU 2023: 1-8 - [c48]Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath:
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval. ICASSP 2023: 1-5 - [c47]Changan Chen, Wei Sun, David Harwath, Kristen Grauman:
Learning Audio-Visual Dereverberation. ICASSP 2023: 1-5 - [c46]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers. ICASSP 2023: 1-5 - [c45]Reem Gody, David Harwath:
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models. ICASSP 2023: 1-5 - [c44]Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, Edison Thomaz:
A Dataset for Foreground Speech Analysis With Smartwatches In Everyday Home Environments. ICASSP Workshops 2023: 1-5 - [c43]Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. ICASSP 2023: 1-5 - [c42]Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass:
Contrastive Audio-Visual Masked Autoencoder. ICLR 2023 - [c41]Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. INTERSPEECH 2023: 391-395 - [c40]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400 - [c39]Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. INTERSPEECH 2023: 2268-2272 - [c38]Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. INTERSPEECH 2023: 4663-4667 - [c37]Xixi Hu, Senthil Purushwalkam, David Harwath, Kristen Grauman:
Learning to Map Efficiently by Active Echolocation. IROS 2023: 1505-1510 - [c36]Debadatta Dash, Paul Ferrari, Abbas Babajani-Feremi, David Harwath, Amir Borna, Jun Wang:
Subject Generalization in Classifying Imagined and Spoken Speech with MEG. NER 2023: 1-4 - [i37]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023) - [i36]Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode. CoRR abs/2305.11435 (2023) - [i35]Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. CoRR abs/2305.12606 (2023) - [i34]Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi:
Unit-based Speech-to-Speech Translation Without Parallel Data. CoRR abs/2305.15405 (2023) - [i33]Anuj Diwan, Eunsol Choi, David Harwath:
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants. CoRR abs/2306.08667 (2023) - [i32]Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos. CoRR abs/2306.15644 (2023) - [i31]Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023) - [i30]Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. CoRR abs/2310.07654 (2023) - 2022
- [c35]Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CVPR 2022: 19988-19997 - [c34]Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald:
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality. EMNLP 2022: 2236-2250 - [c33]Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. ICASSP 2022: 7727-7731 - [c32]David Xu, David Harwath:
Adversarial Input Ablation for Audio-Visual Learning. ICASSP 2022: 7742-7746 - [c31]Tyler Miller, David Harwath:
Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech. INTERSPEECH 2022: 1416-1420 - [c30]Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. INTERSPEECH 2022: 2438-2442 - [c29]Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. INTERSPEECH 2022: 2823-2827 - [c28]Christopher Song, David Harwath, Tuka Alhanai, James R. Glass:
Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings. LREC 2022: 7253-7258 - [c27]Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. SLT 2022: 715-722 - [c26]Luke Strgar, David Harwath:
Phoneme Segmentation Using Self-Supervised Speech Models. SLT 2022: 1067-1073 - [i29]Puyuan Peng, David Harwath:
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling. CoRR abs/2202.03543 (2022) - [i28]Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, Edison Thomaz:
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach. CoRR abs/2203.11294 (2022) - [i27]Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. CoRR abs/2203.15081 (2022) - [i26]Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. CoRR abs/2203.16691 (2022) - [i25]Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. CoRR abs/2210.00705 (2022) - [i24]Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. CoRR abs/2210.03625 (2022) - [i23]Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass:
Contrastive Audio-Visual Masked Autoencoder. CoRR abs/2210.07839 (2022) - [i22]Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald:
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality. CoRR abs/2211.00768 (2022) - [i21]Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath:
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval. CoRR abs/2211.01180 (2022) - [i20]Luke Strgar, David Harwath:
Phoneme Segmentation Using Self-Supervised Speech Models. CoRR abs/2211.01461 (2022) - [i19]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition using Disentangled Conformers. CoRR abs/2212.01393 (2022) - [i18]Reem Gody, David Harwath:
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models. CoRR abs/2212.01661 (2022) - 2021
- [c25]Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. ACL/IJCNLP (1) 2021: 5284-5300 - [c24]Mathew Monfort, SouYoung Jin, Alexander H. Liu, David Harwath, Rogério Feris, James R. Glass, Aude Oliva:
Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions. CVPR 2021: 14871-14881 - [c23]Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. ICCV 2021: 7992-8001 - [c22]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. Interspeech 2021: 1584-1588 - [c21]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. Interspeech 2021: 3006-3010 - [i17]Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Schmidt Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. CoRR abs/2104.12671 (2021) - [i16]Mathew Monfort, SouYoung Jin, Alexander H. Liu, David Harwath, Rogério Feris, James R. Glass, Aude Oliva:
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions. CoRR abs/2105.04489 (2021) - [i15]Changan Chen, Wei Sun, David Harwath, Kristen Grauman:
Learning Audio-Visual Dereverberation. CoRR abs/2106.07732 (2021) - [i14]Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. CoRR abs/2109.08186 (2021) - [i13]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. CoRR abs/2111.04823 (2021) - [i12]Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander H. Liu, David Harwath, James R. Glass, Hilde Kuehne, Mubarak Shah:
Routing with Self-Attention for Multimodal Capsule Networks. CoRR abs/2112.00775 (2021) - [i11]Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CoRR abs/2112.04446 (2021) - 2020
- [j2]David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. Int. J. Comput. Vis. 128(3): 620-641 (2020) - [c20]Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass:
Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms. ICASSP 2020: 4352-4356 - [c19]David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. ICLR 2020 - [c18]Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass:
Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets. INTERSPEECH 2020: 1486-1490 - [i10]Andrew Rouditchenko, Angie W. Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogério Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. CoRR abs/2006.09199 (2020) - [i9]Wei-Ning Hsu, David Harwath, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. CoRR abs/2012.15454 (2020)
2010 – 2019
- 2019
- [c17]Angie W. Boggust, Kartik Audhkhasi, Dhiraj Joshi, David Harwath, Samuel Thomas, Rogério Schmidt Feris, Danny Gutfreund, Yang Zhang, Antonio Torralba, Michael Picheny, James R. Glass:
Grounding Spoken Words in Unlabeled Video. CVPR Workshops 2019: 29-32 - [c16]Didac Suris, Adrià Recasens, David Bau, David Harwath, James R. Glass, Antonio Torralba:
Learning Words by Drawing Images. CVPR 2019: 2029-2038 - [c15]David Harwath, James R. Glass:
Towards Visually Grounded Sub-word Speech Unit Discovery. ICASSP 2019: 3017-3021 - [c14]Emmanuel Azuh, David Harwath, James R. Glass:
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. INTERSPEECH 2019: 276-280 - [c13]Wei-Ning Hsu, David Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. INTERSPEECH 2019: 3242-3246 - [i8]David Harwath, James R. Glass:
Towards Visually Grounded Sub-Word Speech Unit Discovery. CoRR abs/1902.08213 (2019) - [i7]Wei-Ning Hsu, David F. Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. CoRR abs/1907.04355 (2019) - [i6]David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. CoRR abs/1911.09602 (2019) - 2018
- [b1]David Frank Harwath:
Learning spoken language through vision. Massachusetts Institute of Technology, Cambridge, USA, 2018 - [c12]David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. ECCV (6) 2018: 659-677 - [c11]David Harwath, Galen Chuang, James R. Glass:
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech. ICASSP 2018: 4969-4973 - [i5]David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. CoRR abs/1804.01452 (2018) - [i4]David F. Harwath, Galen Chuang, James R. Glass:
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech. CoRR abs/1804.03052 (2018) - 2017
- [c10]David Harwath, James R. Glass:
Learning Word-Like Units from Joint Audio-Visual Analysis. ACL (1) 2017: 506-517 - [c9]Kenneth Leidal, David Harwath, James R. Glass:
Learning modality-invariant representations for speech and images. ASRU 2017: 424-429 - [i3]David F. Harwath, James R. Glass:
Learning Word-Like Units from Joint Audio-Visual Analysis. CoRR abs/1701.07481 (2017) - [i2]Kenneth Leidal, David Harwath, James R. Glass:
Learning Modality-Invariant Representations for Speech and Images. CoRR abs/1712.03897 (2017) - 2016
- [j1]Stephen H. Shum, David F. Harwath, Najim Dehak, James R. Glass:
On the Use of Acoustic Unit Discovery for Language Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 24(9): 1665-1676 (2016) - [c8]David F. Harwath, Antonio Torralba, James R. Glass:
Unsupervised Learning of Spoken Language with Visual Context. NIPS 2016: 1858-1866 - [c7]Felix Sun, David F. Harwath, James R. Glass:
Look, listen, and decode: Multimodal speech recognition with images. SLT 2016: 573-578 - 2015
- [c6]David F. Harwath, James R. Glass:
Deep multimodal semantic embeddings for speech and images. ASRU 2015: 237-244 - [i1]David F. Harwath, James R. Glass:
Deep Multimodal Semantic Embeddings for Speech and Images. CoRR abs/1511.03690 (2015) - 2014
- [c5]David Harwath, Alexander Gruenstein, Ian McGraw:
Choosing useful word alternates for automatic speech recognition correction interfaces. INTERSPEECH 2014: 949-953 - [c4]David F. Harwath, James R. Glass:
Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems. INTERSPEECH 2014: 2655-2659 - 2013
- [c3]Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard C. Rose, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Börschinger, Justin T. Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith D. Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, Samuel Thomas:
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition. ICASSP 2013: 8111-8115 - [c2]David F. Harwath, Timothy J. Hazen, James R. Glass:
Zero resource spoken audio corpus analysis. ICASSP 2013: 8555-8559 - 2012
- [c1]David F. Harwath, Timothy J. Hazen:
Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech. ICASSP 2012: 5073-5076
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-18 20:44 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint