default search action

combined dblp search
author search
venue search
publication search

ask others

David F. Harwath

David Frank Harwath – David Harwath

> Home > Persons

Person information

affiliation (PhD 2018): Massachusetts Institute of Technology, Cambridge, USA

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c58]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/VoasHM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/VoasHM24
Jordan Voas, David Harwath, Raymond Mooney:
Multimodal Contextualized Semantic Parsing from Speech. ACL (1) 2024: 7354-7369
[c57]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/Peng00MH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/Peng00MH24
Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. ACL (1) 2024: 12442-12462
[c56]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/ChenAGHG24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/ChenAGHG24
Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman:
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos. CVPR 2024: 27242-27252
[c55]
- view
  - electronic edition @ aclanthology.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/emnlp/DiwanSHC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/emnlp/DiwanSHC24
Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi:
Textless Speech-to-Speech Translation With Limited Parallel Data. EMNLP (Findings) 2024: 16208-16224
[c54]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/WangSCBPLWH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/WangSCBPLWH24
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data. ICASSP Workshops 2024: 465-469
[c53]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/FangYSPWBLH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/FangYSPWBLH24
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-Yi Lee, David Harwath:
Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model. ICASSP Workshops 2024: 645-649
[c52]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/TsengBCCLLPSWW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/TsengBCCLLPSWW024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. ICASSP 2024: 6890-6894
[c51]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/icml/ZhengPM0CH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/ZhengPM0CH24
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024
[i47]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-01591
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-01591
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. CoRR abs/2402.01591 (2024)
[i46]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-05819
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-05819
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath:
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model. CoRR abs/2402.05819 (2024)
[i45]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-06959
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-06959
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. CoRR abs/2402.06959 (2024)
[i44]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-16973
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-16973
Puyuan Peng, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. CoRR abs/2403.16973 (2024)
[i43]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-05206
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-05206
Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman:
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos. CoRR abs/2404.05206 (2024)
[i42]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-06438
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-06438
Jordan Voas, Raymond Mooney, David Harwath:
Multimodal Contextualized Semantic Parsing from Speech. CoRR abs/2406.06438 (2024)
[i41]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-09272
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-09272
Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. CoRR abs/2406.09272 (2024)
[i40]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-12209
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-12209
Yi-Jen Shih, David Harwath:
Interface Design for Self-Supervised Speech Models. CoRR abs/2406.12209 (2024)
[i39]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-10704
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-10704
Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath:
Self-supervised Speech Models for Word-Level Stuttered Speech Detection. CoRR abs/2409.10704 (2024)
[i38]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-04029
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-04029
Alan Baade, Puyuan Peng, David Harwath:
SyllableLM: Learning Coarse Semantic Units for Speech Language Models. CoRR abs/2410.04029 (2024)
2023
[c50]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/DiwanCH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/DiwanCH23
Anuj Diwan, Eunsol Choi, David Harwath:
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants. ACL (2) 2023: 1639-1650
[c49]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/LaiSPKGCCBCHZLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/LaiSPKGCCBCHZLG23
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. ASRU 2023: 1-8
[c48]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/BerrySWCLH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/BerrySWCLH23
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath:
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval. ICASSP 2023: 1-5
[c47]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenSHG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenSHG23
Changan Chen, Wei Sun, David Harwath, Kristen Grauman:
Learning Audio-Visual Dereverberation. ICASSP 2023: 1-5
[c46]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/DiwanYHTCHM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/DiwanYHTCHM23
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers. ICASSP 2023: 1-5
[c45]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GodyH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GodyH23
Reem Gody, David Harwath:
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models. ICASSP 2023: 1-5
[c44]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/LiangXCAHT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/LiangXCAHT23
Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, Edison Thomaz:
A Dataset for Foreground Speech Analysis With Smartwatches In Everyday Home Environments. ICASSP Workshops 2023: 1-5
[c43]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/RouditchenkoCSTFKKHKG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/RouditchenkoCSTFKKHKG23
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. ICASSP 2023: 1-5
[c42]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iclr/GongRLHKKG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/GongRLHKKG23
Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass:
Contrastive Audio-Visual Masked Autoencoder. ICLR 2023
[c41]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Peng0RMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Peng0RMH23
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. INTERSPEECH 2023: 391-395
[c40]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengY0H23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengY0H23
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400
[c39]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoK0F23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoK0F23
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. INTERSPEECH 2023: 2268-2272
[c38]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriPHLOJCJRR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriPHLOJCJRR23
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. INTERSPEECH 2023: 4663-4667
[c37]
- view
  authority control:
- export record
  dblp key:
  - conf/iros/HuPHG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iros/HuPHG23
Xixi Hu, Senthil Purushwalkam, David Harwath, Kristen Grauman:
Learning to Map Efficiently by Active Echolocation. IROS 2023: 1505-1510
[c36]
- view
  authority control:
- export record
  dblp key:
  - conf/ner/DashFBHBW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ner/DashFBHBW23
Debadatta Dash, Paul Ferrari, Abbas Babajani-Feremi, David Harwath, Amir Borna, Jun Wang:
Subject Generalization in Classifying Imagined and Spoken Speech with MEG. NER 2023: 1-4
[i37]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-11095
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-11095
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023)
[i36]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-11435
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-11435
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode. CoRR abs/2305.11435 (2023)
[i35]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-12606
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-12606
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. CoRR abs/2305.12606 (2023)
[i34]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-15405
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-15405
Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi:
Unit-based Speech-to-Speech Translation Without Parallel Data. CoRR abs/2305.15405 (2023)
[i33]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-08667
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-08667
Anuj Diwan, Eunsol Choi, David Harwath:
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants. CoRR abs/2306.08667 (2023)
[i32]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-15644
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-15644
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos. CoRR abs/2306.15644 (2023)
[i31]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-10787
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-10787
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023)
[i30]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-07654
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-07654
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. CoRR abs/2310.07654 (2023)
2022
[c35]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/ShvetsovaCR0KFH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/ShvetsovaCR0KFH22
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CVPR 2022: 19988-19997
[c34]
- view
  authority control:
- export record
  dblp key:
  - conf/emnlp/DiwanBCHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/emnlp/DiwanBCHM22
Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald:
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality. EMNLP 2022: 2236-2250
[c33]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PengH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PengH22
Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. ICASSP 2022: 7727-7731
[c32]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/XuH22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/XuH22a
David Xu, David Harwath:
Adversarial Input Ablation for Audio-Visual Learning. ICASSP 2022: 7742-7746
[c31]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MillerH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MillerH22
Tyler Miller, David Harwath:
Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech. INTERSPEECH 2022: 1416-1420
[c30]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaadePH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaadePH22
Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. INTERSPEECH 2022: 2438-2442
[c29]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengH22
Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. INTERSPEECH 2022: 2823-2827
[c28]
- view
  - electronic edition @ aclanthology.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/lrec/SongHAG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/SongHAG22
Christopher Song, David Harwath, Tuka Alhanai, James R. Glass:
Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings. LREC 2022: 7253-7258
[c27]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/ShihWCBLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/ShihWCBLH22
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. SLT 2022: 715-722
[c26]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/StrgarH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/StrgarH22
Luke Strgar, David Harwath:
Phoneme Segmentation Using Self-Supervised Speech Models. SLT 2022: 1067-1073
[i29]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2202-03543
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-03543
Puyuan Peng, David Harwath:
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling. CoRR abs/2202.03543 (2022)
[i28]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-11294
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-11294
Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, Edison Thomaz:
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach. CoRR abs/2203.11294 (2022)
[i27]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-15081
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-15081
Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. CoRR abs/2203.15081 (2022)
[i26]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-16691
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-16691
Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. CoRR abs/2203.16691 (2022)
[i25]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-00705
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-00705
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. CoRR abs/2210.00705 (2022)
[i24]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-03625
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-03625
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. CoRR abs/2210.03625 (2022)
[i23]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-07839
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-07839
Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James R. Glass:
Contrastive Audio-Visual Masked Autoencoder. CoRR abs/2210.07839 (2022)
[i22]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-00768
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-00768
Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald:
Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality. CoRR abs/2211.00768 (2022)
[i21]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-01180
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-01180
Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath:
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval. CoRR abs/2211.01180 (2022)
[i20]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-01461
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-01461
Luke Strgar, David Harwath:
Phoneme Segmentation Using Self-Supervised Speech Models. CoRR abs/2211.01461 (2022)
[i19]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-01393
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-01393
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition using Disentangled Conformers. CoRR abs/2212.01393 (2022)
[i18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-01661
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-01661
Reem Gody, David Harwath:
Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models. CoRR abs/2212.01661 (2022)
2021
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/HsuHMSG20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/HsuHMSG20
Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. ACL/IJCNLP (1) 2021: 5284-5300
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/MonfortJLHFGO21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/MonfortJLHFGO21
Mathew Monfort, SouYoung Jin, Alexander H. Liu, David Harwath, Rogério Feris, James R. Glass, Aude Oliva:
Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions. CVPR 2021: 14871-14881
[c23]
- view
  authority control:
- export record
  dblp key:
  - conf/iccv/ChenRDK0BPKFHGP21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iccv/ChenRDK0BPKFHGP21
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. ICCV 2021: 7992-8001
[c22]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoBHC21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoBHC21
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. Interspeech 2021: 1584-1588
[c21]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoBH021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoBH021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. Interspeech 2021: 3006-3010
[i17]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2104-12671
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-12671
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Schmidt Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. CoRR abs/2104.12671 (2021)
[i16]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2105-04489
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2105-04489
Mathew Monfort, SouYoung Jin, Alexander H. Liu, David Harwath, Rogério Feris, James R. Glass, Aude Oliva:
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions. CoRR abs/2105.04489 (2021)
[i15]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2106-07732
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2106-07732
Changan Chen, Wei Sun, David Harwath, Kristen Grauman:
Learning Audio-Visual Dereverberation. CoRR abs/2106.07732 (2021)
[i14]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2109-08186
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2109-08186
Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. CoRR abs/2109.08186 (2021)
[i13]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2111-04823
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2111-04823
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. CoRR abs/2111.04823 (2021)
[i12]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2112-00775
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2112-00775
Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander H. Liu, David Harwath, James R. Glass, Hilde Kuehne, Mubarak Shah:
Routing with Self-Attention for Multimodal Capsule Networks. CoRR abs/2112.00775 (2021)
[i11]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2112-04446
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2112-04446
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CoRR abs/2112.04446 (2021)
2020
[j2]
- view
  authority control:
- export record
  dblp key:
  - journals/ijcv/HarwathRSCTG20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/ijcv/HarwathRSCTG20
David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. Int. J. Comput. Vis. 128(3): 620-641 (2020)
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/OhishiKKKHG20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/OhishiKKKHG20
Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass:
Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms. ICASSP 2020: 4352-4356
[c19]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iclr/HarwathHG20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/HarwathHG20
David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. ICLR 2020
[c18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhishiKKKHG20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhishiKKKHG20
Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass:
Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets. INTERSPEECH 2020: 1486-1490
[i10]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2006-09199
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2006-09199
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogério Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. CoRR abs/2006.09199 (2020)
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2012-15454
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2012-15454
Wei-Ning Hsu, David Harwath, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. CoRR abs/2012.15454 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c17]
- view
  - electronic edition @ thecvf.com (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/cvpr/BoggustAJHTFGZ019
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/BoggustAJHTFGZ019
Angie W. Boggust, Kartik Audhkhasi, Dhiraj Joshi, David Harwath, Samuel Thomas, Rogério Schmidt Feris, Danny Gutfreund, Yang Zhang, Antonio Torralba, Michael Picheny, James R. Glass:
Grounding Spoken Words in Unlabeled Video. CVPR Workshops 2019: 29-32
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/SurisRBHG019
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/SurisRBHG019
Didac Suris, Adrià Recasens, David Bau, David Harwath, James R. Glass, Antonio Torralba:
Learning Words by Drawing Images. CVPR 2019: 2029-2038
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HarwathG19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HarwathG19
David Harwath, James R. Glass:
Towards Visually Grounded Sub-word Speech Unit Discovery. ICASSP 2019: 3017-3021
[c14]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzuhHG19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzuhHG19
Emmanuel Azuh, David Harwath, James R. Glass:
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. INTERSPEECH 2019: 276-280
[c13]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsuHG19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsuHG19
Wei-Ning Hsu, David Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. INTERSPEECH 2019: 3242-3246
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1902-08213
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1902-08213
David Harwath, James R. Glass:
Towards Visually Grounded Sub-Word Speech Unit Discovery. CoRR abs/1902.08213 (2019)
[i7]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1907-04355
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1907-04355
Wei-Ning Hsu, David F. Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. CoRR abs/1907.04355 (2019)
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1911-09602
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1911-09602
David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. CoRR abs/1911.09602 (2019)
2018
[b1]
- view
  - electronic edition via handle.net
  - no references & citations available
- export record
  dblp key:
  - phd/ndltd/Harwath18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/phd/ndltd/Harwath18
David Frank Harwath:
Learning spoken language through vision. Massachusetts Institute of Technology, Cambridge, USA, 2018
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/eccv/HarwathRSCTG18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/eccv/HarwathRSCTG18
David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. ECCV (6) 2018: 659-677
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HarwathCG18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HarwathCG18
David Harwath, Galen Chuang, James R. Glass:
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech. ICASSP 2018: 4969-4973
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1804-01452
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1804-01452
David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James R. Glass:
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input. CoRR abs/1804.01452 (2018)
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1804-03052
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1804-03052
David F. Harwath, Galen Chuang, James R. Glass:
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech. CoRR abs/1804.03052 (2018)
2017
[c10]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/acl/HarwathG17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/HarwathG17
David Harwath, James R. Glass:
Learning Word-Like Units from Joint Audio-Visual Analysis. ACL (1) 2017: 506-517
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/LeidalHG17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/LeidalHG17
Kenneth Leidal, David Harwath, James R. Glass:
Learning modality-invariant representations for speech and images. ASRU 2017: 424-429
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/HarwathG17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/HarwathG17
David F. Harwath, James R. Glass:
Learning Word-Like Units from Joint Audio-Visual Analysis. CoRR abs/1701.07481 (2017)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1712-03897
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1712-03897
Kenneth Leidal, David Harwath, James R. Glass:
Learning Modality-Invariant Representations for Speech and Images. CoRR abs/1712.03897 (2017)
2016
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/ShumHDG16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/ShumHDG16
Stephen H. Shum, David F. Harwath, Najim Dehak, James R. Glass:
On the Use of Acoustic Unit Discovery for Language Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 24(9): 1665-1676 (2016)
[c8]
- view
- export record
  dblp key:
  - conf/nips/HarwathTG16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/HarwathTG16
David F. Harwath, Antonio Torralba, James R. Glass:
Unsupervised Learning of Spoken Language with Visual Context. NIPS 2016: 1858-1866
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/SunHG16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/SunHG16
Felix Sun, David F. Harwath, James R. Glass:
Look, listen, and decode: Multimodal speech recognition with images. SLT 2016: 573-578
2015
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/HarwathG15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/HarwathG15
David F. Harwath, James R. Glass:
Deep multimodal semantic embeddings for speech and images. ASRU 2015: 237-244
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/HarwathG15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/HarwathG15
David F. Harwath, James R. Glass:
Deep Multimodal Semantic Embeddings for Speech and Images. CoRR abs/1511.03690 (2015)
2014
[c5]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarwathGM14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarwathGM14
David Harwath, Alexander Gruenstein, Ian McGraw:
Choosing useful word alternates for automatic speech recognition correction interfaces. INTERSPEECH 2014: 949-953
[c4]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarwathG14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarwathG14
David F. Harwath, James R. Glass:
Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems. INTERSPEECH 2014: 2655-2659
2013
[c3]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/JansenDGJKCFHMRSCMVBBCDFHLLNPRST13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/JansenDGJKCFHMRSCMVBBCDFHLLNPRST13
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard C. Rose, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Börschinger, Justin T. Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith D. Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, Samuel Thomas:
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition. ICASSP 2013: 8111-8115
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HarwathHG13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HarwathHG13
David F. Harwath, Timothy J. Hazen, James R. Glass:
Zero resource spoken audio corpus analysis. ICASSP 2013: 8555-8559
2012
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HarwathH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HarwathH12
David F. Harwath, Timothy J. Hazen:
Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech. ICASSP 2012: 5073-5076

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.