default search action
SLT 2021: Shenzhen, China
- IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021. IEEE 2021, ISBN 978-1-7281-7066-4
- Mohan Li, Catalin Zorila, Rama Doddipatla:
Transformer-Based Online Speech Recognition with Decoder-end Adaptive Computation Steps. 1-7 - Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer:
Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition. 8-14 - Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie:
Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter. 15-21 - Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe:
Streaming Transformer Asr With Blockwise Synchronous Beam Search. 22-29 - Jinhwan Park, Chanwoo Kim, Wonyong Sung:
Convolution-Based Attention Model With Positional Encoding For Streaming Speech Recognition On Embedded Devices. 30-37 - George Sterpu, Christian Saam, Naomi Harte:
Learning to Count Words in Fluent Speech Enables Online Speech Recognition. 38-45 - Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig:
Benchmarking LF-MMI, CTC And RNN-T Criteria For Streaming ASR. 46-51 - Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer:
Alignment Restricted Streaming Recurrent Neural Network Transducer. 52-59 - Huahuan Zheng, Keyu An, Zhijian Ou:
Efficient Neural Architecture Search for End-to-End Speech Recognition Via Straight-Through Gradients. 60-67 - Ke Hu, Ruoming Pang, Tara N. Sainath, Trevor Strohman:
Transformer Based Deliberation for Two-Pass Speech Recognition. 68-74 - Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie:
Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition. 75-81 - Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao:
Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion. 82-88 - Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals:
On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. 89-96 - Thomas Pellegrini, Romain Zimmer, Timothée Masquelier:
Low-Activity Supervised Convolutional Spiking Neural Networks Applied to Speech Commands Recognition. 97-103 - Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie:
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet. 104-110 - Kiran Praveen, Abhishek Pandey, Deepak Kumar, Shakti Prasad Rath, Sandip Shriram Bapat:
Dynamically Weighted Ensemble Models for Automatic Speech Recognition. 111-116 - Kazuhiro Nakadai, Yosuke Fukumoto, Ryu Takeda:
Investigation of Node Pruning Criteria for Neural Networks Model Compression with Non-Linear Function and Non-Uniform Network Topology. 117-124 - Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:
Semi-Supervised end-to-end Speech Recognition via Local Prior Matching. 125-132 - Jaesung Huh, Minjae Lee, Heesoo Heo, Seongkyu Mun, Joon Son Chung:
Metric Learning for Keyword Spotting. 133-140 - Alexandru-Lucian Georgescu, Cristian Manolache, Dan Oneata, Horia Cucu, Corneliu Burileanu:
Data-Filtering Methods for Self-Training of Automatic Speech Recognition Systems. 141-147 - Prakhar Swarup, Debmalya Chakrabarty, Ashtosh Sapru, Hitesh Tulsiani, Harish Arsikere, Sri Garimella:
Efficient Large Scale Semi-Supervised Learning for CTC Based Acoustic Models. 148-155 - Morgane Rivière, Emmanuel Dupoux:
Towards Unsupervised Learning of Speech Features in the Wild. 156-163 - Bowen Shi, Shane Settle, Karen Livescu:
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings. 164-171 - Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig:
Improving RNN Transducer Based ASR with Auxiliary Tasks. 172-179 - Songjun Cao, Yike Zhang, Xiaobing Feng, Long Ma:
Improving Speech Recognition Accuracy of Local POI Using Geographical Models. 180-185 - Heng-Jui Chang, Alexander H. Liu, Hung-yi Lee, Lin-Shan Lee:
End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training. 186-193 - Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian:
Data Augmentation for end-to-end Code-Switching Speech Recognition. 194-200 - Bin Wu, Sakriani Sakti, Satoshi Nakamura:
Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR. 201-208 - Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu:
Frame-Level Specaugment for Deep Convolutional Neural Networks in Hybrid ASR Systems. 209-214 - Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, Matthijs Douze, Emmanuel Dupoux:
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain. 215-222 - Ashutosh Pandey, Chunxi Liu, Yun Wang, Yatharth Saraf:
Dual Application of Speech Enhancement for Automatic Speech Recognition. 223-228 - Ruizhi Li, Gregory Sell, Hynek Hermansky:
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream end-to-end ASR. 229-235 - Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu:
Block-Online Guided Source Separation. 236-242 - Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. 243-250 - Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer:
Deep Shallow Fusion for RNN-T Personalization. 251-257 - Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu:
An Evaluation of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. 258-265 - Shih-Hsuan Chiu, Berlin Chen:
Innovative Bert-Based Reranking Language Models for Speech Recognition. 266-271 - Bipasha Sen, Aditya Agarwal, Mirishkar Sai Ganesh, Anil Kumar Vuppala:
Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models. 272-279 - Minguang Song, Yunxin Zhao, Shaojun Wang, Mei Han:
Word Similarity Based Label Smoothing in Rnnlm Training for ASR. 280-285 - Seong Min Kye, Joon Son Chung, Hoirin Kim:
Supervised Attention for Speaker Recognition. 286-293 - Seong Min Kye, Yoohwan Kwon, Joon Son Chung:
Cross Attentive Pooling for Speaker Verification. 294-300 - Tianyan Zhou, Yong Zhao, Jian Wu:
ResNeXt and Res2Net Structures for Speaker Verification. 301-307 - Danwei Cai, Ming Li:
Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. 308-315 - Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang:
Synth2Aug: Cross-Domain Speaker Recognition with TTS Synthesized Speech. 316-322 - Md. Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent:
UIAI System for Short-Duration Speaker Verification Challenge 2020. 323-329 - Zheng Li, Miao Zhao, Lin Li, Qingyang Hong:
Multi-Feature Learning with Canonical Correlation Analysis Constraint for Text-Independent Speaker Verification. 330-337 - Hrishikesh Rao, Kedar Phatak, Elie Khoury:
Improving Speaker Recognition with Quality Indicators. 338-343 - Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-yi Lee:
Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation. 344-350 - Bo-Hao Su, Chi-Chun Lee:
A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition. 351-357 - Michael Neumann, Ngoc Thang Vu:
Investigations on audiovisual emotion recognition in noisy conditions. 358-364 - Patrick Meyer, Ziyi Xu, Tim Fingscheidt:
Improving Convolutional Recurrent Neural Networks for Speech Emotion Recognition. 365-372 - Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau:
On the Use of Self-Supervised Pre-Trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition. 373-380 - Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram:
Self-Supervised Learning with Cross-Modal Transformers for Emotion Recognition. 381-388 - Shi-wook Lee:
Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition. 389-396 - Alice Baird, Shahin Amiriparian, Manuel Milling, Björn W. Schuller:
Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention. 397-402 - Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agents. 403-409 - Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng:
Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis. 410-414 - Kun Zhou, Berrak Sisman, Haizhou Li:
Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech. 415-422 - Yi Lei, Shan Yang, Lei Xie:
Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. 423-430 - Slava Shechtman, Raul Fernandez, David Haws:
Supervised and unsupervised approaches for controlling narrow lexical focus in sequence-to-sequence speech synthesis. 431-437 - Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, Jing Xiao:
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis. 438-445 - Chung-Ming Chien, Hung-yi Lee:
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis. 446-453 - Qiong Hu, Tobias Bleisch, Petko Petkov, Tuomo Raitio, Erik Marchi, Varun Lakshminarasimhan:
Whispered and Lombard Neural Speech Synthesis. 454-461 - Yeunju Choi, Youngmoon Jung, Hoirin Kim:
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification. 462-469 - Eunwoo Song, Ryuichi Yamamoto, Min-Jae Hwang, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim:
Improved Parallel Wavegan Vocoder with Perceptually Weighted Spectrogram Loss. 470-476 - Yang Ai, Haoyu Li, Xin Wang, Junichi Yamagishi, Zhen-Hua Ling:
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation. 477-484 - Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao:
MelGlow: Efficient Waveform Generative Network Based On Location-Variable Convolution. 485-491 - Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:
Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech. 492-498 - Song Li, Beibei Ouyang, Lin Li, Qingyang Hong:
Lightspeech: Lightweight Non-Autoregressive Multi-Speaker Text-To-Speech. 499-506 - Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity. 507-513 - Tzu-hsien Huang, Jheng-Hao Lin, Hung-yi Lee:
How Far Are We from Robust Voice Conversion: A Survey. 514-521 - Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:
Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher. 522-529 - Hayato Shibata, Mingxin Zhang, Takahiro Shinozaki:
Unsupervised Acoustic-to-Articulatory Inversion Neural Network Learning Based on Deterministic Policy Gradient. 530-537 - Tianxiang Chen, Elie Khoury:
Spoofprint: A New Paradigm for Spoofing Attacks Detection. 538-543 - Yang Gao, Jiachen Lian, Bhiksha Raj, Rita Singh:
Detection and Evaluation of Human and Machine Generated Speech in Spoofing Attacks on Automatic Speaker Verification Systems. 544-551 - Chien-yu Huang, Yist Y. Lin, Hung-yi Lee, Lin-Shan Lee:
Defending Your Voice: Adversarial Attack on Voice Conversion. 552-559 - Hiroto Kai, Shinnosuke Takamichi, Sayaka Shiota, Hitoshi Kiya:
Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules. 560-566 - Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-Jin Lee, Joon Son Chung:
Look Who's Not Talking. 567-573 - Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland:
Discriminative Neural Clustering for Speaker Diarisation. 574-581 - Desh Raj, Zili Huang, Sanjeev Khudanpur:
Multi-Class Spectral Clustering with Overlaps for Speaker Diarization. 582-589 - Suchitra Krishnamachari, Manoj Kumar, So Hyun Kim, Catherine Lord, Shrikanth Narayanan:
Developing Neural Representations for Robust Child-Adult Diarization. 590-597 - You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee:
End-To-End Lip Synchronisation Based on Pattern Classification. 598-605 - Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao:
End-To-End Silent Speech Recognition with Acoustic Sensing. 606-612 - Timothy Israel Santos, Andrew Abel, Nick Wilson, Yan Xu:
Speaker-Independent Visual Speech Recognition with the Inception V3 Model. 613-620 - Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li:
Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations. 621-628 - Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi:
Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue. 629-635 - Srinivas Parthasarathy, Shiva Sundaram:
Detecting Expressions with Multimodal Transformers. 636-643 - Muralikrishna H, Shikha Gupta, Dileep Aroor Dinesh, Padmanabhan Rajan:
Noise-Robust Spoken Language Identification Using Language Relevance Factor Based Embedding. 644-651 - Jörgen Valk, Tanel Alumäe:
VOXLINGUA107: A Dataset for Spoken Language Recognition. 652-658 - Xiaosu Tong, Che-Wei Huang, Sri Harish Mallidi, Shaun Joseph, Sonal Pareek, Chander Chandak, Ariya Rastrow, Roland Maas:
Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection. 659-664 - Fang Kang, Feiran Yang, Jun Yang:
Real-Time Independent Vector Analysis with a Deep-Learning-Based Source Model. 665-669 - Amit Meghanani, Chandran Savithri Anoop, A. G. Ramakrishnan:
An Exploration of Log-Mel Spectrogram and MFCC Features for Alzheimer's Dementia Recognition from Spontaneous Speech. 670-677 - Su Ji Park, Alan Rozet:
Film Quality Prediction Using Acoustic, Prosodic and Lexical Cues. 678-684 - Yulan Feng, Alan W. Black, Maxine Eskénazi:
Towards Automatic Route Description Unification in Spoken Dialog Systems. 685-692 - Subash Khanal, Michael T. Johnson, Narjes Bozorg:
Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis. 693-697 - Yang Shen, Ayano Yasukagawa, Daisuke Saito, Nobuaki Minematsu, Kazuya Saito:
Optimized Prediction of Fluency of L2 English Based on Interpretable Network Using Quantity of Phonation and Quality of Pronunciation. 698-704 - Xinhao Wang, Keelan Evanini, Yao Qian, Matthew Mulholland:
Automated Scoring of Spontaneous Speech from Young Learners of English Using Transformers. 705-712 - Binghuai Lin, Liyuan Wang, Hongwei Ding, Xiaoli Feng:
Improving L2 English Rhythm Evaluation with Automatic Sentence Stress Detection. 713-719 - Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna:
Enhancing the Intelligibility of Cleft Lip and Palate Speech Using Cycle-Consistent Adversarial Networks. 720-727 - Ram C. M. C. Shekar, Chelzy Belitz, John H. L. Hansen:
Development of CNN-Based Cochlear Implant and Normal Hearing Sound Recognition Models Using Natural and Auralized Environmental Audio. 728-733 - Haoyu Li, Yang Ai, Junichi Yamagishi:
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model. 734-741 - Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han:
Can We Trust Deep Speech Prior? 742-749 - Yanpei Shi, Thomas Hain:
Contextual Joint Factor Acoustic Embeddings. 750-757 - Yanpei Shi, Thomas Hain:
Supervised Speaker Embedding De-Mixing in Two-Speaker Environment. 758-765 - Jianming Liu, Meng Yu, Yong Xu, Chao Weng, Shi-Xiong Zhang, Lianwu Chen, Dong Yu:
Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising. 766-770 - Aditya Jayasimha, Periyasamy Paramasivam:
Personalizing Speech Start Point and End Point Detection in ASR Systems from Speaker Embeddings. 771-777 - Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. 778-784 - Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. 785-792 - Catalin Zorila, Mohan Li, Rama Doddipatla:
An Investigation into the Multi-channel Time Domain Speaker Extraction Network. 793-800 - Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu:
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks. 801-808 - Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. 809-816 - Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shi-Xiong Zhang, Dong Yu, Michael I. Mandel:
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation. 817-824 - Yi Luo, Cong Han, Nima Mesgarani:
Distortion-Controlled Training for end-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss. 825-832 - Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. 833-840 - Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola García, Kenji Nagamatsu:
Online End-To-End Neural Diarization with Speaker-Tracing Buffer. 841-848 - Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. 849-856 - Yihui Fu, Jian Wu, Yanxin Hu, Mengtao Xing, Lei Xie:
DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation. 857-864 - Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Böddeker, Yanmin Qian, Shinji Watanabe, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. 865-872 - Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu:
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. 873-880 - Desh Raj, Leibny Paola García-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur:
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs. 881-888 - Katerina Zmolíková, Marc Delcroix, Lukás Burget, Tomohiro Nakatani, Jan Honza Cernocký:
Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. 889-896 - Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. 897-904 - Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. 905-911 - Li Chai, Jun Du, Diyuan Liu, Yanhui Tu, Chin-Hui Lee:
Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge. 912-918 - Christiaan Jacobs, Yevgen Matusevych, Herman Kamper:
Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation. 919-926 - Lisa van Staden, Herman Kamper:
A Comparison of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings. 927-934 - Yushi Hu, Shane Settle, Karen Livescu:
Acoustic Span Embeddings for Multilingual Query-by-Example Search. 935-942 - Merve Ünlü, Ebru Arisoy:
Uncertainty-Aware Representations for Spoken Question Answering. 943-949 - Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney:
Tight Integrated End-to-End Training for Cascaded Speech Translation. 950-957 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:
Transformer-Based Direct Speech-To-Speech Translation with Transcoder. 958-965 - Manoj Kumar, Varun Kumar, Hadrien Glaude, Cyprien de Lichy, Aman Alok, Rahul Gupta:
Protoda: Efficient Transfer Learning for Few-Shot Intent Classification. 966-972 - Grégory Senay, Badr Youbi Idrissi, Marine Haziza:
VirAAL: Virtual Adversarial Active Learning for NLU. 973-980 - Mahdi Namazifar, Gökhan Tür, Dilek Hakkani-Tür:
Warped Language Models for Noise Robust Language Understanding. 981-988 - Prashanth Gurunath Shivakumar, Naveen Kumar, Panayiotis G. Georgiou, Shrikanth Narayanan:
RNN Based Incremental Online Spoken Language Understanding. 989-996 - Pu Wang, Hugo Van hamme:
A Light Transformer For Speech-To-Intent Applications. 997-1003 - Shang-Wen Li, Jason Krone, Shuyan Dong, Yi Zhang, Yaser Al-Onaizan:
Meta Learning to Classify Intent and Slot Labels with Noisy Few Shot Examples. 1004-1011 - Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi:
Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents. 1012-1019 - Zhengyu Zhou, In Gyu Choi, Yongliang He, Vikas Yadav, Chin-Hui Lee:
Using Paralinguistic Information to Disambiguate User Intentions for Distinguishing Phrase Structure and Sarcasm in Spoken Dialog Systems. 1020-1027 - Ting-Yun Chang, Yang Liu, Karthik Gopalakrishnan, Behnam Hedayatnia, Pei Zhou, Dilek Hakkani-Tür:
Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense. 1028-1035 - Zexin Lu, Jing Li, Yingyi Zhang, Haisong Zhang:
Getting Your Conversation on Track: Estimation of Residual Life for Conversations. 1036-1043 - Hiroaki Takatsu, Mayu Okuda, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi:
Personalized Extractive Summarization for a News Dialogue System. 1044-1051 - Tomek Rutowski, Elizabeth Shriberg, Amir Harati, Yang Lu, Ricardo Oliveira, Piotr Chlebek:
Cross-Demographic Portability of Deep NLP-Based Depression Models. 1052-1057 - Huan-Yu Chen, Yun-Shao Lin, Chi-Chun Lee:
Through the Words of Viewers: Using Comment-Content Entangled Network for Humor Impression Recognition. 1058-1064 - Parnia Bahar, Christopher Brix, Hermann Ney:
Two-Way Neural Machine Translation: A Proof of Concept for Bidirectional Translation Modeling Using a Two-Dimensional Grid. 1065-1070 - Maya Epps, Juan Uribe, Mandy Korpusik:
A New Dataset for Natural Language Understanding of Exercise Logs in a Food and Fitness Spoken Dialogue System. 1071-1078 - Chiara Semenzin, Lisa Hamrick, Amanda Seidl, Bridgette Kelleher, Alejandrina Cristià:
Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types. 1079-1085 - Marco Marini, Mauro Viganò, Massimo Corbo, Marina Zettin, Gloria Simoncini, Bruno Fattori, Clelia D'Anna, Massimiliano Donati, Luca Fanucci:
IDEA: An Italian Dysarthric Speech Database. 1086-1093 - Delowar Hossain, Yoshinao Sato:
Efficient corpus design for wake-word detection. 1094-1100 - Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlícek, Jean-Marc Odobez:
IEEE SLT 2021 Alpha-Mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines. 1101-1108 - Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals:
Tal: A Synchronised Multi-Speaker Corpus of Ultrasound Tongue Imaging, Audio, and Lip Videos. 1109-1116 - Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao:
The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines. 1117-1123
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.