default search action
SLT 2018: Athens, Greece
- 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18-21, 2018. IEEE 2018, ISBN 978-1-5386-4334-1
- Hoon Chung, Sung Joo Lee, Jeon Gue Park:
High-Degree Feature for Deep Neural Network Based Acoustic Model. 1-5 - Maximilian Strake, Pascal Behr, Timo Lohrenz, Tim Fingscheidt:
Densenet Blstm for Acoustic Modeling in Robust ASR. 6-12 - Siddharth Sehgal, Stuart P. Cunningham, Phil D. Green:
Phase-Based Feature Representations for Improving Recognition of Dysarthric Speech. 13-20 - Takafumi Moriya, Hiroki Kanagawa, Kiyoaki Matsui, Takaaki Fukutomi, Yusuke Shinohara, Yoshikazu Yamaguchi, Manabu Okamoto, Yushi Aono:
Efficient Building Strategy with Knowledge Distillation for Small-Footprint Acoustic Models. 21-28 - Shahram Ghorbani, Ahmet Emin Bulut, John H. L. Hansen:
Advancing Multi-Accented Lstm-CTC Speech Recognition Using a Domain Specific Student-Teacher Learning Paradigm. 29-35 - Badr Abdullah, Irina Illina, Dominique Fohr:
Dynamic Extension of ASR Lexicon Using Wikipedia Data. 36-42 - Hossein Hadian, Daniel Povey, Hossein Sameti, Jan Trmal, Sanjeev Khudanpur:
Improving LF-MMI Using Unconstrained Supervisions for ASR. 43-47 - Hao Tang, James R. Glass:
On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition. 48-55 - Davis Liang, Zhiheng Huang, Zachary C. Lipton:
Learning Noise-Invariant Representations for Robust Speech Recognition. 56-63 - Chunlei Zhang, Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu:
An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition. 64-69 - Bin Wang, Zhijian Ou:
Improved Training Of Neural Trans-Dimensional Random field Language Models with Dynamic Noise-Contrastive Estimation. 70-76 - Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai:
Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems. 77-83 - Rutuja Ubale, Yao Qian, Keelan Evanini:
Exploring End-To-End Attention-Based Neural Networks For Native Language Identification. 84-91 - Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos:
Analysing The Predictions Of a CNN-Based Replay Spoofing Detection System. 92-97 - Xiaoxiao Miao, Ian McLoughlin, Shengyu Yao, Yonghong Yan:
Improved Conditional Generative Adversarial Net Classification For Spoken Language Recognition. 98-104 - Suwon Shon, Wei-Ning Hsu, James R. Glass:
Unsupervised Representation Learning of Speech for Dialect Identification. 105-111 - Seunghyun Yoon, Seokhyun Byun, Kyomin Jung:
Multimodal Speech Emotion Recognition Using Audio and Text. 112-118 - Gábor Gosztolya, Róbert Busa-Fekete:
Posterior Calibration for Multi-Class Paralinguistic Classification. 119-125 - Gaetan Ramet, Philip N. Garner, Michael Baeriswyl, Alexandros Lazaridis:
Context-Aware Attention Mechanism for Speech Emotion Recognition. 126-131 - Bekir Bakar, Cemal Hanilçi:
An Experimental Study on Audio Replay Attack Detection Using Deep Neural Networks. 132-138 - Zeynab Raeesy, Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister:
LSTM-Based Whisper Detection. 139-144 - Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu:
American Sign Language Fingerspelling Recognition in the Wild. 145-152 - Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
WaveNet-Based Zero-Delay Lossless Speech Coding. 153-158 - Rosanna Turrisi, Raffaele Tavarone, Leonardo Badino:
Improving Generalization of Vocal Tract Feature Reconstruction: From Augmented Acoustic Inversion to Articulatory Feature Reconstruction without Articulatory Data. 159-166 - Sasan Asadiabadi, Engin Erzin:
A Deep Learning Approach for Data Driven Vocal Tract Area Function Estimation. 167-173 - Mikaela Grace, Meysam Bastani, Eugene Weinstein:
Occam's Adaptation: A Comparison of Interpolation of Bases Adaptation Methods for Multi-Dialect Acoustic Modeling with LSTMS. 174-181 - Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Speech Chain for Semi-Supervised Learning of Japanese-English Code-Switching ASR and TTS. 182-189 - Rahhal Errattahi, Salil Deena, Asmaa El Hannani, Hassan Ouahmane, Thomas Hain:
Improving ASR Error Detection with RNNLM Adaptation. 190-196 - Minguang Song, Yunxin Zhao, Shaojun Wang:
Multi-Objective Multi-Task Learning on RNNLM for Speech Recognition. 197-203 - Anton Ragni, Qiujia Li, Mark J. F. Gales, Yongqiang Wang:
Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks. 204-211 - Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR. 212-218 - Yasufumi Moriya, Gareth J. F. Jones:
LSTM Language Model Adaptation with Images and Titles for Multimedia Automatic Speech Recognition. 219-226 - Matti Varjokallio, Sami Virpioja, Mikko Kurimo:
First-Pass Techniques for Very Large Vocabulary Speech Recognition ff Morphologically Rich Languages. 227-234 - Joanna Rownicka, Peter Bell, Steve Renals:
Analyzing Deep CNN-Based Utterance Embeddings for Acoustic Model Adaptation. 235-241 - György Kovács, László Tóth, Gábor Gosztolya:
Multi-Band Processing With Gabor Filters and Time Delay Neural Nets for Noise Robust Speech Recognition. 242-249 - Vimal Manohar, Pegah Ghahremani, Daniel Povey, Sanjeev Khudanpur:
A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models. 250-257 - Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. Black:
Domain Robust Feature Extraction for Rapid Low Resource ASR Development. 258-265 - Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo:
StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks. 266-273 - Cheng-chieh Yeh, Po-Chun Hsu, Ju-Chieh Chou, Hung-yi Lee, Lin-Shan Lee:
Rhythm-Flexible Voice Conversion Without Parallel Data Using Cycle-GAN Over Phoneme Posteriorgram Sequences. 274-281 - Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura:
Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion. 282-289 - Zvi Kons, Slava Shechtman, Alexander Sorin, Ron Hoory, Carmel Rabinovitz, Edmilson Da Silva Morais:
Neural TTS Voice Conversion. 290-296 - Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion. 297-303 - Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Improving FFTNet Vocoder with Noise Shaping and Subband Approaches. 304-311 - Raul Fernandez, Andrew Rosenberg:
Comparing Prosodic Frameworks: Investigating the Acoustic-Symbolic Relationship in ToBI and RaP. 312-318 - Fang-Yu Kuo, Sandesh Aryal, Gilles Degottex, Sam Kang, Pierre Lanchantin, Iris Ouyang:
Data Selection for Improving Naturalness of TTS Voices Trained on Small Found Corpuses. 319-324 - Thomas Merritt, Bartosz Putrycz, Adam Nadolski, Tianjun Ye, Daniel Korzekwa, Wiktor Dolecki, Thomas Drugman, Viacheslav Klimkov, Alexis Moinet, Andrew Breen, Rafal Kuklinski, Nikko Strom, Roberto Barra-Chicote:
Comprehensive Evaluation of Statistical Speech Waveform Synthesis. 325-331 - Cassia Valentini-Botinhao, Oliver Watts, Felipe Espic, Simon King:
Examplar-Based Speechwaveform Generation for Text-To-Speech. 332-338 - Anna Björk Nikulásdóttir, Jón Guðnason, Eiríkur Rögnvaldsson:
An Icelandic Pronunciation Dictionary for TTS. 339-345 - Sajad Shirali-Shahreza, Gerald Penn:
MOS Naturalness and the Quest for Human-Like Speech. 346-352 - Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Peidong Wang, Chengzhu Yu, Dan Su, Dong Yu:
Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions. 353-360 - Jennifer Drexler, James R. Glass:
Combining End-to-End and Adversarial Training for Low-Resource Speech Recognition. 361-368 - Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N. Sainath, Karen Livescu:
A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition. 369-375 - Yoonho Boo, Jinhwan Park, Lukas Lee, Wonyong Sung:
On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns. 376-381 - Lahiru Samarakoon, Brian Mak, Albert Y. S. Lam:
Domain Adaptation of End-to-end Speech Recognition in Low-Resource Settings. 382-388 - Takaaki Hori, Jaejin Cho, Shinji Watanabe:
End-to-end Speech Recognition With Word-Based Rnn Language Models. 389-396 - Shruti Palaskar, Florian Metze:
Acoustic-to-Word Recognition with Sequence-to-Sequence Models. 397-404 - Tzu-Hsuan Ting, Chia-Ping Chen:
Combining De-noising Auto-encoder and Recurrent Neural Networks in End-to-End Automatic Speech Recognition for Noise Robustness. 405-410 - Gakuto Kurata, Kartik Audhkhasi:
Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition. 411-417 - Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, Anjuli Kannan, Ding Zhao:
Deep Context: End-to-end Contextual Speech Recognition. 418-425 - Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for end-to-end ASR. 426-433 - Suyoun Kim, Florian Metze:
Dialog-Context Aware end-to-end Speech Recognition. 434-440 - Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani:
Toward Domain-Invariant Speech Recognition via Large Scale Training. 441-447 - Jesse Emond, Bhuvana Ramabhadran, Brian Roark, Pedro J. Moreno, Min Ma:
Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance. 448-455 - Jinyu Li, Liang Lu, Changliang Liu, Yifan Gong:
Exploring Layer Trajectory LSTM with Depth Processing Units and Attention. 456-462 - Wenjie Li, Yu Zhang, Pengyuan Zhang, Fengpei Ge:
Multichannel ASR with Knowledge Distillation and Generalized Cross Correlation Feature. 463-469 - Heiko Holz, Maria Chinkina, Laura Vetter:
Optimizing the Quality of Synthetically Generated Pseudowords for the Task of Minimal-Pair Distinction. 470-476 - Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, Tatsuya Kawahara:
Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition. 477-484 - Ramon Sanabria, Florian Metze:
Hierarchical Multitask Learning With CTC. 485-490 - Alexis Thual, Corentin Dancette, Julien Karadayi, Juan Benjumea, Emmanuel Dupoux:
A K-Nearest Neighbours Approach To Unsupervised Spoken Term Discovery. 491-497 - Timo Lohrenz, Wei Li, Tim Fingscheidt:
A New Timit Benchmark for Context-Independent Phone Recognition Using Turbo Fusion. 498-505 - Tom Bagby, Kanishka Rao, Khe Chai Sim:
Efficient Implementation of Recurrent Neural Network Transducer in Tensorflow. 506-512 - Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic:
Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture. 513-520 - Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiát, Shinji Watanabe, Takaaki Hori:
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. 521-527 - Yusuke Kida, Dung T. Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita:
Speaker Selective Beamformer with Keyword Mask Estimation. 528-534 - Tobias Menne, Ralf Schlüter, Hermann Ney:
Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition. 535-541 - Ke Li, Jinyu Li, Yong Zhao, Kshitiz Kumar, Yifan Gong:
Speaker Adaptation for End-to-End CTC Models. 542-549 - Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier:
An Exploration of Mimic Architectures for Residual Network Based Spectral Mapping. 550-557 - Zhuo Chen, Xiong Xiao, Takuya Yoshioka, Hakan Erdogan, Jinyu Li, Yifan Gong:
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network. 558-565 - Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo:
A Study on Speech Enhancement Using Exponent-Only Floating Point Quantized Neural Network (EOFP-QNN). 566-573 - Hiroshi Seki, Kazumasa Yamamoto, Tomoyosi Akiba, Seiichi Nakagawa:
Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition. 574-580 - Pranay Dighe, Afsaneh Asaei, Hervé Bourlard:
Far-Field ASR Using Low-Rank and Sparse Soft Targets from Parallel Data. 581-587 - Alexandros Koumparoulis, Gerasimos Potamianos:
Deep View2View Mapping for View-Invariant Lipreading. 588-594 - Daisy Stanton, Yuxuan Wang, R. J. Skerry-Ryan:
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis. 595-602 - Gilles Degottex, Mark J. F. Gales:
A Spectrally Weighted Mixture of Least Square Error and Wasserstein Discriminator Loss for Generative SPSS. 603-609 - Hieu-Thi Luong, Junichi Yamagishi:
Scaling and Bias Codes for Modeling Speaker-Adaptive DNN-Based Speech Synthesis Systems. 610-617 - Qingyun Dou, Moquan Wan, Gilles Degottex, Zhiyi Ma, Mark J. F. Gales:
Hierarchical RNNs for Waveform-Level Speech Synthesis. 618-625 - Viacheslav Klimkov, Alexis Moinet, Adam Nadolski, Thomas Drugman:
Parameter Generation Algorithms for Text-To-Speech Synthesis with Recurrent Neural Networks. 626-631 - Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka:
Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks. 632-639 - Da-Rong Liu, Chi-Yu Yang, Szu-Lin Wu, Hung-yi Lee:
Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition. 640-647 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Multi-Scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model. 648-655 - Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur:
Low-Resource Contextual Topic Identification on Speech. 656-663 - Issac Alphonso, Nick Kibre, Tasos Anastasakos:
Ranking Approach to Compact Text Representation for Personal Digital Assistants. 664-669 - Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, Spyros Matsoukas:
A Re-Ranker Scheme For Integrating Large Scale NLU Models. 670-676 - Sanchit Agarwal, Rahul Goel, Tagyoung Chung, Abhishek Sethi, Arindam Mandal, Spyros Matsoukas:
Parsing Coordination For Spoken Language Understanding. 677-684 - Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori:
Quaternion Convolutional Neural Networks For Theme Identification Of Telephone Conversations. 685-691 - Sahar Ghannay, Antoine Caubrière, Yannick Estève, Nathalie Camelin, Edwin Simonnet, Antoine Laurent, Emmanuel Morin:
End-To-End Named Entity And Semantic Concept Extraction From Speech. 692-699 - Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura:
Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues. 700-706 - John Chen, Rashmi Prasad, Svetlana Stoyanchev, Ethan Selfridge, Srinivas Bangalore, Michael Johnston:
Corpus and Annotation Towards NLU for Customer Ordering Dialogs. 707-713 - Jihwan Lee, Dongchan Kim, Ruhi Sarikaya, Young-Bum Kim:
Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding. 714-719 - Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro J. Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters:
From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding. 720-726 - Ryan Price, Bhargav Srinivas Ch, Surbhi Singhal, Srinivas Bangalore:
Investigating the Downstream Impact of Grapheme-Based Acoustic Modeling on Spoken Utterance Classification. 727-734 - Chih-Wen Goo, Yun-Nung Chen:
Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts. 735-742 - Fenxiao Chen, Bin Wang, C.-C. Jay Kuo:
Graph-Based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification. 743-749 - Dogan Dalva, Ümit Güz, Hakan Gürkan:
Extension of Conventional Co-Training Learning Strategies to Three-View and Committee-Based Learning Strategies for Effective Automatic Sentence Segmentation. 750-755 - Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq:
Information-Weighted Neural Cache Language Models for ASR. 756-762 - Ryu Takeda, Kazunori Komatani, Alexander I. Rudnicky:
Word Segmentation From Phoneme Sequences Based On Pitman-Yor Semi-Markov Model Exploiting Subword Information. 763-770 - Lasguido Nio, Koji Murakami:
Intelligence Is Asking The Right Question: A Study On Japanese Question Generation. 771-778 - Shang-Yu Su, Yun-Nung Chen:
Investigating Linguistic Pattern Ordering In Hierarchical Natural Language Generation. 779-786 - Alexey Tikhonov, Ivan P. Yamshchikov:
Guess who? Multilingual Approach For The Automated Generation Of Author-Stylized Poetry. 787-794 - Lutfi Kerem Senel, Ihsan Utlu, Veysel Yücesoy, Aykut Koç, Tolga Çukur:
Generating Semantic Similarity Atlas for Natural Languages. 795-799 - Nevan Wichers, Dilek Hakkani-Tür, Jindong Chen:
Resolving Referring Expressions in Images with Labeled Elements. 800-806 - Sri Harsha Dumpala, Imran A. Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu:
Sentiment Classification on Erroneous ASR Transcripts: A Multi View Learning Approach. 807-814 - Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-rahman Mohamed, Ruhi Sarikaya:
Direct Optimization of F-Measure for Retrieval-Based Personal Question Answering. 815-822 - Rui Zhao, Volker Tresp:
Efficient Dialog Policy Learning via Positive Memory Retention. 823-830 - Nigel G. Ward, Diego Aguirre, Gerardo Cervantes, Olac Fuentes:
Turn-Taking Predictions across Languages and Genres Using an LSTM Recurrent Neural Network. 831-837 - Athanasios Lykartsis, Margarita Kotti, Alexandros Papangelis, Yannis Stylianou:
Prediction of Dialogue Success with Spectral and Rhythm Acoustic Features Using DNNS and SVMS. 838-845 - Junki Ohmura, Maxine Eskénazi:
Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems. 846-853 - Yuka Kobayashi, Takami Yoshida, Kenji Iwata, Hiroshi Fujimura, Masami Akamine:
Out-of-Domain Slot Value Detection for Spoken Dialogue Systems with Context Information. 854-861 - Sungjin Lee:
Accumulating Conversational Skills Using Continual Learning. 862-867 - Rui Zhao, Volker Tresp:
Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient. 868-875 - Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura:
Optimizing Neural Response Generator with Emotional Impact Information. 876-883 - Mandy Korpusik, James R. Glass:
Convolutional Neural Networks for Dialogue State Tracking without Pre-Trained Word Vectors or Semantic Dictionaries. 884-891 - Chandra Khatri, Rahul Goel, Behnam Hedayatnia, Angeliki Metanillou, Anushree Venkatesh, Raefer Gabriel, Arindam Mandal:
Contextual Topic Modeling For Dialog Systems. 892-899 - Izzeddin Gur, Dilek Hakkani-Tür, Gökhan Tür, Pararth Shah:
User Modeling for Task Oriented Dialogues. 900-906 - Ankur Gandhe, Ariya Rastrow, Björn Hoffmeister:
Scalable Language Model Adaptation for Spoken Dialogue Systems. 907-912 - Xinhao Wang, Binod Gyawali, James V. Bruno, Hillary R. Molloy, Keelan Evanini, Klaus Zechner:
Discourse Modeling of Non-Native Spontaneous Speech Using the Rhetorical Structure Theory Framework. 913-920 - Elizabeth Salesky, Susanne Burger, Jan Niehues, Alex Waibel:
Towards Fluent Translations From Disfluent Speech. 921-926 - Ali Janalizadeh Choobbasti, Mohammad Erfan Gholamian, Amir Vaheb, Saeid Safavi:
JSpeech: A Multi-Lingual Conversational Speech Corpus. 927-933 - Katsuya Yokoyama, Hiroaki Takatsu, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi:
Investigation of Users' Short Responses in Actual Conversation System and Automatic Recognition of their Intentions. 934-940 - Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-Shan Lee:
Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval. 941-948 - Chia-Hsuan Lee, Shang-Ming Wang, Huan-Cheng Chang, Hung-yi Lee:
ODSQA: Open-Domain Spoken Question Answering Dataset. 949-956 - Marco Del Vecchio, Andrey Malinin, Mark J. F. Gales:
Improved Auto-Marking Confidence for Spoken Language Assessment. 957-963 - Anastassia Loukina, Nitin Madnani, Beata Beigman Klebanov, Abhinav Misra, Georgi Angelov, Ognjen Todic:
Evaluating on-device ASR on Field Recordings from an Interactive Reading Companion. 964-970 - Suguru Kabashima, Yusuke Inoue, Daisuke Saito, Nobuaki Minematsu:
DNN-Based Scoring of Language Learners' Proficiency Using Learners' Shadowings and Native Listeners' Responsive Shadowings. 971-978 - Yao Qian, Rutuja Ubale, Matthew Mulholland, Keelan Evanini, Xinhao Wang:
A Prompt-Aware Neural Network Approach to Content-Based Scoring of Non-Native Spontaneous Speech. 979-986 - M. Joana Correia, Bhiksha Raj, Isabel Trancoso:
Querying Depression Vlogs. 987-993 - Yu Wang, Jeremy Heng Meng Wong, Mark J. F. Gales, Kate M. Knill, Anton Ragni:
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment. 994-1000 - Mohamed Adel, Mohamed Afify, Akram Gaballah, Magda B. Fayek:
Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic Time Warping. 1001-1006 - Suwon Shon, Hao Tang, James R. Glass:
Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model. 1007-1013 - Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li:
Generative X-Vectors for Text-Independent Speaker Verification. 1014-1020 - Mirco Ravanelli, Yoshua Bengio:
Speaker Recognition from Raw Waveform with SincNet. 1021-1028 - Ivan Himawan, Md. Hafizur Rahman, Sridha Sridharan, Clinton Fookes, Ahilan Kanagasundaram:
Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge. 1029-1035 - Nikolaos Flemotomos, Zhuohao Chen, David C. Atkins, Shrikanth S. Narayanan:
Role Annotated Speech Recognition for Conversational Interactions. 1036-1043 - Raymond W. M. Ng, Xuechen Liu, Pawel Swietojanski:
Teacher-Student Training for Text-Independent Speaker Recognition. 1044-1051 - Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Hitoshi Yamamoto, Takafumi Koshinaka:
Attention Mechanism in Speaker Recognition: What Does it Learn in Deep Speaker Embedding? 1052-1059 - Finnian Kelly, John H. L. Hansen:
Detection and Calibration of Whisper for Speaker Recognition. 1060-1065 - Tanel Alumäe:
Training Speaker Recognition Models with Recording-Level Labels. 1066-1072 - Narumitsu Ikeda, Yoshinao Sato, Hirokazu Takahashi:
Short Utterance Speaker Recognition by Reservoir with Self-Organized Mapping. 1073-1077
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.