default search action

combined dblp search
author search
venue search
publication search

ask others

Haohan Guo

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/LuWGL0M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/LuWGL0M24
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu, Helen Meng:
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations. ICASSP 2024: 11141-11145
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/0002MCGWLM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/0002MCGWLM24
Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng:
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition. ICASSP 2024: 11986-11990
[c7]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/YangT0HLGCSZ0ZW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/YangT0HLGCSZ0ZW24
Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Zhou Zhao, Xixin Wu, Helen M. Meng:
UniAudio: Towards Universal Audio Generation with Large Language Models. ICML 2024
[i19]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-04152
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-04152
Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng:
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition. CoRR abs/2401.04152 (2024)
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-08093
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-08093
Mateusz Lajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman:
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data. CoRR abs/2402.08093 (2024)
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-02328
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-02328
Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng:
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models. CoRR abs/2406.02328 (2024)
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-02940
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-02940
Haohan Guo, Fenglong Xie, Dongchao Yang, Hui Lu, Xixin Wu, Helen Meng:
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder. CoRR abs/2406.02940 (2024)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-10056
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-10056
Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng:
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner. CoRR abs/2406.10056 (2024)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2408-13893
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2408-13893
Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng:
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models. CoRR abs/2408.13893 (2024)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-00933
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-00933
Haohan Guo, Fenglong Xie, Kun Xie, Dongchao Yang, Dake Guo, Xixin Wu, Helen Meng:
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis. CoRR abs/2409.00933 (2024)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-03283
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-03283
Haohan Guo, Kun Liu, Feiyu Shen, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kaituo Xu:
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications. CoRR abs/2409.03283 (2024)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-11630
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-11630
Haohan Guo, Fenglong Xie, Dongchao Yang, Xixin Wu, Helen Meng:
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation. CoRR abs/2409.11630 (2024)
2023
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/GuoXWSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/GuoXWSM23
Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng:
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1811-1824 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-00126
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-00126
Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng:
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning. CoRR abs/2309.00126 (2023)
2022
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GuoZML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GuoZML22
Haohan Guo, Zhiping Zhou, Fanbo Meng, Kai Liu:
Improving Adversarial Waveform Generation Based Singing Voice Conversion with Harmonic Signals. ICASSP 2022: 6657-6661
[c5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLWM22
Haohan Guo, Hui Lu, Xixin Wu, Helen Meng:
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS. INTERSPEECH 2022: 1566-1570
[c4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoXSWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoXSWM22
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. INTERSPEECH 2022: 1611-1615
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-10130
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-10130
Haohan Guo, Zhiping Zhou, Fanbo Meng, Kai Liu:
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals. CoRR abs/2201.10130 (2022)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-01080
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-01080
Haohan Guo, Hui Lu, Xixin Wu, Helen Meng:
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS. CoRR abs/2203.01080 (2022)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2209-10887
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2209-10887
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. CoRR abs/2209.10887 (2022)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-15131
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-15131
Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng:
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations. CoRR abs/2210.15131 (2022)
2021
[c3]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/GuoZSHX21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/GuoZSHX21
Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agents. SLT 2021: 403-409
2020
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2005-10438
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2005-10438
Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agent. CoRR abs/2005.10438 (2020)
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2012-01837
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2012-01837
Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu:
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training. CoRR abs/2012.01837 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoSHX19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoSHX19
Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-Based End-to-End TTS Training Algorithm. INTERSPEECH 2019: 1288-1292
[c1]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoSHX19a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoSHX19a
Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. INTERSPEECH 2019: 4460-4464
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1901-00707
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1901-00707
Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong:
Feature reinforcement with word embedding and parsing information in neural TTS. CoRR abs/1901.00707 (2019)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1904-04764
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1904-04764
Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. CoRR abs/1904.04764 (2019)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1904-04775
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1904-04775
Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-based End-to-End TTS Training Algorithm. CoRR abs/1904.04775 (2019)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.