Google Scholar

User profiles for Zexu Pan

Pan Zexu

Alibaba; MERL; National University of Singapore

Verified email at u.nus.edu

Cited by 682

[PDF] ieee.org

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker
speech mixture when given a cue that represents the target speaker, such as a pre-…

Save Cite Cited by 48 Related articles All 4 versions

[PDF] arxiv.org

Multi-modal attention for speech emotion recognition

Z Pan, Z Luo, J Yang, H Li - arXiv preprint arXiv:2009.04107, 2020 - arxiv.org

Emotion represents an essential aspect of human speech that is manifested in speech
prosody. Speech, visual, and textual cues are complementary in human communication. In this …

Save Cite Cited by 90 Related articles All 10 versions View as HTML

[PDF] arxiv.org

Muse: Multi-modal target speaker extraction with visual cues

Z Pan, R Tao, C Xu, H Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Speaker extraction algorithm relies on the speech sample from the target speaker as the
reference point to focus its attention. Such a reference speech is typically pre-recorded. On the …

Save Cite Cited by 54 Related articles All 6 versions

[PDF] acm.org

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Save Cite Cited by 194 Related articles All 5 versions

[PDF] ieee.org

USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker
speech mixture. The prior studies focus mostly on speaker extraction from a highly …

Save Cite Cited by 49 Related articles All 4 versions

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent studies …

Save Cite Cited by 16 Related articles All 2 versions

[PDF] arxiv.org

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of target …

Save Cite Cited by 3 Related articles All 5 versions

[PDF] ieee.org

Speaker extraction with co-speech gestures cue

Z Pan, X Qian, H Li - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …

Save Cite Cited by 28 Related articles All 3 versions

[PDF] arxiv.org

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Target speech extraction aims to extract, based on a given conditioning cue, a target
speech signal that is corrupted by interfering sources, such as noise or competing speakers. …

Save Cite Cited by 5 Related articles All 6 versions

[PDF] merl.com

[PDF][PDF] PARIS: Pseudo-AutoRegressIve siamese training for online speech separation

Z Pan, G Wichern, FG Germain, K Saijo, J Le Roux - Proc. Interspeech, 2024 - merl.com

While offline speech separation models have made significant advances, the streaming
regime remains less explored and is typically limited to causal modifications of existing offline …

Save Cite Cited by 3 Related articles All 5 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Zexu Pan

Pan Zexu

Selective listening by synchronizing speech with lips

Multi-modal attention for speech emotion recognition

Muse: Multi-modal target speaker extraction with visual cues

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

USEV: Universal speaker extraction with visual cue

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Speech separation with pretrained frontend to minimize domain mismatch

Speaker extraction with co-speech gestures cue

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

[PDF][PDF] PARIS: Pseudo-AutoRegressIve siamese training for online speech separation