User profiles for Zexu Pan

Pan Zexu

Alibaba; MERL; National University of Singapore
Verified email at u.nus.edu
Cited by 682

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker
speech mixture when given a cue that represents the target speaker, such as a pre-…

Multi-modal attention for speech emotion recognition

Z Pan, Z Luo, J Yang, H Li - arXiv preprint arXiv:2009.04107, 2020 - arxiv.org
Emotion represents an essential aspect of human speech that is manifested in speech
prosody. Speech, visual, and textual cues are complementary in human communication. In this …

Muse: Multi-modal target speaker extraction with visual cues

Z Pan, R Tao, C Xu, H Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Speaker extraction algorithm relies on the speech sample from the target speaker as the
reference point to focus its attention. Such a reference speech is typically pre-recorded. On the …

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker
speech mixture. The prior studies focus mostly on speaker extraction from a highly …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent studies …

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of target …

Speaker extraction with co-speech gestures cue

Z Pan, X Qian, H Li - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Target speech extraction aims to extract, based on a given conditioning cue, a target
speech signal that is corrupted by interfering sources, such as noise or competing speakers. …

[PDF][PDF] PARIS: Pseudo-AutoRegressIve siamese training for online speech separation

Z Pan, G Wichern, FG Germain, K Saijo, J Le Roux - Proc. Interspeech, 2024 - merl.com
While offline speech separation models have made significant advances, the streaming
regime remains less explored and is typically limited to causal modifications of existing offline …