User profiles for Zexu Pan
Pan ZexuAlibaba; MERL; National University of Singapore Verified email at u.nus.edu Cited by 682 |
Selective listening by synchronizing speech with lips
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker
speech mixture when given a cue that represents the target speaker, such as a pre-…
speech mixture when given a cue that represents the target speaker, such as a pre-…
Multi-modal attention for speech emotion recognition
Emotion represents an essential aspect of human speech that is manifested in speech
prosody. Speech, visual, and textual cues are complementary in human communication. In this …
prosody. Speech, visual, and textual cues are complementary in human communication. In this …
Muse: Multi-modal target speaker extraction with visual cues
Speaker extraction algorithm relies on the speech sample from the target speaker as the
reference point to focus its attention. Such a reference speech is typically pre-recorded. On the …
reference point to focus its attention. Such a reference speech is typically pre-recorded. On the …
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …
more speakers. The successful ASD depends on accurate interpretation of short-term and …
USEV: Universal speaker extraction with visual cue
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker
speech mixture. The prior studies focus mostly on speaker extraction from a highly …
speech mixture. The prior studies focus mostly on speaker extraction from a highly …
NeuroHeed: Neuro-steered speaker extraction using EEG signals
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent studies …
competing voices and background noise, known as selective auditory attention. Recent studies …
Speech separation with pretrained frontend to minimize domain mismatch
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of target …
Typically, most separation models are trained on synthetic data due to the unavailability of target …
Speaker extraction with co-speech gestures cue
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …
mixture speech. There have been studies to use a pre-recorded speech sample or face …
Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Target speech extraction aims to extract, based on a given conditioning cue, a target
speech signal that is corrupted by interfering sources, such as noise or competing speakers. …
speech signal that is corrupted by interfering sources, such as noise or competing speakers. …
[PDF][PDF] PARIS: Pseudo-AutoRegressIve siamese training for online speech separation
While offline speech separation models have made significant advances, the streaming
regime remains less explored and is typically limited to causal modifications of existing offline …
regime remains less explored and is typically limited to causal modifications of existing offline …