Cited By
View all- Yu SYang C(2024)MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker ExtractionMultiMedia Modeling10.1007/978-3-031-53308-2_17(227-238)Online publication date: 29-Jan-2024
- Mo SMorgado PKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)A unified audio-visual learning framework for localization, separation, and recognitionProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619449(25006-25017)Online publication date: 23-Jul-2023
- Hsu YBai M(2023)Learning-based robust speaker counting and separation with the aid of spatial coherenceEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00298-32023:1Online publication date: 20-Sep-2023
- Show More Cited By