default search action
Di Hu 0001
Person information
- affiliation: Baidu Research, Big Data Laboratory, Beijing, China
- affiliation: Renmin University of China, Gaoling School of Artificial Intelligence, Beijing, China
- affiliation (former): Northwestern Polytechnical University, School of Computer Science and Engineering, OPTIMAL, Xi'an, China
Other persons with the same name
- Di Hu — disambiguation page
- Di Hu 0002 — Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China (and 1 more)
- Di Hu 0003 — Wuhan Polytechnic University, School of Math and Computer, China
- Di Hu 0004 — Beihang University, School of Astronautics, Beijing, China
- Di Hu 0005 — Southeast University, School of Instrument Science and Engineering, Nanjing, China (and 1 more)
- Di Hu 0006 — Huazhong University of Science and Technology, School of Energy and Power Engineering, Wuhan, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j8]Xingjian Li, Di Hu, Xuhong Li, Haoyi Xiong, Cheng-Zhong Xu, Dejing Dou:
Towards accurate knowledge transfer via target-awareness representation disentanglement. Mach. Learn. 113(2): 699-723 (2024) - [j7]Zequn Yang, Han Zhang, Yake Wei, Zheng Wang, Feiping Nie, Di Hu:
Geometric-inspired graph-based Incomplete Multi-view Clustering. Pattern Recognit. 147: 110082 (2024) - [c34]Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li:
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer. AAAI 2024: 5669-5677 - [c33]Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu:
Enhancing Multimodal Cooperation via Sample-Level Modality Valuation. CVPR 2024: 27328-27337 - [c32]Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu:
Diagnosing and Re-learning for Balanced Multimodal Learning. ECCV (64) 2024: 71-86 - [c31]Zequn Yang, Yake Wei, Ce Liang, Di Hu:
Quantifying and Enhancing Multi-modal Robustness with Modality Preference. ICLR 2024 - [c30]Yake Wei, Di Hu:
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance. ICML 2024 - [c29]Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li:
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. ICRA 2024: 2073-2080 - [c28]Guangyao Li, Henghui Du, Di Hu:
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues. ACM Multimedia 2024: 5997-6005 - [c27]Peiwen Sun, Honggang Zhang, Di Hu:
Unveiling and Mitigating Bias in Audio Visual Segmentation. ACM Multimedia 2024: 7259-7268 - [i48]Zequn Yang, Yake Wei, Ce Liang, Di Hu:
Quantifying and Enhancing Multi-modal Robustness with Modality Preference. CoRR abs/2402.06244 (2024) - [i47]Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang:
Multimodal Fusion on Low-quality Data: A Comprehensive Survey. CoRR abs/2404.18947 (2024) - [i46]Yake Wei, Di Hu:
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance. CoRR abs/2405.17730 (2024) - [i45]Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li:
Learning Manipulation by Predicting Interaction. CoRR abs/2406.00439 (2024) - [i44]Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao, Yueguo Chen, Weizheng Lu, Ji-Rong Wen:
YuLan: An Open-source Large Language Model. CoRR abs/2406.19853 (2024) - [i43]Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu:
Diagnosing and Re-learning for Balanced Multimodal Learning. CoRR abs/2407.09705 (2024) - [i42]Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu:
Can Textual Semantics Mitigate Sounding Object Segmentation Preference? CoRR abs/2407.10947 (2024) - [i41]Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu:
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes. CoRR abs/2407.10957 (2024) - [i40]Peiwen Sun, Honggang Zhang, Di Hu:
Unveiling and Mitigating Bias in Audio Visual Segmentation. CoRR abs/2407.16638 (2024) - [i39]Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ji-Rong Wen:
Towards Effective and Efficient Continual Pre-training of Large Language Models. CoRR abs/2407.18743 (2024) - [i38]Ruoxuan Feng, Di Hu, Wenke Ma, Xuelong Li:
Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation. CoRR abs/2408.01366 (2024) - [i37]Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li:
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance. CoRR abs/2408.02912 (2024) - [i36]Xincheng Pang, Wenke Xia, Zhigang Wang, Bin Zhao, Di Hu, Dong Wang, Xuelong Li:
Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection. CoRR abs/2408.05107 (2024) - 2023
- [j6]Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu:
Self-supervised audiovisual representation learning for remote sensing data. Int. J. Appl. Earth Obs. Geoinformation 116: 103130 (2023) - [j5]Di Hu, Zheng Wang, Feiping Nie, Rong Wang, Xuelong Li:
Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis. IEEE Trans. Multim. 25: 3534-3545 (2023) - [c26]Ruize Xu, Ruoxuan Feng, Shi-Xiong Zhang, Di Hu:
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning. ICASSP 2023: 1-5 - [c25]Andong Deng, Xingjian Li, Di Hu, Tianyang Wang, Haoyi Xiong, Cheng-Zhong Xu:
Towards Inadequately Pre-trained Models in Transfer Learning. ICCV 2023: 19340-19351 - [c24]Guangyao Li, Yixin Xu, Di Hu:
Multi-Scale Attention for Audio Question Answering. INTERSPEECH 2023: 3442-3446 - [c23]Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu:
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. ACM Multimedia 2023: 1303-1313 - [c22]Guangyao Li, Wenxuan Hou, Di Hu:
Progressive Spatio-temporal Perception for Audio-Visual Question Answering. ACM Multimedia 2023: 7808-7816 - [c21]Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Di Hu:
SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance. WACV 2023: 5157-5166 - [c20]Xinchi Zhou, Dongzhan Zhou, Di Hu, Hang Zhou, Wanli Ouyang:
Exploiting Visual Context Semantics for Sound Source Localization. WACV 2023: 5188-5197 - [i35]Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu:
TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat. CoRR abs/2301.05880 (2023) - [i34]Ruoxuan Feng, Wenke Xia, Di Hu:
Revisiting Pre-training in Audio-Visual Learning. CoRR abs/2302.03533 (2023) - [i33]Wenke Xia, Xu Zhao, Xincheng Pang, Changqing Zhang, Di Hu:
Balanced Audiovisual Dataset for Imbalance Analysis. CoRR abs/2302.10912 (2023) - [i32]Ruize Xu, Ruoxuan Feng, Shi-Xiong Zhang, Di Hu:
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning. CoRR abs/2303.05338 (2023) - [i31]Wenke Xia, Xingjian Li, Andong Deng, Haoyi Xiong, Dejing Dou, Di Hu:
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos. CoRR abs/2304.07775 (2023) - [i30]Guangyao Li, Yixin Xu, Di Hu:
Multi-Scale Attention for Audio Question Answering. CoRR abs/2305.17993 (2023) - [i29]Wenxuan Hou, Guangyao Li, Yapeng Tian, Di Hu:
Towards Long Form Audio-visual Video Understanding. CoRR abs/2306.09431 (2023) - [i28]Guangyao Li, Wenxuan Hou, Di Hu:
Progressive Spatio-temporal Perception for Audio-Visual Question Answering. CoRR abs/2308.05421 (2023) - [i27]Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu:
Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation. CoRR abs/2309.06255 (2023) - [i26]Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li:
Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer. CoRR abs/2309.07929 (2023) - [i25]Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu:
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. CoRR abs/2311.02847 (2023) - 2022
- [j4]Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen:
Class-Aware Sounding Objects Localization via Audiovisual Correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 44(12): 9844-9859 (2022) - [c19]Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou:
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing. AAAI 2022: 1801-1809 - [c18]Dongzhan Zhou, Xinchi Zhou, Di Hu, Hang Zhou, Lei Bai, Ziwei Liu, Wanli Ouyang:
SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation. AAAI 2022: 3544-3552 - [c17]Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu:
Balanced Multimodal Learning via On-the-fly Gradient Modulation. CVPR 2022: 8228-8237 - [c16]Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu:
Learning to Answer Questions in Dynamic Audio-Visual Scenarios. CVPR 2022: 19086-19096 - [c15]Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu:
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction. HCMA@MM 2022: 15-23 - [i24]Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou:
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing. CoRR abs/2202.06406 (2022) - [i23]Andong Deng, Xingjian Li, Zhibing Li, Di Hu, Chengzhong Xu, Dejing Dou:
Inadequately Pre-trained Models are Better Feature Extractors. CoRR abs/2203.04668 (2022) - [i22]Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu:
SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance. CoRR abs/2203.13535 (2022) - [i21]Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu:
Learning to Answer Questions in Dynamic Audio-Visual Scenarios. CoRR abs/2203.14072 (2022) - [i20]Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu:
Balanced Multimodal Learning via On-the-fly Gradient Modulation. CoRR abs/2203.15332 (2022) - [i19]Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu:
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction. CoRR abs/2208.05220 (2022) - [i18]Yake Wei, Di Hu, Yapeng Tian, Xuelong Li:
Learning in Audio-visual Context: A Review, Analysis, and New Perspective. CoRR abs/2208.09579 (2022) - 2021
- [j3]Sijia Yang, Haoyi Xiong, Di Hu, Kaibo Xu, Licheng Wang, Peizhen Zhu, Zeyi Sun:
Generalising combinatorial discriminant analysis through conditioning truncated Rayleigh flow. Knowl. Inf. Syst. 63(8): 2189-2208 (2021) - [c14]Dong Wang, Di Hu, Xingjian Li, Dejing Dou:
Temporal Relational Modeling with Self-Supervision for Action Segmentation. AAAI 2021: 2729-2737 - [c13]Yapeng Tian, Di Hu, Chenliang Xu:
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation. CVPR 2021: 2745-2754 - [c12]Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, Errui Ding:
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification. CVPR 2021: 12914-12923 - [i17]Yapeng Tian, Di Hu, Chenliang Xu:
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation. CoRR abs/2104.02026 (2021) - [i16]Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, Errui Ding:
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification. CoRR abs/2104.12961 (2021) - [i15]Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu:
Self-supervised Audiovisual Representation Learning for Remote Sensing Data. CoRR abs/2108.00688 (2021) - [i14]Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen:
Class-aware Sounding Objects Localization via Audiovisual Correspondence. CoRR abs/2112.11749 (2021) - 2020
- [c11]Di Hu, Xuhong Li, Lichao Mou, Pu Jin, Dong Chen, Liping Jing, Xiaoxiang Zhu, Dejing Dou:
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition. ECCV (24) 2020: 68-84 - [c10]Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin:
Multiple Sound Sources Localization from Coarse to Fine. ECCV (20) 2020: 292-308 - [c9]Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou:
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching. NeurIPS 2020 - [i13]Di Hu, Zheng Wang, Haoyi Xiong, Dong Wang, Feiping Nie, Dejing Dou:
Curriculum Audiovisual Learning. CoRR abs/2001.09414 (2020) - [i12]Di Hu, Lichao Mou, Qingzhong Wang, Junyu Gao, Yuansheng Hua, Dejing Dou, Xiao Xiang Zhu:
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions. CoRR abs/2005.07097 (2020) - [i11]Di Hu, Xuhong Li, Lichao Mou, Pu Jin, Dong Chen, Liping Jing, Xiaoxiang Zhu, Dejing Dou:
Cross-Task Transfer for Multimodal Aerial Scene Recognition. CoRR abs/2005.08449 (2020) - [i10]Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin:
Multiple Sound Sources Localization from Coarse to Fine. CoRR abs/2007.06355 (2020) - [i9]Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou:
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching. CoRR abs/2010.05466 (2020) - [i8]Xingjian Li, Di Hu, Xuhong Li, Haoyi Xiong, Zhi Ye, Zhipeng Wang, Chengzhong Xu, Dejing Dou:
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement. CoRR abs/2010.08532 (2020) - [i7]Dong Wang, Di Hu, Xingjian Li, Dejing Dou:
Temporal Relational Modeling with Self-Supervision for Action Segmentation. CoRR abs/2012.07508 (2020)
2010 – 2019
- 2019
- [j2]Di Hu, Feiping Nie, Xuelong Li:
Discrete Spectral Hashing for Efficient Similarity Retrieval. IEEE Trans. Image Process. 28(3): 1080-1091 (2019) - [j1]Di Hu, Feiping Nie, Xuelong Li:
Deep Binary Reconstruction for Cross-Modal Hashing. IEEE Trans. Multim. 21(4): 973-985 (2019) - [c8]Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang:
Listen to the Image. CVPR 2019: 7972-7981 - [c7]Di Hu, Feiping Nie, Xuelong Li:
Deep Multimodal Clustering for Unsupervised Audiovisual Learning. CVPR 2019: 9248-9257 - [c6]Di Hu, Chengze Wang, Feiping Nie, Xuelong Li:
Dense Multimodal Fusion for Hierarchically Joint Representation. ICASSP 2019: 3941-3945 - [i6]Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang:
Listen to the Image. CoRR abs/1904.09115 (2019) - 2018
- [i5]Di Hu, Feiping Nie, Xuelong Li:
Deep Co-Clustering for Unsupervised Audiovisual Learning. CoRR abs/1807.03094 (2018) - [i4]Di Hu, Feiping Nie, Xuelong Li:
Deep LDA Hashing. CoRR abs/1810.03402 (2018) - [i3]Di Hu, Feiping Nie, Xuelong Li:
Dense Multimodal Fusion for Hierarchically Joint Representation. CoRR abs/1810.03414 (2018) - 2017
- [c5]Xuelong Li, Di Hu, Feiping Nie:
Large Graph Hashing with Spectral Rotation. AAAI 2017: 2203-2209 - [c4]Xuelong Li, Di Hu, Xiaoqiang Lu:
Image2song: Song Retrieval via Bridging Image Content and Lyric Words. ICCV 2017: 5650-5659 - [c3]Xuelong Li, Di Hu, Feiping Nie:
Deep Binary Reconstruction for Cross-modal Hashing. ACM Multimedia 2017: 1398-1406 - [i2]Xuelong Li, Di Hu, Feiping Nie:
Deep Binary Reconstruction for Cross-modal Hashing. CoRR abs/1708.05127 (2017) - [i1]Xuelong Li, Di Hu, Xiaoqiang Lu:
Image2song: Song Retrieval via Bridging Image Content and Lyric Words. CoRR abs/1708.05851 (2017) - 2016
- [c2]Di Hu, Xuelong Li, Xiaoqiang Lu:
Temporal Multimodal Learning in Audiovisual Speech Recognition. CVPR 2016: 3574-3582 - [c1]Di Hu, Xiaoqiang Lu, Xuelong Li:
Multimodal Learning via Exploring Deep Semantic Similarity. ACM Multimedia 2016: 342-346
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-22 19:47 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint