Google Scholar

User profiles for Songyang Zhang

Songyang Zhang - Verified email at pjlab.org.cn - Cited by 5986

Songyang Zhang - Verified email at amazon.com - Cited by 4181

Song-Yang Zhang - Verified email at mcgill.ca - Cited by 2847

Make-a-video: Text-to-video generation without text-video data

…, A Polyak, T Hayes, X Yin, J An, S Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose Make-A-Video -- an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn …

Save Cite Cited by 1585 Related articles All 3 versions View as HTML

[PDF] thecvf.com

Distribution alignment: A unified framework for long-tail visual recognition

S Zhang, Z Li, S Yan, X He… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Despite the success of the deep neural networks, it remains challenging to effectively build a
system for long-tail visual recognition tasks. To address this problem, we first investigate the …

Save Cite Cited by 385 Related articles All 5 versions View as HTML

[PDF] nature.com

Suppression of mitochondrial ROS by prohibitin drives glioblastoma progression and therapeutic resistance

H Huang, S Zhang, Y Li, Z Liu, L Mi, Y Cai… - Nature …, 2021 - nature.com

Low levels of reactive oxygen species (ROS) are crucial for maintaining cancer stem cells (CSCs)
and their ability to resist therapy, but the ROS regulatory mechanisms in CSCs …

Save Cite Cited by 125 Related articles All 10 versions

[PDF] msu.edu

On geometric features for skeleton-based action recognition using multilayer lstm networks

S Zhang, X Liu, J Xiao - 2017 IEEE winter conference on …, 2017 - ieeexplore.ieee.org

RNN-based approaches have achieved outstanding performance on action recognition with
skeleton inputs. Currently these methods limit their inputs to coordinates of joints and …

Save Cite Cited by 382 Related articles All 8 versions

Related searches

[PDF] arxiv.org

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022 - Springer

Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Save Cite Cited by 411 Related articles All 8 versions

[HTML] nih.gov

Gut microbiota and intestinal FXR mediate the clinical benefits of metformin

…, B Chen, S Zhang, C Yun, G Lian, X Zhang, H Zhang… - Nature medicine, 2018 - nature.com

The anti-hyperglycemic effect of metformin is believed to be caused by its direct action on
signaling processes in hepatocytes, leading to lower hepatic gluconeogenesis. Recently, …

Save Cite Cited by 945 Related articles All 11 versions

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer

Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

Save Cite Cited by 1235 Related articles All 11 versions

[PDF] aaai.org

Learning 2d temporal adjacent networks for moment localization with natural language

S Zhang, H Peng, J Fu, J Luo - Proceedings of the AAAI conference on …, 2020 - ojs.aaai.org

We address the problem of retrieving a specific moment from an untrimmed video by a
query sentence. This is a challenging problem because a target moment may take place in …

Save Cite Cited by 565 Related articles All 7 versions View as HTML

[PDF] arxiv.org

Part-aware prototype network for few-shot semantic segmentation

Y Liu, X Zhang, S Zhang, X He - European conference on computer vision, 2020 - Springer

Few-shot semantic segmentation aims to learn to segment new object classes with only a
few annotated examples, which has a wide range of real-world applications. Most existing …

Save Cite Cited by 423 Related articles All 6 versions

[PDF] arxiv.org

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

…, X Wei, S Zhang, H Duan, M Cao, W Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form
text-image composition and comprehension. This model goes beyond conventional …

Save Cite Cited by 317 Related articles All 3 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Songyang Zhang

Make-a-video: Text-to-video generation without text-video data

Distribution alignment: A unified framework for long-tail visual recognition

Suppression of mitochondrial ROS by prohibitin drives glioblastoma progression and therapeutic resistance

On geometric features for skeleton-based action recognition using multilayer lstm networks

Related searches

Expanding language-image pretrained models for general video recognition

Gut microbiota and intestinal FXR mediate the clinical benefits of metformin

Mmbench: Is your multi-modal model an all-around player?

Learning 2d temporal adjacent networks for moment localization with natural language

Part-aware prototype network for few-shot semantic segmentation

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model