User profiles for Songyang Zhang
Make-a-video: Text-to-video generation without text-video data
We propose Make-A-Video -- an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn …
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn …
Distribution alignment: A unified framework for long-tail visual recognition
Despite the success of the deep neural networks, it remains challenging to effectively build a
system for long-tail visual recognition tasks. To address this problem, we first investigate the …
system for long-tail visual recognition tasks. To address this problem, we first investigate the …
Suppression of mitochondrial ROS by prohibitin drives glioblastoma progression and therapeutic resistance
H Huang, S Zhang, Y Li, Z Liu, L Mi, Y Cai… - Nature …, 2021 - nature.com
Low levels of reactive oxygen species (ROS) are crucial for maintaining cancer stem cells (CSCs)
and their ability to resist therapy, but the ROS regulatory mechanisms in CSCs …
and their ability to resist therapy, but the ROS regulatory mechanisms in CSCs …
On geometric features for skeleton-based action recognition using multilayer lstm networks
RNN-based approaches have achieved outstanding performance on action recognition with
skeleton inputs. Currently these methods limit their inputs to coordinates of joints and …
skeleton inputs. Currently these methods limit their inputs to coordinates of joints and …
Expanding language-image pretrained models for general video recognition
Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …
joint representation from web-scale data, demonstrating remarkable “zero-shot” …
Gut microbiota and intestinal FXR mediate the clinical benefits of metformin
The anti-hyperglycemic effect of metformin is believed to be caused by its direct action on
signaling processes in hepatocytes, leading to lower hepatic gluconeogenesis. Recently, …
signaling processes in hepatocytes, leading to lower hepatic gluconeogenesis. Recently, …
Mmbench: Is your multi-modal model an all-around player?
Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …
Learning 2d temporal adjacent networks for moment localization with natural language
We address the problem of retrieving a specific moment from an untrimmed video by a
query sentence. This is a challenging problem because a target moment may take place in …
query sentence. This is a challenging problem because a target moment may take place in …
Part-aware prototype network for few-shot semantic segmentation
Few-shot semantic segmentation aims to learn to segment new object classes with only a
few annotated examples, which has a wide range of real-world applications. Most existing …
few annotated examples, which has a wide range of real-world applications. Most existing …
Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form
text-image composition and comprehension. This model goes beyond conventional …
text-image composition and comprehension. This model goes beyond conventional …