qwen-vl

Here are 12 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Feb 24, 2025
Markdown

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Sep 25, 2025

zjysteven / lmms-finetune

Star

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

Updated Feb 25, 2025
Python

zli12321 / Vision-SR1

Star

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward

Updated Sep 23, 2025
Python

ComfyUI-QwenVL custom node integrates the Qwen-VL series, including the latest Qwen3-VL models, including Qwen2.5-VL and the latest Qwen3-VL, to enable advanced multimodal AI for text generation, image understanding, and video analysis.

comfyui customnodes qwen-vl qwen3-vl

Updated Oct 19, 2025
Python

reidbarber / webmarker

Star

Mark web pages for use with vision-language models

som prompt gemini operator cua claude playwright prompt-engineering llms vision-language-model gpt4v qwen-vl gpt4o set-of-mark computer-use computer-using-agent

Updated May 18, 2025
TypeScript

janelu9 / EasyLLM

Star

Running Large Language Model easily.

llama fine-tuning megatron npu pretrain deepspeed rlhf vllm qwen deepseek qwen-vl

Updated Oct 17, 2025
Python

10BC0

luxus180 / LLaVA-OneVision-1.5

Star

🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open framework designed for seamless integration of vision and language tasks.

finetuning multimodal vision-language foundation-models llm instruction-tuning mllm vision-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next qwen3

Updated Oct 21, 2025
Python

autodistill / autodistill-qwen-vl

Star

Qwen-VL base model for use with Autodistill.

zero-shot-object-detection autodistill qwen-vl

Updated Feb 8, 2024
Python

mangobanaani / movie2story

Star

creates text from video and audio using Qwen-VL and Whisper

python machine-learning qwen-vl

Updated Aug 1, 2025
Jupyter Notebook

anto18671 / image-to-dense-caption

Sponsor

Star

Generate vivid, human-like captions for portrait images using the Qwen2.5-VL-7B model. Outputs dense descriptions covering emotion, posture, clothing, and environment.

transformers image-captioning captioning-images vision-language vision-language-model qwen-vl

Updated Jul 17, 2025
Python

telota / imagines-nummorum-vlm-data-extraction

Star

A computer vision system for automated analysis of index cards from a collection of coin forgeries using Qwen2.5-VL vision-language model. Developed for the imagines nummorum project.

transformers information-extraction vlm qwen-vl

Updated Aug 6, 2025
Python

Improve this page

Add a description, image, and links to the qwen-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen-vl

Here are 12 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

zli12321 / Vision-Language-Models-Overview

zjysteven / lmms-finetune

zli12321 / Vision-SR1

1038lab / ComfyUI-QwenVL

reidbarber / webmarker

janelu9 / EasyLLM

luxus180 / LLaVA-OneVision-1.5

autodistill / autodistill-qwen-vl

mangobanaani / movie2story

anto18671 / image-to-dense-caption

telota / imagines-nummorum-vlm-data-extraction

Improve this page

Add this topic to your repo