vlm

Here are 88 public repositories matching this topic...

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 llama3 llama3-1

Updated Nov 25, 2024
Python

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Nov 25, 2024
Python

BAAI-Agents / Cradle

Star

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

QiuYannnn / Local-File-Organizer

Star

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

Updated Oct 21, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Nov 25, 2024
Python

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img

Updated Nov 24, 2024
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

mbzuai-oryx / GeoChat

Star

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

remote-sensing vlm

Updated Jul 25, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Nov 6, 2024
Python

niuzaisheng / ScreenAgent

Star

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

agent ai vlm llm

Updated Nov 25, 2024
Python

modelscope / evalscope

Star

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

performance evaluation vlm rag llm

Updated Nov 25, 2024
Python

baaivision / EVE

Star

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

Updated Oct 2, 2024
Python

TIGER-AI-Lab / Mantis

Star

Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Nov 20, 2024
Python

camUrban / PteraSoftware

Sponsor

Star

Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.

Updated Nov 15, 2024
Python

mbodiai / embodied-agents

Star

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal embodied embodied-agent large-language-models llm generative-ai vision-language-model embodied-agents mbodi mbodiai

Updated Nov 19, 2024
Python

RobotecAI / rai

Star

RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.

ai robotics ros2 vlm multimodal embodied-artificial-intelligence embodied-agent embodied-ai o3de llm generative-ai ai-agents-framework embodied-agents robotec

Updated Nov 25, 2024
Python

LostXine / LLaRA

Star

LLaRA: Large Language and Robotics Assistant

robotics behavioral-cloning vlm self-supervised-learning instruction-tuning llava

Updated Oct 2, 2024
Python

fpgaminer / joycaption

Star

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

vlm captioning joycaption

Updated Oct 12, 2024
Python

wisdomikezogwo / quilt1m

Star

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

vlm medical-dataset multimodal-datasets histopathology clip-model

Updated Jan 18, 2024
Python

baaivision / DenseFusion

Star

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

Updated Sep 27, 2024
Python

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm

Here are 88 public repositories matching this topic...

sgl-project / sglang

NexaAI / nexa-sdk

BAAI-Agents / Cradle

QiuYannnn / Local-File-Organizer

xlang-ai / OSWorld

heshengtao / comfyui_LLM_party

BAAI-DCAI / Bunny

mbzuai-oryx / GeoChat

gokayfem / ComfyUI_VLM_nodes

niuzaisheng / ScreenAgent

modelscope / evalscope

baaivision / EVE

TIGER-AI-Lab / Mantis

camUrban / PteraSoftware

mbodiai / embodied-agents

RobotecAI / rai

LostXine / LLaRA

fpgaminer / joycaption

wisdomikezogwo / quilt1m

baaivision / DenseFusion

Improve this page

Add this topic to your repo