vlm
Here are 88 public repositories matching this topic...
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
-
Updated
Nov 25, 2024 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
-
Updated
Nov 7, 2024 - Python
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
-
Updated
Oct 21, 2024 - Python
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
Updated
Nov 25, 2024 - Python
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img
-
Updated
Nov 24, 2024 - Python
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
-
Updated
Jul 25, 2024 - Python
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
-
Updated
Nov 25, 2024 - Python
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
-
Updated
Oct 2, 2024 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
-
Updated
Nov 20, 2024 - Python
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
-
Updated
Nov 15, 2024 - Python
Seamlessly integrate state-of-the-art transformer models into robotics stacks
-
Updated
Nov 19, 2024 - Python
RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
-
Updated
Nov 25, 2024 - Python
LLaRA: Large Language and Robotics Assistant
-
Updated
Oct 2, 2024 - Python
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
-
Updated
Oct 12, 2024 - Python
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
-
Updated
Jan 18, 2024 - Python
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
-
Updated
Sep 27, 2024 - Python
Improve this page
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."