- Rio de Janeiro, Brazil
Stars
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for …
Surveillance Perspective Human Action Recognition Dataset: 7759 Videos from 14 Action Classes, aggregated from multiple sources, all cropped spatio-temporally and filmed from a surveillance-camera …
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
GPT4V-level open-source multi-modal model based on Llama3-8B
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
✨✨Latest Advances on Multimodal Large Language Models
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Analyze videos using LLMs, Computer Vision and Automatic Speech Recognition
Python Computer Vision & Video Analytics Framework With Batteries Included
⛹️ Pytorch ReID: A tiny, friendly, strong pytorch implement of person re-id / vehicle re-id baseline. Tutorial 👉https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial
Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
A lightweight web application for remotely viewing images from a remote computer through a web browser. 🖼️
Implementation of paper "Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces"
Amazon Kinesis Video Streams Webrtc SDK is for developers to install and customize realtime communication between devices and enable secure streaming of video, audio to Kinesis Video Streams.
superglue (YC W25) builds integrations and tools from natural language. Get production-grade tools for long tail and enterprise systems.
[NeurIPS 2023] Global Structure-Aware Diffusion Process for Low-Light Image Enhancement
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
🤖 Autonomous agent framework for Elixir. Built for distributed, autonomous behavior and dynamic workflows.
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Implementation for paper "Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Model"
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.