Deprecated: Function get_magic_quotes_gpc() is deprecated in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 99

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 619

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1169

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176
8000 GitHub - Zipstack/unstract: No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Nothing Special   »   [go: up one dir, main page]

Skip to content

Zipstack/unstract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Unstract

The Data Layer for your Agentic Workflows—Automate Document-based workflows with close to 100% accuracy!

Python Version from PEP 621 TOML uv GitHub License Docker Pulls CLA assistant pre-commit.ci status Quality Gate Status Bugs Code Smells Coverage Duplicated Lines (%)

🤖 Prompt Studio

Prompt Studio is a purpose-built environment that supercharges your schema definition efforts. Compare outputs from different LLMs side-by-side, keep tab on costs while you develop generic prompts that work across wide-ranging document variations. And when you're ready, launch extraction APIs with a single click.

img Prompt Studio

🔌 Integrations that suit your environment

Once you've used Prompt Studio to define your schema, Unstract makes it easy to integrate into your existing workflows. Simply choose the integration type that best fits your environment:

Integration Type Description Best For Documentation
🖥️ MCP Servers Run Unstract as an MCP Server to provide structured data extraction to Agents or LLMs in your ecosystem. Developers building Agentic/LLM apps/tools that speak MCP. Unstract MCP Server Docs
🌐 API Deployments Turn any document into JSON with an API call. Deploy any Prompt Studio project as a REST API endpoint with a single click. Teams needing programmatic access in apps, services, or custom tooling. API Deployment Docs
⚙️ ETL Pipelines Embed Unstract directly into your ETL jobs to transform unstructured data before loading it into your warehouse / database. Engineering and Data engineering teams that need to batch process documents into clean JSON. ETL Pipelines Docs
🧩 n8n Nodes Use Unstract as ready-made nodes in n8n workflows for drag-and-drop automation. Low-code users and ops teams automating workflows. Unstract n8n Nodes Docs

☁️ Getting Started (Cloud / Enterprise)

The easy-peasy way to try Unstract is to sign up for a 14-day free trial. Give Unstract a spin now!

Unstract Cloud also comes with some really awesome features that give serious accuracy boosts to agentic/LLM-powered document-centric workflows in the enterprise.

Feature Description Documentation
🧪 LLMChallenge Uses two Large Language Models to ensure trustworthy output. You either get the right response or no response at all. Docs
SinglePass Extraction Reduces LLM token usage by up to 8x, dramatically cutting costs. Docs
📉 SummarizedExtraction Reduces LLM token usage by up to 6x, saving costs while keeping accuracy. Docs
👀 Human-In-The-Loop Side-by-side comparison of extracted value and source document, with highlighting for human review and tweaking. Docs
🔐 SSO Support Enterprise-ready authentication options for seamless onboarding and off-boarding. Docs

⏩ Quick Start Guide

Unstract comes well documented. You can get introduced to the basics of Unstract, and learn how to connect various systems like LLMs, Vector Databases, Embedding Models and Text Extractors to it. The easiest way to wet your feet is to go through our Quick Start Guide where you actually get to do some prompt engineering in Prompt Studio and launch an API to structure varied credit card statements!

🚀 Getting started (self-hosted)

System Requirements

  • 8GB RAM (minimum)

Prerequisites

  • Linux or MacOS (Intel or M-series)
  • Docker
  • Docker Compose (if you need to install it separately)
  • Git

Next, either download a release or clone this repo and do the following:

./run-platform.sh
✅ Now visit http://frontend.unstract.localhost in your browser
✅ Use username and password unstract to login

That's all there is to it!

Follow these steps to change the default username and password. See user guide for more details on managing the platform.

Another really quick way to experience Unstract is by signing up for our hosted version. It comes with a 14 day free trial!

📄 Supported File Types

Unstract supports a wide range of file formats for document processing:

Category Format Description
Word Processing DOCX Microsoft Word Open XML
DOC Microsoft Word
ODT OpenDocument Text
Presentation PPTX Microsoft PowerPoint Open XML
PPT Microsoft PowerPoint
ODP OpenDocument Presentation
Spreadsheet XLSX Microsoft Excel Open XML
XLS Microsoft Excel
ODS OpenDocument Spreadsheet
Document & Text PDF Portable Document Format
TXT Plain Text
CSV Comma-Separated Values
JSON JavaScript Object Notation
Image BMP Bitmap Image
GIF Graphics Interchange Format
JPEG Joint Photographic Experts Group
JPG Joint Photographic Experts Group
PNG Portable Network Graphics
TIF Tagged Image File Format
TIFF Tagged Image File Format
WEBP Web Picture Format

🤝 Ecosystem support

LLM Providers

Provider Status
OpenAI ✅ Working
Google VertexAI, Gemini Pro ✅ Working
Azure OpenAI ✅ Working
Anthropic ✅ Working
Ollama ✅ Working
Bedrock ✅ Working
Google PaLM ✅ Working
Anyscale ✅ Working
Mistral AI ✅ Working

Vector Databases

Provider Status
Qdrant ✅ Working
Weaviate ✅ Working
Pinecone ✅ Working
PostgreSQL ✅ Working
Milvus ✅ Working

Embeddings

Provider Status
OpenAI ✅ Working
Azure OpenAI ✅ Working
Google PaLM ✅ Working
Ollama ✅ Working
VertexAI ✅ Working
Bedrock ✅ Working

Text Extractors

Provider Status
Unstract LLMWhisperer V2 ✅ Working
Unstructured.io Community ✅ Working
Unstructured.io Enterprise ✅ Working
LlamaIndex Parse ✅ Working

ETL Sources

Provider Status
AWS S3 ✅ Working
MinIO ✅ Working
Google Cloud Storage ✅ Working
Azure Cloud Storage ✅ Working
Google Drive ✅ Working
Dropbox ✅ Working
SFTP ✅ Working

ETL Destinations

Provider Status
Snowflake ✅ Working
Amazon Redshift ✅ Working
Google BigQuery ✅ Working
PostgreSQL ✅ Working
MySQL ✅ Working
MariaDB ✅ Working
Microsoft SQL Server ✅ Working
Oracle ✅ Working

🙌 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for further details to get started easily.

👋 Join the LLM-powered automation community

🚨 Backup encryption key

Do copy the value of ENCRYPTION_KEY config in either backend/.env or platform-service/.env file to a secure location.

Adapter credentials are encrypted by the platform using this key. Its loss or change will make all existing adapters inaccessible!

📊 A note on analytics

In full disclosure, Unstract integrates Posthog to track usage analytics. As you can inspect the relevant code here, we collect the minimum possible metrics. Posthog can be disabled if desired by setting REACT_APP_ENABLE_POSTHOG to false in the frontend's .env file.

0