A Flask-based web application for querying document embeddings using semantic search. The application provides a modern UI built with Jinja templates and Bootstrap for intuitive document search.
- 🔍 Semantic Search: Query documents using natural language
- 📊 Similarity Scoring: Results ranked by semantic similarity
- 📝 Query History: Track all user queries with timestamps
- 🔄 Live Ingestion: Trigger document processing from the UI
- 🎨 Modern UI: Responsive design with Bootstrap and Font Awesome
- 🐳 Docker Ready: Complete containerized setup
- Backend: Flask application with PostgreSQL + pgvector
- Frontend: Jinja2 templates with Bootstrap 5
- Embeddings: OpenAI text-embedding-ada-002
- Database: PostgreSQL with vector similarity search
- Chunking: RecursiveCharacterTextSplitter with 20% overlap
- Docker and Docker Compose
- OpenAI API key
-
Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
-
Add PDF documents to the
./docs
directory -
Start the services:
docker-compose up -d
-
Access the application:
- Query Interface: http://localhost:50505
- Query History: http://localhost:50505/history
Run the test script to verify functionality:
python test_query_app.py
Submit a semantic search query.
Request:
{
"query": "What is machine learning?"
}
Response:
{
"query_id": 123,
"query_text": "What is machine learning?",
"results": [
{
"doc_id": "document_name",
"chunk_index": 0,
"content": "Machine learning is...",
"metadata": {
"source": "/path/to/document.pdf",
"page": 1,
"doc_id": "document_name",
"chunk_index": 0
},
"similarity_score": 0.85
}
],
"total_results": 5
}
Trigger document ingestion process.
Response:
{
"message": "Successfully processed 2 files: doc1, doc2\nTotal chunks stored: 45"
}
Retrieve query history.
Response: HTML page with query history
id
: Primary keydoc_id
: Document identifierchunk_index
: Chunk position in documentcontent
: Text contentmetadata
: JSON metadataembedding
: Vector embedding (768 dimensions)
id
: Primary keyquery_text
: User query textquery_embedding
: Query vector embeddingcreated_at
: Timestampuser_ip
: User IP addresssession_id
: Session identifier
OPENAI_API_KEY
: Required for embedding generation
- Host:
postgres
(Docker service name) - Port:
5432
- Database:
crewai_db
- User:
postgres
- Password:
postgres
- Chunk Size: 1000 characters
- Chunk Overlap: 200 characters (20% overlap)
- Splitter: RecursiveCharacterTextSplitter
-
Install dependencies:
pip install -r requirements.txt
-
Start PostgreSQL:
docker-compose up postgres -d
-
Run the Flask app:
python query_app.py
tender/
├── query_app.py # Main Flask application
├── templates/
│ ├── index.html # Main query interface
│ └── history.html # Query history page
├── migrations/
│ ├── init_01.sql # Documents table
│ └── init_02.sql # User queries table
├── docker-compose.yml # Docker services
├── Dockerfile.query # Query app Dockerfile
├── requirements.txt # Python dependencies
└── test_query_app.py # Test script
-
"OPENAI_API_KEY not set"
- Ensure the environment variable is set
- Check docker-compose.yml includes the environment variable
-
"No relevant documents found"
- Run ingestion first using the "Trigger Ingestion" button
- Ensure PDF files are in the
./docs
directory
-
Database connection errors
- Verify PostgreSQL container is running:
docker-compose ps
- Check database logs:
docker-compose logs postgres
- Verify PostgreSQL container is running:
-
Port conflicts
- Ensure port 50505 is available
- Modify port mapping in docker-compose.yml if needed
# View all logs
docker-compose logs
# View specific service logs
docker-compose logs query-app
docker-compose logs postgres
- Vector similarity search uses cosine distance
- Results are limited to top 5 matches by default
- Embeddings are cached in the database
- Query history is limited to 50 recent queries
- User IP addresses are logged for analytics
- Session IDs are generated for tracking
- No authentication is implemented (add as needed)
- API endpoints are not rate-limited (consider adding)