A web scraper using Crawl4AI to find real estate developers who play golf, with contact information extraction (email, phone) and a Streamlit dashboard for data visualization. This project implements a FAISS-based vector knowledge graph for storing and querying entities and relationships.
-
Install the package:
pip install -e .
-
Run the scraper:
python -m construction_scraper scrape
-
Launch the dashboard:
python -m construction_scraper dashboard
construction_scraper/
: Main packagecore/
: Core data models and structuresknowledge_graph/
: FAISS-based vector knowledge graph implementationscrapers/
: Web scrapers for collecting datautils/
: Utility functions and helpersweb/
: Web dashboard for data visualization
tests/
: Test filesdocs/
: Documentationscripts/
: Utility scriptsdata/
: Data directory (created automatically)
-
Advanced Data Collection:
- Uses Crawl4AI for efficient, LLM-friendly web crawling
- Automatically follows relevant links to discover more information
- Identifies real estate developers with golf connections
-
Contact Information Extraction:
- Extracts email addresses using regex pattern matching
- Identifies phone numbers in various formats
- Links contacts to specific developer profiles
-
FAISS Vector Knowledge Graph:
- Stores entities and relationships in a structured knowledge graph
- Enables semantic similarity search using FAISS vector embeddings
- Maintains persistent data storage using SQLite
- Performs network analysis to identify key influencers
- Visualizes connections between developers, golf entities, and companies
See the docs/
directory for detailed documentation, including:
DEVELOPMENT.md
: Detailed architecture and implementation notes
MIT