https://codesandbox.io/p/devbox/sleepy-engelbart-tmtlp3?file=%2Fpackage.json%3A10%2C6-10%2C19
Universal Columnar Data Merging Tool
f*ck is a powerful Rust-based data merging engine that empowers users to combine, clean, and transform messy tabular data through an intuitive DSL and visual interface.
f*ck stands for "fields combined with columnar keys" - the core concept of merging data fields across multiple sources using columnar key relationships with intelligent merge policies.
- π Smart Joins: Dynamic column mapping between different data sources
- π Aggregation Policies: Sum, Count, Average, Min, Max, FirstMatch
- π― Primary Key Logic: OR/AND logic for complex key relationships
- β‘ Lazy Evaluation: Powered by Polars for efficient processing
- π Incremental Computation: Salsa-based caching for performance
- π Multi-Modal: CLI, Daemon+RPC, and WASM support
- π Visual DSL: JSON-based query language
git clone https://github.com/your-repo/f-ck
cd f-ck
cargo build --release
- Prepare your data sources (CSV, TSV, XLSX, SQLite)
- Create a query plan (JSON DSL)
- Execute the merge
# Preview results
./target/release/f-ck --query query.json --output result.csv --preview
# Write to file
./target/release/f-ck --query query.json --output result.csv
customers.csv
id,name,email
1,John Doe,john@example.com
2,Jane Smith,jane@example.com
3,Bob Johnson,bob@example.com
orders.csv
customer_id,order_total,product
1,99.99,Widget A
2,149.50,Widget B
1,25.00,Widget C
{
"sources": [
{
"id": "customers",
"path": "customers.csv",
"format": "csv"
},
{
"id": "orders",
"path": "orders.csv",
"format": "csv"
}
],
"destination_schema": [
{"name": "customer_id", "data_type": "Int64"},
{"name": "customer_name", "data_type": "String"},
{"name": "email", "data_type": "String"},
{"name": "total_spent", "data_type": "Float64"}
],
"primary_keys": {
"logic": "or",
"keys": ["customer_id"]
},
"mappings": [
{
"destination_field": "customer_id",
"policy": {"type": "firstMatch", "priority": ["customers"]},
"source_fields": [
{"id": "cust_id", "source_file_id": "customers", "column_name": "id"},
{"id": "order_cust_id", "source_file_id": "orders", "column_name": "customer_id"}
]
},
{
"destination_field": "customer_name",
"policy": {"type": "firstMatch", "priority": ["customers"]},
"source_fields": [
{"id": "name", "source_file_id": "customers", "column_name": "name"}
]
},
{
"destination_field": "email",
"policy": {"type": "firstMatch", "priority": ["customers"]},
"source_fields": [
{"id": "email", "source_file_id": "customers", "column_name": "email"}
]
},
{
"destination_field": "total_spent",
"policy": {"type": "sum"},
"source_fields": [
{"id": "order_total", "source_file_id": "orders", "column_name": "order_total"}
]
}
]
}
customer_id,customer_name,email,total_spent
1,John Doe,john@example.com,124.99
2,Jane Smith,jane@example.com,149.50
3,Bob Johnson,bob@example.com,0.0
Policy | Description | Use Case |
---|---|---|
FirstMatch |
Take first non-null value | Contact info, names |
Sum |
Add all values | Order totals, quantities |
Count |
Count non-null entries | Number of transactions |
Average |
Mean of all values | Average order size |
Min |
Minimum value | Earliest date, lowest price |
Max |
Maximum value | Latest date, highest price |
f-ck [OPTIONS]
OPTIONS:
-q, --query <FILE> JSON file containing the query plan [required]
-o, --output <FILE> Output file path [required]
-f, --format <FORMAT> Output format: csv, tsv, xlsx, sqlite [default: csv]
-p, --preview Preview results without writing to file
-l, --limit <N> Limit preview to N rows
-h, --help Print help information
-V, --version Print version information
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Data Sources β β Query DSL β β Output β
β β β β β β
β β’ CSV/TSV βββββΆβ β’ Field Maps βββββΆβ β’ CSV/TSV β
β β’ XLSX β β β’ Join Logic β β β’ XLSX β
β β’ SQLite β β β’ Merge Policy β β β’ SQLite β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- DSL Engine: JSON-based query planning and validation
- Data Reader: Multi-format input with Polars lazy evaluation
- Join Engine: Dynamic column mapping and transitive closure
- Aggregation Engine: Group-by operations with merge policies
- < 7D5A strong>Output Writer: Multi-format export with streaming
- Basic CSV join functionality
- DSL query planning
- Aggregation policies (sum, count, etc.)
- CLI interface
- Salsa incremental computation
- WASM compilation support
- Transitive closure joins
- Type detection heuristics
- Web-based visual interface
- Real-time preview system
- Data lineage tracking
- Recipe sharing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
# Build and test
cargo build
cargo test
# Run with sample data
cargo run -- --query test_data/test_query.json --output result.csv --preview
# Check WASM compatibility (currently limited)
cargo check --target wasm32-unknown-unknown --lib
This project is licensed under the MIT License - see the LICENSE file for details.
The name represents both the frustration of working with messy data and the satisfaction of finally getting it clean. f*ck is about taking control of your data and making it work for you.
"fck around and find out... how clean your data can be."*