f*ck - Fields Combined with Columnar Keys

https://codesandbox.io/p/devbox/sleepy-engelbart-tmtlp3?file=%2Fpackage.json%3A10%2C6-10%2C19

Universal Columnar Data Merging Tool

f*ck is a powerful Rust-based data merging engine that empowers users to combine, clean, and transform messy tabular data through an intuitive DSL and visual interface.

What is f*ck?

f*ck stands for "fields combined with columnar keys" - the core concept of merging data fields across multiple sources using columnar key relationships with intelligent merge policies.

Key Features

🔗 Smart Joins: Dynamic column mapping between different data sources
📊 Aggregation Policies: Sum, Count, Average, Min, Max, FirstMatch
🎯 Primary Key Logic: OR/AND logic for complex key relationships
⚡ Lazy Evaluation: Powered by Polars for efficient processing
🔄 Incremental Computation: Salsa-based caching for performance
🌐 Multi-Modal: CLI, Daemon+RPC, and WASM support
📝 Visual DSL: JSON-based query language

Quick Start

Installation

git clone https://github.com/your-repo/f-ck
cd f-ck
cargo build --release

Basic Usage

Prepare your data sources (CSV, TSV, XLSX, SQLite)
Create a query plan (JSON DSL)
Execute the merge

# Preview results
./target/release/f-ck --query query.json --output result.csv --preview

# Write to file
./target/release/f-ck --query query.json --output result.csv

Example: Customer Order Analysis

Input Files

customers.csv

id,name,email
1,John Doe,john@example.com
2,Jane Smith,jane@example.com
3,Bob Johnson,bob@example.com

orders.csv

customer_id,order_total,product
1,99.99,Widget A
2,149.50,Widget B
1,25.00,Widget C

Query Plan (query.json)

{
  "sources": [
    {
      "id": "customers",
      "path": "customers.csv",
      "format": "csv"
    },
    {
      "id": "orders",
      "path": "orders.csv",
      "format": "csv"
    }
  ],
  "destination_schema": [
    {"name": "customer_id", "data_type": "Int64"},
    {"name": "customer_name", "data_type": "String"},
    {"name": "email", "data_type": "String"},
    {"name": "total_spent", "data_type": "Float64"}
  ],
  "primary_keys": {
    "logic": "or",
    "keys": ["customer_id"]
  },
  "mappings": [
    {
      "destination_field": "customer_id",
      "policy": {"type": "firstMatch", "priority": ["customers"]},
      "source_fields": [
        {"id": "cust_id", "source_file_id": "customers", "column_name": "id"},
        {"id": "order_cust_id", "source_file_id": "orders", "column_name": "customer_id"}
      ]
    },
    {
      "destination_field": "customer_name",
      "policy": {"type": "firstMatch", "priority": ["customers"]},
      "source_fields": [
        {"id": "name", "source_file_id": "customers", "column_name": "name"}
      ]
    },
    {
      "destination_field": "email",
      "policy": {"type": "firstMatch", "priority": ["customers"]},
      "source_fields": [
        {"id": "email", "source_file_id": "customers", "column_name": "email"}
      ]
    },
    {
      "destination_field": "total_spent",
      "policy": {"type": "sum"},
      "source_fields": [
        {"id": "order_total", "source_file_id": "orders", "column_name": "order_total"}
      ]
    }
  ]
}

Output

customer_id,customer_name,email,total_spent
1,John Doe,john@example.com,124.99
2,Jane Smith,jane@example.com,149.50
3,Bob Johnson,bob@example.com,0.0

Merge Policies

Policy	Description	Use Case
`FirstMatch`	Take first non-null value	Contact info, names
`Sum`	Add all values	Order totals, quantities
`Count`	Count non-null entries	Number of transactions
`Average`	Mean of all values	Average order size
`Min`	Minimum value	Earliest date, lowest price
`Max`	Maximum value	Latest date, highest price

CLI Options

f-ck [OPTIONS]

OPTIONS:
    -q, --query <FILE>     JSON file containing the query plan [required]
    -o, --output <FILE>    Output file path [required]
    -f, --format <FORMAT>  Output format: csv, tsv, xlsx, sqlite [default: csv]
    -p, --preview          Preview results without writing to file
    -l, --limit <N>        Limit preview to N rows
    -h, --help             Print help information
    -V, --version          Print version information

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Data Sources  │    │   Query DSL     │    │   Output        │
│                 │    │                 │    │                 │
│ • CSV/TSV       │───▶│ • Field Maps    │───▶│ • CSV/TSV       │
│ • XLSX          │    │ • Join Logic    │    │ • XLSX          │
│ • SQLite        │    │ • Merge Policy  │    │ • SQLite        │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Core Components

DSL Engine: JSON-based query planning and validation
Data Reader: Multi-format input with Polars lazy evaluation
Join Engine: Dynamic column mapping and transitive closure
Aggregation Engine: Group-by operations with merge policies
< 7D5A strong>Output Writer: Multi-format export with streaming

Roadmap

Phase 1: Core Engine ✅

Basic CSV join functionality
DSL query planning
Aggregation policies (sum, count, etc.)
CLI interface

Phase 2: Advanced Features 🚧

Salsa incremental computation
WASM compilation support
Transitive closure joins
Type detection heuristics

Phase 3: UI & Integration 📋

Web-based visual interface
Real-time preview system
Data lineage tracking
Recipe sharing

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Development

# Build and test
cargo build
cargo test

# Run with sample data
cargo run -- --query test_data/test_query.json --output result.csv --preview

# Check WASM compatibility (currently limited)
cargo check --target wasm32-unknown-unknown --lib

License

This project is licensed under the MIT License - see the LICENSE file for details.

Why "f*ck"?

The name represents both the frustration of working with messy data and the satisfaction of finally getting it clean. f*ck is about taking control of your data and making it work for you.

"fck around and find out... how clean your data can be."*

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.cargo		.cargo
.claude		.claude
node		node
plans		plans
prd		prd
site		site
src		src
test_data		test_data
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
mise.toml		mise.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

f*ck - Fields Combined with Columnar Keys

What is f*ck?

Key Features

Quick Start

Installation

Basic Usage

Example: Customer Order Analysis

Input Files

Query Plan (query.json)

Output

Merge Policies

CLI Options

Architecture

Core Components

Roadmap

Phase 1: Core Engine ✅

Phase 2: Advanced Features 🚧

Phase 3: UI & Integration 📋

Contributing

Development

License

Why "f*ck"?

About

Uh oh!

Releases

Packages

Languages

arlyon/f-ck

Folders and files

Latest commit

History

Repository files navigation

f*ck - Fields Combined with Columnar Keys

What is f*ck?

Key Features

Quick Start

Installation

Basic Usage

Example: Customer Order Analysis

Input Files

Query Plan (query.json)

Output

Merge Policies

CLI Options

Architecture

Core Components

Roadmap

Phase 1: Core Engine ✅

Phase 2: Advanced Features 🚧

Phase 3: UI & Integration 📋

Contributing

Development

License

Why "f*ck"?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages