DagsHub client libraries
-
Updated
Oct 13, 2025 - Python
DagsHub client libraries
Adapter for dbt that executes dbt pipelines on Apache Flink
A Python library for machine-learning and feedback loops on streaming data
Apache Flink (Pyflink) and Related Projects
Udacity Data Streaming Nanodegree Program
A library for data streaming and augmentation
This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.
A Federated Learning Method for Real-time Emotion State Classification from Multi 10BC0 -modal Streaming
A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲
Final Project for IYKRA Data Fellowship 8 Program, creating an end-to-end banking campaign pipeline using lambda architecture (providing acess to batch and stream processing)
FastFlight is a high-performance data transfer framework using Apache Arrow Flight for efficient, modular, and pluggable data streaming with optional FastAPI integration for HTTP-based access.
A lightweight fast data streaming library for raspberry pi in python.
Hands-on demo for querying Kafka streams using SQL with Trino and data integration with PostgreSQL.
Showcases real-time data replication from RDS (MariaDB) to Kinesis using AWS DMS on LocalStack. Implements both full-load and Change Data Capture (CDC) tasks to stream database changes for analytics.
Reference implementation of the Affirmative Sampling algorithm by Jérémie Lumbroso and Conrado Martínez (2022). 🍀
A lightweight and polyglot stream-processing library, to be used as a data backplane-, message relay-, or pipeline-subsystem.
Streams simulated events using Kafka & Spark, from a music application to a data lake (AWS S3), and then a warehouse (AWS Redshift)
EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics
Design data streaming architecture and API for a real-life application called the Step Trending Electronic Data Interface (STEDI). It is a working application used to assess fall risk for seniors. When a senior takes a test, they are scored using an index which reflects the likelihood of falling, and potentially sustaining an injury in the cours…
Demonstrates a best practice implementation for using an AWS Lambda function to deploy a Flink Job Graph to Confluent Cloud for Apache Flink.
Add a description, image, and links to the data-streaming topic page so that developers can more easily learn about it.
To associate your repository with the data-streaming topic, visit your repo's landing page and select "manage topics."