All Categories → Data Storage → hdfs

Top 67 hdfs open source projects

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

✭ 232

go json mysql postgresql kafka s3 etl clickhouse avro hdfs msgpack

Hdfs

API and command line interface for HDFS

✭ 209

python cli hdfs

Smart open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

✭ 2,306

python shell streaming s3 file hdfs streaming-data hacktoberfest webhdfs boto gzip-stream bz2

Bigdata docker

Big Data Ecosystem Docker

✭ 161

vba jupyter-notebook mysql spark hadoop zookeeper hive mongo hbase hdfs hue presto

Dcos Commons

DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

✭ 162

java declarative tensorflow kubernetes elasticsearch kafka cassandra mesos hdfs dcos

Seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

✭ 13,380

go java kubernetes distributed-systems s3 posix fuse replication hdfs distributed-storage object-storage s3-storage cloud-drive distributed-file-system erasure-coding blob-storage seaweedfs hadoop-hdfs tiered-file-system

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

✭ 150

python jupyter-notebook machine-learning database sql spark analytics big-data hadoop apache parallel-computing distributed-computing apache-spark dataframe pyspark hdfs

Wradlib

weather radar data processing - python package

✭ 143

python hacktoberfest weather hdfs radar

Hsuntzu

HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark

✭ 135

scala hdfs compress

Elasticctr

ElasticCTR，即飞桨弹性计算推荐系统，是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务，帮助用户在Kubernetes环境中一键完成推荐系统部署，具备高性能、工业级部署、端到端体验的特点，并且作为开源套件，满足二次深度开发的需求。

✭ 123

python k8s recommender-system hdfs

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

✭ 122

java testing testing-tools hadoop performance-analysis scale performance-testing performance-metrics hdfs

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

✭ 117

java shell linux cli big-data hadoop hdfs

Ibis

A pandas-like deferred expression system, with first-class SQL support

✭ 1,630

python C++pandas hadoop hdfs spark impala ibis

Bigdata Notes

大数据入门指南 ⭐

✭ 10,991

java scala kafka spark big-data yarn hadoop phoenix zookeeper bigdata hive hbase hdfs mapreduce storm flume azkaban sqoop

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

✭ 92

shell algorithm kafka spark hadoop zookeeper datastructures flink hive hbase hdfs mapreduce olap

Wifi

基于wifi抓取信息的大数据查询分析系统

✭ 93

java hadoop hive hbase hdfs

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

✭ 86

java bigdata avro hdfs parquet

Camus

Mirror of Linkedin's Camus

✭ 81

java kafka hadoop hdfs connector confluent

Tiledb Py

Python interface to the TileDB storage manager

✭ 78

python numpy s3 array hdfs

Cloud Note

基于分布式的云笔记（参考某道云笔记），数据存储在redis与hbase中

✭ 71

java linux web redis nginx tomcat hbase ssm hdfs

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

✭ 71

python jupyter-notebook spark big-data bigdata coursera hdfs yandex mapreduce

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

✭ 58

java machine-learning json data-science azure spark csv s3 text query scale svm avro hdfs parquet root

Flume Canal Source

Flume NG Canal source

✭ 56

java mysql elasticsearch kafka hdfs mq

Tiledb

The Universal Storage Engine

✭ 1,072

data-science data-analysis s3 scientific-computing hdfs arrays s3-storage

Hdfs

A native go client for HDFS

✭ 992

go commandline hdfs

Learning Spark

零基础学习spark，大数据学习

✭ 37

python java scala spark hadoop hbase hdfs spark-streaming

Jsr203 Hadoop

A Java NIO file system provider for HDFS

✭ 35

java hadoop hdfs nio

Pucket

Bucketing and partitioning system for Parquet

✭ 29

scala spark thrift hdfs parquet

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

✭ 857

kafka spark interview interview-questions yarn hadoop bigdata flink hbase hdfs mapreduce

Cluster Pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

✭ 23

python s3 pyspark hdfs

Yandex Big Data Engineering

✭ 17

jupyter-notebook spark hdfs mapreduce

Snakebite

A pure python HDFS client

✭ 828

python hdfs

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

✭ 5

java server big-data hadoop bigdata hdfs transport arcgis connector

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

✭ 513

scala kafka spark workflow analytics streaming real-time lambda hdfs spark-streaming streaming-data olap

Bigdata

💎🔥大数据学习笔记

✭ 488

java shell linux mysql hadoop zookeeper hive hbase hdfs mapreduce

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

✭ 6,008

kafka spark hadoop zookeeper bigdata flink hive hbase hdfs flume azkaban

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

✭ 406

python docker linux json aws devops elasticsearch spark travis-ci hadoop gcp cloudformation solr hbase avro pyspark hdfs parquet

Kafka Connect Hdfs

Kafka Connect HDFS connector

✭ 400

java kafka streaming big-data hadoop hdfs apache-kafka confluent

Kafka Connect Ui

Web tool for Kafka Connect |

✭ 388

javascript redis elasticsearch kafka twitter mqtt s3 influxdb cassandra hdfs kafka-connect

Juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

✭ 4,262

go java shell c redis distributed-systems storage filesystem s3 cloud-native posix hdfs object-storage

Divolte Collector

✭ 264

java kafka analytics pubsub avro hdfs

bigkube

Minikube for big data with Scala and Spark

✭ 16

python scala shell powershell Dockerfile kubernetes airflow kafka spark presto integration-testing minikube hdfs mssql spark-on-kubernetes spark-operator

bigdata-fun

A complete (distributed) BigData stack, running in containers

✭ 14

shell clojure nix big-data spark hadoop solr banana hbase hdfs flume hue

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

✭ 13

visualization css d3 map big-data html5 dataviz spark apache-spark hadoop heatmap leaflet bigdata data-visualization hdfs data-analysis javscript d3js tilelayer datavisualization

fluent-plugin-webhdfs

Hadoop WebHDFS output plugin for Fluentd

✭ 57

ruby hadoop fluentd hdfs fluentd-plugin

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

✭ 20

HTML shell vagrant kafka spark cassandra yarn hadoop cluster vms hdfs flink

py-hdfs-mount

Mount HDFS with fuse, works with kerberos!

✭ 13

python fuse hadoop mount hdfs kerberos mount-hdfs

taller SparkR

Taller SparkR para las Jornadas de Usuarios de R

✭ 12

HTML Jupyter Notebook data-mining spark rstudio machine-learning-algorithms bigdata artificial-intelligence ipynb hdfs data-analysis sparklyr sparkr

ros hadoop

Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.

✭ 92

scala Dockerfile python machine-learning spark hadoop robotics ros hdfs bag rosbag hadoop-inputformat ros-bag ros-hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

✭ 27

java HTML gui hadoop hdfs

BigDataTools

tools for bigData

✭ 36

java elasticsearch kafka hive bigdata hbase hdfs

aaocp

一个对用户行为日志进行分析的大数据项目

✭ 53

PLpgSQL scala nginx spark hive hadoop hbase zookeeper hdfs flume echarts

datasqueeze

Hadoop utility to compact small files

✭ 18

java hadoop hdfs hadoop-filesystem hadoop-smallfiles smallfiles hdfs-compaction

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

✭ 19