Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,338 public repositories matching this topic...
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Jun 13, 2024 - Python
Apache Spark Connect Client for Rust
-
Updated
Jun 13, 2024 - Rust
Quill for Scala 3
-
Updated
Jun 13, 2024 - Scala
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
-
Updated
Jun 13, 2024 - Java
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
-
Updated
Jun 13, 2024 - Java
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
Updated
Jun 13, 2024 - Scala
Prime Number Generator using PySpark
-
Updated
Jun 13, 2024 - Python
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Jun 13, 2024 - Scala
🏆 Spark4You Design patterns
-
Updated
Jun 13, 2024 - Shell
Implementation of "Colossal Trajectory Mining: A Unifying Approach to Mine Behavioral Mobility Patterns." Expert Systems with Applications.
-
Updated
Jun 13, 2024 - Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Updated
Jun 13, 2024 - Jupyter Notebook
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
-
Updated
Jun 13, 2024 - TypeScript
Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.
-
Updated
Jun 13, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia