This project provides fully automated one-click experience to create Cloud and Kubernetes environment to run Data Analytics workload like Apache Spark.

Introduction

This project provides a fully automated one-click tool to create Data Analytics platform in Cloud and Kubernetes environment:

  1. Single script to deploy a full stack data platform: Kafka, Hive Metastore, Spark, and Data Ingestion Job.

  2. Spark API Gateway to run Spark platform as a service.

  3. Extensible design to support customization and new service deployment.

Use Cases

Deploy Spark as a Service on EKS

Use command like punch install SparkOnEks to get a ready-to-use Spark Service within minutes. That single command will do following automatically:

  1. Create an AWS EKS cluster and set up required IAM roles
  2. Deploy Nginx Ingress Controller and a Load Balancer
  3. Deploy Spark Operator and a REST API Gateway to accept application submission
  4. Deploy Spark History Server
  5. Enable Cluster AutoScaler

When the punch command finishes, the Spark Service is ready to use. People could use curl or the command line tool (sparkcli) to submit Spark application.

Deploy a Data Ingestion Platform

punch also supports chaining multiple install commands to deploy a complex data platform.

For example, we could create a single script file with multiple commands:

punch install Eks -> punch install KafkaBridge -> punch install HiveMetastore -> punch install SparkOnEks

The script will deploy a data ingestion platform with all the components in the green area in the following diagram:

Data Ingestion Platform

After it is deployed, user could send data to the REST API. Then the data will get into Kafka and automatically ingested into AWS S3 by the Spark Streaming application. People could write further Spark applications to query Hive table or compute metrics/insights from the table.

How to build (on MacBook)

The following command will create dist folder and dist.zip file for Punch.

make release

Go to dist folder, check User Guide to see how to run punch command.

Quick Start - Run Spark with Minikube

You could build this project (make release) and use punch to deploy Spark on Minikube, and run Spark application for a quick try.

For example, use one command like punch install SparkOnEks --env withMinikube=true to deploy a runnable Spark environment on Minikube.

See Quick Start Guide - Run Spark with Minikube for details.

User Guide - Run Spark on AWS EKS

Again, use one command like punch install SparkOnEks to deploy a runnable Spark environment on EKS.

See User Guide for more details in section: Run punch on AWS.

Quick Start - Create EKS Cluster

You could build this project (make release) and use punch to crate an AWS EKS cluster and play with it.

See Quick Start Guide - Create EKS for details.


Thanks for support from JetBrains with the great development tool and licenses.

Owner
DataPunch - One Click to Create Cloud and Kubernetes Environment for Data Analytics and Apache Spark
https://github.com/datapunchorg/punch
DataPunch - One Click to Create Cloud and Kubernetes Environment for Data Analytics and Apache Spark
Similar Resources

A Caddy v2 plugin to track requests in Pirsch analytics

caddy-pirsch-plugin A Caddy v2 plugin to track requests in Pirsch Analytics. Usage pirsch [matcher] { client_id pirsch-client-id client_se

Sep 15, 2022

Provides the function Parallel to create a synchronous in memory pipe and lets you write to and read from the pipe parallelly

iopipe provides the function Parallel to create a synchronous in memory pipe and lets you write to and read from the pipe parallely

Jan 25, 2022

Message relay written in golang for PostgreSQL and Apache Kafka

Message Relay Message relay written in golang for PostgreSQL and Apache Kafka Requirements Docker and Docker Compose Local installation and using dock

Dec 19, 2021

CoreRAD is an extensible and observable IPv6 Neighbor Discovery Protocol router advertisement daemon. Apache 2.0 Licensed.

CoreRAD is an extensible and observable IPv6 Neighbor Discovery Protocol router advertisement daemon. Apache 2.0 Licensed.

CoreRAD CoreRAD is an extensible and observable IPv6 Neighbor Discovery Protocol router advertisement daemon. Apache 2.0 Licensed. To get started with

Nov 14, 2022

Runproxy - Provides run.Service implementation for running proxy

Run Proxy Provides a library to download the Proxy and embed it into your progra

Jan 30, 2022

Apache RocketMQ go client

RocketMQ Client Go A product ready RocketMQ Client in pure go, which supports almost the full features of Apache RocketMQ, such as pub and sub message

Jan 4, 2023

Paster 服务端核心模块,使用字节跳动开源的微服务 RPC 框架 KiteX ,以 Apache Thrift 作为通信协议

Paster 服务端核心模块,使用字节跳动开源的微服务 RPC 框架 KiteX ,以 Apache Thrift 作为通信协议

paster_core Paster 服务端核心模块,使用字节跳动开源的微服务 RPC 框架 KiteX ,以 Apache Thrift 作为通信协议。 Todo: 实现 KiteX 服务注册扩展接口,使用 Consul 服务注册 新增 frame 层,通过 PreProcessor, PostP

Aug 4, 2022

Apache Traffic Control is an Open Source implementation of a Content Delivery Network

Apache Traffic Control Apache Traffic Control is an Open Source implementation of a Content Delivery Network. Documentation Intro CDN Basics Traffic C

Jan 6, 2023

apache dubbo gateway,L7 proxy,virtual host,k8s ingress controller.

apache dubbo gateway,L7 proxy,virtual host,k8s ingress controller.

apache dubbo gateway,L7 proxy,virtual host,k8s ingress controller.

Jul 22, 2022
Comments
  • Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9

    Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9

    Bumps sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9.

    Release notes

    Sourced from sigs.k8s.io/aws-iam-authenticator's releases.

    v0.5.9

    Changelog

    • 1209cfe2 Bump version in Makefile
    • 029d1dcf Add query parameter validation for multiple parameters

    v0.5.7

    What's Changed

    New Contributors

    Full Changelog: https://github.com/kubernetes-sigs/aws-iam-authenticator/compare/v0.5.6...v0.5.7

    v0.5.6

    Changelog

    Docker Images

    Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

    $(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)
    
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-arm64
    • docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-amd64

    v0.5.5

    Changelog

    Docker Images

    Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

    $(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)
    

    ... (truncated)

    Commits
    • 1209cfe Bump version in Makefile
    • 029d1dc Add query parameter validation for multiple parameters
    • 0a72c12 Merge pull request #455 from jyotimahapatra/rev2
    • 596a043 revert use of upstream yaml parsing
    • 2a9ee95 Merge pull request #448 from jngo2/master
    • fc4e6cb Remove unused imports
    • f0fe605 Remove duplicate InitMetrics
    • 99f04d6 Merge pull request #447 from nckturner/release-0.5.6
    • 9dcb6d1 Faster multiarch docker builds
    • a9cc81b Bump timeout for image build job
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Refactor and rename some fields

    Refactor and rename some fields

    Rename EksTopologySpec.Eks field to EksCluster Rename SparkOnEks EksTopologySpec EksSpec field to Eks Add SparkOnEks SparkComponentSpec struct to hold Operator/Gateway/HistoryServer

Related tags
kcp is a prototype of a Kubernetes API server that is not a Kubernetes cluster - a place to create, update, and maintain Kube-like APis with controllers above or without clusters.
kcp is a prototype of a Kubernetes API server that is not a Kubernetes cluster - a place to create, update, and maintain Kube-like APis with controllers above or without clusters.

kcp is a minimal Kubernetes API server How minimal exactly? kcp doesn't know about Pods or Nodes, let alone Deployments, Services, LoadBalancers, etc.

Jan 6, 2023
A minimal analytics package to start collecting traffic data without client dependencies.

go-web-analytics A minimal analytics package to start collecting traffic data without client dependencies. Logging incoming requests import "github.co

Nov 23, 2021
red-tldr is a lightweight text search tool, which is used to help red team staff quickly find the commands and key points they want to execute, so it is more suitable for use by red team personnel with certain experience.
red-tldr is a lightweight text search tool, which is used to help red team staff quickly find the commands and key points they want to execute, so it is more suitable for use by red team personnel with certain experience.

Red Team TL;DR English | 中文简体 What is Red Team TL;DR ? red-tldr is a lightweight text search tool, which is used to help red team staff quickly find t

Jan 5, 2023
Used gRPC for the first time, and it was a amazing developer experience

gRPC Used gRPC for the first time, and it was a amazing developer experience. Edge points of using gPRC which I felt: Structured Code Uniform request

Oct 11, 2021
IPFS Cluster - Automated data availability and redundancy on IPFS
IPFS Cluster - Automated data availability and redundancy on IPFS

IPFS Cluster - Automated data availability and redundancy on IPFS

Jan 2, 2023
A pair of local reverse proxies (one in Windows, one in Linux) for Tailscale on WSL2

tailscale-wsl2 TL;DR Running two reverse proxies (one in Windows, one in the WSL2 Linux VM), the Windows Tailscale daemon can be accessed via WSL2: $

Dec 9, 2022
Server-tool - A simple tool to run and create Minecraft servers

Server Tool A simple tool to run and maintain different Minecraft servers. This

Dec 15, 2022
Get related domains / subdomains by looking at Google Analytics IDs
Get related domains / subdomains by looking at Google Analytics IDs

AnalyticsRelationships This script try to get related domains / subdomains by looking at Google Analytics IDs from a URL. First search for ID of Googl

Jan 2, 2023