churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

churrodata

Last update: Mar 10, 2022

Comments: 3

Churro - ETL for Kubernetes

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

What is churro?
Design
Docs
Images
churro Cloud
Starting with churro
Contributing
Support

What is churro?

churro is an application that processes input files and streams by extracting content and loading that content into a database of your choice, all running on Kubernetes. Today, churro supports processing of JSON, XML, XLSX, CSV files along with JSON API streams. End users create churro pipelines to process data, those pipelines can be created using the churro web app or via a churro pipeline custom resource (yaml) directly.

Design

Some key aspects of the churro design:

churro is designed from the start to run within a Kubernetes cluster.
churro uses a micro-service architecture to scale ETL processing
churro has extension points defined to allow for customized processing to be performed per customer requirements.
churro is written in golang
churro currently supports persisting ingested data into cockroachdb, singlestore, and mysql databases
churro implements a kubernetes operator to handle git-ops style provisioning of churro pipeline resources including the pipeline database

For more details on the churro design, checkout out the documentation at the churro github pages.

Docs

Detailed documentation is found at the churro github pages, additional content such as blogs can be found at the churrodata.com web site.

Images

Today, churro container images are found on DockerHub here. These images are muulti-arch manifest images that include builds for both amd64 and arm64 architectures.

churro Cloud

Inquires about churro Cloud can be directed to [email protected]. In the near future, users will be able to provision a churro instance on the cloud of their choice, with billing and management handled by churrodata.com

Starting with churro

People generally start with churro by creating a kubernetes cluster, then deploying churro to your running cluster. Installation documentation is found on the churro github pages.

Contributing

Since churro is open source, you can view the source code and make contributions such as pull requests on our github.com site. We encourage users to log any issues they find on our github issues site.

Support

churro enterprise support and services are provided by churrodata.com.

Owner

churrodata

churrodata works on churro the Kubernetes based file/API ETL processor.

https://github.com/churrodata/churro

Similar Resources

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dec 22, 2022

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Dec 30, 2022

Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

Jan 1, 2023

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Jan 1, 2023

Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

Dec 26, 2022

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

Dec 16, 2022

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 5, 2023

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

glow Purpose Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go. I a

Dec 30, 2022

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

Dec 29, 2022

Comments

bump mysql operator up to 0.5.0 and add mysql password support

in the UI, when you specify a mysql database for a pipeline, you can now specify the mysql password. Also, this PR bumps up the version of the mysql operator to the bitpoke version 0.5.0 which supports k8s 1.22.
change go.mod to use golang 1.16 and relocate templates so they can b…

…e embedded into churro-operator, remove template volume from churro-operator and instead look for it dynamically and if not found use the embedded templates
add Sheetname as a configuration within the pipeline CR, web console …

this PR adds a 'sheetname' field to the pipeline CR, it updates the UI to allow users an ability to specify the xlxs sheet name, if nothing is specified, then 'Sheet1' is the default. When an xlsx extractsource is created, the sheetname is carried forward for rthat extract processing.

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes

What is churro?

Design

Docs

Images

churro Cloud

Starting with churro

Contributing

Support

Owner

churrodata

Similar Resources

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Data syncing in golang for ClickHouse.

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

Machine is a library for creating data workflows.

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Comments

bump mysql operator up to 0.5.0 and add mysql password support

change go.mod to use golang 1.16 and relocate templates so they can b…

add Sheetname as a configuration within the pipeline CR, web console …

Related tags

xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Declarative streaming ETL for mundane tasks, written in Go

A distributed, fault-tolerant pipeline for observability data

Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

Simple CRUD application using CockroachDB and Go

DEPRECATED: Data collection and processing made easy.

Open source framework for processing, monitoring, and alerting on time series data