churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

What is churro?

churro is an application that processes input files and streams by extracting content and loading that content into a database of your choice, all running on Kubernetes. Today, churro supports processing of JSON, XML, XLSX, CSV files along with JSON API streams. End users create churro pipelines to process data, those pipelines can be created using the churro web app or via a churro pipeline custom resource (yaml) directly.

Design

Some key aspects of the churro design:

  • churro is designed from the start to run within a Kubernetes cluster.
  • churro uses a micro-service architecture to scale ETL processing
  • churro has extension points defined to allow for customized processing to be performed per customer requirements.
  • churro is written in golang
  • churro currently supports persisting ingested data into cockroachdb, singlestore, and mysql databases
  • churro implements a kubernetes operator to handle git-ops style provisioning of churro pipeline resources including the pipeline database

For more details on the churro design, checkout out the documentation at the churro github pages.

Docs

Detailed documentation is found at the churro github pages, additional content such as blogs can be found at the churrodata.com web site.

Images

Today, churro container images are found on DockerHub here. These images are muulti-arch manifest images that include builds for both amd64 and arm64 architectures.

churro Cloud

Inquires about churro Cloud can be directed to [email protected]. In the near future, users will be able to provision a churro instance on the cloud of their choice, with billing and management handled by churrodata.com

Starting with churro

People generally start with churro by creating a kubernetes cluster, then deploying churro to your running cluster. Installation documentation is found on the churro github pages.

Contributing

Since churro is open source, you can view the source code and make contributions such as pull requests on our github.com site. We encourage users to log any issues they find on our github issues site.

Support

churro enterprise support and services are provided by churrodata.com.

Owner
churrodata
churrodata works on churro the Kubernetes based file/API ETL processor.
churrodata
Similar Resources

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dec 22, 2022

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Dec 30, 2022

Data syncing in golang for ClickHouse.

Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

Jan 1, 2023

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Jan 1, 2023

Machine is a library for creating data workflows.

Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

Dec 26, 2022

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

Dec 16, 2022

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 5, 2023

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.

Dec 29, 2022
Comments
  • bump mysql operator up to 0.5.0 and add mysql password support

    bump mysql operator up to 0.5.0 and add mysql password support

    in the UI, when you specify a mysql database for a pipeline, you can now specify the mysql password. Also, this PR bumps up the version of the mysql operator to the bitpoke version 0.5.0 which supports k8s 1.22.

  • change go.mod to use golang 1.16 and relocate templates so they can b…

    change go.mod to use golang 1.16 and relocate templates so they can b…

    …e embedded into churro-operator, remove template volume from churro-operator and instead look for it dynamically and if not found use the embedded templates

  • add Sheetname as a configuration within the pipeline CR, web console …

    add Sheetname as a configuration within the pipeline CR, web console …

    this PR adds a 'sheetname' field to the pipeline CR, it updates the UI to allow users an ability to specify the xlxs sheet name, if nothing is specified, then 'Sheet1' is the default. When an xlsx extractsource is created, the sheetname is carried forward for rthat extract processing.

xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Dec 2, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

Dec 14, 2022
Declarative streaming ETL for mundane tasks, written in Go
Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.

Dec 29, 2022
A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Dec 25, 2022
Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.
Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Prometheus Common Data Exporter Prometheus Common Data Exporter 用于将多种来源(如http响应报文、本地文件、TCP响应报文、UDP响应报文)的Json、xml、yaml或其它格式的数据,解析为Prometheus metric数据。

May 18, 2022
Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Dud Website | Install | Getting Started | Source Code Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Jan 1, 2023
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

Jan 1, 2023
Simple CRUD application using CockroachDB and Go

Simple CRUD application using CockroachDB and Go

Feb 20, 2022
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Nov 30, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 24, 2022