A library for performing data pipeline / ETL tasks in Go.

Last update: Jan 19, 2022

Comments: 0

Ratchet

A library for performing data pipeline / ETL tasks in Go.

The Go programming language's simplicity, execution speed, and concurrency support make it a great choice for building data pipeline systems that can perform custom ETL (Extract, Transform, Load) tasks. Ratchet is a library that is written 100% in Go, and let's you easily build custom data pipelines by writing your own Go code.

Ratchet provides a set of built-in, useful data processors, while also providing an interface to implement your own. Conceptually, data processors are organized into stages, and those stages are run within a pipeline. It looks something like:

Each data processor is receiving, processing, and then sending data to the next stage in the pipeline. All data processors are running in their own goroutine, so all processing is happening concurrently. Go channels are connecting each stage of processing, so the syntax for sending data will be intuitive for anyone familiar with Go. All data being sent and received is JSON, which provides for a nice balance of flexibility and consistency.

Getting Started

Check out the full Godoc reference:
Get Ratchet: go get github.com/dailyburn/ratchet

Ratchet comes with vendored dependencies so it can work out of the box if you use go1.6 (or go1.5 with the GO15VENDOREXPERIMENT environment variable set to 1).

However, if you prefer to vendor your own dependencies then you should move the dependencies out of ratchet's vendor/ folder and into your own. Read the vendor/vendor.json file to get a list of its dependencies and their versions. Ratchet works with the vendor-spec, so it will also work with the govendor dependency manager. After you have copied the dependencies into your project's vendor.json, you can download them into your project's vendor folder--along with ratchet--by running:
```
  govendor sync
  govendor add github.com/dailyburn/ratchet
  govendor add github.com/dailyburn/ratchet/data
  govendor add github.com/dailyburn/ratchet/logger
  govendor add github.com/dailyburn/ratchet/processors
  govendor add github.com/dailyburn/ratchet/util
```

While not necessary, it may be helpful to understand some of the pipeline concepts used within Ratchet's internals: https://blog.golang.org/pipelines

Why would I use this?

Ratchet could be used anytime you need to perform some type of custom ETL. At DailyBurn we use Ratchet mainly to handle extracting data from our application databases, transforming it into reporting-oriented formats, and then loading it into our dedicated reporting databases.

Another good use-case is when you have data stored in disparate locations that can't be easily tied together. For example, if you have some CSV data stored on S3, some related data in a SQL database, and want to combine them into a final CSV or SQL output.

In general, Ratchet tends to solve the type of data-related tasks that you end up writing a bunch of custom and difficult to maintain scripts to accomplish.

Similar Resources

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dec 22, 2022

Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

Jan 1, 2023

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Jan 1, 2023

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under t

Dec 16, 2022

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Dec 30, 2022

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline appli

Mar 10, 2022

go.pipeline is a utility library that imitates unix pipeline. It simplifies chaining unix commands (and other stuff) in Go.

go.pipeline go.pipeline is a utility library that imitates unix pipeline. It simplifies chaining unix commands (and other stuff) in Go. Installation g

May 8, 2022

Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform h

Dec 28, 2022

Declarative streaming ETL for mundane tasks, written in Go

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.

Dec 29, 2022

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - 💃🏻 Get started - 📄 Config sample - 📚 Commands List - 🛠 Support and Suggestions

Jan 6, 2023

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

#1 Golang live reload and task runner Content - ⭐️ Top Features - 💃🏻 Get started - 📄 Config sample - 📚 Commands List - 🛠 Support and Suggestions

Dec 31, 2022

xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Dec 2, 2022

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023

Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

Dec 14, 2022

A library for performing data pipeline / ETL tasks in Go.

Ratchet

A library for performing data pipeline / ETL tasks in Go.

Getting Started

Why would I use this?

Owner

Daily Burn

Similar Resources

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

Data syncing in golang for ClickHouse.

sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

go.pipeline is a utility library that imitates unix pipeline. It simplifies chaining unix commands (and other stuff) in Go.

Declarative streaming ETL for mundane tasks, written in Go

Declarative streaming ETL for mundane tasks, written in Go

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.

xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Package tasks is an easy to use in-process scheduler for recurring tasks in Go

Delay-tasks - A delayed tasks implementation for golang

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

A library for performing OAuth Device flow and Web application flow in Go client apps.

A distributed, fault-tolerant pipeline for observability data

Related tags

Declarative streaming ETL for mundane tasks, written in Go

xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

Baker is a high performance, composable and extendable data-processing pipeline for the big data era

A distributed, fault-tolerant pipeline for observability data

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Machine is a library for creating data workflows.

DEPRECATED: Data collection and processing made easy.

Open source framework for processing, monitoring, and alerting on time series data