Stream data into Google BigQuery concurrently using InsertAll() or BQ Storage.

bqwriter Go Workflow Status Coverage Status GoDoc Go Report Card license

A Go package to write data into Google BigQuery concurrently with a high throughput. By default the InsertAll() API is used (REST API under the hood), but you can configure to use the Storage Write API (GRPC under the hood) as well.

The InsertAll API is easier to configure and can work pretty much out of the box without any configuration. It is recommended to use the Storage API as it is faster and comes with a lower cost. The latter does however require a bit more configuration on your side, including a Proto schema file as well. See the Storage example below on how to do (TODO).

import "github.com/OTA-Insight/bqwriter"

To install the packages on your system, do not clone the repo. Instead:

  1. Change to your project directory:
cd /path/to/my/project
  1. Get the package using the official Go tooling, which will also add it to your Go.mod file for you:
go get github.com/OTA-Insight/bqwriter

NOTE: This package is under development, and may occasionally make backwards-incompatible changes.

Go Versions Supported

We currently support Go versions 1.13 and newer.

Authorization

The streamer client will use Google Application Default Credentials for authorization credentials used in calling the API endpoints. This will allow your application to run in many environments without requiring explicit configuration.

Please open an issue should you require more advanced forms of authorization. The issue should come with an example, a clear statement of intention and motivation on why this is a useful contribution to this package. Even if you wish to contribute to this project by implementing this patch yourself, it is none the less best to create an issue prior to it, such that we can all be aligned on the specifics. Good communication is key here.

It was a choice to not support these advanced authorization methods for now. The reasons being that the package authors didn't have a need for it and it allowed to keep the API as simple and small as possible. There however some advanced authorizations still possible:

To conclude. We currently do not support advanced ways for Authorization, but we're open to include support for these, if there is sufficient interest for it.

Contributing

Contributions are welcome. Please, see the CONTRIBUTING document for details.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See Contributor Code of Conduct for more information.

Comments
  • [Proposal]: Connect to local BigQuery emulator

    [Proposal]: Connect to local BigQuery emulator

    Contact Details

    [email protected]

    Summary of your proposal

    To change the default connection url you can use the optionsFn parameters to the bigquery.NewClient.

    Making the impact as minimum as possible and not changing the behavior used today. The option.ClientOption can be part of the method calls where the bigquery.NewClient is called. "google.golang.org/api/option"

    e.g: func NewStreamer(ctx context.Context, projectID, dataSetID, tableID string, cfg *StreamerConfig, opts ...option.ClientOption) (*Streamer, error)

    where then they can be passed on to (storage, batch and insertall) e.g:

    client, err := storage.NewClient(
    					projectID, dataSetID, tableID,
    					encoder, protobufDescriptor,
    					logger,
    					opts..., 
    

    and the pass it on to the google bigquery client: writer, err := managedwriter.NewClient(ctx, projectID, opts...)

    To use in test code:

    bqWriter, err := bqwriter.NewStreamer(
    		ctx, projectId, datasetId, tableId,
    		&bqwriter.StreamerConfig{
    			...
    		},
    		option.WithEndpoint("localhost:9050"),
    		option.WithoutAuthentication(),
    	)
    	if err != nil {
    		panic(err)
    	}
    

    Motivation for your proposal

    The motivation for this change: To use the 'https://github.com/goccy/bigquery-emulator' as part of the integration tests during development and deploy pipeline.

    bigquery-emulator is a locally running biguery emulator, that easily can run during testing. Fort this to work though, local connection needs to be supported.

    Alternatives for your proposal

    The work around use today is a fork of the bqwriter master. There the OptionFn parameter is past to the used methods.

    Alternatively think about if the bqwriter should have it's own OptionFn pattern to handle optional parameters sent in on create.

    Version

    0.4.1 (Latest)

    What platform are you mostly using or planning to use our software on?

    Linux

    Code of Conduct

    • [X] I agree to follow this project's Code of Conduct
  • [Proposal]: use upstream bigquery/storage/managedwriter package instead of our forked version

    [Proposal]: use upstream bigquery/storage/managedwriter package instead of our forked version

    Contact Details

    No response

    Summary of your proposal

    Use the upstream bigquery/storage/managedwriter package instead of our forked version. One of the issues that ideally is resolved prior to doing so is https://github.com/googleapis/google-cloud-go/issues/5094.

    Motivation for your proposal

    The main motivation is that we have less code to maintain ourselves.

    Alternatives for your proposal

    The alternative is to either not use a managedWriter in which case we're going to reinvent the wheel most likely. The only other obvious option is to continue using our fork but that means we'll also need to maintain it ourselves, and do so mostly alone.

    Version

    0.3.1 (Latest)

    What platform are you mostly using or planning to use our software on?

    MacOS

    Code of Conduct

    • [X] I agree to follow this project's Code of Conduct
  • [Proposal]: support batch loading of data

    [Proposal]: support batch loading of data

    Contact Details

    No response

    Summary of your proposal

    We currently support the insertAll API and soon also the storage API. What we do not yet support is batch loading as documented in https://cloud.google.com/bigquery/docs/batch-loading-data. They give an example of a single file, but we could do it for any number of files as well as any reader in general.

    Need to investigate the specifics, but it does look like it is still in scope.

    Motivation for your proposal

    It's a different kind of use case of BQWriter, still writing data into BQ but for an entirely different purpose. For those purposes batch loading is better suited as the timing isn't as critical, and with that a reduction of cost is a given.

    Alternatives for your proposal

    Do not support it and explicitly document so instead.

    Version

    0.3.1 (Latest)

    What platform are you mostly using or planning to use our software on?

    MacOS

    Code of Conduct

    • [X] I agree to follow this project's Code of Conduct
  • Support connecting to local BigQuery emulator

    Support connecting to local BigQuery emulator

    Related issues

    A link to each issue (be it a proposal or bug) which this PR aims to resolve. Prior to starting a PR an issue is desired in order to ensure we're all aligned prior to you putting any of your valuable time into this project.

    Closes https://github.com/OTA-Insight/bqwriter/issues/10

    Description

    A few sentences describing the overall goals of the pull request's commits.

    ...

    Import remarks

    A summary/extract of the most important remarks perhaps already discussed in your description above.

    • ...

    Todos

    • [ ] Tests
    • [ ] Documentation
    • [ ] Self-Review
    • [ ] ...

    Impacted Areas in this Golang package:

    List general components of the OTA-Insight/bqwriter Golang package that this PR will affect:

    • ...

    Code of Conduct

    By submitting this pull request (PR), I Agree agree to follow this project's Code of Conduct.

  • update-v0.6.21

    update-v0.6.21

    Upgraded Dependencies:

    • golang.org/x/net: v0.0.0-20220526153639-5463443f8c37 => v0.0.0-20220607020251-c690dde0001d
    • golang.org/x/sync: v0.0.0-20220513210516-0976fa681c29 => v0.0.0-20220601150217-0de741cfad7f
    • google.golang.org/api: v0.81.0 => v0.82.0
    • google.golang.org/genproto: v0.0.0-20220527130721-00d5c0f3be58 => v0.0.0-20220607140733-d738665f6195
    • google.golang.org/grpc: v1.46.2 => v1.47.0
  • Update version v0.6.17

    Update version v0.6.17

    cmd: go get -u && go mod tidy

    Updated Dependencies:

    • update google.golang.org/api to v0.77.0 (was v0.75.0);

    Updated Indirect Dependencies:

    • update golang.org/x/sys, golang.org/x/net, and google.golang.org/genproto to latest (no semver);

    Added Indirect Dependencies:

    • add github.com/google/go-cmp v0.5.8
  • Fix typo

    Fix typo

    Related issues

    /

    Description

    There was a typo in the error message

    Import remarks

    /

    Todos

    • [ ] Tests
    • [ ] Documentation
    • [ ] Self-Review
    • [ ] ...

    Impacted Areas in this Golang package:

    List general components of the OTA-Insight/bqwriter Golang package that this PR will affect:

    • ...

    Code of Conduct

    By submitting this pull request (PR), I Agree agree to follow this project's Code of Conduct.

  • add initial benchmark code (insertAll works, storage fails)

    add initial benchmark code (insertAll works, storage fails)

    Related issues

    N/A

    Description

    A few sentences describing the overall goals of the pull request's commits.

    Be able to have benchmarks which function as end-to-end tests with real production infrastructure, and which also give some insights, be it basic, in some of the different clients and setups possible.

    Import remarks

    N/A

    Todos

    • [ ] Tests
    • [ ] Documentation
    • [ ] Self-Review

    Impacted Areas in this Golang package:

    List general components of the OTA-Insight/bqwriter Golang package that this PR will affect:

    • new benchmark package;
    • fix bugs here and there (std logger, storage API)

    Code of Conduct

    By submitting this pull request (PR), I Agree agree to follow this project's Code of Conduct.

  • Add batch client

    Add batch client

    Related issues

    A link to each issue (be it a proposal or bug) which this PR aims to resolve. Prior to starting a PR an issue is desired in order to ensure we're all aligned prior to you putting any of your valuable time into this project.

    https://github.com/OTA-Insight/bqwriter/issues/2

    Description

    A few sentences describing the overall goals of the pull request's commits.

    Create a new client that supports batch uploading described here: https://cloud.google.com/bigquery/docs/batch-loading-data

    Currently we support these formats

    • CSV
    • JSON
    • Avro
    • Parquet
    • ORC

    Import remarks

    A summary/extract of the most important remarks perhaps already discussed in your description above.

    Adds support for batch uploading of data.

    Todos

    • [x] Tests
    • [x] Documentation
    • [x] Self-Review

    Impacted Areas in this Golang package:

    List general components of the OTA-Insight/bqwriter Golang package that this PR will affect:

    • streamer.go
    • bigquery/batch

    Code of Conduct

    By submitting this pull request (PR), I Agree agree to follow this project's Code of Conduct.

  • initial storage API support (alpha)

    initial storage API support (alpha)

    Related issues

    N/A

    Description

    Adds storage API support. Very basic for now, and only for the DefaultStream.

    For production use it is also not yet recommended, until it is further tested and streamlined internally.

    Import remarks

    best to use the InsertAll API for production use until the Storage API has been tested and streamlined further

    Code of Conduct

    By submitting this pull request (PR), I Agree agree to follow this project's Code of Conduct.

  • align special files according to GH's name conventions

    align special files according to GH's name conventions

    Related issues

    A link to each issue (be it a proposal or bug) which this PR aims to resolve. Prior to starting a PR an issue is desired in order to ensure we're all aligned prior to you putting any of your valuable time into this project.

    N/A

    Description

    A few sentences describing the overall goals of the pull request's commits.

    Align the special files used by GitHub or as defined by its implicit conventions in order to better align with its ecosystem as a whole.

    Import remarks

    A summary/extract of the most important remarks perhaps already discussed in your description above.

    Todos

    • [ ] Documentation
    • [ ] Self-Review

    Impacted Areas in this Golang package:

    List general components of the OTA-Insight/bqwriter Golang package that this PR will affect:

    N/A

Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.
Prometheus Common Data Exporter can parse JSON, XML, yaml or other format data from various sources (such as HTTP response message, local file, TCP response message and UDP response message) into Prometheus metric data.

Prometheus Common Data Exporter Prometheus Common Data Exporter 用于将多种来源(如http响应报文、本地文件、TCP响应报文、UDP响应报文)的Json、xml、yaml或其它格式的数据,解析为Prometheus metric数据。

May 18, 2022
A stream processing API for Go (alpha)
A stream processing API for Go (alpha)

A data stream processing API for Go (alpha) Automi is an API for processing streams of data using idiomatic Go. Using Automi, programs can process str

Dec 28, 2022
CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

CUE is an open source data constraint language which aims to simplify tasks involving defining and using data.

Jan 1, 2023
xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL.

xyr [WIP] xyr is a very lightweight, simple and powerful data ETL platform that helps you to query available data sources using SQL. Supported Drivers

Dec 2, 2022
Baker is a high performance, composable and extendable data-processing pipeline for the big data era

Baker is a high performance, composable and extendable data-processing pipeline for the big data era. It shines at converting, processing, extracting or storing records (structured data), applying whatever transformation between input and output through easy-to-write filters.

Dec 14, 2022
Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Dud Website | Install | Getting Started | Source Code Dud is a lightweight tool for versioning data alongside source code and building data pipelines.

Jan 1, 2023
Feed pipe input into a Discord server via webhook.

Feed pipe input into a Discord server via webhook.

Oct 28, 2022
DEPRECATED: Data collection and processing made easy.

This project is deprecated. Please see this email for more details. Heka Data Acquisition and Processing Made Easy Heka is a tool for collecting and c

Nov 30, 2022
Open source framework for processing, monitoring, and alerting on time series data

Kapacitor Open source framework for processing, monitoring, and alerting on time series data Installation Kapacitor has two binaries: kapacitor – a CL

Dec 24, 2022
A library for performing data pipeline / ETL tasks in Go.
A library for performing data pipeline / ETL tasks in Go.

Ratchet A library for performing data pipeline / ETL tasks in Go. The Go programming language's simplicity, execution speed, and concurrency support m

Jan 19, 2022
A distributed, fault-tolerant pipeline for observability data

Table of Contents What Is Veneur? Use Case See Also Status Features Vendor And Backend Agnostic Modern Metrics Format (Or Others!) Global Aggregation

Dec 25, 2022
Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

kanzi Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go. modern: state-of-the-art algorithms are impleme

Dec 22, 2022
Data syncing in golang for ClickHouse.
Data syncing in golang for ClickHouse.

ClickHouse Data Synchromesh Data syncing in golang for ClickHouse. based on go-zero ARCH A typical data warehouse architecture design of data sync Aut

Jan 1, 2023
sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document formats like CSV or Excel.

sq: swiss-army knife for data sq is a command line tool that provides jq-style access to structured data sources such as SQL databases, or document fo

Jan 1, 2023
Machine is a library for creating data workflows.
Machine is a library for creating data workflows.

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms.

Dec 26, 2022
churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline applications.

Churro - ETL for Kubernetes churro is a cloud-native Extract-Transform-Load (ETL) application designed to build, scale, and manage data pipeline appli

Mar 10, 2022
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data
Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data

Dev Lake is the one-stop solution that integrates, analyzes, and visualizes software development data throughout the software development life cycle (SDLC) for engineering teams.

Dec 30, 2022
Simple CRUD application using CockroachDB and Go

Simple CRUD application using CockroachDB and Go

Feb 20, 2022
Stream data into Google BigQuery concurrently using InsertAll()

Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development. B

Dec 7, 2022
Go-bqstreamer - Stream data into Google BigQuery concurrently using InsertAll()

Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development. B

Dec 7, 2022