Build platforms that flexibly mix SQL, batch, and stream processing paradigms

Gazette Logo

Gazette Continuous Integration GoDoc Slack Go Report Card

Overview

Gazette makes it easy to build platforms that flexibly mix SQL, batch, and millisecond-latency streaming processing paradigms. It enables teams, applications, and analysts to work from a common catalog of data in the way that's most convenient to them. Gazette's core abstraction is a "journal" -- a streaming append log that's represented using regular files in a BLOB store (i.e., S3).

The magic of this representation is that journals are simultaneously a low-latency data stream and a collection of immutable, organized files in cloud storage (aka, a data lake) -- a collection which can be directly integrated into familiar processing tools and SQL engines.

Atop the journal broker service, Gazette offers a powerful consumers framework for building streaming applications in Go. Gazette has served production use cases for nearly five years, with deployments scaled to millions of streamed records per second.

Where to Start

Owner
Gazette
Development of Gazette and related projects.
Gazette
Comments
  • Add max-txn-size flag to gazctl

    Add max-txn-size flag to gazctl

    This change is Reviewable

    Connects #154 Connects #159

    Please review commit by commit. First commit: There is an outstanding ticket (#154) to create a loopback server for testing against a consumer. This implements a basic loopback server in the model of the broker loopback server. This is a first pass implementation and sought to solve the problems introduced in writing tests for the second commit. As this testing tool is used more I expect it will be expanded upon.

    Second commit: Add a max transaction size flag to all of the apply/edit commands in gazctl.

  • WriteService: remove

    WriteService: remove "stop-writes" disk monitor and semantics

    Stop-writes attempts to monitor local disk, and if the disk fills beyond a threshold will block all attempts to write to Gazette (which requires local spooling) until disk usage has dropped.

    This behavior has bitten us more than it's helped, because stop-writes often results in ensemble deadlocks which can only be resolved by letting writes proceed. It also often impacts (sometimes important) services which are innocent bystanders and not the primary culprits of disk usage.

    Alternatives include monitoring at the application level, implementing quotas on disks assigned to applications, and likely other approaches.


    This change is Reviewable

  • gazctl: journals prune-forks

    gazctl: journals prune-forks

    As an operator, I want to be able to list and/or prune fragments in the backing stores which correspond to "dead end" forks in history and represent some time range or offset range.

    These forks commonly occur when gazette pods are deleted and can create a lot of clutter in the cloud storage. An operator may want to explicitly delete clean these up after an on-call issue even though they have not exceeded the configured retention.

    • This functionality could be hooked into gazctl journals prune so that it automatically removes forked fragments, avoiding the caveat given in its description.
  • Use Kubernetes Service dns for service discovery.

    Use Kubernetes Service dns for service discovery.

    Prefer Kubernetes Service dns over env vars for service discovery because env vars aren't available across Namespaces or for ExternalName Services, and env vars are required at Pod startup, whereas dns is more resilient to Endpoint updates.

    Remove 127.0.0.1 as the default for discovering etcd and gazette and expect Kubernetes first. Finally, remove _SERVICE_HOST and _SERVICE_PORT since this will clobber the dns service discovery in favor of the env vars, which we don't prefer.


    This change is Reviewable

  • Make socks server endpoint a flag.

    Make socks server endpoint a flag.

    Previously, the socks server endpoint is discovered from the environment variables SOCKS_SERVER_SERVICE_HOST and SOCKS_SERVER_SERVICE_PORT.

    This change keeps this behavior but also allows the endpoint to be configured via cli flag.

    This change is for making it standard to discover Kubernetes Services using DNS rather than environment variables because we use ExternalName Services for cross-Namespace service discovery and ExternalName Services don't have environment variable support.


    This change is Reviewable

  • task.Group refactor & other fixes

    task.Group refactor & other fixes

    This is a collection of patches addressing outstanding issues I'm aware of from recovery during recent net-splits.

    Testing:

    • I've done a bunch of flake hunting with unit tests (eg --count=X00) and feel pretty confident here.
    • I've run it through integration tests on a local stack. Plan to do more manual testing here, but wanted to kick this over now.

    Fixes #197, #198


    This change is Reviewable

  • fragment flush interval

    fragment flush interval

    We want to run a process on gazette fragments which we'd like to have include all fragments from a previous 24hr window (e.g. 12:00:00 AM PT-11:59:59 PM PT).

    Today it's difficult to determine when to run the aforementioned process because there is no guarantee as to when a spool created prior to the 24h window will close and upload the remaining events from that window to cloud storage. Especially if there is high variability in the number of writes to the consumed topic, the timing could be highly variable (I believe). Currently the only way to make a guarantee that we have included all fragments from a given window is to read all fragments until we hit one from a timestamp after the window (and perhaps fail our process if no such fragment is read).

    The goal would be to have a process (perhaps a cron) which submits a request to the broker to close its current spool (even if said spool is still relatively small) and then upload that spool to cloud storage. This should provide a much easier ability to guarantee we are processing all fragments from a previous time frame.

  • INTL-549 Fetch S3 region dynamically

    INTL-549 Fetch S3 region dynamically

    @joshk0 could you take a look at this one please? Please lmk if I should tag someone else (although you seem to have contributed quite a lot to this file).


    This change is Reviewable

  • Implement

    Implement "gazctl {journals,shards} edit"

    I recommend combining commits for review.

    In addition to enhancing gazctl, I added CircleCI branch filters so branches that start with "draft-" or "draft/" do not trigger builds. This allows developers a way to push partial work to GH without flooding their inbox with build failure notifications.

    Connects #99


    This change is Reviewable

  • gazctl: shards prune

    gazctl: shards prune

    As an operator, I want to provide a shard label selector, and have journal fragments of matched ShardSpec recovery logs that are not required for recovering from any of a shard’s current hints, to be deleted.

    (Make gRPC api for fetching current hints?)

  • Identify and prune obsolete log fragments

    Identify and prune obsolete log fragments

    This PR adds a gazctl shard prune-log command, as well as a number of related underlying changes.

    Log pruning identifies Fragments of a hinted recovery log which can safely be removed, while preserving the ability to play back the pruned hints.

    The intention is that regular pruning would be operational-ized by pointing gazctl as .lastRecovered hints of consumer shards.

    Extensive testing against the stream-sum example was performed by:

    • Heavily and continuously loading a local stack
    • Regularly deleting the stream-sum-summer consumer pods, to force playback and recovery.
    • Regularly using the gazctl shard prune-log command to prune both .lastRecovered and also primary hint paths.
    • Verifying that no playback errors occurred.

    This change is Reviewable

  • build(deps): bump certifi from 2019.9.11 to 2022.12.7 in /docs

    build(deps): bump certifi from 2019.9.11 to 2022.12.7 in /docs

    Bumps certifi from 2019.9.11 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.


    This change is Reviewable

  • Cannot use `delete` in `common` section with `shards edit` subcommand

    Cannot use `delete` in `common` section with `shards edit` subcommand

    The docs on deleting shard specs say that you can add delete: true under the common section. This does not work, though, since we unmarshal that field into a pc.ShardSpec, which does not contain the delete property.

    We should update either the docs or the code here.

  • Append Service Metrics

    Append Service Metrics

    Adds a prometheus collector implementation to the broker/client package that allows tracking of disk usage across file-backed appendBuffers.

    If AppendRPCs are retrying, or taking longer than normal, users can timeout waits on the AsyncAppend and alert on the rate of these timeouts to indicate that there is potential unhealthiness in the write path. While this is happening its useful to see the amount of data that the producers have buffered on disk. Given limited disk resources this is the key indicator for how much time the operator has to resolve any issues before there are real availability issues / data loss.


    This change is Reviewable

  • recovery: retry on fetching log spec

    recovery: retry on fetching log spec

    Observed from a consumer application on a recent release, where gazette and the consumer were rolling in tandem (the consumer happened to pick a gazette broker that was exiting):

    {"err":"beginRecovery: fetching log spec: rpc error: code = Unavailable desc = error reading from server: read tcp 10.0.0.35:51576-\u003e10.3.247.101:8080: read: connection reset by peer","level":"error","msg":"serveStandby failed","shard":"/gazette/consumers/flow/reactor/items/ ... ","time":"2021-09-24T18:58:07Z"}
    
  • service: retry on transport-level errors from Etcd within keyspace.Watch

    service: retry on transport-level errors from Etcd within keyspace.Watch

    During a recent automated GKE upgrade, all brokers and Etcd pods were simultaneously signaled to exit (not ideal, but also not the issue at hand).

    Etcd pods exited, and on the way out Gazette brokers observed transport-level errors which were treated as terminal, and caused a controlled but fatal shutdown across all brokers (along with a pod restart):

    {"err":"service.Watch: rpc error: code = Unknown desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: NO_ERROR, debug data: ","level":"fatal","msg":"broker task failed","time":"2021-09-01T14:08:13Z"}
    

    The shutdown was controlled -- no data loss is believed or expected to have occurred -- but it did cause cluster consistency to be lost and require operator intervention (gazctl journals reset-head).

    What should happen instead

    Brokers should have retried the Etcd watch on this transport-level error.

Batch messages over a time interval

timebatch timebatch is a package for batching messages over a time interval. This can be useful for receiving messages that occur "quickly" and sendin

Nov 3, 2021
Topictool - Batch replace, add or remove Github repository topic labels

Topictool CLI Tool to manage topic labels on Github repositories Installation go

Feb 3, 2022
Using golang to produce data to kinesis data stream

Using golang to produce data to kinesis data stream What is this The idea behind this repo was to quickly determine how easy it would be to add a serv

Dec 22, 2022
Vigia-go-nats - Program for processing camera metadata

VIGIA MIGRAR O HOUSEKEEPER PARA O PYTHON Programa para processamento de metadado

Jan 10, 2022
⚡️ A lightweight service that will build and store your go projects binaries, Integrated with Github, Gitlab, Bitbucket and Bitbucket Server.
⚡️ A lightweight service that will build and store your go projects binaries, Integrated with Github, Gitlab, Bitbucket and  Bitbucket Server.

Rabbit A lightweight service that will build and store your go projects binaries. Rabbit is a lightweight service that will build and store your go pr

Nov 19, 2022
💨 A real time messaging system to build a scalable in-app notifications, multiplayer games, chat apps in web and mobile apps.
💨 A real time messaging system to build a scalable in-app notifications, multiplayer games, chat apps in web and mobile apps.

Beaver A Real Time Messaging Server. Beaver is a real-time messaging server. With beaver you can easily build scalable in-app notifications, realtime

Jan 1, 2023
Build event-driven and event streaming applications with ease

Commander ?? Commander is Go library for writing event-driven applications. Enabling event sourcing, RPC over messages, SAGA's, bidirectional streamin

Dec 19, 2022
🔥 A fast and beautiful command line tool to build API requests.
🔥 A fast and beautiful command line tool to build API requests.

Poodle A fast and beautiful command line tool to build API requests ?? Check out the full Demo! Poodle is an interactive command line tool to build an

Aug 23, 2022
💨A well crafted go packages that help you build robust, reliable, maintainable microservices.

Hippo A Microservices Toolkit. Hippo is a collection of well crafted go packages that help you build robust, reliable, maintainable microservices. It

Aug 11, 2022
Go library to build event driven applications easily using AMQP as a backend.

event Go library to build event driven applications easily using AMQP as a backend. Publisher package main import ( "github.com/creekorful/event" "

Dec 5, 2021
Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

Cadence Visit cadenceworkflow.io to learn about Cadence. This repo contains the source code of the Cadence server. To implement workflows, activities

Jan 9, 2023
The Xiaomi message push service is a system-level channel on MIUI and is universal across the platform, which can provide developers with stable, reliable, and efficient push services.

Go-Push-API MiPush、JiPush、UMeng MiPush The Xiaomi message push service is a system-level channel on MIUI and is universal across the platform, which c

Oct 20, 2022
May 11, 2023
⚡ HTTP/2 Apple Push Notification Service (APNs) push provider for Go — Send push notifications to iOS, tvOS, Safari and OSX apps, using the APNs HTTP/2 protocol.

APNS/2 APNS/2 is a go package designed for simple, flexible and fast Apple Push Notifications on iOS, OSX and Safari using the new HTTP/2 Push provide

Jan 1, 2023
Asynq: simple, reliable, and efficient distributed task queue in Go
Asynq: simple, reliable, and efficient distributed task queue in Go

Asynq Overview Asynq is a Go library for queueing tasks and processing them asynchronously with workers. It's backed by Redis and is designed to be sc

Dec 30, 2022
Emits events in Go way, with wildcard, predicates, cancellation possibilities and many other good wins

Emitter The emitter package implements a channel-based pubsub pattern. The design goals are to use Golang concurrency model instead of flat callbacks

Jan 4, 2023
Glue - Robust Go and Javascript Socket Library (Alternative to Socket.io)

Glue - Robust Go and Javascript Socket Library Glue is a real-time bidirectional socket library. It is a clean, robust and efficient alternative to so

Nov 25, 2022
Declare AMQP entities like queues, producers, and consumers in a declarative way. Can be used to work with RabbitMQ.

About This package provides an ability to encapsulate creation and configuration of RabbitMQ([AMQP])(https://www.amqp.org) entities like queues, excha

Dec 28, 2022
Abstraction layer for simple rabbitMQ connection, messaging and administration
Abstraction layer for simple rabbitMQ connection, messaging and administration

Jazz Abstraction layer for quick and simple rabbitMQ connection, messaging and administration. Inspired by Jazz Jackrabbit and his eternal hatred towa

Dec 12, 2022