Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Dkron

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments GoDoc Actions Status Gitter

Website: http://dkron.io/

Dkron is a distributed cron service, easy to setup and fault tolerant with focus in:

  • Easy: Easy to use with a great UI
  • Reliable: Completely fault tolerant
  • High scalable: Able to handle high volumes of scheduled jobs and thousands of nodes

Dkron is written in Go and leverage the power of the Raft protocol and Serf for providing fault tolerance, reliability and scalability while keeping simple and easily installable.

Dkron is inspired by the google whitepaper Reliable Cron across the Planet and by Airbnb Chronos borrowing the same features from it.

Dkron runs on Linux, OSX and Windows. It can be used to run scheduled commands on a server cluster using any combination of servers for each job. It has no single points of failure due to the use of the Gossip protocol and fault tolerant distributed databases.

You can use Dkron to run the most important part of your company, scheduled jobs.

Installation

Installation instructions

Full, comprehensive documentation is viewable on the Dkron website

Development Quick start

The best way to test and develop dkron is using docker, you will need Docker installed before proceding.

Clone the repository.

Next, run the included Docker Compose config:

docker-compose up

This will start Dkron instances. To add more Dkron instances to the clusters:

docker-compose up --scale dkron-server=4
docker-compose up --scale dkron-agent=10

Check the port mapping using docker-compose ps and use the browser to navigate to the Dkron dashboard using one of the ports mapped by compose.

To add jobs to the system read the API docs.

Frontend development

Dkron dashboard is built using React Admin as a single page application.

To start developing the dashboard enter the ui directory and run npm install to get the frontend dependencies and then start the local server with npm start it should start a new local web server and open a new browser window serving de web ui.

Make your changes to the code, then run make ui to generate assets files. This is a method of embedding resources in Go applications.

Resources

Chef cookbook https://supermarket.chef.io/cookbooks/dkron

Python Client Library https://github.com/oldmantaiter/pydkron

Ruby client https://github.com/jobandtalent/dkron-rb

PHP client https://github.com/gromo/dkron-php-adapter

Terraform provider https://github.com/peertransfer/terraform-provider-dkron

Get in touch

Sponsor

This project is possible thanks to the Support of Jobandtalent

Owner
Comments
  • Failed jobs count increasing with new jobs added - v2.0.0-b5

    Failed jobs count increasing with new jobs added - v2.0.0-b5

    Use this template if you are reporting a bug. Don't use this template of you are proposing a feature.

    Expected Behavior

    When adding jobs, if they have not run yet, should not increase failed jobs count

    Actual Behavior

    For each new job added, it increases failed job count

    Steps to Reproduce the Problem

    1. Add a job via API
    2. Failed job count increases

    Specifications

    • Version:v2.0.0-b5
    • Platform:linux
    • Backend store:default
  • Skipped jobs

    Skipped jobs

    Expected Behavior

    Jobs should be executed on time without missing

    Actual Behavior

    Some of our jobs skip their execution once, and are not executed again, because the "next execution time" doesn't arrive (since it's in the past) So far it only happens to our jobs that their schedule intervals are longer than 6 hours(I believe its a coincidence but still won't hurt to mention)

    Steps to Reproduce the Problem

    1.create job and schedule it to run every day (@every 1440m) 2.check the next day if it ran

    Specifications

    • Version:2.0.0-rc3
    • Platform:linux
    • Backend store: default boltDB
  • Executor not working

    Executor not working

    Hey,

    I migrate from command (since declared as deprecated) to shell executor and I got the following issue from the agent :

    INFO[2018-05-23T19:08:20Z] agent: Starting job                           job=stats_update node=04fe58fe3cf0
    ERRO[2018-05-23T19:08:20Z] invoke: Specified executor is not present     executor="<nil>" node=04fe58fe3cf0
    

    with following config :

    {
        "name": "stats_update",
        "schedule": "@every 15s",
        "executor": "shell",
        "executor_config": {
            "command": "./stats_update",
            "shell": false
        },
        "concurrency": "forbid",
        "tags": {
            "role": "manager:1"
        }
    }
    

    I'm on 0.10.1 dkron version.

    My config is bad or it's bug ?

  • "Invalid memory address or nil pointer dereference" error

    Hey. I've got a bunch of instances running dkron with a Consul cluster as their backing store. I'm trying to update from Dkron 0.7 to 0.9.1. The instances that are running 0.9.1 throw this error:

    time="2016-10-25T14:52:07-04:00" level=info msg="agent: Listen for events" node=ip-10-0-15-163
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal 0xb code=0x1 addr=0x0 pc=0x48897c]
    
    goroutine 50 [running]:
    panic(0xbb9dc0, 0xc820010060)
            /Users/victorcoder/src/github.com/mxcl/homebrew/Cellar/go/1.6.3/libexec/src/runtime/panic.go:481 +0x3e6
    github.com/victorcoder/dkron/dkron.(*AgentCommand).eventLoop(0xc82000c4e0)
            /Users/victorcoder/src/github.com/victorcoder/dkron/dkron/agent.go:490 +0x1abc
    created by github.com/victorcoder/dkron/dkron.(*AgentCommand).Run
            /Users/victorcoder/src/github.com/victorcoder/dkron/dkron/agent.go:298 +0x362
    Starting Dkron agent...
    time="2016-10-25T14:52:07-04:00" level=info msg="agent: Dkron agent starting" node=ip-10-0-15-163
    

    I'm not doing anything crazy here, I don't think. It's been working for months. Any ideas?

    More debug:

    // /etc/dkron/dkron.json
    {
      "backend": "consul",
      "backend_machine": "consul.internal:8500",
      "keyspace": "bkg-process",
      "http_addr": ":8988",
      "server": true,
      "encrypt": "KEY",
      "tags": {
        "role": "background_processing"
      },
      "join": "background-processing.service.dc1.consul"
    }
    
    $ curl 127.0.0.1:8988/v1/jobs
    [{"name":"cron_job","schedule":"0 * * * * *","shell":false,"command":"/queue.sh","owner":"Me,"owner_email":"[email protected]","success_count":254066,"error_count":0,"last_success":"2016-10-25T15:00:00.23316342-04:00","last_error":"0001-01-01T00:00:00Z","disabled":false,"tags":{"role":"background_processing:1"},"retries":0,"dependent_jobs":null,"parent_job":""}]
    
  • 0.11.2 Concurrency 'forbid' not always respected

    0.11.2 Concurrency 'forbid' not always respected

    Originally posted to #377

    Hi @victorcoder

    I'm on 0.11.2 and have noticed that the concurrency: forbid is not always respected. I have a very long running process (nearly 2 hours) that I poll every 15 minutes. It is essential that it only one instance is running.

    I use Consul as the backend, so thought that might be my issue, but I just completed a migration of the Consul cluster, and the Dkron masters, to greatly reduce the node count, and the latency between the nodes. But dkron seems to erroneously mark an event as complete (erroneously, because I can still see the task is running via ps aux) and fires again at the next schedule. If I forcibly kill the event that was marked complete, it gets updated with the new completed timestamp.

    I cannot reliably reproduce the issue, as it seems random. Please let me know what I can do to help track down the cause.

    Geoff

  • Scheduled jobs stop execution after leader re-election

    Scheduled jobs stop execution after leader re-election

    My cluster consists of 3 server node (Dkron 3.1.7):

    [opc@nonprod-exa-services-0 ~]$ dkron version
    Name: Dkron
    Version: 3.1.7
    Codename: merichuas
    Agent Protocol: 5 (Understands back to: 2)
    

    All nodes are configured as follows:

    [opc@nonprod-exa-services-0 ~]$ cat /etc/dkron/dkron.yml
    server: true
    bootstrap-expect: 3
    join:
    - nonprod-exa-services-0.nonprodexasvc.nonprodvcn.oraclevcn.com
    - nonprod-exa-services-1.nonprodexasvc.nonprodvcn.oraclevcn.com
    - nonprod-exa-services-2.nonprodexasvc.nonprodvcn.oraclevcn.com
    log-level: debug
    

    This cluster has been running for ~3 weeks and able to execute scheduled jobs

    [opc@nonprod-exa-services-0 ~]$ systemctl status dkron
    ● dkron.service - Dkron Agent
       Loaded: loaded (/usr/lib/systemd/system/dkron.service; disabled; vendor preset: disabled)
       Active: active (running) since Wed 2021-06-02 10:19:43 CEST; 3 weeks 1 days ago
         Docs: https://dkron.io
     Main PID: 6940 (dkron)
        Tasks: 95
       Memory: 202.0M
       CGroup: /system.slice/dkron.service
               ├─6940 /usr/bin/dkron agent
               ├─6945 /usr/bin/dkron-processor-files
               ├─6952 /usr/bin/dkron-processor-fluent
               ├─7049 /usr/bin/dkron-processor-log
               ├─7056 /usr/bin/dkron-processor-syslog
               ├─7062 /usr/bin/dkron-executor-gcppubsub
               ├─7069 /usr/bin/dkron-executor-http
               ├─7076 /usr/bin/dkron-executor-kafka
               ├─7083 /usr/bin/dkron-executor-nats
               ├─7088 /usr/bin/dkron-executor-rabbitmq
               └─7096 /usr/bin/dkron-executor-shell
    
    

    ISSUE Details:

    One of the nodes is unable to contact the other 2 nodes (could be an intermittent network issue) causing a re-election. After re-election none of the scheduled jobs are executing

    Log snippet from one of the nodes. The complete logs from the 3 nodes are attached.

    o msg="2021/06/23 23:28:42 [DEBUG] memberlist: Initiating push/pull sync with: nonprod-exa-services-1 10.106.51.22:8946"
    o msg="2021-06-23T23:28:44.338+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-0 time=500.962849ms"
    o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-1 time=500.680357ms"
    o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-0 time=552.570599ms"
    o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact quorum of nodes, stepping down"
    o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: entering follower state: follower=\"Node at 10.106.51.29:6868 [Follower]\" leader="
    o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: aborting pipeline replication: peer=\"{Voter nonprod-exa-services-1 10.106.51.22:6868}\""
    o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: aborting pipeline replication: peer=\"{Voter nonprod-exa-services-0 10.106.51.18:6868}\""
    ug msg="dkron: shutting down leader loop" node=nonprod-exa-services-2
    ug msg="scheduler: Stopping scheduler" node=nonprod-exa-services-2
    o msg="dkron: cluster leadership lost" node=nonprod-exa-services-2
    o msg="dkron: monitoring leadership" node=nonprod-exa-services-2
    o msg="2021-06-23T23:28:46.085+0200 [WARN]  raft: heartbeat timeout reached, starting election: last-leader="
    o msg="2021-06-23T23:28:46.085+0200 [INFO]  raft: entering candidate state: node=\"Node at 10.106.51.29:6868 [Candidate]\" term=54"
    o msg="2021-06-23T23:28:46.092+0200 [DEBUG] raft: votes: needed=2"
    o msg="2021-06-23T23:28:46.092+0200 [DEBUG] raft: vote granted: from=nonprod-exa-services-2 term=54 tally=1"
    o msg="2021-06-23T23:28:46.093+0200 [DEBUG] raft: vote granted: from=nonprod-exa-services-1 term=54 tally=2"
    o msg="2021-06-23T23:28:46.093+0200 [INFO]  raft: election won: tally=2"
    o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: entering leader state: leader=\"Node at 10.106.51.29:6868 [Leader]\""
    o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: added peer, starting replication: peer=nonprod-exa-services-1"
    o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: added peer, starting replication: peer=nonprod-exa-services-0"
    o msg="dkron: cluster leadership acquired" node=nonprod-exa-services-2
    o msg="dkron: monitoring leadership" node=nonprod-exa-services-2
    o msg="2021-06-23T23:28:46.095+0200 [INFO]  raft: pipelining replication: peer=\"{Voter nonprod-exa-services-1 10.106.51.22:6868}\""
    o msg="2021-06-23T23:28:46.097+0200 [INFO]  raft: pipelining replication: peer=\"{Voter nonprod-exa-services-0 10.106.51.18:6868}\""
    o msg="agent: Starting scheduler" node=nonprod-exa-services-2
    
  • Cluster panic

    Cluster panic

    Some time after start, all nodes of my cluster (two servers and two agents) crash with the same error:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x160 pc=0x948a8a]
    
    goroutine 623378 [running]:
    google.golang.org/grpc.(*Server).Stop(0x0)
            /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1482 +0x4a
    github.com/distribworks/dkron/v3/plugin.(*ExecutorClient).Execute(0xc00091e6a0, 0xc00b231640, 0x357c300, 0xc000e27820, 0xc0011bc980, 0x25bb801, 0xc0044c6150)
            /home/runner/work/dkron/dkron/plugin/executor.go:65 +0x190
    github.com/distribworks/dkron/v3/dkron.(*AgentServer).AgentRun(0xc0001295b8, 0xc0044c6060, 0x35e2620, 0xc003374e20, 0x0, 0x0)
            /home/runner/work/dkron/dkron/dkron/grpc_agent.go:83 +0x6f4
    github.com/distribworks/dkron/v3/plugin/types._Agent_AgentRun_Handler(0x2561d80, 0xc0001295b8, 0x35dcda0, 0xc0002186c0, 0x4ba5ae0, 0xc000e2a200)
            /home/runner/work/dkron/dkron/plugin/types/dkron.pb.go:1862 +0x109
    google.golang.org/grpc.(*Server).processStreamingRPC(0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200, 0xc000b56150, 0x4b45e20, 0x0, 0x0, 0x0)
            /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1329 +0xcbc
    google.golang.org/grpc.(*Server).handleStream(0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200, 0x0)
            /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1409 +0xc64
    google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc001ab37e0, 0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200)
            /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:746 +0xa1
    created by google.golang.org/grpc.(*Server).serveStreams.func1
            /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:744 +0xa1
    runtime: note: your Linux kernel may be buggy
    runtime: note: see https://golang.org/wiki/LinuxKernelSignalVectorBug
    runtime: note: mlock workaround for kernel bug failed with errno 12
    

    Environment

    • Dkron 3.0.5

    • Docker (servers: Dockerfile, run script, agents: Dockerfile, run script)

    • uname -a:

        Linux dkron-server-1.backpack.test 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
      
  • Skipped Jobs

    Skipped Jobs

    1. A job is not picked for the next run and so miss all subsequent runs.
    2. Also when a job is run manually using the play button on the dkron dashboard the next run time it picks is the current time (I have been using @every construct of the cronspec).
    3. On using the API doc to create new jobs, the first run is executed successfully but next run time is not updated to so all subsequent runs fail.

    Steps to reproduce the behavior:

    1. Create a new job using @every construct of the cronspec, second and the following runs fail.
    2. Use the play button on the dashboard. Next run time will be picked as the current time.
    • OS: centos
    • Version 2 rc9
    Screenshot 2019-11-18 at 9 56 40 AM
  • How to group jobs

    How to group jobs

    I would like to know if there is a way to group jobs, like a custom tag or something. My requirement is that I need to create several jobs for an user, and at some point I need to delete all jobs which were not executed. I tried to do it using "tags" (tags: { userId: "123" }) but the http executor does not run, although the job is created.

  • Filter jobs by tags

    Filter jobs by tags

    Hello guys,

    let me introduce this little change which could be useful for somebody else.

    From now on we can assign labels to the jobs. And query (GET /jobs) them with labels filter. This change could be useful for the users who have to build UI and display background jobs on their own. One of the use case when you have a different kind of jobs like the system jobs and some application related and you don't want the app to see the jobs not assigned to it.

    Exactly our case is that we have 3 types of jobs and 3 different UIs to display the jobs and their status. All jobs should now know about each other. Moreover, we have that 3 UI build for 12 countries. Which lead to 36 UIs. Before that change, we used prefixes like country-app_name-actual_name_of_the_cron and it's getting annoying now.

    Feel free to comment and accept ;)

    Good luck


    UPDATE: 2018-08-29

    No more labels, use tags to filter jobs.

  • Scheduled jobs sometimes not being executed

    Scheduled jobs sometimes not being executed

    Describe the bug Hi, There's a one issue that I can't pinpoint and need the help. We have around 50 jobs that are being scheduled at different times and we're using master version. Sometimes, by no reason (so far) random jobs at random times are not executed, despite visible they're scheduled. I've modified logs a bit to see a flow of a job from time being scheduled to end of execution. During the issue the flow is the following:

    time="2020-04-21T02:00:00Z" level=debug msg="scheduler: Run job" job=sync_local_ip_list_dms_0 node=dkron-server-1 schedule="0 0 2 * * *"
    time="2020-04-21T02:00:00Z" level=debug msg="store: Retrieved job from datastore" job=sync_local_ip_list_dms_0 node=dkron-server-1
    time="2020-04-21T02:00:00Z" level=info msg="agent: Sending query" job_name=sync_local_ip_list_dms_0 node=dkron-server-1 query="run:job"
    time="2020-04-21T02:00:00Z" level=debug msg="store: Setting job" job=sync_local_ip_list_dms_0 node=dkron-server-1
    time="2020-04-21T02:00:00Z" level=debug msg="store: Retrieved job from datastore" job=sync_local_ip_list_dms_0 node=dkron-server-1
    time="2020-04-21T02:00:00Z" level=debug msg="agent: Filtered nodes to run" job_name=sync_local_ip_list_dms_0 job_name_from_function=sync_local_ip_list_dms_0 node=dkron-server-1 nodes="[dms-dkron-kapp-5dbf554656-vr99b]"
    time="2020-04-21T02:00:00Z" level=debug msg="agent: Filtered tags to run" job_name=sync_local_ip_list_dms_0 job_name_from_function=sync_local_ip_list_dms_0 node=dkron-server-1 tags="map[dms:cron region:global]"
    time="2020-04-21T02:00:00Z" level=debug msg="agent: Sending query" job_name=sync_local_ip_list_dms_0 json="{\"execution\":{\"job_name\":\"sync_local_ip_list_dms_0\",\"started_at\":\"0001-01-01T00:00:00Z\",\"finished_at\":\"0001-01-01T00:00:00Z\",\"success\":false,\"group\":1587434400017697229,\"attempt\":1},\"rpc_addr\":\"172.25.4.133:6868\"}" node=dkron-server-1 query="run:job"
    time="2020-04-21T02:00:08Z" level=debug msg="agent: Done receiving acks and responses" fields.time=8.014626936s job=sync_local_ip_list_dms_0 node=dkron-server-1 query="run:job"
    

    In normal cases, before "Done receiving acks and responses" step, there are also steps: "agent: Received ack" and "agent: Received response"

    It looks like chosen agent node in step "Filtered nodes to run" by some reason is ignoring or not receiving broadcasted query.

    From the view of agent node, there is simply no log message "Running job" for a given job. Though there are visible log messages from serf like

    time="2020-04-21T02:00:00Z" level=info msg="2020/04/21 02:00:00 [DEBUG] serf: messageQueryType: run:job"
    

    Due to an intense workload, it's hard to tell if that log message is generated from a missed job.

    What can we do about this? To pinpoint the problem and fix it.

    Expected behavior No misses!

  • chore(deps): bump github.com/nats-io/nats.go from 1.21.0 to 1.22.1

    chore(deps): bump github.com/nats-io/nats.go from 1.21.0 to 1.22.1

    Bumps github.com/nats-io/nats.go from 1.21.0 to 1.22.1.

    Release notes

    Sourced from github.com/nats-io/nats.go's releases.

    Release v1.22.1

    Changelog

    Changed

    • Service API:
      • Monitoring subjects for a service are no longer uppercase (#1166)
      • Changed RequestHandler signature to no longer return an error (#1166)

    Complete Changes

    https://github.com/nats-io/nats.go/compare/v1.22.0...v1.22.1

    Release v1.22.0

    Changelog

    Overview

    This release adds a beta implementation of micro package, which provides API for creating and monitoring microservices on top of NATS connection.

    Added

    • Service API beta implementation (#1160)
    • Getters for connection callbacks (#1162)

    Complete Changes

    https://github.com/nats-io/nats.go/compare/v1.21.0...v1.22.0

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • chore(deps): bump github.com/gin-gonic/gin from 1.8.1 to 1.8.2

    chore(deps): bump github.com/gin-gonic/gin from 1.8.1 to 1.8.2

    Bumps github.com/gin-gonic/gin from 1.8.1 to 1.8.2.

    Release notes

    Sourced from github.com/gin-gonic/gin's releases.

    v1.8.2

    Changelog

    Bug fixes

    • 0c2a691 fix(engine): missing route params for CreateTestContext (#2778) (#2803)
    • e305e21 fix(route): redirectSlash bug (#3227)

    Others

    • 6a2a260 Fix the GO-2022-1144 vulnerability (#3432)
    Changelog

    Sourced from github.com/gin-gonic/gin's changelog.

    Gin v1.8.2

    Bugs

    • fix(route): redirectSlash bug (#3227)
    • fix(engine): missing route params for CreateTestContext (#2778) (#2803)

    Security

    • Fix the GO-2022-1144 vulnerability (#3432)
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • Bringing up on Docker Swarm doesn't work

    Bringing up on Docker Swarm doesn't work

    Describe the bug When trying to bring up a simple cluster (1 server/1 agent) on our production docker swarm cluster the agent can never connect to the server; it even tries connecting to the wrong IP address. Doing the exact same thing on our standalone test box (minus the deploy restrictions in the compose file) works flawlessly. I even added a placement restriction for the swam to force them to the same node and it doesn't work.

    To Reproduce Production swarm stack file:

    ` version: '3.8'

    services: dkron-server: image: centricdocker.azurecr.io/thirdparty/dkron ports: - "10120:8080"
    volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts - /var/apps/docker-shared/shared-dkron/dkron-server/data:/data command: dkron agent --server --log-level=debug --bootstrap-expect=1 --data-dir=/data --node-name dkron-primary stop_grace_period: 3m deploy: replicas: 1 placement: constraints: - node.labels.dkron==allowed

    dkron-agent: image: centricdocker.azurecr.io/thirdparty/dkron depends_on: - dkron-server volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts command: dkron agent --retry-join=dkron-server:8946 --log-level=debug --tag agent=true stop_grace_period: 3m deploy: replicas: 1 placement: constraints: - node.labels.dkron==allowed ` Does not work.

    Standalone server stack file: ` version: '3.8'

    services: dkron-server: image: centricdocker.azurecr.io/thirdparty/dkron ports: - "10120:8080"
    volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts - /var/apps/docker-shared/shared-dkron/dkron-server/data:/data command: dkron agent --server --log-level=debug --bootstrap-expect=1 --data-dir=/data --node-name dkron-primary stop_grace_period: 3m

    dkron-agent: image: centricdocker.azurecr.io/thirdparty/dkron depends_on: - dkron-server volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts command: dkron agent --retry-join=dkron-server:8946 --log-level=debug --tag agent=true stop_grace_period: 3m ` This works perfectly.

    Expected behavior The swarm should work and the agent should try to connect to the correct internal IP of 10.0.61.9, but it is trying to connect to 10.0.61.2 instead. The server/web-ui is up.

    Screenshots web-ui stack_layout server_container_ip server_container_logs agent_container_logs

    ** Specifications:**

    • OS: Linux
    • VersionLatest
  • chore(deps): bump github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8

    chore(deps): bump github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8

    Bumps github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8.

    Release notes

    Sourced from github.com/hashicorp/go-plugin's releases.

    v1.4.8

    BUG FIXES:

    v1.4.7

    ENHANCEMENTS:

    • More detailed error message on plugin start failure: [GH-223]

    v1.4.6

    BUG FIXES:

    • server: Prevent gRPC broker goroutine leak when using GRPCServer type GracefulStop() or Stop() methods [GH-220]
    Changelog

    Sourced from github.com/hashicorp/go-plugin's changelog.

    v1.4.8

    BUG FIXES:

    v1.4.7

    ENHANCEMENTS:

    • More detailed error message on plugin start failure: [GH-223]

    v1.4.6

    BUG FIXES:

    • server: Prevent gRPC broker goroutine leak when using GRPCServer type GracefulStop() or Stop() methods [GH-220]
    Commits
    • 5a212b5 Merge pull request #228 from hashicorp/release-1.4.8
    • a1e6a8e Add release notes for v1.4.8
    • 7dc9726 Merge pull request #227 from hashicorp/fix-win-build
    • 06d4d53 import go-plugin in minimal.go
    • 9d19a83 move non windows additional notes to new file
    • e8d389f Merge pull request #224 from hashicorp/release-1.4.7
    • 2d8efb0 Add release notes for v1.4.7
    • 5ecc4fb Merge pull request #223 from hashicorp/vault-9832/better-plugin-error-message
    • b616f4e Add additional notes about users and groups
    • b47fd64 Update client.go
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • chore(deps): bump github.com/prometheus/client_golang from 1.13.0 to 1.14.0

    chore(deps): bump github.com/prometheus/client_golang from 1.13.0 to 1.14.0

    Bumps github.com/prometheus/client_golang from 1.13.0 to 1.14.0.

    Release notes

    Sourced from github.com/prometheus/client_golang's releases.

    1.14.0 / 2022-11-08

    It might look like a small release, but it's quite opposite 😱 There were many non user facing changes and fixes and enormous work from engineers from Grafana to add native histograms in 💪🏾 Enjoy! 😍

    What's Changed

    • [FEATURE] Add Support for Native Histograms. #1150
    • [CHANGE] Extend prometheus.Registry to implement prometheus.Collector interface. #1103

    New Contributors

    Full Changelog: https://github.com/prometheus/client_golang/compare/v1.13.1...v1.14.0

    1.13.1 / 2022-11-02

    • [BUGFIX] Fix race condition with Exemplar in Counter. #1146
    • [BUGFIX] Fix CumulativeCount value of +Inf bucket created from exemplar. #1148
    • [BUGFIX] Fix double-counting bug in promhttp.InstrumentRoundTripperCounter. #1118

    Full Changelog: https://github.com/prometheus/client_golang/compare/v1.13.0...v1.13.1

    Changelog

    Sourced from github.com/prometheus/client_golang's changelog.

    1.14.0 / 2022-11-08

    • [FEATURE] Add Support for Native Histograms. #1150
    • [CHANGE] Extend prometheus.Registry to implement prometheus.Collector interface. #1103

    1.13.1 / 2022-11-01

    • [BUGFIX] Fix race condition with Exemplar in Counter. #1146
    • [BUGFIX] Fix CumulativeCount value of +Inf bucket created from exemplar. #1148
    • [BUGFIX] Fix double-counting bug in promhttp.InstrumentRoundTripperCounter. #1118
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • chore(deps): bump github.com/spf13/viper from 1.13.0 to 1.14.0

    chore(deps): bump github.com/spf13/viper from 1.13.0 to 1.14.0

    Bumps github.com/spf13/viper from 1.13.0 to 1.14.0.

    Release notes

    Sourced from github.com/spf13/viper's releases.

    v1.14.0

    What's Changed

    Enhancements 🚀

    Breaking Changes 🛠

    Dependency Updates ⬆️

    Full Changelog: https://github.com/spf13/viper/compare/v1.13.0...v1.14.0

    Commits
    • b89e554 chore: update crypt
    • db9f89a chore: disable watch on appengine
    • 4b8d148 refactor: use new Has fsnotify method for event matching
    • 2e99a57 refactor: rename watch file to unsupported
    • dcb7f30 feat: fix compilation for all platforms unsupported by fsnotify
    • 2e04739 ci: drop dedicated wasm build
    • b2234f2 ci: add build for aix
    • 52009d3 feat: disable watcher on aix
    • b274f63 build(deps): bump github.com/fsnotify/fsnotify from 1.5.4 to 1.6.0
    • 7c62cfd build(deps): bump github.com/stretchr/testify from 1.8.0 to 1.8.1
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Easy to use Raft library to make your app distributed, highly available and fault-tolerant
Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Nov 16, 2022
Lightweight, fault-tolerant message streams.
Lightweight, fault-tolerant message streams.

Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extend

Jan 2, 2023
Scalable, fault-tolerant application-layer sharding for Go applications

ringpop-go (This project is no longer under active development.) Ringpop is a library that brings cooperation and coordination to distributed applicat

Jan 5, 2023
An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣
An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

barge A simple implementation of a consistent, distributed Key:Value store which uses the Raft Concensus Algorithm. This project launches a cluster of

Nov 24, 2021
*DEPRECATED* Please use https://gopkg.in/redsync.v1 (https://github.com/go-redsync/redsync)

Redsync.go This package is being replaced with https://gopkg.in/redsync.v1. I will continue to maintain this package for a while so that its users do

Nov 20, 2022
Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022
Compute cluster (HPC) job submission library for Go (#golang) based on the open DRMAA standard.

go-drmaa This is a job submission library for Go (#golang) which is compatible to the DRMAA standard. The Go library is a wrapper around the DRMAA C l

Nov 17, 2022
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Dec 24, 2019
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 1, 2023
A distributed system for embedding-based retrieval
A distributed system for embedding-based retrieval

Overview Vearch is a scalable distributed system for efficient similarity search of deep learning vectors. Architecture Data Model space, documents, v

Dec 30, 2022
💡 A Distributed and High-Performance Monitoring System. The next generation of Open-Falcon
💡 A Distributed and High-Performance Monitoring System.  The next generation of Open-Falcon

夜莺简介 夜莺是一套分布式高可用的运维监控系统,最大的特点是混合云支持,既可以支持传统物理机虚拟机的场景,也可以支持K8S容器的场景。同时,夜莺也不只是监控,还有一部分CMDB的能力、自动化运维的能力,很多公司都基于夜莺开发自己公司的运维平台。开源的这部分功能模块也是商业版本的一部分,所以可靠性有保

Jan 5, 2023
a dynamic configuration framework used in distributed system
a dynamic configuration framework used in distributed system

go-archaius This is a light weight configuration management framework which helps to manage configurations in distributed system The main objective of

Dec 9, 2022
Verifiable credential system on Cosmos with IBC for Distributed Identities
Verifiable credential system on Cosmos with IBC for Distributed Identities

CertX This is a project designed to demonstrate the use of IBC between different zones in the Cosmos ecosystem for privacy preserving credential manag

Mar 29, 2022
A distributed and coördination-free log management system
A distributed and coördination-free log management system

OK Log is archived I hoped to find the opportunity to continue developing OK Log after the spike of its creation. Unfortunately, despite effort, no su

Dec 26, 2022
A distributed MySQL binlog storage system built on Raft
A distributed MySQL binlog storage system built on Raft

What is kingbus? 中文 Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Dec 31, 2022
A distributed key-value storage system developed by Alibaba Group

Product Overview Tair is fast-access memory (MDB)/persistent (LDB) storage service. Using a high-performance and high-availability distributed cluster

Dec 31, 2022