Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Last update: Dec 28, 2022

Comments: 17

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments

Dkron is a distributed cron service, easy to setup and fault tolerant with focus in:

Easy: Easy to use with a great UI
Reliable: Completely fault tolerant
High scalable: Able to handle high volumes of scheduled jobs and thousands of nodes

Dkron is written in Go and leverage the power of the Raft protocol and Serf for providing fault tolerance, reliability and scalability while keeping simple and easily installable.

Dkron is inspired by the google whitepaper Reliable Cron across the Planet and by Airbnb Chronos borrowing the same features from it.

Dkron runs on Linux, OSX and Windows. It can be used to run scheduled commands on a server cluster using any combination of servers for each job. It has no single points of failure due to the use of the Gossip protocol and fault tolerant distributed databases.

You can use Dkron to run the most important part of your company, scheduled jobs.

Installation

Installation instructions

Full, comprehensive documentation is viewable on the Dkron website

Development Quick start

The best way to test and develop dkron is using docker, you will need Docker installed before proceding.

Clone the repository.

Next, run the included Docker Compose config:

docker-compose up

This will start Dkron instances. To add more Dkron instances to the clusters:

docker-compose up --scale dkron-server=4
docker-compose up --scale dkron-agent=10

Check the port mapping using docker-compose ps and use the browser to navigate to the Dkron dashboard using one of the ports mapped by compose.

To add jobs to the system read the API docs.

Frontend development

Dkron dashboard is built using React Admin as a single page application.

To start developing the dashboard enter the ui directory and run npm install to get the frontend dependencies and then start the local server with npm start it should start a new local web server and open a new browser window serving de web ui.

Make your changes to the code, then run make ui to generate assets files. This is a method of embedding resources in Go applications.

Resources

Chef cookbook https://supermarket.chef.io/cookbooks/dkron

Python Client Library https://github.com/oldmantaiter/pydkron

Ruby client https://github.com/jobandtalent/dkron-rb

PHP client https://github.com/gromo/dkron-php-adapter

Terraform provider https://github.com/peertransfer/terraform-provider-dkron

Get in touch

Twitter: @distribworks
Chat: https://gitter.im/distribworks/dkron
Email: victor at distrib.works

Sponsor

This project is possible thanks to the Support of Jobandtalent

Owner

Distributed Works

https://github.com/distribworks/dkron

Comments

Failed jobs count increasing with new jobs added - v2.0.0-b5
Use this template if you are reporting a bug. Don't use this template of you are proposing a feature.

Expected Behavior

When adding jobs, if they have not run yet, should not increase failed jobs count

Actual Behavior

For each new job added, it increases failed job count

Steps to Reproduce the Problem

Add a job via API

Failed job count increases

Specifications

Version:v2.0.0-b5

Platform:linux

Backend store:default
Skipped jobs
Expected Behavior

Jobs should be executed on time without missing

Actual Behavior

Some of our jobs skip their execution once, and are not executed again, because the "next execution time" doesn't arrive (since it's in the past) So far it only happens to our jobs that their schedule intervals are longer than 6 hours(I believe its a coincidence but still won't hurt to mention)

Steps to Reproduce the Problem

1.create job and schedule it to run every day (@every 1440m) 2.check the next day if it ran

Specifications

Version:2.0.0-rc3

Platform:linux

Backend store: default boltDB

Executor not working

Hey,

I migrate from command (since declared as deprecated) to shell executor and I got the following issue from the agent :

INFO[2018-05-23T19:08:20Z] agent: Starting job                           job=stats_update node=04fe58fe3cf0
ERRO[2018-05-23T19:08:20Z] invoke: Specified executor is not present     executor="<nil>" node=04fe58fe3cf0

with following config :

{
    "name": "stats_update",
    "schedule": "@every 15s",
    "executor": "shell",
    "executor_config": {
        "command": "./stats_update",
        "shell": false
    },
    "concurrency": "forbid",
    "tags": {
        "role": "manager:1"
    }
}

I'm on 0.10.1 dkron version.

My config is bad or it's bug ?

"Invalid memory address or nil pointer dereference" error

Hey. I've got a bunch of instances running dkron with a Consul cluster as their backing store. I'm trying to update from Dkron 0.7 to 0.9.1. The instances that are running 0.9.1 throw this error:

time="2016-10-25T14:52:07-04:00" level=info msg="agent: Listen for events" node=ip-10-0-15-163
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x48897c]

goroutine 50 [running]:
panic(0xbb9dc0, 0xc820010060)
        /Users/victorcoder/src/github.com/mxcl/homebrew/Cellar/go/1.6.3/libexec/src/runtime/panic.go:481 +0x3e6
github.com/victorcoder/dkron/dkron.(*AgentCommand).eventLoop(0xc82000c4e0)
        /Users/victorcoder/src/github.com/victorcoder/dkron/dkron/agent.go:490 +0x1abc
created by github.com/victorcoder/dkron/dkron.(*AgentCommand).Run
        /Users/victorcoder/src/github.com/victorcoder/dkron/dkron/agent.go:298 +0x362
Starting Dkron agent...
time="2016-10-25T14:52:07-04:00" level=info msg="agent: Dkron agent starting" node=ip-10-0-15-163

I'm not doing anything crazy here, I don't think. It's been working for months. Any ideas?

More debug:

// /etc/dkron/dkron.json
{
  "backend": "consul",
  "backend_machine": "consul.internal:8500",
  "keyspace": "bkg-process",
  "http_addr": ":8988",
  "server": true,
  "encrypt": "KEY",
  "tags": {
    "role": "background_processing"
  },
  "join": "background-processing.service.dc1.consul"
}

$ curl 127.0.0.1:8988/v1/jobs
[{"name":"cron_job","schedule":"0 * * * * *","shell":false,"command":"/queue.sh","owner":"Me,"owner_email":"[email protected]","success_count":254066,"error_count":0,"last_success":"2016-10-25T15:00:00.23316342-04:00","last_error":"0001-01-01T00:00:00Z","disabled":false,"tags":{"role":"background_processing:1"},"retries":0,"dependent_jobs":null,"parent_job":""}]

0.11.2 Concurrency 'forbid' not always respected

Originally posted to #377

Hi @victorcoder

I'm on 0.11.2 and have noticed that the concurrency: forbid is not always respected. I have a very long running process (nearly 2 hours) that I poll every 15 minutes. It is essential that it only one instance is running.

I use Consul as the backend, so thought that might be my issue, but I just completed a migration of the Consul cluster, and the Dkron masters, to greatly reduce the node count, and the latency between the nodes. But dkron seems to erroneously mark an event as complete (erroneously, because I can still see the task is running via ps aux) and fires again at the next schedule. If I forcibly kill the event that was marked complete, it gets updated with the new completed timestamp.

I cannot reliably reproduce the issue, as it seems random. Please let me know what I can do to help track down the cause.

Geoff

Scheduled jobs stop execution after leader re-election

My cluster consists of 3 server node (Dkron 3.1.7):

[opc@nonprod-exa-services-0 ~]$ dkron version
Name: Dkron
Version: 3.1.7
Codename: merichuas
Agent Protocol: 5 (Understands back to: 2)

All nodes are configured as follows:

[opc@nonprod-exa-services-0 ~]$ cat /etc/dkron/dkron.yml
server: true
bootstrap-expect: 3
join:
- nonprod-exa-services-0.nonprodexasvc.nonprodvcn.oraclevcn.com
- nonprod-exa-services-1.nonprodexasvc.nonprodvcn.oraclevcn.com
- nonprod-exa-services-2.nonprodexasvc.nonprodvcn.oraclevcn.com
log-level: debug

This cluster has been running for ~3 weeks and able to execute scheduled jobs

[opc@nonprod-exa-services-0 ~]$ systemctl status dkron
● dkron.service - Dkron Agent
   Loaded: loaded (/usr/lib/systemd/system/dkron.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-06-02 10:19:43 CEST; 3 weeks 1 days ago
     Docs: https://dkron.io
 Main PID: 6940 (dkron)
    Tasks: 95
   Memory: 202.0M
   CGroup: /system.slice/dkron.service
           ├─6940 /usr/bin/dkron agent
           ├─6945 /usr/bin/dkron-processor-files
           ├─6952 /usr/bin/dkron-processor-fluent
           ├─7049 /usr/bin/dkron-processor-log
           ├─7056 /usr/bin/dkron-processor-syslog
           ├─7062 /usr/bin/dkron-executor-gcppubsub
           ├─7069 /usr/bin/dkron-executor-http
           ├─7076 /usr/bin/dkron-executor-kafka
           ├─7083 /usr/bin/dkron-executor-nats
           ├─7088 /usr/bin/dkron-executor-rabbitmq
           └─7096 /usr/bin/dkron-executor-shell

ISSUE Details:

One of the nodes is unable to contact the other 2 nodes (could be an intermittent network issue) causing a re-election. After re-election none of the scheduled jobs are executing

Log snippet from one of the nodes. The complete logs from the 3 nodes are attached.

o msg="2021/06/23 23:28:42 [DEBUG] memberlist: Initiating push/pull sync with: nonprod-exa-services-1 10.106.51.22:8946"
o msg="2021-06-23T23:28:44.338+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-0 time=500.962849ms"
o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-1 time=500.680357ms"
o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact: server-id=nonprod-exa-services-0 time=552.570599ms"
o msg="2021-06-23T23:28:44.390+0200 [WARN]  raft: failed to contact quorum of nodes, stepping down"
o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: entering follower state: follower=\"Node at 10.106.51.29:6868 [Follower]\" leader="
o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: aborting pipeline replication: peer=\"{Voter nonprod-exa-services-1 10.106.51.22:6868}\""
o msg="2021-06-23T23:28:44.390+0200 [INFO]  raft: aborting pipeline replication: peer=\"{Voter nonprod-exa-services-0 10.106.51.18:6868}\""
ug msg="dkron: shutting down leader loop" node=nonprod-exa-services-2
ug msg="scheduler: Stopping scheduler" node=nonprod-exa-services-2
o msg="dkron: cluster leadership lost" node=nonprod-exa-services-2
o msg="dkron: monitoring leadership" node=nonprod-exa-services-2
o msg="2021-06-23T23:28:46.085+0200 [WARN]  raft: heartbeat timeout reached, starting election: last-leader="
o msg="2021-06-23T23:28:46.085+0200 [INFO]  raft: entering candidate state: node=\"Node at 10.106.51.29:6868 [Candidate]\" term=54"
o msg="2021-06-23T23:28:46.092+0200 [DEBUG] raft: votes: needed=2"
o msg="2021-06-23T23:28:46.092+0200 [DEBUG] raft: vote granted: from=nonprod-exa-services-2 term=54 tally=1"
o msg="2021-06-23T23:28:46.093+0200 [DEBUG] raft: vote granted: from=nonprod-exa-services-1 term=54 tally=2"
o msg="2021-06-23T23:28:46.093+0200 [INFO]  raft: election won: tally=2"
o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: entering leader state: leader=\"Node at 10.106.51.29:6868 [Leader]\""
o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: added peer, starting replication: peer=nonprod-exa-services-1"
o msg="2021-06-23T23:28:46.094+0200 [INFO]  raft: added peer, starting replication: peer=nonprod-exa-services-0"
o msg="dkron: cluster leadership acquired" node=nonprod-exa-services-2
o msg="dkron: monitoring leadership" node=nonprod-exa-services-2
o msg="2021-06-23T23:28:46.095+0200 [INFO]  raft: pipelining replication: peer=\"{Voter nonprod-exa-services-1 10.106.51.22:6868}\""
o msg="2021-06-23T23:28:46.097+0200 [INFO]  raft: pipelining replication: peer=\"{Voter nonprod-exa-services-0 10.106.51.18:6868}\""
o msg="agent: Starting scheduler" node=nonprod-exa-services-2

Cluster panic

Some time after start, all nodes of my cluster (two servers and two agents) crash with the same error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x160 pc=0x948a8a]

goroutine 623378 [running]:
google.golang.org/grpc.(*Server).Stop(0x0)
        /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1482 +0x4a
github.com/distribworks/dkron/v3/plugin.(*ExecutorClient).Execute(0xc00091e6a0, 0xc00b231640, 0x357c300, 0xc000e27820, 0xc0011bc980, 0x25bb801, 0xc0044c6150)
        /home/runner/work/dkron/dkron/plugin/executor.go:65 +0x190
github.com/distribworks/dkron/v3/dkron.(*AgentServer).AgentRun(0xc0001295b8, 0xc0044c6060, 0x35e2620, 0xc003374e20, 0x0, 0x0)
        /home/runner/work/dkron/dkron/dkron/grpc_agent.go:83 +0x6f4
github.com/distribworks/dkron/v3/plugin/types._Agent_AgentRun_Handler(0x2561d80, 0xc0001295b8, 0x35dcda0, 0xc0002186c0, 0x4ba5ae0, 0xc000e2a200)
        /home/runner/work/dkron/dkron/plugin/types/dkron.pb.go:1862 +0x109
google.golang.org/grpc.(*Server).processStreamingRPC(0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200, 0xc000b56150, 0x4b45e20, 0x0, 0x0, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1329 +0xcbc
google.golang.org/grpc.(*Server).handleStream(0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1409 +0xc64
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc001ab37e0, 0xc000a12b60, 0x35ea7e0, 0xc0019b4300, 0xc000e2a200)
        /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:746 +0xa1
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:744 +0xa1
runtime: note: your Linux kernel may be buggy
runtime: note: see https://golang.org/wiki/LinuxKernelSignalVectorBug
runtime: note: mlock workaround for kernel bug failed with errno 12

Environment

Dkron 3.0.5
Docker (servers: Dockerfile, run script, agents: Dockerfile, run script)

uname -a:

  Linux dkron-server-1.backpack.test 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Skipped Jobs
A job is not picked for the next run and so miss all subsequent runs.

Also when a job is run manually using the play button on the dkron dashboard the next run time it picks is the current time (I have been using @every construct of the cronspec).

On using the API doc to create new jobs, the first run is executed successfully but next run time is not updated to so all subsequent runs fail.

Steps to reproduce the behavior:

Create a new job using @every construct of the cronspec, second and the following runs fail.

Use the play button on the dashboard. Next run time will be picked as the current time.

OS: centos

Version 2 rc9
How to group jobs

I would like to know if there is a way to group jobs, like a custom tag or something. My requirement is that I need to create several jobs for an user, and at some point I need to delete all jobs which were not executed. I tried to do it using "tags" (tags: { userId: "123" }) but the http executor does not run, although the job is created.
Filter jobs by tags

Hello guys,

let me introduce this little change which could be useful for somebody else.

From now on we can assign labels to the jobs. And query (GET /jobs) them with labels filter. This change could be useful for the users who have to build UI and display background jobs on their own. One of the use case when you have a different kind of jobs like the system jobs and some application related and you don't want the app to see the jobs not assigned to it.

Exactly our case is that we have 3 types of jobs and 3 different UIs to display the jobs and their status. All jobs should now know about each other. Moreover, we have that 3 UI build for 12 countries. Which lead to 36 UIs. Before that change, we used prefixes like country-app_name-actual_name_of_the_cron and it's getting annoying now.

Feel free to comment and accept ;)

Good luck

UPDATE: 2018-08-29

No more labels, use tags to filter jobs.

Scheduled jobs sometimes not being executed

Describe the bug Hi, There's a one issue that I can't pinpoint and need the help. We have around 50 jobs that are being scheduled at different times and we're using master version. Sometimes, by no reason (so far) random jobs at random times are not executed, despite visible they're scheduled. I've modified logs a bit to see a flow of a job from time being scheduled to end of execution. During the issue the flow is the following:

time="2020-04-21T02:00:00Z" level=debug msg="scheduler: Run job" job=sync_local_ip_list_dms_0 node=dkron-server-1 schedule="0 0 2 * * *"
time="2020-04-21T02:00:00Z" level=debug msg="store: Retrieved job from datastore" job=sync_local_ip_list_dms_0 node=dkron-server-1
time="2020-04-21T02:00:00Z" level=info msg="agent: Sending query" job_name=sync_local_ip_list_dms_0 node=dkron-server-1 query="run:job"
time="2020-04-21T02:00:00Z" level=debug msg="store: Setting job" job=sync_local_ip_list_dms_0 node=dkron-server-1
time="2020-04-21T02:00:00Z" level=debug msg="store: Retrieved job from datastore" job=sync_local_ip_list_dms_0 node=dkron-server-1
time="2020-04-21T02:00:00Z" level=debug msg="agent: Filtered nodes to run" job_name=sync_local_ip_list_dms_0 job_name_from_function=sync_local_ip_list_dms_0 node=dkron-server-1 nodes="[dms-dkron-kapp-5dbf554656-vr99b]"
time="2020-04-21T02:00:00Z" level=debug msg="agent: Filtered tags to run" job_name=sync_local_ip_list_dms_0 job_name_from_function=sync_local_ip_list_dms_0 node=dkron-server-1 tags="map[dms:cron region:global]"
time="2020-04-21T02:00:00Z" level=debug msg="agent: Sending query" job_name=sync_local_ip_list_dms_0 json="{\"execution\":{\"job_name\":\"sync_local_ip_list_dms_0\",\"started_at\":\"0001-01-01T00:00:00Z\",\"finished_at\":\"0001-01-01T00:00:00Z\",\"success\":false,\"group\":1587434400017697229,\"attempt\":1},\"rpc_addr\":\"172.25.4.133:6868\"}" node=dkron-server-1 query="run:job"
time="2020-04-21T02:00:08Z" level=debug msg="agent: Done receiving acks and responses" fields.time=8.014626936s job=sync_local_ip_list_dms_0 node=dkron-server-1 query="run:job"

In normal cases, before "Done receiving acks and responses" step, there are also steps: "agent: Received ack" and "agent: Received response"

It looks like chosen agent node in step "Filtered nodes to run" by some reason is ignoring or not receiving broadcasted query.

From the view of agent node, there is simply no log message "Running job" for a given job. Though there are visible log messages from serf like

time="2020-04-21T02:00:00Z" level=info msg="2020/04/21 02:00:00 [DEBUG] serf: messageQueryType: run:job"

Due to an intense workload, it's hard to tell if that log message is generated from a missed job.

What can we do about this? To pinpoint the problem and fix it.

Expected behavior No misses!

chore(deps): bump github.com/nats-io/nats.go from 1.21.0 to 1.22.1
Bumps github.com/nats-io/nats.go from 1.21.0 to 1.22.1.

Release notes

Sourced from github.com/nats-io/nats.go's releases.

Release v1.22.1

Changelog

Changed

Service API:

Monitoring subjects for a service are no longer uppercase (#1166)

Changed RequestHandler signature to no longer return an error (#1166)

Complete Changes

https://github.com/nats-io/nats.go/compare/v1.22.0...v1.22.1

Release v1.22.0

Changelog

Overview

This release adds a beta implementation of micro package, which provides API for creating and monitoring microservices on top of NATS connection.

Added

Service API beta implementation (#1160)

Getters for connection callbacks (#1162)

Complete Changes

https://github.com/nats-io/nats.go/compare/v1.21.0...v1.22.0

Commits

6600948 Release v1.22.1 (#1168)

e350a6f [CHANGED] Service API adjustments (#1166)

61a9345 Release v1.22.0 (#1165)

32b9daa [ADDED] Service api improvements (#1160)

95a7e50 [ADDED] Getters for connection callbacks (#1162)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
chore(deps): bump github.com/gin-gonic/gin from 1.8.1 to 1.8.2
Bumps github.com/gin-gonic/gin from 1.8.1 to 1.8.2.

Release notes

Sourced from github.com/gin-gonic/gin's releases.

v1.8.2

Changelog

Bug fixes

0c2a691 fix(engine): missing route params for CreateTestContext (#2778) (#2803)

e305e21 fix(route): redirectSlash bug (#3227)

Others

6a2a260 Fix the GO-2022-1144 vulnerability (#3432)

Changelog

Sourced from github.com/gin-gonic/gin's changelog.

Gin v1.8.2

Bugs

fix(route): redirectSlash bug (#3227)

fix(engine): missing route params for CreateTestContext (#2778) (#2803)

Security

Fix the GO-2022-1144 vulnerability (#3432)

Commits

6a2a260 Fix the GO-2022-1144 vulnerability (#3432)

f38c9c8 docs(readme): release v1.8.2 version (#3420)

0c2a691 fix(engine): missing route params for CreateTestContext (#2778) (#2803)

e305e21 fix(route): redirectSlash bug (#3227)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bringing up on Docker Swarm doesn't work
Describe the bug When trying to bring up a simple cluster (1 server/1 agent) on our production docker swarm cluster the agent can never connect to the server; it even tries connecting to the wrong IP address. Doing the exact same thing on our standalone test box (minus the deploy restrictions in the compose file) works flawlessly. I even added a placement restriction for the swam to force them to the same node and it doesn't work.

To Reproduce Production swarm stack file:

` version: '3.8'

services: dkron-server: image: centricdocker.azurecr.io/thirdparty/dkron ports: - "10120:8080"
volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts - /var/apps/docker-shared/shared-dkron/dkron-server/data:/data command: dkron agent --server --log-level=debug --bootstrap-expect=1 --data-dir=/data --node-name dkron-primary stop_grace_period: 3m deploy: replicas: 1 placement: constraints: - node.labels.dkron==allowed

dkron-agent: image: centricdocker.azurecr.io/thirdparty/dkron depends_on: - dkron-server volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts command: dkron agent --retry-join=dkron-server:8946 --log-level=debug --tag agent=true stop_grace_period: 3m deploy: replicas: 1 placement: constraints: - node.labels.dkron==allowed ` Does not work.

Standalone server stack file: ` version: '3.8'

services: dkron-server: image: centricdocker.azurecr.io/thirdparty/dkron ports: - "10120:8080"
volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts - /var/apps/docker-shared/shared-dkron/dkron-server/data:/data command: dkron agent --server --log-level=debug --bootstrap-expect=1 --data-dir=/data --node-name dkron-primary stop_grace_period: 3m

dkron-agent: image: centricdocker.azurecr.io/thirdparty/dkron depends_on: - dkron-server volumes: - /var/apps/docker-shared/shared-dkron/root:/root - /media/scripts:/media/scripts command: dkron agent --retry-join=dkron-server:8946 --log-level=debug --tag agent=true stop_grace_period: 3m ` This works perfectly.

Expected behavior The swarm should work and the agent should try to connect to the correct internal IP of 10.0.61.9, but it is trying to connect to 10.0.61.2 instead. The server/web-ui is up.

Screenshots

** Specifications:**

OS: Linux

VersionLatest
chore(deps): bump github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8
Bumps github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8.

Release notes

Sourced from github.com/hashicorp/go-plugin's releases.

v1.4.8

BUG FIXES:

Fix windows build: [GH-227]

v1.4.7

ENHANCEMENTS:

More detailed error message on plugin start failure: [GH-223]

v1.4.6

BUG FIXES:

server: Prevent gRPC broker goroutine leak when using GRPCServer type GracefulStop() or Stop() methods [GH-220]

Changelog

Sourced from github.com/hashicorp/go-plugin's changelog.

v1.4.8

BUG FIXES:

Fix windows build: [GH-227]

v1.4.7

ENHANCEMENTS:

More detailed error message on plugin start failure: [GH-223]

v1.4.6

BUG FIXES:

server: Prevent gRPC broker goroutine leak when using GRPCServer type GracefulStop() or Stop() methods [GH-220]

Commits

5a212b5 Merge pull request #228 from hashicorp/release-1.4.8

a1e6a8e Add release notes for v1.4.8

7dc9726 Merge pull request #227 from hashicorp/fix-win-build

06d4d53 import go-plugin in minimal.go

9d19a83 move non windows additional notes to new file

e8d389f Merge pull request #224 from hashicorp/release-1.4.7

2d8efb0 Add release notes for v1.4.7

5ecc4fb Merge pull request #223 from hashicorp/vault-9832/better-plugin-error-message

b616f4e Add additional notes about users and groups

b47fd64 Update client.go

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
chore(deps): bump github.com/prometheus/client_golang from 1.13.0 to 1.14.0
Bumps github.com/prometheus/client_golang from 1.13.0 to 1.14.0.

Release notes

Sourced from github.com/prometheus/client_golang's releases.

1.14.0 / 2022-11-08

It might look like a small release, but it's quite opposite 😱 There were many non user facing changes and fixes and enormous work from engineers from Grafana to add native histograms in 💪🏾 Enjoy! 😍

What's Changed

[FEATURE] Add Support for Native Histograms. #1150

[CHANGE] Extend prometheus.Registry to implement prometheus.Collector interface. #1103

New Contributors

@hairyhenderson made their first contribution in prometheus/client_golang#1118

@rfratto made their first contribution in prometheus/client_golang#1103

@donotnoot made their first contribution in prometheus/client_golang#1125

@rogerogers made their first contribution in prometheus/client_golang#1130

@balintzs made their first contribution in prometheus/client_golang#1148

@fstab made their first contribution in prometheus/client_golang#1146

@jessicalins made their first contribution in prometheus/client_golang#1152

Full Changelog: https://github.com/prometheus/client_golang/compare/v1.13.1...v1.14.0

1.13.1 / 2022-11-02

[BUGFIX] Fix race condition with Exemplar in Counter. #1146

[BUGFIX] Fix CumulativeCount value of +Inf bucket created from exemplar. #1148

[BUGFIX] Fix double-counting bug in promhttp.InstrumentRoundTripperCounter. #1118

Full Changelog: https://github.com/prometheus/client_golang/compare/v1.13.0...v1.13.1

Changelog

Sourced from github.com/prometheus/client_golang's changelog.

1.14.0 / 2022-11-08

[FEATURE] Add Support for Native Histograms. #1150

[CHANGE] Extend prometheus.Registry to implement prometheus.Collector interface. #1103

1.13.1 / 2022-11-01

[BUGFIX] Fix race condition with Exemplar in Counter. #1146

[BUGFIX] Fix CumulativeCount value of +Inf bucket created from exemplar. #1148

[BUGFIX] Fix double-counting bug in promhttp.InstrumentRoundTripperCounter. #1118

Commits

254e546 Merge pull request #1162 from kakkoyun/cut-1.14.0

c8a3d32 Cut v1.14.0

07d3a81 Merge pull request #1161 from prometheus/release-1.13

870469e Test and support 1.19 (#1160)

b785d0c Fix go_collector_latest_test Fail on go1.19 (#1136)

4d54769 Fix float64 comparison test failure on archs using FMA (#1133)

53e51c4 Merge pull request #1157 from prometheus/cut-1.13.1

79ca0eb Added tip from Björn + Grammarly.

078f11f Cut 1.13.1 release (+ documenting release process).

ddd7f0e Fix race condition with Exemplar in Counter (#1146)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
chore(deps): bump github.com/spf13/viper from 1.13.0 to 1.14.0
Bumps github.com/spf13/viper from 1.13.0 to 1.14.0.

Release notes

Sourced from github.com/spf13/viper's releases.

v1.14.0

What's Changed

Enhancements 🚀

feat: make Viper compile on platforms unsupported by fsnotify by @sagikazarmark in spf13/viper#1457

Fsnotify improvements by @sagikazarmark in spf13/viper#1458

Disable watch on appengine by @sagikazarmark in spf13/viper#1460

Breaking Changes 🛠

Drop support for Go 1.15 by @sagikazarmark in spf13/viper#1428

Dependency Updates ⬆️

build(deps): bump github.com/spf13/afero from 1.8.2 to 1.9.2 by @dependabot in spf13/viper#1406

build(deps): bump github.com/sagikazarmark/crypt from 0.6.0 to 0.7.0 by @dependabot in spf13/viper#1437

build(deps): bump github.com/stretchr/testify from 1.8.0 to 1.8.1 by @dependabot in spf13/viper#1453

build(deps): bump github.com/fsnotify/fsnotify from 1.5.4 to 1.6.0 by @dependabot in spf13/viper#1449

chore: update crypt by @sagikazarmark in spf13/viper#1461

Full Changelog: https://github.com/spf13/viper/compare/v1.13.0...v1.14.0

Commits

b89e554 chore: update crypt

db9f89a chore: disable watch on appengine

4b8d148 refactor: use new Has fsnotify method for event matching

2e99a57 refactor: rename watch file to unsupported

dcb7f30 feat: fix compilation for all platforms unsupported by fsnotify

2e04739 ci: drop dedicated wasm build

b2234f2 ci: add build for aix

52009d3 feat: disable watcher on aix

b274f63 build(deps): bump github.com/fsnotify/fsnotify from 1.5.4 to 1.6.0

7c62cfd build(deps): bump github.com/stretchr/testify from 1.8.0 to 1.8.1

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Easy to use Raft library to make your app distributed, highly available and fault-tolerant

An easy to use customizable library to make your Go application Distributed, Highly available, Fault Tolerant etc... using Hashicorp's Raft library wh

Nov 16, 2022

Lightweight, fault-tolerant message streams.

Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extend

Jan 2, 2023

Scalable, fault-tolerant application-layer sharding for Go applications

ringpop-go (This project is no longer under active development.) Ringpop is a library that brings cooperation and coordination to distributed applicat

Jan 5, 2023

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

glow Purpose Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go. I a

Dec 30, 2022

An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

barge A simple implementation of a consistent, distributed Key:Value store which uses the Raft Concensus Algorithm. This project launches a cluster of

Nov 24, 2021

DEPRECATED Please use https://gopkg.in/redsync.v1 (https://github.com/go-redsync/redsync)

Redsync.go This package is being replaced with https://gopkg.in/redsync.v1. I will continue to maintain this package for a while so that its users do

Nov 20, 2022

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022

Compute cluster (HPC) job submission library for Go (#golang) based on the open DRMAA standard.

go-drmaa This is a job submission library for Go (#golang) which is compatible to the DRMAA standard. The Go library is a wrapper around the DRMAA C l

Nov 17, 2022

Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Dec 24, 2019

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 1, 2023

SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV.

SeaweedFS Sponsor SeaweedFS via Patreon SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible ent

Jan 5, 2023

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments

Installation

Development Quick start

Frontend development

Resources

Get in touch

Sponsor

Owner

Distributed Works

Comments

Failed jobs count increasing with new jobs added - v2.0.0-b5

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Skipped jobs

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Executor not working

"Invalid memory address or nil pointer dereference" error

0.11.2 Concurrency 'forbid' not always respected

Scheduled jobs stop execution after leader re-election

Cluster panic

Environment

Skipped Jobs

How to group jobs

Filter jobs by tags

Scheduled jobs sometimes not being executed

chore(deps): bump github.com/nats-io/nats.go from 1.21.0 to 1.22.1

Release v1.22.1

Changelog

Changed

Complete Changes

Release v1.22.0

Changelog

Overview

Added

Complete Changes

chore(deps): bump github.com/gin-gonic/gin from 1.8.1 to 1.8.2

v1.8.2

Changelog

Bug fixes

Others

Gin v1.8.2

Bugs

Security

Bringing up on Docker Swarm doesn't work

chore(deps): bump github.com/hashicorp/go-plugin from 1.4.5 to 1.4.8

v1.4.8

v1.4.7

v1.4.6

v1.4.8

v1.4.7

v1.4.6

chore(deps): bump github.com/prometheus/client_golang from 1.13.0 to 1.14.0

1.14.0 / 2022-11-08

What's Changed

New Contributors

1.13.1 / 2022-11-02

1.14.0 / 2022-11-08

1.13.1 / 2022-11-01

chore(deps): bump github.com/spf13/viper from 1.13.0 to 1.14.0

v1.14.0

What's Changed

Enhancements 🚀

Breaking Changes 🛠

Dependency Updates ⬆️

Related tags

Easy to use Raft library to make your app distributed, highly available and fault-tolerant

Lightweight, fault-tolerant message streams.

Scalable, fault-tolerant application-layer sharding for Go applications

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

*DEPRECATED* Please use https://gopkg.in/redsync.v1 (https://github.com/go-redsync/redsync)

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed reliable key-value store for the most critical data of a distributed system

Compute cluster (HPC) job submission library for Go (#golang) based on the open DRMAA standard.

DEPRECATED Please use https://gopkg.in/redsync.v1 (https://github.com/go-redsync/redsync)