Job worker service that provides an API to run arbitrary Linux processes.

Job Scheduler

CI

Summary

Prototype job worker service that provides an API to run arbitrary Linux processes.

Overview

Library

The library (Worker) is a reusable Golang package that interacts with Linux OS to execute arbitrary processes (Jobs). The Worker is responsible for the business logic to start, stop processes, stream the process output, and handle process errors. The Worker will keep the process status in memory, in a local map, to update the process status when it's finished. Once the Job state is not persistent, the Worker will lose the data if the Worker goes down. Most of the time, the users want to see the full log content to check if the job performs as expected, doing another API call to stream the output. The Worker should write the process output (stderr/stdout) on the disk as a log file. On the other hand, the old log files consume disk space which can crash the system when no more space is left. To address it, we can implement a log rotation, purge policy, or use a distributed file system (like Amazon S3) to keep the system healthy. For now the logs will be stored under the /tmp folder, but the log folder should be parameterized in the configuration file.

// Command is a job request with the program name and arguments.
type Command struct {
    // Command name
    Name string
    // Command arguments
    Args []string
}

// Job represents an arbitrary Linux process schedule by the Worker.
type Job struct {
    // ID job identifier
    ID string
    // Cmd started
    Cmd *exec.Cmd
    // Status of the process.
    Status *Status
}

// Status of the process.
type Status struct {
    // Process identifier
    Pid int
    // ExitCode of the exited process, or -1 if the process hasn't
    // exited or was terminated by a signal
    ExitCode int
    // Exited reports whether the program has exited
    Exited bool
}

// Worker defines the basic operations to manage Jobs.
type Worker interface {
    // Start creates a Linux process.
    //    - command: command to be executed 
    // It returns the job ID and the execution error encountered.
    Start(command Command) (jobID string, err error)
    // Stop a running Job which kills a running process.
    //    - ID: Job identifier
    // It returns the execution error encountered.
    Stop(jobID string) (err error)
    // Query a Job to check the current status.
    //    - ID: Job identifier
    // It returns process status and the execution error encountered.
    Query(jobID string) (status Status, err error)
    // Streams the process output.
    //    - ctx: context to cancel the log stream
    //    - ID: Job identifier
    // It returns a chan to stream process stdout/stderr and the
    // execution error encountered.
    Stream(ctx context.Context, jobID string) (logchan chan string, err error)
}

API

The API is a gRPC server responsible for interacting with the Worker Library to start/stop/query processes and also consumed by the remote Client. The API is also responsible for providing authentication, authorization, and secure communication between the client and server.

syntax = "proto3";
option go_package = "github.com/renatoaguimaraes/job-scheduler/worker/internal/proto";

message StartRequest {
  string name = 1;
  repeated string args = 2;
}

message StartResponse {
  string jobID = 1;
}

message StopRequest {
  string jobID = 1;
}

message StopResponse {
}

message QueryRequest {
  string jobID = 1;
}

message QueryResponse {
  int32 pid = 1;
  int32 exitCode = 2;
  bool exited = 3;
}

message StreamRequest {
  string jobID = 1;
}

message StreamResponse {
  string output = 1;
}

service WorkerService {
  rpc Start(StartRequest) returns (StartResponse);
  rpc Stop(StopRequest) returns (StopResponse);
  rpc Query(QueryRequest) returns (QueryResponse);
  rpc Stream(StreamRequest) returns (stream StreamResponse);
}

Client

The Client is a command line interface where the user interacts to schedule remote Linux jobs. The CLI provides communication between the user and Worker Library through the gRPC.

Security

Transport

The Transport Layer Security (TLS), version 1.3, provides privacy and data integrity in secure communication between the client and server.

Authentication

The authentication will be provided by mTLS. The following assets will be generated, using the openssl v1.1.1k, to support the authorization schema:

  • Server CA private key and self-signed certificate
  • Server private key and certificate signing request (CSR)
  • Server signed certificate, based on Server CA private key and Server CSR
  • Client CA private key and self-signed certificate
  • Client private key and certificate signing request (CSR)
  • Client signed certificate, based on Client CA private key and Client CSR

The authentication process checks the certificate signature, finding a CA certificate with a subject field that matches the issuer field of the target certificate, once the proper authority certificate is found, the validator checks the signature on the target certificate using the public key in the CA certificate. If the signature check fails, the certificate is invalid and the connection will not be established. Both client and server execute the same process to validate each other. Intermediate certificates won't be used.

Authorization

The user roles will be added into the client certificate as an extension, so the gRPC server interceptors will read and check the roles to authorize the user, the roles available are reader and writer, the writer will be authorized do start, stop, query and stream operations and the reader will be authorized just to query and stream operations available on the API. A memory map will keep the name of the gRPC method and the respective roles authorized to execute it. The X.509 v3 extensions will be used to add the user role to the certificate. For that, the extension attribute roleOid 1.2.840.10070.8.1 = ASN1:UTF8String, must be requested in the Certificate Signing Request (CSR), when the user certificate is created, the user roles must be informed when the CA signs the the CSR. The information after UTF8String: is encoded inside of the x509 certificate under the given OID.

gRPC interceptors

  • UnaryInterceptor
  • StreamInterceptor

Certificates

  • X.509
  • Signature Algorithm: sha256WithRSAEncryption
  • Public Key Algorithm: rsaEncryption
  • RSA Public-Key: (4096 bit)
  • roleOid 1.2.840.10070.8.1 = ASN1:UTF8String (for the client certificate)

Scalability

For now, the client will connect with just a single Worker node. Nevertheless, for a production-grade system, the best choice is the External Load Balancer approach. The Worker API can run in a cluster mode with several worker instances distributed over several nodes/containers. On the other hand, the Client must send requests to the same Worker for a specific job, creating an affinity to interact with that later.

Proxy Client Side
Pos Simple client
No client-side awareness of backend High performance because elimination of extra hop
Cons LB is in the data path Complex client
Higher latency Client keeps track of server load and health
LB throughput may limit scalability Client implements load balancing algorithm
Per-language implementation and maintenance burden
Client needs to be trusted, or the trust boundary needs to be handled by a lookaside LB

Trade-offs

Proxy Load balancer distributes the RPC call to one of the available backend servers that implement the actual logic for serving the call. The proxy model is inefficient when considering heavy request services like storage. Client-side load balancing is aware of multiple backend servers and chooses one to use for each RPC. For the client-side strategy, there are two models available, the thicker client and external load balancer. The thicker client places more of the load balancing logic in the client, a list of servers would be either statically configured in the client. External load-balancing is the primary mechanism for load-balancing in gRPC, where an external load balancer provides simple clients with an up-to-date list of servers.

Architecture

Architecture

Test

$ make test

Build and run API

$ make api
go build -o ./bin/worker-api cmd/api/main.go
$ ./bin/worker-api

Build and run Client

$ make client
go build -o ./bin/worker-client cmd/client/main.go
$ ./bin/worker-client start "bash" "-c" "while true; do date; sleep 1; done"
Job 9a8cb077-22da-488f-98b4-d2fb51ba4fc9 is started
$ ./bin/worker-client query 9a8cb077-22da-488f-98b4-d2fb51ba4fc9
Pid: 1494556 Exit code: 0 Exited: false
$ ./bin/worker-client stream 9a8cb077-22da-488f-98b4-d2fb51ba4fc9
Sun 02 May 2021 05:54:29 PM -03
Sun 02 May 2021 05:54:30 PM -03
Sun 02 May 2021 05:54:31 PM -03
Sun 02 May 2021 05:54:32 PM -03
Sun 02 May 2021 05:54:33 PM -03
Sun 02 May 2021 05:54:34 PM -03
Sun 02 May 2021 05:54:35 PM -03
Sun 02 May 2021 05:54:36 PM -03
Sun 02 May 2021 05:54:37 PM -03
./bin/worker-client stop 9a8cb077-22da-488f-98b4-d2fb51ba4fc9
Job 79d95817-7228-4c36-8054-6c29513841b4 has been stopped
Owner
Renato Guimarães
The beautiful thing about learning is that nobody can take it away from you.
Renato Guimarães
Comments
  • Command line interface

    Command line interface

    The Client is a command line interface where the user interacts to schedule remote Linux jobs. The CLI provides communication between the user and Worker Library through the gRPC.

    
    $ ./worker-client start ls -lha
    Job 46305a32-a2d6-11eb-bcbc-0242ac130002 was created.
    
    $ ./worker-client query 46305a32-a2d6-11eb-bcbc-0242ac130002
    pid: 1324 ExitCode: 1  Exited: false
    
    $ ./worker-client stream 46305a32-a2d6-11eb-bcbc-0242ac130002
    total 0
    drwxr-xr-x   2 renatoguimaraes  staff    64B Apr 22 12:38 .
    drwxrwxr-x@ 54 renatoguimaraes  staff   1.7K Apr 22 12:38 ..
    
    $ ./worker-client stop 46305a32-a2d6-11eb-bcbc-0242ac130002
    Job not found
    
  • gRPC API

    gRPC API

    The API is a gRPC server responsible for interacting with the Worker Library to start/stop/query processes and also consumed by the remote Client. The API is also responsible for providing authentication, authorization, and secure communication between the client and server.

    Security

    Transport

    The Transport Layer Security (TLS), version 1.3, provides privacy and data integrity in secure communication between the client and server.

    Authentication

    The authentication will be provided by mTLS. The following assets will be generated, using the openssl v1.1.1k, to support the authorization schema:

    • Server CA private key and self-signed certificate
    • Server private key and certificate signing request (CSR)
    • Server signed certificate, based on Server CA private key and Server CSR
    • Client CA private key and self-signed certificate
    • Client private key and certificate signing request (CSR)
    • Client signed certificate, based on Client CA private key and Client CSR

    The authentication process checks the certificate signature, finding a CA certificate with a subject field that matches the issuer field of the target certificate, once the proper authority certificate is found, the validator checks the signature on the target certificate using the public key in the CA certificate. If the signature check fails, the certificate is invalid and the connection will not be established. Both client and server execute the same process to validate each other. Intermediate certificates won't be used.

    Authorization

    The user roles will be added into the client certificate as an extension, so the gRPC server interceptors will read and check the roles to authorize the user, the roles available are reader and writer, the writer will be authorized do start, stop, query and stream operations and the reader will be authorized just to query and stream operations available on the API. A memory map will keep the name of the gRPC method and the respective roles authorized to execute it. The X.509 v3 extensions will be used to add the user role to the certificate. For that, the extension attribute roleOid 1.2.840.10070.8.1 = ASN1:UTF8String, must be requested in the Certificate Signing Request (CSR), when the user certificate is created, the user roles must be informed when the CA signs the the CSR. The information after UTF8String: is encoded inside of the x509 certificate under the given OID.

    gRPC interceptors

    • UnaryInterceptor
    • StreamInterceptor

    Certificates

    • X.509
    • Signature Algorithm: sha256WithRSAEncryption
    • Public Key Algorithm: rsaEncryption
    • RSA Public-Key: (4096 bit)
    • roleOid 1.2.840.10070.8.1 = ASN1:UTF8String (for the client certificate)
  • Create a command line interface to interact with Worker API

    Create a command line interface to interact with Worker API

    Client to provide communication between the user and Worker Library through the gRPC API.

    $ ./worker-client ls -lha
    Job 46305a32-a2d6-11eb-bcbc-0242ac130002 was created.
    
    $ ./worker-client query 46305a32-a2d6-11eb-bcbc-0242ac130002
    pid: 1324 ExitCode: 1  Exited: false
    
    $ ./worker-client stream 46305a32-a2d6-11eb-bcbc-0242ac130002
    total 0
    drwxr-xr-x   2 renatoguimaraes  staff    64B Apr 22 12:38 .
    drwxrwxr-x@ 54 renatoguimaraes  staff   1.7K Apr 22 12:38 ..
    
    $ ./worker-client stop 46305a32-a2d6-11eb-bcbc-0242ac130002
    Job not found
    
  • Create gRPC client to interact with the Worker library

    Create gRPC client to interact with the Worker library

    • Use GRPC for API to start/stop/get status of a running process;
    • Add streaming log output of a running job process;
    • Use mTLS and verify client certificate;
    • Set up a strong set of cipher suites for TLS and a good crypto setup for certificates;
    • Authentication and Authorization.
    syntax = "proto3";
    option go_package = "goteleport.com/worker/pb";
    
    message StartRequest {
      string name = 1;
      repeated string args = 2;
    }
    
    message StartResponse {
      string jobID = 1;
    }
    
    message StopRequest {
      string jobID = 1;
    }
    
    message StopResponse {
    }
    
    message QueryRequest {
      string jobID = 1;
    }
    
    message QueryResponse {
      int32 pid = 1;
      int32 exitCode = 2;
      bool exited = 3;
    }
    
    message StreamRequest {
      string jobID = 1;
    }
    
    message StreamResponse {
      string output = 1;
    }
    
    service WorkerService {
      rpc Start(StartRequest) returns (StartResponse);
      rpc Stop(StopRequest) returns (StopResponse);
      rpc Query(QueryRequest) returns (QueryResponse);
      rpc Stream(StreamRequest) returns (stream StreamResponse);
    }
    
  • Worker library

    Worker library

    The library (Worker) is a reusable Golang package that interacts with Linux OS to execute arbitrary processes (Jobs). The Worker is responsible for the business logic to start, stop processes, stream the process output, and handle process errors.

    The Worker will keep the process status in memory, in a local map, to update the process status when it's finished. Once the Job state is not persistent, the Worker will lose the data if the Worker goes down.

    Most of the time, the users want to see the full log content to check if the job performs as expected, doing another API call to stream the output. The Worker should write the process output (stderr/stdout) on the disk as a log file. The logs will be stored under the /tmp folder, but the log folder will be parameterized in the configuration file.

  • Worker library with methods to start/stop/query status and get an output of a running job

    Worker library with methods to start/stop/query status and get an output of a running job

    The library (Worker) is a reusable Golang package that interacts with Linux OS to execute arbitrary processes (Jobs). The Worker is responsible for the business logic to start, stop processes, stream the process output, and handle process errors.

Run Jobs on a schedule, supports fixed interval, timely, and cron-expression timers; Instrument your processes and expose metrics for each job.

A simple process manager that allows you to specify a Schedule that execute a Job based on a Timer. Schedule manage the state of this job allowing you to start/stop/restart in concurrent safe way. Schedule also instrument this Job and gather metrics and optionally expose them via uber-go/tally scope.

Dec 8, 2022
Tasqueue is a simple, lightweight distributed job/worker implementation in Go

Tasqueue Tasqueue is a simple, lightweight distributed job/worker implementation in Go Concepts tasqueue.Broker is a generic interface that provides m

Dec 24, 2022
A lightweight job scheduler based on priority queue with timeout, retry, replica, context cancellation and easy semantics for job chaining. Build for golang web apps.

Table of Contents Introduction What is RIO? Concern An asynchronous job processor Easy management of these goroutines and chaining them Introduction W

Dec 9, 2022
clockwork - Simple and intuitive job scheduling library in Go.
clockwork - Simple and intuitive job scheduling library in Go.

clockwork A simple and intuitive scheduling library in Go. Inspired by python's schedule and ruby's clockwork libraries. Example use package main imp

Jul 27, 2022
You had one job, or more then one, which can be done in steps

Leprechaun Leprechaun is tool where you can schedule your recurring tasks to be performed over and over. In Leprechaun tasks are recipes, lets observe

Nov 23, 2022
Job scheduling made easy.

scheduler Job scheduling made easy. Scheduler allows you to schedule recurrent jobs with an easy-to-read syntax. Inspired by the article Rethinking Cr

Dec 30, 2022
goCron: A Golang Job Scheduling Package.

goCron: A Golang Job Scheduling Package.

Jan 9, 2023
A programmable, observable and distributed job orchestration system.
A programmable, observable and distributed job orchestration system.

?? Overview Odin is a programmable, observable and distributed job orchestration system which allows for the scheduling, management and unattended bac

Dec 21, 2022
Machinery is an asynchronous task queue/job queue based on distributed message passing.
Machinery is an asynchronous task queue/job queue based on distributed message passing.

Machinery Machinery is an asynchronous task queue/job queue based on distributed message passing. V2 Experiment First Steps Configuration Lock Broker

Dec 24, 2022
Simple job queues for Go backed by Redis
Simple job queues for Go backed by Redis

bokchoy Introduction Bokchoy is a simple Go library for queueing tasks and processing them in the background with workers. It should be integrated in

Dec 13, 2022
golang job dispatcher
golang job dispatcher

go-gearman The shardingkey is hashed to the same queue, each of which is bound to a worker.

Dec 28, 2022
A simple job scheduler backed by Postgres.

A simple job scheduler backed by Postgres used in production at https://operand.ai. Setup needs two environment variables, SECRET and ENDPOINT. The se

Sep 10, 2022
goInterLock is golang job/task scheduler with distributed locking mechanism (by Using Redis🔒).
goInterLock is golang job/task scheduler with distributed locking mechanism (by Using Redis🔒).

goInterLock is golang job/task scheduler with distributed locking mechanism. In distributed system locking is preventing task been executed in every instant that has the scheduler,

Dec 5, 2022
xxl-job 对应的golang客户端

xxl-job-go-client xxl-job 对应的golang客户端 提供Elasticsearch 日志组件,把job执行过程写入elasticsearch方便跟踪查询 func main() { exec := xxl.NewExecutor( xxl.ServerAd

Aug 26, 2022
a self terminating concurrent job queue for indeterminate workloads in golang

jobtracker - a self terminating concurrent job queue for indeterminate workloads in golang This library is primarily useful for technically-recursive

Sep 6, 2022
A zero-dependencies and lightweight go library for job scheduling

A zero-dependencies and lightweight go library for job scheduling.

Aug 3, 2022
Cloud-native, enterprise-level cron job platform for Kubernetes
Cloud-native, enterprise-level cron job platform for Kubernetes

Furiko Furiko is a cloud-native, enterprise-level cron and adhoc job platform for Kubernetes. The main website for documentation and updates is hosted

Dec 30, 2022
Easily schedule commands to run multiple times at set intervals (like a cronjob, but with one command)

hakcron Easily schedule commands to run multiple times at set intervals (like a cronjob, but for a single command) Description hakcron allows you to r

Aug 17, 2022
Chrono is a scheduler library that lets you run your task and code periodically
Chrono is a scheduler library that lets you run your task and code periodically

Chrono is a scheduler library that lets you run your tasks and code periodically. It provides different scheduling functionalities to make it easier t

Dec 26, 2022