Zero downtime restarts for go servers (Drop in replacement for http.ListenAndServe)

endless

Zero downtime restarts for golang HTTP and HTTPS servers. (for golang 1.3+)

GoDoc

Inspiration & Credits

Well... it's what you want right - no need to hook in and out on a loadbalancer or something - just compile, SIGHUP, start new one, finish old requests etc.

There is https://github.com/rcrowley/goagain and i looked at https://fitstar.github.io/falcore/hot_restart.html which looked easier to do, but still some assembly required. I wanted something that's ideally as simple as

err := endless.ListenAndServe("localhost:4242", mux)

I found the excellent post Graceful Restart in Golang by Grisha Trubetskoy and took his code as a start. So a lot of credit to Grisha!

Features

  • Drop-in replacement for http.ListenAndServe and http.ListenAndServeTLS
  • Signal hooks to execute your own code before or after the listened to signals (SIGHUP, SIGUSR1, SIGUSR2, SIGINT, SIGTERM, SIGTSTP)
  • You can start multiple servers from one binary and endless will take care of the different sockets/ports assignments when restarting

Default Timeouts & MaxHeaderBytes

There are three variables exported by the package that control the values set for DefaultReadTimeOut, DefaultWriteTimeOut, and MaxHeaderBytes on the inner http.Server:

DefaultReadTimeOut    time.Duration
DefaultWriteTimeOut   time.Duration
DefaultMaxHeaderBytes int

The endless default behaviour is to use the same defaults defined in net/http.

These have impact on endless by potentially not letting the parent process die until all connections are handled/finished.

Hammer Time

To deal with hanging requests on the parent after restarting endless will hammer the parent 60 seconds after receiving the shutdown signal from the forked child process. When hammered still running requests get terminated. This behaviour can be controlled by another exported variable:

DefaultHammerTime time.Duration

The default is 60 seconds. When set to -1 hammerTime() is not invoked automatically. You can then hammer the parent manually by sending SIGUSR2. This will only hammer the parent if it is already in shutdown mode. So unless the process had received a SIGTERM, SIGSTOP, or SIGINT (manually or by forking) before SIGUSR2 will be ignored.

If you had hanging requests and the server got hammered you will see a log message like this:

2015/04/04 13:04:10 [STOP - Hammer Time] Forcefully shutting down parent

Examples & Documentation

import "github.com/fvbock/endless"

and then replacing http.ListenAndServe with endless.ListenAndServe or http.ListenAndServeTLS with endless.ListenAndServeTLS

err := endless.ListenAndServe("localhost:4242", handler)

After starting your server you can make some changes, build, and send SIGHUP to the running process and it will finish handling any outstanding requests and serve all new incoming ones with the new binary.

More examples are in here

There is also GoDoc Documentation

Signals

The endless server will listen for the following signals: syscall.SIGHUP, syscall.SIGUSR1, syscall.SIGUSR2, syscall.SIGINT, syscall.SIGTERM, and syscall.SIGTSTP:

SIGHUP will trigger a fork/restart

syscall.SIGINT and syscall.SIGTERM will trigger a shutdown of the server (it will finish running requests)

SIGUSR2 will trigger hammerTime

SIGUSR1 and SIGTSTP are listened for but do not trigger anything in the endless server itself. (probably useless - might get rid of those two)

You can hook your own functions to be called pre or post signal handling - eg. pre fork or pre shutdown. More about that in the hook example.

Limitation: No changing of ports

Currently you cannot restart a server on a different port than the previous version was running on.

PID file

If you want to save actual pid file, you can change the BeforeBegin hook like this:

server := endless.NewServer("localhost:4242", handler)
server.BeforeBegin = func(add string) {
	log.Printf("Actual pid is %d", syscall.Getpid())
	// save it somehow
}
err := server.ListenAndServe()

TODOs

  • tests
  • documentation
  • less ugly wrapping of the tls.listener
Owner
Florian von Bock
coder.
Florian von Bock
Comments
  • Exceptionally weird server crashes due to panic: sync: negative WaitGroup counter

    Exceptionally weird server crashes due to panic: sync: negative WaitGroup counter

    Hey, First of all thank you for this really helpful library! :-)

    After having spent multiple hours testing this lib and reasoning about your code whether it is race conditions free and whether it is robust enough to survive in a high profile scenario I decided to give it a try. Since then I use endless in a mission critical production app.

    All in all I'm very happy with the result. But there are situations where my Go server sporadically crashes due to a panic (sync: negative WaitGroup counter) in endless with the following trace:

    goroutine 3849 [running]:
    runtime.panic(0x764000, 0xc208404de0)
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/runtime/panic.c:279 +0xf5
    sync.(*WaitGroup).Add(0xc208042080, 0xffffffffffffffff)
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/sync/waitgroup.go:64 +0x93
    sync.(*WaitGroup).Done(0xc208042080)
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/sync/waitgroup.go:82 +0x30
    bitbucket.org/justphil/glutenplan.de/vendor/endless.endlessConn.Close(0x7f0997bd9ce0, 0xc20803e018, 0xc208042000, 0x0, 0x0)
        /Users/pt/Dev/Workspaces/go/src/bitbucket.org/justphil/glutenplan.de/vendor/endless/endless.go:508 +0x48
    bitbucket.org/justphil/glutenplan.de/vendor/endless.(*endlessConn).Close(0xc2083d8360, 0x0, 0x0)
        <autogenerated>:16 +0xa4
    net/http.(*conn).close(0xc20836e000)
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/net/http/server.go:1047 +0x4f
    net/http.func·011()
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/net/http/server.go:1104 +0x22a
    net/http.(*conn).serve(0xc20836e000)
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/net/http/server.go:1187 +0x78b
    created by net/http.(*Server).Serve
        /usr/local/Cellar/go/1.3.3/libexec/src/pkg/net/http/server.go:1721 +0x313
    

    (Note: I forked your lib and extended it with the ability to write pid files. But the same problem appears in the original lib as well.)

    So, I again started to reason about the endless code in order to get rid of this problem. There are basically two possibilities how the WaitGroup counter can become negative. The first one is when hammerTime() tries to forcefully shut down the parent process. And the second one is the one that is panicking when a connection is closed. To me this is really weird because according to your code this panicking implies that endless wants to close connections that have neven been established.

    Any hints on this one? :-)

    Bye, Phil

  • Refactor svr mgmt & support Handler backends

    Refactor svr mgmt & support Handler backends

    Added Manager which is responsible to managing all servers and their actions.

    Moves the built-in signal handling into a pluggable Handler backends which allows endless to support Windows which doesn't support signals.

    The default Handler on none Windows OS's is a SignalHandler so still maintaining drop in replacement of http.ListenAndServe methods.

    Manager now exports a Restart(), Shutdown() and Terminate(..) methods which are the heart of endless.

    SignalHandler can hook any signal handle however only the following are used internally:

    • SIGHUP: Call Restart()
    • SIGUSR2: Call Terminate(0)
    • SIGINT & SIGTERM: Call Shutdown()

    Hooked signals now require a function of type SignalHook which takes the triggering os.Signal. A Hook can prevent further processing of the signal by returning false.

    Also:

    • Renamed HammerTime to TerminateTimeout
    • Made state management internal using a type for compile time validation.
    • Removed initialisation of variable to their default values (go vet)
    • Documented methods.
    • Cleaned up comment formatting for easier reading.
    • Made use of http.Server ErrorLog for all logging if not nil otherwise standard logger.
    • Made TerminateTimeout configurable on each server.
    • Removed commented out debugging.
    • Renamed type endlessServer -> Server so its exported and idiomaticly named.
    • Renamed type endlessListener -> Listener so they exported and idomaticly named.
    • Moved command line flags to Environment to avoid conflict with existing application flags.
    • Fixed err var shadowing.
    • Add state transition validation and error checking.
    • Exported GetState
    • Use a done channel to fast bypass a Terminate request.
    • Removed keep-alive disable as it wasn't operating as expected
    • Added full test suite.
    • Hide output under Debug flag to support tests properly.
    • Fix restarts with multiple servers. Prior to this rework servers would be shutdown after the first server had started.
  • Running with a process manager

    Running with a process manager

    On restart, the program forks a new copy and closes the old copy, leaving the new one parentless/daemonized. How would you use this with init, upstart, systemd and others? If you can't, what do you do about crashes - restarts and reporting, and also logging?

  • Lower level API

    Lower level API

    This library only provides public API for http.Handlers and not net.Listeners :-( endlessListener is private for some reason. Please make the TCP-level API available!

  • Ability to save pid file.

    Ability to save pid file.

    I needed to save pid to file. It didn't turn out as easy as it sounds.

    In many cases the written PID file was incorrect.

    For example: you can try to write pid on init, before ListenAndServe.

    That would be wrong if you try to run the server second time. It says bind: address already in use, but anyway writes wrong pid.

    You can try to write pid file after ListenAndServe (but you can't do that until first request comes and you need to do it in handles, which is messy code).

    You can try to detect if the original pid is still running. But you might be a child. Then you need to detect if -continue flag is passed. Which would fail if endless changes the way it runs the child and also relies on undocumented behavoir.

    Anyway, any solution would lead to incorrect pid file being written in some cases.

    The only correct place to get actual pid is deep inside endless:

    log.Println(syscall.Getpid(), srv.Addr)
    

    So, I replaced it with more generic (fully backwards compatible):

    srv.BeforeBegin(srv.Addr)
    

    which, by default, does exactly the same:

    srv.BeforeBegin = func(addr string) {
        log.Println(syscall.Getpid(), addr)
    }
    

    but, it can be overridden from user code, like so:

    server := endless.NewServer("localhost:4242", handler)
    server.BeforeBegin = func(add string) {
        log.Printf("Actual pid is %d", syscall.Getpid())
        // save it somehow
    }
    err := server.ListenAndServe()
    

    So that now user can always have real actual pid in pid file.

    Thanks for the awesome library!

  • Combinable with Supervisor?

    Combinable with Supervisor?

    Hey, thanks for this cool package!

    Is this anyhow combinable with supervisor? I'm using it for regular restarts, log rotations, and so on. Only downside while deploying is that with supervisor I don't have graceful restarts...

    Any experiences with such a stack?

  • endless takes over param handling

    endless takes over param handling

    Expected behavior: endless to append whatever runtime args it needs, but preserve the application command line params.

    Actual behavior: endless takes over the command line params, and treats all application params as illegal

    What's most curious is that args required by the main app are still considered required, even though providing them is disallowed by endless.

    1074)internationalsos/riskratings % ./rr -h
    Usage of ./rr:
      -continue=false: Dummy -- needed by endless
      -debug=false: Enable debugging
      -pass="": Password
      -port=":8080": Listen port
      -refresh=10: Cache refresh time, in minutes
      -uri="******************": URI of data source
      -user="": Username
    1075)internationalsos/riskratings % ./rr
    Password is required
    Usage of ./rr:
      -continue=false: Dummy -- needed by endless
      -debug=false: Enable debugging
      -pass="": Password
      -port=":8080": Listen port
      -refresh=10: Cache refresh time, in minutes
      -uri="******************": URI of data source
      -user="": Username
    1076)internationalsos/riskratings % ./rr -pass xyz -user abc -port :8123
    2015/06/29 22:33:58 start listening on localhost:8123
    flag provided but not defined: -pass
    Usage of ./rr:
      -continue=false: listen on open fd (after forking)
      -socketorder="": previous initialization order - used when more than one listener was started
    
  • undefined on import...

    undefined on import...

    ../github.com/fvbock/endless/endless.go:380: srv.SetKeepAlivesEnabled undefined (type *endlessServer has no field or method SetKeepAlivesEnabled)

    When I used go get to get this.

  • runningServersForked should be read with runningServerReg held

    runningServersForked should be read with runningServerReg held

    The current code in fork() is racy in the sense that two concurrent go routines might call fork() at the same time, see that runningServersForked is false and perform two forks. Holding runningServerReg makes sure that only one of the goroutines will see runningServersForked set to false.

  • Disable keep-alives on servers during shutdown.

    Disable keep-alives on servers during shutdown.

    Keep-alive connections get their timeouts reset w/each new request made on the connection. As a result, they can stay open indefinitely, as long as they remain in use.

    Disabling keep-alive on shutdown should help ensure that existing connections get cleared out in a timely fashion.

  • Build for Windows on compilers smaller than go1.10

    Build for Windows on compilers smaller than go1.10

    Solution shared by one of our devs

    https://developpaper.com/using-endless-in-windows-undefined-syscall-sigusr1/

    Inside: C:\Program Files\Go\src\syscall\types_windows.go

    var signals = [...]string{
        //Omit line n here....
        /**Compatible with Windows start*/
        16: "SIGUSR1",
        17: "SIGUSR2",
        18: "SIGTSTP",
        /**Compatible with windows end*/
    }
    /**Compatible with Windows start*/
    func Kill(...interface{}) {
        return;
    }
    const (
        SIGUSR1 = Signal(0x10)
        SIGUSR2 = Signal(0x11)
        SIGTSTP = Signal(0x12)
    )
    /**Compatible with windows end*/
    

    Originally posted by @kevincobain2000 in https://github.com/fvbock/endless/issues/35#issuecomment-1033721343

    For example on go1.5.1, the C:\Go\src\syscall\types_windows.go file is missing, but After you edit the C:\Go\src\syscall\ztypes_windows.go file you receive 10 warnings: dll_windows.go 1 warning in func (p *Proc) Call(...) (...) {...}, exec_windows.go 1 warning in func joinExeDirAndFName(...) (...) {...}, zsyscall_windows.go 8 warnings possible misuse of unsafe.Pointer. If it is enough to comment out the return keyword in the first two files, then it is not clear how to fix the last one. Ideas?

    https://cs.opensource.google/go/go/+/refs/tags/go1.5.1:src/syscall/ https://cs.opensource.google/go/go/+/refs/tags/go1.10:src/syscall/ изображение

  • reload daemon then http server can't listen on other port

    reload daemon then http server can't listen on other port

    env: Linux l-qfy0.dba.dev.cn0 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

    go version go1.17 linux/amd64

    test.go

    import (
    	"context"
    	"flag"
    	"fmt"
    	"os/exec"
    	"strconv"
    	"syscall"
    
    	"github.com/fvbock/endless"
    	"github.com/gin-gonic/gin"
    )
    
    var ctx context.Context
    var cancel context.CancelFunc
    var reload bool
    
    var port = flag.Int("port", 8080, "")
    
    func main() {
    	flag.Parse()
    	ListenHTTP(setReload)
    }
    
    func setReload() {
    	reload = true
    }
    func ListenHTTP(reloadFunc func()) error {
    	var handler = gin.Default()
    	handler.GET("/", startDaemon())
    	handler.GET("/hello", hello())
    
    	ctx, cancel = context.WithCancel(context.Background())
    	defer cancel()
    	s := endless.NewServer(":"+strconv.Itoa(*port), handler)
    	go func() {
    		select {
    		case <-ctx.Done():
    		}
    		if s != nil {
    			if err := s.Shutdown(ctx); err != nil {
    				fmt.Println(err)
    			}
    		}
    	}()
    	s.SignalHooks[endless.PRE_SIGNAL][syscall.SIGHUP] = append(
    		s.SignalHooks[endless.PRE_SIGNAL][syscall.SIGHUP],
    		reloadFunc)
    	if err := s.ListenAndServe(); err != nil {
    		return err
    	}
    	return nil
    }
    func hello() gin.HandlerFunc {
    	return func(c *gin.Context) {
    		fmt.Println("hello ", *port)
    	}
    }
    func startDaemon() gin.HandlerFunc {
    	return func(c *gin.Context) {
    		start(c)
    	}
    }
    
    func start(c *gin.Context) (err error) {
    	ctx, cancel := context.WithCancel(context.Background())
    	defer cancel()
    	cmd := exec.CommandContext(ctx, "sh", []string{"start.sh"}...)
    	err = cmd.Run()
    	if err != nil {
    		fmt.Println(err)
    		return
    	}
    	return
    }
    

    start.sh

    nohup /home/qfy/tmp/test --port=8081 > /dev/null 2>&1 &
    

    basedir:/root/tmp

    steps:

    1. run test
    image
    1. start with new shell, then send http requst,now 8081 is listening
    image
    1. kill 8081 ,and reload test
    image

    4.then send http requst,but 8081 is not listening image

    Is that a bug need to fix ? And how can I resolve it.

  • Abandoned project?

    Abandoned project?

    This repo hasn't been updated since 2016. Is it officially abandoned? If so, can you mark it as archive, or put out a call for maintainers as a new issue and in the README?

  • windows endless.go:64:11: undefined: syscall.SIGUSR1

    windows endless.go:64:11: undefined: syscall.SIGUSR1

    PS C:\Users\xxx\Workspace\xxx-xxx> go env set GO111MODULE= set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\xxx\AppData\Local\go-build set GOENV=C:\Users\xxx\AppData\Roaming\go\env set GOEXE=.exe set GOEXPERIMENT= set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOINSECURE= set GOMODCACHE=C:\Users\xxx\go\pkg\mod set GONOPROXY=.corp.example.com set GONOSUMDB=.corp.example.com set GOOS=windows set GOPATH=C:\Users\xxx\go set GOPRIVATE=*.corp.example.com set GOPROXY=https://proxy.golang.com.cn,direct set GOROOT=C:\Program Files\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64 set GOVCS= set GOVERSION=go1.18.1 set GCCGO=gccgo set GOAMD64=v1 set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD=C:\Users\xxx\Workspace\xxx-xxx\go.mod set GOWORK= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\xxx\AppData\Local\Temp\go-build219240513=/tmp/go-build -gno-record-gcc-switches

  • I tried to get the app to send a SIGHUP signal to reload itself and got a

    I tried to get the app to send a SIGHUP signal to reload itself and got a "text file busy" error

    		I tried to get the app to send a SIGHUP signal to reload itself and got a "text file busy" error
    
    		2021/11/17 17:14:30 8597 Received SIGHUP. forking.
    		2021/11/17 17:14:30 Restart: Failed to launch, error: fork/exec ./project: text file busy
    
    		but I manually executed "kill -1" to send a SIGHUP signal and it worked fine, reload successful
    
    		2021/11/17 17:53:28 11425 Received SIGHUP. forking.
    		2021/11/17 17:53:28 11475 0.0.0.0:8001
    		2021/11/17 17:53:28 11425 Received SIGTERM.
    		2021/11/17 17:53:28 11425 Waiting for connections to finish...
    		2021/11/17 17:53:28 11425 Serve() returning...
    
Related tags
Drop-in replacement for Go net/http when running in AWS Lambda & API Gateway
Drop-in replacement for Go net/http when running in AWS Lambda & API Gateway

Package gateway provides a drop-in replacement for net/http's ListenAndServe for use in AWS Lambda & API Gateway, simply swap it out for gateway.Liste

Nov 24, 2022
Graceful process restarts in Go

Graceful process restarts in Go It is sometimes useful to update the running code and / or configuration of a network service, without disrupting exis

Dec 27, 2022
llb - It's a very simple but quick backend for proxy servers. Can be useful for fast redirection to predefined domain with zero memory allocation and fast response.

llb What the f--k it is? It's a very simple but quick backend for proxy servers. You can setup redirect to your main domain or just show HTTP/1.1 404

Sep 27, 2022
Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than net/http
Fast HTTP package for Go. Tuned for high performance. Zero memory allocations in hot paths. Up to 10x faster than net/http

fasthttp Fast HTTP implementation for Go. Currently fasthttp is successfully used by VertaMedia in a production serving up to 200K rps from more than

Jan 5, 2023
Secure-by-default HTTP servers in Go.

go-safeweb DISCLAIMER: This is not an officially supported Google product. go-safeweb is a collection of libraries for writing secure-by-default HTTP

Dec 17, 2022
Lobby - A Nox game lobby which exposes a simple HTTP API for both listing and registering Nox game servers

Nox lobby server This project provides a Nox game lobby which exposes a simple H

Mar 6, 2022
“Dear Port80” is a zero-config TCP proxy server that hides SSH connection behind a HTTP server!

Dear Port80 About The Project: “Dear Port80” is a zero-config TCP proxy server that hides SSH connection behind a HTTP server! +---------------------

Jun 29, 2022
Podbit is a replacement for newsboat's standard podboat tool for listening to podcasts.

Podbit - Podboat Improved Podbit is a replacement for newsboat's standard podboat tool for listening to podcasts. It is minimal, performant and abides

Dec 8, 2022
A minimal IPFS replacement for P2P IPLD apps

IPFS-Nucleus IPFS-Nucleus is a minimal block daemon for IPLD based services. You could call it an IPLDaemon. It implements the following http api call

Jan 4, 2023
🛠 A test fixtures replacement for Go, support struct and ent, inspired by factory_bot/factory_boy

carrier - A Test Fixtures Replacement for Go Statically Typed - 100% statically typed using code generation Developer Friendly API - explicit API with

Jan 23, 2022
🚀Gev is a lightweight, fast non-blocking TCP network library based on Reactor mode. Support custom protocols to quickly and easily build high-performance servers.
🚀Gev is a lightweight, fast non-blocking TCP network library based on Reactor mode. Support custom protocols to quickly and easily build high-performance servers.

gev 中文 | English gev is a lightweight, fast non-blocking TCP network library based on Reactor mode. Support custom protocols to quickly and easily bui

Jan 6, 2023
Easy SSH servers in Golang

gliderlabs/ssh The Glider Labs SSH server package is dope. —@bradfitz, Go team member This Go package wraps the crypto/ssh package with a higher-level

Dec 28, 2022
mdmb is a tool for simulating Apple devices interacting with Apple MDM servers.

mdmb mdmb — short for MDM Benchmark, à la ab — is a tool for simulating Apple devices interacting with Apple MDM servers. mdmb creates sets of fake Ap

Dec 1, 2022
The fastest way to create self-hosted exit-servers
The fastest way to create self-hosted exit-servers

inletsctl - the fastest way to create self-hosted exit-servers inletsctl automates the task of creating an exit-server (tunnel server) on public cloud

Dec 15, 2022
DNS Ping: to check packet loss and latency issues with DNS servers

DNSping DNS Ping checks packet loss and latency issues with DNS servers Installation If you have golang, easiest install is go get -u fortio.org/dnspi

Nov 18, 2022
A vote botting wrapper for GoLang designed for Minecraft: Pocket Servers.

libvote A vote botting wrapper for GoLang designed for Minecraft: Pocket Servers by Jviguy and JustTal. Disclaimer Usage of libvote requires your own

Apr 17, 2022
List running processes that are acting as DCE/RPC servers or clients

rpcls This project was made to assist in a larger research project. It pulls from a running process' PEB to enumerate the loaded DLLs. If a process im

Sep 14, 2022
mt-multiserver-proxy is a reverse proxy designed for linking multiple Minetest servers together

mt-multiserver-proxy mt-multiserver-proxy is a reverse proxy designed for linking multiple Minetest servers together. It is the successor to multiserv

Nov 17, 2022
WebRTC media servers stress testing tool (currently only Janus)
 WebRTC media servers stress testing tool (currently only Janus)

GHODRAT WebRTC media servers stress testing tool (currently only Janus) Architecture Janus media-server under load Deployment ghodrat # update or crea

Nov 9, 2022