Service orchestration and management tool.

Serf CircleCI Join the chat at https://gitter.im/hashicorp-serf/Lobby

Serf is a decentralized solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant.

Serf runs on Linux, Mac OS X, and Windows. An efficient and lightweight gossip protocol is used to communicate with other nodes. Serf can detect node failures and notify the rest of the cluster. An event system is built on top of Serf, letting you use Serf's gossip protocol to propagate events such as deploys, configuration changes, etc. Serf is completely masterless with no single point of failure.

Here are some example use cases of Serf, though there are many others:

  • Discovering web servers and automatically adding them to a load balancer
  • Organizing many memcached or redis nodes into a cluster, perhaps with something like twemproxy or maybe just configuring an application with the address of all the nodes
  • Triggering web deploys using the event system built on top of Serf
  • Propagating changes to configuration to relevant nodes.
  • Updating DNS records to reflect cluster changes as they occur.
  • Much, much more.

Quick Start

First, download a pre-built Serf binary for your operating system, compile Serf yourself, or install using go get -u github.com/hashicorp/serf/cmd/serf.

Next, let's start a couple Serf agents. Agents run until they're told to quit and handle the communication of maintenance tasks of Serf. In a real Serf setup, each node in your system will run one or more Serf agents (it can run multiple agents if you're running multiple cluster types. e.g. web servers vs. memcached servers).

Start each Serf agent in a separate terminal session so that we can see the output of each. Start the first agent:

$ serf agent -node=foo -bind=127.0.0.1:5000 -rpc-addr=127.0.0.1:7373
...

Start the second agent in another terminal session (while the first is still running):

$ serf agent -node=bar -bind=127.0.0.1:5001 -rpc-addr=127.0.0.1:7374
...

At this point two Serf agents are running independently but are still unaware of each other. Let's now tell the first agent to join an existing cluster (the second agent). When starting a Serf agent, you must join an existing cluster by specifying at least one existing member. After this, Serf gossips and the remainder of the cluster becomes aware of the join. Run the following commands in a third terminal session.

$ serf join 127.0.0.1:5001
...

If you're watching your terminals, you should see both Serf agents become aware of the join. You can prove it by running serf members to see the members of the Serf cluster:

$ serf members
foo    127.0.0.1:5000    alive
bar    127.0.0.1:5001    alive
...

At this point, you can ctrl-C or force kill either Serf agent, and they'll update their membership lists appropriately. If you ctrl-C a Serf agent, it will gracefully leave by notifying the cluster of its intent to leave. If you force kill an agent, it will eventually (usually within seconds) be detected by another member of the cluster which will notify the cluster of the node failure.

Documentation

Full, comprehensive documentation is viewable on the Serf website:

https://www.serf.io/docs

Developing Serf

If you wish to work on Serf itself, you'll first need Go installed (version 1.10+ is required). Make sure you have Go properly installed, including setting up your GOPATH.

Next, clone this repository into $GOPATH/src/github.com/hashicorp/serf and then just type make. In a few moments, you'll have a working serf executable:

$ make
...
$ bin/serf
...

NOTE: make will also place a copy of the executable under $GOPATH/bin/

Serf is first and foremost a library with a command-line interface, serf. The Serf library is independent of the command line agent, serf. The serf binary is located under cmd/serf and can be installed stand alone by issuing the command go get -u github.com/hashicorp/serf/cmd/serf. Applications using the Serf library should only need to include github.com/hashicorp/serf.

Tests can be run by typing make test.

If you make any changes to the code, run make format in order to automatically format the code according to Go standards.

Owner
HashiCorp
Consistent workflows to provision, secure, connect, and run any infrastructure for any application.
HashiCorp
Comments
  • [COMPLIANCE] Update MPL-2.0 LICENSE

    [COMPLIANCE] Update MPL-2.0 LICENSE

    Hi there πŸ‘‹

    This PR was auto-generated as part of an internal review of public repositories that are not in compliance with HashiCorp's licensing standards.

    Frequently Asked Questions

    Why am I getting this PR? This pull request was created because one or more of the following criteria was found:
    • This repo did not previously have a LICENSE file
    • A LICENSE file was present, but had a non-conforming name (e.g., license.txt)
    • A LICENSE file was present, but was missing an appropriate copyright statement

    More info is available in the RFC

    How do you determine the copyright date? The copyright date given in this PR is supposed to be the year the repository or project was created (whichever is older). If you believe the copyright date given in this PR is not valid, please reach out to:

    #proj-software-copyright

    I don't think this repo should be licensed under the terms of the Mozilla Public License 2.0. Who should I reach out to? If you believe this repository should not use an MPL 2.0 License, please reach out to [email protected]. Exemptions are considered on a case-by-case basis, but common reasons include if the project is co-managed by another entity that requires differing license terms, or if the project is part of an ecosystem that commonly uses a different license type (e.g., MIT or Apache 2.0).

    Please approve and merge this PR in a timely manner to keep this source code compliant with our OSS license agreement. If you have any questions or feedback, reach out to #proj-software-copyright.

    Thank you!


    Made with :heart: @HashiCorp

  • Fix the bug that consul is always in the leaving state

    Fix the bug that consul is always in the leaving state

    Fix: https://github.com/hashicorp/serf/issues/662 The function handleNodeJoin is used to set member.Status = StatusAlive, and the function handleNodeJoinIntent is used to set member.statusLTime = joinMsg.LTime. Unfortunately, the two are not in sync. If set member.statusLTime directly in handleNodeJoinIntent, will get wrong results. As a result, the member state we see has always been leaving e.g. We have 3 members A B and C in our cluster, Both B and C join the cluster through A.

    • We do leave and start operations on node B
    • When node A updates member.statusLTime of B to the latest value in the function handleNodeJoinIntent, member.Status may still be left.
    • When A sends its member status to C through pushPullMsg, C recognizes that the status of B is leave and marks member .statusLTime is the received value + 1 incorrectly.
    • C refreshes the status of member B as leaving, and marks member.statusLTime as member.statusLTime as the received value +1
    • The state of B in C continues to be leaving eventually
  • When the nodes leave and start at the same time in a large-capacity consul cluster, some member states will continue to keep leaving, not alive.

    When the nodes leave and start at the same time in a large-capacity consul cluster, some member states will continue to keep leaving, not alive.

    I am in a consul scenario with 3 servers and 150 clients. When 150 clients leave at the same time and restart, some member states show leaving Further analysis, the following scenarios appear

    • Node A and Node B are both rebooted
    • After node A starts, it receives the leave message from B, and A caches the type and ltime of the message to recentIntents
    • When A receives the join message from B, the join message is smaller than the leave message ltime. As a result, In A consul members, the state of B is always leaving.

    The source code is as follows

    https://github.com/hashicorp/serf/blob/830be12c38e509faa64a2a650629e91c8930a040/serf/serf.go#L1111

    https://github.com/hashicorp/serf/blob/830be12c38e509faa64a2a650629e91c8930a040/serf/serf.go#L1220

    https://github.com/hashicorp/serf/blob/830be12c38e509faa64a2a650629e91c8930a040/serf/serf.go#L956

    https://github.com/hashicorp/serf/blob/830be12c38e509faa64a2a650629e91c8930a040/serf/serf.go#L1225

    /kind bug

  • Logging and close race in RPC client

    Logging and close race in RPC client

    We're regularly seeing errors like panic: close of closed channel in our application with a stack trace like:

    goroutine 502383 [running]:
    github.com/hashicorp/serf/client.(*queryHandler).Cleanup(0xc00198c060)
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:629 +0x91
    github.com/hashicorp/serf/client.(*RPCClient).deregisterHandler(0xc00034c070, 0x11bea00)
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:804 +0xd8
    github.com/hashicorp/serf/client.(*queryHandler).Handle(0xc00198c060, 0x11beb00)
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:612 +0x1cb
    github.com/hashicorp/serf/client.(*RPCClient).respondSeq(0xc00034c070, 0x11beb00, 0xc001cb3740)
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:824 +0xc4
    github.com/hashicorp/serf/client.(*RPCClient).listen(0xc00034c070)
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:840 +0x86
    created by github.com/hashicorp/serf/client.ClientFromConfig
    	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:148 +0x4bd
    

    While I am not 100% sure of the root cause of this, inspecting the RPC client code suggests that there might be a race if someone is calling client.Close() on an RPC client at the same time a client might be closing itself due to an error. Since none of the handlers' Cleanup methods are currently doing anything atomic, it's easy to see how multiple callers might be close()ing the channels at the same time.

    This PR attempts to fix the race condition by adding a mutex to the various RPC client handlers to protect access to closed and the response channels; it also adds a check to see if the handler is closed in the Handle() methods.

    I noticed that there may be a similar race with the init members, so they got a similar but slightly different atomic treatment.

    Finally, I noticed that the serf.Config provides the ability to customize the logging by providing your own *log.Logger, but the RPC client currently does not. So this PR also adds a *log.Logger to the config for the RPC client. If not supplied, it uses the default logger (which should have the same effect as the current code).

    It doesn't seem like there are any tests for the client package, and the top-level tests failed for me before making any changes, so if someone can suggest how I can confirm my changes, I'm more than happy to do so.

  • Is Serf suitable for integration into the kubernetes app as a library?

    Is Serf suitable for integration into the kubernetes app as a library?

    As I know Consul client agent should NOT be use in each application pod as a sidecar ,however, should Serf be used as a common lib in business app workload that will run in a pod?Will it causes an explosion of resource usage? Thanks.

  • [Agent | Event Handler] Honor shebang when present

    [Agent | Event Handler] Honor shebang when present

    The changes I made address issue #421.

    When running in a *nix system, the first two runes of the event handler script are compared against #!. If it matches, the script is run directly, or if no shebang is present, the current behavior of running with /bin/sh remains.

Library for enabling asynchronous health checks in your service

go-health A library that enables async dependency health checking for services running on an orchestrated container platform such as kubernetes or mes

Jan 4, 2023
A distributed lock service in Go using etcd

locker A distributed lock service client for etcd. What? Why? A distributed lock service is somewhat self-explanatory. Locking (mutexes) as a service

Sep 27, 2022
Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022
watch tool rewritten in go
watch tool rewritten in go

⏰ watch watch tool rewritten in go. Features working aliases configurable shell windows support Usage watch [command] Specify command for watch by set

Dec 29, 2022
Golang client library for adding support for interacting and monitoring Celery workers, tasks and events.

Celeriac Golang client library for adding support for interacting and monitoring Celery workers and tasks. It provides functionality to place tasks on

Oct 28, 2022
dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise

dht Installation Install the library package with go get github.com/anacrolix/dht, or the provided cmds with go get github.com/anacrolix/dht/cmd/....

Dec 28, 2022
Take control of your data, connect with anything, and expose it anywhere through protocols such as HTTP, GraphQL, and gRPC.
Take control of your data, connect with anything, and expose it anywhere through protocols such as HTTP, GraphQL, and gRPC.

Semaphore Chat: Discord Documentation: Github pages Go package documentation: GoDev Take control of your data, connect with anything, and expose it an

Sep 26, 2022
More effective network communication, two-way calling, notify and broadcast supported.

ARPC - More Effective Network Communication Contents ARPC - More Effective Network Communication Contents Features Performance Header Layout Installat

Dec 22, 2022
A feature complete and high performance multi-group Raft library in Go.
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / δΈ­ζ–‡η‰ˆ News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

Dec 30, 2022
High performance, distributed and low latency publish-subscribe platform.
High performance, distributed and low latency publish-subscribe platform.

Emitter: Distributed Publish-Subscribe Platform Emitter is a distributed, scalable and fault-tolerant publish-subscribe platform built with MQTT proto

Jan 2, 2023
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 1, 2023
Simple, fast and scalable golang rpc library for high load

gorpc Simple, fast and scalable golang RPC library for high load and microservices. Gorpc provides the following features useful for highly loaded pro

Dec 19, 2022
🌧 BitTorrent client and library in Go
🌧 BitTorrent client and library in Go

rain BitTorrent client and library in Go. Running in production at put.io. Features Core protocol Fast extension Magnet links Multiple trackers UDP tr

Dec 28, 2022
A Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services

sleuth sleuth is a Go library that provides master-less peer-to-peer autodiscovery and RPC between HTTP services that reside on the same network. It w

Dec 28, 2022
Full-featured BitTorrent client package and utilities

torrent This repository implements BitTorrent-related packages and command-line utilities in Go. The emphasis is on use as a library from other projec

Jan 4, 2023
Go Open Source, Distributed, Simple and efficient Search Engine

Go Open Source, Distributed, Simple and efficient full text search engine.

Dec 31, 2022
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

Dapr is a portable, serverless, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks.

Jan 5, 2023