A distributed and coördination-free log management system

Last update: Dec 26, 2022

Comments: 17

OK Log is archived

I hoped to find the opportunity to continue developing OK Log after the spike of its creation. Unfortunately, despite effort, no such opportunity presented itself. Please look at OK Log for inspiration, and consider using the (maintained!) projects that came from it, ulid and run.

OK Log is a distributed and coördination-free log management system for big ol' clusters. It's an on-prem solution that's designed to be a sort of building block: easy to understand, easy to operate, and easy to extend.

Is OK Log for me?

You may consider OK Log if...

You're tailing your logs manually, find it annoying, and want to aggregate them without a lot of fuss
You're using a hosted solution like Loggly, and want to move logs on-prem
You're using Elasticsearch, but find it unreliable, difficult to operate, or don't use many of its features
You're using a custom log pipeline with e.g. Fluentd or Logstash, and having performance problems
You just wanna, like, grep your logs — why is this all so complicated?

Getting OK Log

OK Log is distributed as a single, statically-linked binary for a variety of target architectures. Download the latest release from the releases page.

Quickstart

$ oklog ingeststore -store.segment-replication-factor 1
$ ./myservice | oklog forward localhost
$ oklog query -from 5m -q Hello
2017-01-01 12:34:56 Hello world!

Deploying

Small installations

If you have relatively small log volume, you can deploy a cluster of identical ingeststore nodes. By default, the replication factor is 2, so you need at least 2 nodes. Use the -cluster flag to specify a routable IP address or hostname for each node to advertise itself on. And let each node know about at least one other node with the -peer flag.

foo$ oklog ingeststore -cluster foo -peer foo -peer bar -peer baz
bar$ oklog ingeststore -cluster bar -peer foo -peer bar -peer baz
baz$ oklog ingeststore -cluster baz -peer foo -peer bar -peer baz

To grow the cluster, just add a new node, and tell it about at least one other node via the -peer flag. Optionally, you can run the rebalance tool (TODO) to redistribute the data over the new topology. To shrink the cluster, just kill nodes fewer than the replication factor, and run the repair tool (TODO) to re-replicate lost records.

All configuration is done via commandline flags. You can change things like the log retention period (default 7d), the target segment file size (default 128MB), and maximum time (age) of various stages of the logging pipeline. Most defaults should be sane, but you should always audit for your environment.

Large installations

If you have relatively large log volume, you can split the ingest and store (query) responsibilities. Ingest nodes make lots of sequential writes, and benefit from fast disks and moderate CPU. Store nodes make lots of random reads and writes, and benefit from large disks and lots of memory. Both ingest and store nodes join the same cluster, so provide them with the same set of peers.

ingest1$ oklog ingest -cluster 10.1.0.1 -peer ...
ingest2$ oklog ingest -cluster 10.1.0.2 -peer ...

store1$ oklog store -cluster 10.1.9.1 -peer ...
store2$ oklog store -cluster 10.1.9.2 -peer ...
store3$ oklog store -cluster 10.1.9.3 -peer ...

To add more raw ingest capacity, add more ingest nodes to the cluster. To add more storage or query capacity, add more store nodes. Also, make sure you have enough store nodes to consume from the ingest nodes without backing up.

Forwarding

The forwarder is basically just netcat with some reconnect logic. Pipe the stdout/stderr of your service to the forwarder, configured to talk to your ingesters.

$ ./myservice | oklog forward ingest1 ingest2

OK Log integrates in a straightforward way with runtimes like Docker and Kubernetes. See the Integrations page for more details.

Querying

Querying is an HTTP GET to /query on any of the store nodes. OK Log comes with a query tool to make it easier to play with. One good thing is to first use the -stats flag to refine your query. When you're satisfied it's sufficiently constrained, drop -stats to get results.

$ oklog query -from 2h -to 1h -q "myservice.*(WARN|ERROR)" -regex
2016-01-01 10:34:58 [myservice] request_id 187634 -- [WARN] Get /check: HTTP 419 (0B received)
2016-01-01 10:35:02 [myservice] request_id 288211 -- [ERROR] Post /ok: HTTP 500 (0B received)
2016-01-01 10:35:09 [myservice] request_id 291014 -- [WARN] Get /next: HTTP 401 (0B received)
 ...

To query structured logs, combine a basic grep filter expression with a tool like jq.

$ oklog query -from 1h -q /api/v1/login
{"remote_addr":"10.34.115.3:50032","path":"/api/v1/login","method":"POST","status_code":200}
{"remote_addr":"10.9.101.113:51442","path":"/api/v1/login","method":"POST","status_code":500}
{"remote_addr":"10.9.55.2:55210","path":"/api/v1/login","method":"POST","status_code":200}
{"remote_addr":"10.34.115.1:51610","path":"/api/v1/login","method":"POST","status_code":200}
...

$ oklog query -from 1h -q /api/v1/login | jq '. | select(.status_code == 500)'
{
	"remote_addr": "10.9.55.2:55210",
	"path": "/api/v1/login",
	"method": "POST",
	"status_code": 500
}
...

UI

OK Log ships with a basic UI for making queries. You can access it on any store or ingeststore node, on the public API port (default 7650), path /ui. So, e.g. http://localhost:7650/ui.

Web UI
This work started based on the hypothesis that an interface which follows the same design goals as the overall system can add immediate value for the brave who want to start using OK Log and might convince the doubtful to not discard it completely. In order to get an understanding early on what might be important for such an interface I'm putting it out it in the open in a very experimental/rough state and hope to avoid spending energy non-effectively. Therefore everybody should feel encouraged to bring forth their ideas and needs regarding a web interface that interacts in (maybe rich ways) with large log volumes.

Goals

build a tool for users and operators of the system to interact with the dataset efficiently

stay true to OK Logs operational simplicity

encode/enforce best-practices and workflows for querying

Non-Goals

blindly copying other log tooling

arguing for benefits or drawbacks of contemporary frontend technologies

How to look at it?

Given you have access to this code branch, start up any of the stores (oklog store, oklog ingeststore) and point your browser to /ui/.

Improvements

Achievable changes in this change-set.

[x] simplify range controls

[x] remove delay on stats query and fire on initialisation

[x] support regex query

[x] clarify planning output

[ ] responsive ui design (media queries galore)

[x] remove ULID column

[x] move towards chunked consumption (possibly infinite scrolling)

[ ] provide separate ui cmd

[ ] encode query in URL for sharing
error when querying store
I have a 6 peer cluster setup. Three ingestors and three store nodes. I started piling in a lots of logs for load testing purposes. After 10 minutes I queried:

$ oklog query -store log-store-1 -from 1h -q "JID-b6179b2707845b363de309af" results all good (about 5 lines of text)

After 15 minutes:

$ oklog query -store log-store-1 -from 1h -q "JID-b6179b2707845b363de309af" results all good (same 5 lines of text)

After 20 minutes:

$ oklog query -store log-store-1 -from 1h -q "JID-b6179b2707845b363de309af"

Nothing.

In the console for the store node I see:

ts=2017-04-05T23:06:15.888364628Z level=error during=query_gather status_code=500 err="open /data/logs/01BD05E0P7H7B1WGY576CMHMSX-01BD05GNK917D75W1SNMXV1TMK.flushed: no such file or directory"

I can also query log-store-2 and -3 and I get the same result. In log-store-1 I also sometimes get:

ts=2017-04-05T23:06:01.113046951Z level=error during=query_gather err="Get http://10.240.0.32:7650/store/_query?from=2017-04-05T22%3A05%3A56Z&to=2017-04-05T23%3A05%3A56Z&q=JID-b6179b2707845b363de309af: net/http: timeout awaiting response headers"

where 32 is store-2.

ts=2017-04-05T23:06:01.113173666Z level=error during=query_gather err="Get http://10.240.0.40:7650/store/_query?from=2017-04-05T22%3A05%3A56Z&to=2017-04-05T23%3A05%3A56Z&q=JID-b6179b2707845b363de309af: net/http: timeout awaiting response headers"

And where 40 is store-3.

ingeststore non functional

Following the directions in the README I've setup two hosts to test ingeststore. After starting the hosts I get:

tthew@log-store-1:~$ oklog ingeststore -cluster log-store-1 -peer log-store-1 -peer log-store-2
ts=2017-03-30T17:36:20.410582124Z level=info cluster=log-store-1:7659
ts=2017-03-30T17:36:20.410667763Z level=info fast=tcp://0.0.0.0:7651
ts=2017-03-30T17:36:20.410689283Z level=info durable=tcp://0.0.0.0:7652
ts=2017-03-30T17:36:20.410707298Z level=info bulk=tcp://0.0.0.0:7653
ts=2017-03-30T17:36:20.410733243Z level=info API=tcp://0.0.0.0:7650
ts=2017-03-30T17:36:20.410890213Z level=info ingest_path=data/ingest
ts=2017-03-30T17:36:20.410950353Z level=info store_path=data/store
ts=2017-03-30T17:36:20.421332098Z level=debug component=cluster Join=1
ts=2017-03-30T17:36:20.521814528Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:21.522022322Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:22.522240947Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:23.621790312Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:24.621966192Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:25.42175071Z level=warn component=cluster NumMembers=1
ts=2017-03-30T17:36:25.622150195Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:26.721837292Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:27.722031419Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:28.722254832Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:29.821805223Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"

....

matthew@log-store-2:~$ oklog ingeststore -cluster log-store-2 -peer log-store-1 -peer log-store-2
ts=2017-03-30T17:36:34.307731343Z level=info cluster=log-store-2:7659
ts=2017-03-30T17:36:34.307863365Z level=info fast=tcp://0.0.0.0:7651
ts=2017-03-30T17:36:34.307892432Z level=info durable=tcp://0.0.0.0:7652
ts=2017-03-30T17:36:34.307917061Z level=info bulk=tcp://0.0.0.0:7653
ts=2017-03-30T17:36:34.307940364Z level=info API=tcp://0.0.0.0:7650
ts=2017-03-30T17:36:34.308098722Z level=info ingest_path=data/ingest
ts=2017-03-30T17:36:34.308193966Z level=info store_path=data/store
ts=2017-03-30T17:36:34.319719011Z level=debug component=cluster Join=2
ts=2017-03-30T17:36:41.420291562Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
ts=2017-03-30T17:36:42.420504608Z level=warn component=Consumer state=gather replication_factor=2 available_peers=1 err="replication currently impossible"
^Cts=2017-03-30T17:36:43.420713048Z level=debug component=Compacter shutdown_took=7.052µs
received signal interrupt

panic in github.com/djherbis/nio

Was doing a simple test as described in the quickstart. When querying, I got this panic from the ingeststore:

% ~/Downloads/oklog-0.1.0-darwin-amd64 ingeststore -store.segment-replication-factor 1
ts=2017-01-17T14:20:54Z level=info cluster=0.0.0.0:7659
ts=2017-01-17T14:20:54Z level=info fast=tcp://0.0.0.0:7651
ts=2017-01-17T14:20:54Z level=info durable=tcp://0.0.0.0:7652
ts=2017-01-17T14:20:54Z level=info bulk=tcp://0.0.0.0:7653
ts=2017-01-17T14:20:54Z level=info API=tcp://0.0.0.0:7650
ts=2017-01-17T14:20:54Z level=info ingest_path=data/ingest
ts=2017-01-17T14:20:54Z level=info store_path=data/store
ts=2017-01-17T14:20:54Z level=debug component=cluster Join=0
panic: runtime error: slice bounds out of range

goroutine 1858 [running]:
github.com/djherbis/nio.(*PipeWriter).Write(0xc42000e050, 0xc4205ec99c, 0x1e, 0x664, 0x0, 0x0, 0x0)
	/Users/peter/src/github.com/djherbis/nio/sync.go:135 +0x30a
github.com/oklog/oklog/pkg/store.newConcurrentFilteringReadCloser.func1(0x2693d80, 0xc421f83620, 0xc42000e050, 0xc4200148c0)
	/Users/peter/src/github.com/oklog/oklog/pkg/store/query.go:265 +0x198
created by github.com/oklog/oklog/pkg/store.newConcurrentFilteringReadCloser
	/Users/peter/src/github.com/oklog/oklog/pkg/store/query.go:276 +0x210

Was running ingeststore like so:

% ~/Downloads/oklog-0.1.0-darwin-amd64 ingeststore -store.segment-replication-factor 1

a producer like so:

% while true; do echo hi; done | ~/Downloads/oklog-0.1.0-darwin-amd64 forward localhost

and this query:

% ~/Downloads/oklog-0.1.0-darwin-amd64 query -from 1m -v -q hi
-from 2017-01-17T10:25:07-04:00 -to 2017-01-17T10:26:07-04:00
Get http://localhost:7650/store/query?from=2017-01-17T10%3A25%3A07-04%3A00&to=2017-01-17T10%3A26%3A07-04%3A00&q=hi: EOF

Will try and dig in but perhaps you'll spot the problem sooner.

How to use with a service like Heroku?

Hey there, I'd like to use OKLog, but I'm struggling figuring out a good way to run it with my heroku app. The "best" thing I've come up with is:

Copy cmd/oklog into my app's directory and let heroku compile it and make it available (heroku automatically compiles and makes things in cmd/ available to run). I could then pipe my app into oklog in my Procfile.

The only downside to this is it seems that my STDOUT will be swallowed by oklog and I'll lose heroku logs.

Is there a better way?
store: implement streaming queries
There are lots of use cases that are well-served by streaming queries. That is, set up a query that delivers results as they arrive at each store node. At a first thought this would look like

User makes a stream-type query to any store node

The store node broadcasts that query to all store nodes

Each store node registers the query in some kind of table as a websocket

When new segments are replicated, records are matched against each registered streaming query

Matching records are written to the websocket

The originating store node uses some kind of ringbuffer to deduplicate records over a time window

Every effort should be made to keep the websocket connections alive: reconnect logic, etc.
Log records can be stored in wrong order
Hello,

I've found that log records could come in wrong order sometimes.

Suppose we have one ingestor and 2 forwarders (A and B), store node is offline at the beginning. If for example, ingestor will save two segments, one - from A in time range 2-5 and second - from B in time range 3-6. Than if we start store, it will start fetching segments and appending them to "active" (Consumer.active in the code). This buffer will have records in time range 2-5-3-6 and these records will never be reordered if I understood code correctly. So as a result we'll see query output like:

default aaaaaaaaa 2018-06-13T18:07:11+03:00 foo 000000197 G default aaaaaaaaa 2018-06-13T18:07:11+03:00 foo 000000198 J default aaaaaaaaa 2018-06-13T18:07:11+03:00 foo 000000199 F default bbbbbbb 2018-06-13T18:07:09+03:00 foo 000000150 T default bbbbbbb 2018-06-13T18:07:09+03:00 foo 000000151 N default bbbbbbb 2018-06-13T18:07:09+03:00 foo 000000152 S

I suppose this behaviour in not expected, so maybe it need to be fixed. I would be very grateful to someone who'll clarify it.
integration: Kubernetes

We should make it as easy as possible to hook up OK Log to an existing Kubernetes cluster. At first glance, this involves some configuration or manifest files to install forwarders at the appropriate place/s, and an optional set of manifest files to actually deploy an OK Log installation into the cluster. (It probably makes sense to host that off-cluster for most people, but an all-in solution will be nice to have.)
Basic setup for k8s

This is an example of how to get oklog up and going for an evaluation in k8s, I found that trying to use the helm chart to evaluate was not reliable enough as that helm chart (https://github.com/oklog/oklog/pull/55) is still a bit off from being dynamic enough in k8s.
Buffered forwarder for #15
@peterbourgon please see this experimental implementation for #15 , although it's a bit racey for the time being. Please don't consider this complete yet.

It basically works - when I kill my ingeststore the forwarder buffers some messages, and then after restarting the ingeststore it reconnects & forwards the buffer :+1: ... but sometimes a couple of messages get sent twice - it needs work still.

I've been testing it by piping date once per second onto oklog forward -buf -bufsize=5 localhost, then repeatedly querying oklog query -from 30s in another window, and then killing/restarting my ingeststore for 5-15 seconds. It's pretty easy to see what's going on.

For the buffer I tried using container/ring from the standard library. I added a mutex and a couple of other fields to maintain state. Seems OKish, but not very simple really. Any advice on how you envisioned this?

I chose messages as the unit for buffering, rather than bytes. I figured it fits the forwarding code reasonably well.

It's late now so I just thought I'd elicit some feedback for now, as you may have had something very different in mind. Cheers
Syslog integration

Is there any pattern for integrating syslog streams into this? If not would you be interested in a patch that accepted the syslog protocol ? This seems to be a standard way to handle containers
Web UI Improvements
I have a couple of suggestions for improvements that can be made to the current OkLog Web UI:

Entries are shown in time ascending order. Having an option to choose descending might be helpful.

Allowing the UI to specify to return at most N results would also be useful.

Timestamps are not displayed per log item. These are collected as part of the ULID, but displaying these in a human-readable format may help.

The bottom log line is covered by the floating debug footer, I believe this is a bug.

Editing the start time, query mode (plain/regex), and streaming settings result in "Your query hasn't been planned yet." This means someone needs to edit the query again to see an estimate of data returned.
Make log consume by store little more aggressive

Loop log segments gathering by store/igeststore until configurable timeout or no more data. This change allow to catch quickly log entries burst on ingest nodes.
How oklog stores data?

How oklog store the data . is it store data in json format or in text format .

on request to localhost:port/store/stream i am getting bunch of data i am not understanding how/what it is displaying..
What's the best way to test the cluster in large installation is working

Hi, I have the following large installation (2 ingests and 3 stores) started as follows:

Commands: oklog-0.3.2-linux-amd64 ingest -cluster 172.27.47.21 -peer 172.27.47.21 -peer 172.27.47.22 -peer 172.27.47.23 -peer 172.27.47.24 -peer 172.27.47.25

oklog-0.3.2-linux-amd64 ingest -cluster 172.27.47.22 -peer 172.27.47.21 -peer 172.27.47.22 -peer 172.27.47.23 -peer 172.27.47.24 -peer 172.27.47.25

oklog-0.3.2-linux-amd64 store -cluster 172.27.47.23 -peer 172.27.47.21 -peer 172.27.47.22 -peer 172.27.47.23 -peer 172.27.47.24 -peer 172.27.47.25

oklog-0.3.2-linux-amd64 store -cluster 172.27.47.24 -peer 172.27.47.21 -peer 172.27.47.22 -peer 172.27.47.23 -peer 172.27.47.24 -peer 172.27.47.25

oklog-0.3.2-linux-amd64 store -cluster 172.27.47.25 -peer 172.27.47.21 -peer 172.27.47.22 -peer 172.27.47.23 -peer 172.27.47.24 -peer 172.27.47.25

Output: First ingest keeps showing the following output until the second ingest joins the cluster: level=warn component=cluster NumMembers=1 msg="I appear to be alone in the cluster"

Second ingest shows the following output: level=info ingest_path=data/ingest

First store keeps showing the following output until the second store joins the cluster: ts=2018-08-02T13:06:04.503169534Z level=warn component=Consumer op=gather warning="replication factor 2, available peers 1: replication currently impossible"

Second and third stores show the following output (respectively): ts=2018-08-02T13:07:44.610961417Z level=info StoreLog=data/store ts=2018-08-02T13:08:08.235069864Z level=info StoreLog=data/store

Testing cluster: When I use testsvc to test the service I get output like the following - although I haven’t hooked the cluster with any application or source of logs -

2018-08-02T14:26:32+01:00 foo 000000100 SBE2 0T30 0Y43 39E4 4Q7B TK7N VVTR K2HG VSKS Z8P6 R3A8 D49G FGWD 6QH2 A9Y7 41JJ 708W 6TGM 5RZQ AG4J ZGDJ JQVR PZVN ZZ8W A6WF ZTK0 0MBT WPH6 E5DH 3APC 58K8 KJMM 25GX Y440 HCWR SJ4D M8BG S21B 2B1M 1NAB XM1J 4D7Z 0QZZ 220Z QM5E 2BFN B216 4HM7 CMQN AXVT HEJ0 XGHV 17S8 WQXS M23N 55F6 RZHT XG9D 72AJ 9DGW S8NS 6RSV T2A1 FDJ2 771N 6HMQ WFQN 1KY3 3TD6 0DRW 2WWJ 1TGF CQ6W 8EMB B030 2TG2 K3Z2 Z9HQ DE1P HPPK BCZV SBBH 2RKD 6S16 DR8J P7DS 26YB Y4KC 4X8T 6E18 DGHE 8CDA 4KRG PA8W N 2018-08-02T14:26:32+01:00 foo 000000101 01SF32MTSYYSDGM 8N089YN3DFTGK0W RNV9BJ1Q03130GE FDEH838MRXE1PY4 DJKRTK0K1YD0BW3 TCVTZP4Q9SHEPRS ZDBE2N09XWZ3CDG JT1KJ9F8EPKJSYX W8K2EGX034KPZS9 8RS8QZ9GPVAMN24 MWM990KTNSSJEH5 T6VWGA30Z4SWCF6 DT8EKVC1E105BZX G1SG4TYCDKK4C1A GA545A0EK52MMSK THZXVHVV9DS8ER7 A1MD9Q4B93ED3X3 895GSW94RDQSCH0 Z05D63ZJKG8ZPFW RTKGT2VV5PC9NM1 85MQC6SG408DPKT V94F94H7B4YYTX0 GJ4E7YJPAJSG3TJ F6T6H3D79QBYVQZ MQY4EXTYDSW2AKC QY9NWDPYB4A30ZF DZF0NT2F7W056KN PDFPV5RDBCHW0V9 XFSBHZ65V497TQ2 ZD62Z4R 2018-08-02T14:26:33+01:00 foo 000000102 8XN18 E1HZC 6TS9H BPEP3 ARMCE CV25Y 0Q69D PFDHQ 2CDC6 Z708X Z3BFN EF20N Y2VP5 PAANT BW4WN EA8HJ 2SFX7 9BZ1V FD4VC 1A4ZF 1PP3W PBZJC 2B11Z GD0QK 419YZ Y0X2T BBJ2B ACX9Y G7X1W 4QN1Q CXNP9 JQAKB RETB7 0C6DM DBXSG 9MAVT SEFJN 1286S 5BY06 JP8SC XSAQD TYT3B T5FK7 JQSXX FNE53 DN71G 1R8J3 QSBTN HM7A2 PKAMV C6J5H CJKYT AHQK1 3SMJJ YTVT6 4AP5N 3RB3P 1WKS6 CP7FC BB7P4 VZ7QQ FT3JS 27T71 90F97 K3JSW XX6KA 8WJEC YQ5T4 719J9 Z5425 NK04S M966W 1VCM7 27TMX 8BHB6 090VN 0108D NNR48 JPH 2018-08-02T14:26:33+01:00 foo 000000103 J5S8CS5GJJNMQRK 41M7KPB5S8ZY686 SYT1ZE40XSETGN6 G608YKWF05C4EAK GBEVQEJWE6M3MYZ W1AGX1QE0NT47V4 91VJQPRRVG61AJM MTRTKJSPY98HMW2 QNHCEZ9FJBT93NV GEX84DTXNFHJW7T 7HRJ9MT6NK5AKQQ PN9E9QDN002M5TQ 4Z52WC0JMB491DV ZPE3RCSKK0XKTC8 0BPMVC63K9J8ZGS YEKGB1P84DYJM8W XHJ8TD31MRS339D QC2N7285DS17SP1 RHNH7NZHGGV7C1G VN9KBXR5S4MQF8A 6929BYWCFE5W8GH 7S26H0TAP8H15XJ F17MCQTQDJSG1ZF 71A6E7SVJZB2JZV XWWPYH809F2FWZG 4XVM7Y88Z4KQSH6 YRWG31Y41BRS50N 1N12451GAM8CXCH QA62Q85H7J0JHN0 2WT4BGG 2018-08-02T14:26:33+01:00 foo 000000104 S6YJ4SDHNDNMCCZ3 MX86GFKJCC921JPA B710CFZVY5HE2HHT RZP0AMF0A7AJDDFK S9YVY0SHS0HXY56H VAXHE1DHZFNBQ7NY 5ZAC1E70MKPQAM39 PKTVTENGJB5XY91P F674EXK1464CE8TV Q833271WAYP6PX1T 1RRKZXMZSCZ1MH5V R2TK9E35NP7B7VW3 4WQF328Y7GSSYZEH TQ48GYQG7SQK3774 MCCZRSANK695A020 7C08P0VY2CVJ2719 14Z9RAC5736CRZB9 KCRQ822P39ARB2YB HM6RSVAXP17SHYTV Q65QGX2W2SD37MT6 MYN96N87Q1XK0P5X 9SA0G4VGNS5MTA6R 57E6JVWC7EKERHV8 Z2K5RAEJP8F2YE7C WYKAMRW91YDW52QZ 93F320FVWJA5061X Z1NTG5W9YK33ZR5T H3GA9G5AYBWF

So my question is: what is the best way to test the cluster is working before forwarding any logs to it (using output or a test or member list command) since the testsvc output doesn't seem to be accurate in my opinion (unless there's something wrong I am doing).

Thanks, Hoch
Local node unable to connect to self after netowrk change

Observed the following error:

ts=2018-07-29T14:09:48.652160779Z level=warn component=Consumer op=gather warning="Get http://192.168.0.50:7650/ingest/next: dial tcp 192.168.0.50:7650: connect: no route to host" msg="ingester 192.168.0.50:7650, during /next: fatal error"

This happened during local development and with the following start command:

oklog ingeststore -ui.local -store.segment-replication-factor 1

My assumption is this is to local ip addresses changing when switching networks on travels. Just wanted to leave it here for a second look.
Add ability to query by topic

I believe one of most popular query filters would be "get log records from a particular service and filter by some string", services can be identified by topic in oklog. So you need to be able to filter records by topic somehow.

You can do it by providing regexp, like "^.*?", but this regexp works very slowly. So this patch adds additional filter which works much faster.

I've generated log of size 1.5Gb and 20.000.000 records. And filter by regexp takes 18 seconds while separate filter by topic and plain string takes only 2.5 seconds.

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed-Services This project is essentially a result of my attempt to under

Jun 1, 2022

Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

Dec 30, 2022

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

glow Purpose Glow is providing a library to easily compute in parallel threads or distributed to clusters of machines. This is written in pure Go. I a

Dec 30, 2022

SeaweedFS is a distributed storage system for blobs, objects, files, and data lake, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV.

SeaweedFS Sponsor SeaweedFS via Patreon SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible ent

Jan 5, 2023

Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Dec 24, 2019

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Gleam Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize. Gleam is built

Jan 1, 2023

A distributed and coördination-free log management system

OK Log is archived

Is OK Log for me?

Getting OK Log

Quickstart

Deploying

Small installations

Large installations

Forwarding

Querying

UI

Further reading

Integrations

Unofficial Docker images

Translation

Owner

OK Log

Comments

Web UI

Goals

Non-Goals

How to look at it?

Improvements

error when querying store

ingeststore non functional

panic in github.com/djherbis/nio

How to use with a service like Heroku?

store: implement streaming queries

Log records can be stored in wrong order

integration: Kubernetes

Basic setup for k8s

Buffered forwarder for #15

Syslog integration

Web UI Improvements

Make log consume by store little more aggressive

How oklog stores data?

What's the best way to test the cluster in large installation is working

Local node unable to connect to self after netowrk change

Add ability to query by topic

Related tags

Distributed-Services - Distributed Systems with Golang to consequently build a fully-fletched distributed service

Distributed reliable key-value store for the most critical data of a distributed system

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

💡 A Distributed and High-Performance Monitoring System. The next generation of Open-Falcon

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Distributed-system - Practicing and learning the foundations of DS with Go

BlobStore is a highly reliable,highly available and ultra-large scale distributed storage system

A distributed system for embedding-based retrieval

a dynamic configuration framework used in distributed system

Verifiable credential system on Cosmos with IBC for Distributed Identities

A distributed MySQL binlog storage system built on Raft

A distributed key-value storage system developed by Alibaba Group

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

implementation of some distributed system techniques

An distributed system developed by gin

The repository aims to share some useful about distributed system