Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store.

Last update: Jan 7, 2023

Comments: 15

Akutan

There's a blog post that's a good introduction to Akutan.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

How to model your data as a knowledge graph and how to query it will feel a bit different for people coming from SQL, NoSQL, and property graph stores. In a knowledge graph, data is represented as a single table of facts, where each fact has a subject, predicate, and object. This representation enables the store to sift through the data for complex queries and to apply inference rules that raise the level of abstraction. Here's an example of a tiny graph:

subject	predicate	object

To learn about how to represent and query data in Akutan, see docs/query.md.

Akutan is designed to store large graphs that cannot fit on a single server. It's scalable in how much data it can store and the rate of queries it can execute. However, Akutan serializes all changes to the graph through a central log, which fundamentally limits the total rate of change. The rate of change won't improve with a larger number of servers, but a typical deployment should be able to handle tens of thousands of changes per second. In exchange for this limitation, Akutan's architecture is a relatively simple one that enables many features. For example, Akutan supports transactional updates and historical global snapshots. We believe this trade-off is suitable for most knowledge graph use cases, which accumulate large amounts of data but do so at a modest pace. To learn more about Akutan's architecture and this trade-off, see docs/central_log_arch.md.

Akutan isn't ready for production-critical deployments, but it's useful today for some use cases. We've run a 20-server deployment of Akutan for development purposes and off-line use cases for about a year, which we've most commonly loaded with a dataset of about 2.5 billion facts. We believe Akutan's current capabilities exceed this capacity and scale; we haven't yet pushed Akutan to its limits. The project has a good architectural foundation on which additional features can be built and higher performance could be achieved.

Akutan needs more love before it can be used for production-critical deployments. Much of Akutan's code consists of high-quality, documented, unit-tested modules, but some areas of the code base are inherited from Akutan's earlier prototype days and still need attention. In other places, some functionality is lacking before Akutan could be used as a critical production data store, including deletion of facts, backup/restore, and automated cluster management. We have filed GitHub issues for these and a few other things. There are also areas where Akutan could be improved that wouldn't necessarily block production usage. For example, Akutan's query language is not quite compatible with Sparql, and its inference engine is limited.

So, Akutan has a nice foundation and may be useful to some people, but it also needs additional love. If that's not for you, here are a few alternative open-source knowledge and property graph stores that you may want to consider (we have no affiliation with these projects):

Blazegraph: an RDF store. Supports several query languages, including SPARQL and Gremlin. Disk-based, single-master, scales out for reads only. Seems unmaintained. Powers https://query.wikidata.org/.
Dgraph: a triple-oriented property graph store. GraphQL-like query language, no support for SPARQL. Disk-based, scales out.
Neo4j: a property graph store. Cypher query language, no support for SPARQL. Single-master, scales out for reads only.
See also Wikipedia's Comparison of Triplestores page.

The remainder of this README describes how to get Akutan up and running. Several documents under the docs/ directory describe aspects of Akutan in more detail; see docs/README.md for an overview.

Installing dependencies and building Akutan

Akutan has the following system dependencies:

It's written in Go. You'll need v1.11.5 or newer.
Akutan uses Protocol Buffers extensively to encode messages for gRPC, the log of data changes, and storage on disk. You'll need protobuf version 3. We reccomend 3.5.2 or later. Note that 3.0.x is the default in many Linux distributions, but doesn't work with the Akutan build.
Akutan's Disk Views store their facts in RocksDB.

On Mac OS X, these can all be installed via Homebrew:

$ brew install golang protobuf rocksdb zstd

On Ubuntu, refer to the files within the docker/ directory for package names to use with apt-get.

After cloning the Akutan repository, pull down several Go libraries and additional Go tools:

$ make get

Finally, build the project:

$ make build

Running Akutan locally

The fastest way to run Akutan locally is to launch the in-memory log store:

$ bin/plank

Then open another terminal and run:

$ make run

This will bring up several Akutan servers locally. It starts an API server that listens on localhost for gRPC requests on port 9987 and for HTTP requests on port 9988, such as http://localhost:9988/stats.txt.

The easiest way to interact with the API server is using bin/akutan-client. See docs/query.md for examples. The API server exposes the FactStore gRPC service defined in proto/api/akutan_api.proto.

Deployment concerns

The log

Earlier, we used bin/plank as a log store, but this is unsuitable for real usage! Plank is in-memory only, isn't replicated, and by default, it only keeps 1000 entries at a time. It's only meant for development.

Akutan also supports using Apache Kafka as its log store. This is recommended over Plank for any deployment. To use Kafka, follow the Kafka quick start guide to install Kafka, start ZooKeeper, and start Kafka. Then create a topic called "akutan" (not "test" as in the Kafka guide) with partitions set to 1. You'll want to configure Kafka to synchronously write entries to disk.

To use Kafka with Akutan, set the akutanLog's type to kafka in your Akutan configuration (default: local/config.json), and update the locator's addresses accordingly (Kafka uses port 9092 by default). You'll need to clear out Akutan's Disk Views' data before restarting the cluster. The Disk Views by default store their data in $TMPDIR/rocksdb-akutan-diskview-{space}-{partition} so you can delete them all with rm -rf $TMPDIR/rocksdb-akutan-diskview*

Docker and Kubernetes

This repository includes support for running Akutan inside Docker and Minikube. These environments can be tedious for development purposes, but they're useful as a step towards a modern and robust production deployment.

See cluster/k8s/Minikube.md file for the steps to build and deploy Akutan services in Minikube. It also includes the steps to build the Docker images.

Distributed tracing

Akutan generates distributed OpenTracing traces for use with Jaeger. To try it, follow the Jaeger Getting Started Guide for running the all-in-one Docker image. The default make run is configured to send traces there, which you can query at http://localhost:16686. The Minikube cluster also includes a Jaeger all-in-one instance.

Development

VS Code

You can use whichever editor you'd like, but this repository contains some configuration for VS Code. We suggest the following extensions:

Override the default settings in .vscode/settings.json with ./vscode-settings.json5.

Test targets

The Makefile contains various targets related to running tests:

Target	Description
`make test`	run all the akutan unit tests
`make cover`	run all the akutan unit tests and open the web-based coverage viewer
`make lint`	run basic code linting
`make vet`	run all static analysis tests including linting and formatting

License Information

Primary authors: Simon Fell, Diego Ongaro, Raymond Kroeker, Sathish Kandasamy

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note the project was renamed to Akutan in July 2019.

Owner

eBay

https://ebay.github.io/

https://github.com/eBay/beam

Comments

run "make run" error

run "bin/plank" shows “plank server started at localhost:20011”。However，run “make run” error： 22:23:03 txview-00 | WARN[2019-05-08 14:23:03.055084 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(*Log).Read() Retrying RPC=Read error="rpc error: code = Canceled desc = grpc: the client connection is closing" server="tcp://node24:20011" 22:23:03 txview-00 | INFO[2019-05-08 14:23:03.055528 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:455 github.com/ebay/beam/blog/logspecclient.(*Log).connectAnyLocked.func1() Logspec client connecting to server="tcp://node24:20011" 22:23:10 hashsp-00 | WARN[2019-05-08 14:23:10.884702 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(*Log).Read() Retrying RPC=Read error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 218.93.250.18:20011: i/o timeout"" server="tcp://node24:20011" 22:23:10 hashsp-00 | INFO[2019-05-08 14:23:10.885279 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:455 github.com/ebay/beam/blog/logspecclient.(*Log).connectAnyLocked.func1() Logspec client connecting to server="tcp://node24:20011" 22:23:10 hashsp-01 | WARN[2019-05-08 14:23:10.907590 UTC]src/github.com/ebay/beam/blog/logspecclient/client.go:222 github.com/ebay/beam/blog/logspecclient.(*Log).Read() Retrying RPC=Read error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 218.93.250.18:20011: i/o timeout"" server="tcp://node24:20011"

How should I do

Can't see how to resolve views.proto for Ubuntu 18.04

Very excited to try this....

However, I can't seem to get past some of the dependencies for Ubuntu 18.04

not sure where to resolve "views.proto"

go install vendor/golang.org/x/tools/cmd/goimports
go install vendor/honnef.co/go/tools/cmd/staticcheck
PATH=/home/fils/src/git/beam/bin:/home/fils/.cargo/bin:/home/fils/bin:/home/fils/.cargo/bin:/home/fils/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin:/usr/local/go/bin:/usr/local/go/bin:/home/fils/src/go/bin:/home/fils/.cargo/bin:/home/fils/src/flutter/bin:/home/fils/.local/bin:/usr/local/android-studio/bin:/usr/lib/dart/bin:/home/fils/.pub-cache/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin:/usr/local/go/bin:/usr/local/go/bin:/home/fils/src/go/bin:/home/fils/.cargo/bin:/home/fils/src/flutter/bin:/home/fils/.local/bin:/usr/local/android-studio/bin:/usr/lib/dart/bin:/home/fils/.pub-cache/bin protoc --gogoslick_out=plugins=grpc:src/github.com/ebay/beam/rpc -Isrc:src/vendor:src/github.com/ebay/beam/rpc views.proto
views.proto: No such file or directory
Makefile:54: recipe for target 'src/github.com/ebay/beam/rpc/views.pb.go' failed
make: *** [src/github.com/ebay/beam/rpc/views.pb.go] Error 1

SPARQL 1.2 WC3 Community Group

Hi everyone,

Great to see your work with beam, looks very interesting and we are really interested in the discussions you are having about aligning with RDF/SPARQL. I'm referring to this document for example. I think you might be a valuable contributor to a W3C Community Group we just recently launched after a meetup in Berlin in February. It's about defining what could become SPARQL 1.2 and later maybe 2.0.

This is a W3C Community Group, which is much less formal and more open than a formal W3C group to release a standard. The ones driving it right now are all people that work with SPARQL for many years and some of them implement it as well, so it is very hands-on.

We are interested in making SPARQL easier to use and add stuff many of us are missing right now. Having people like you involved sounds like a great extension to a possible new standard in the future.

Feel free to close this issue immediately, I just wanted to make sure that you are aware of what is going on there.

You can find a list of collected ideas so far in the GitHub repository.
Fix ordering of different length literal strings. Fixes #8

This adds a 0x00 between the string and the trailing languageID for String KGObjects. This ensures that "Bob" and "Bob's House" are ordered correctly. Previously this would be wrong because "Bob" would be "Bob00000ID" and so would be sorted based on the first '0' after Bob.
Hard to use beam, need more details

i must say it's hard to use it, more documents about beam are strongly recommended for users.

for this point, dgraph is much better. Cause i know how to deploy my own distributed dgraph system, althought it's not faster to load data through grpc interface, but i think beam has the same problems.

what's more, i can not even find a way to load large rdf files, that's too bad.
Fix missing keys printing in db-scan. fixes #23

key.FactKey changed from *keys.FactKey implementing keys.Spec to keys.FactKey implementing keys.Spec, and this is fallout from that. Some quality time with grep didn't show any other places with this issue. This also changes the -bytes output, as the keys are changing to be binary.
Better encoding for KGObject / Fact. Fixes #7

This ditches the 19 char ascii encoding used in KGObject & Keys for uint64s, and switches to a 8 byte bigendian encoding instead. For the SPO/POS keys, unneeded separated are removed, and the type prefix shortened. This cuts the size of a Fact with a KGObject of type Int from 114 bytes to 49 bytes.
Remove deprecated InsertFact from API

Towards #16.

I also tried removing LookupSP and LookupPO, but the queryFacts stuff in beam-client uses LookupSP to fetch external IDs, so I backed that out.
KGObjects encoding of strings needs a null terminator

When a literal string is encoded into a KGObject value, there's a separator between the end of the string and the language ID, but as the separator is not 0x00 this throws off sorting. (for example "Bob" & "Bob's house" aren't ordered correctly). The separator should be changed to be a null instead. This separator is not used to determine the length of the string, so there's no escaping considerations to be concerned about.
Improve KGObject encoding

KGObject's encoding is inherited from the earlier prototypes, where debugging was more important than performance or space used. There are a number of fields in the key that are encoded as a 19 character ascii number, rather than as 8 byte binary value.
Add kakfa & zookeeper configs for kafka docker build

This fixes the problem with docker build failing for the docker-build-kafka target.

I've been through the minikube setup end to end with all the local image builds and was able to standup a working beam cluster at the end.
use of vendored package not allowed

Hi, while building akutan with make build facing following error.

go install github.com/ebay/akutan/... src/vendor/golang.org/x/net/http2/frame.go:17:2: use of vendored package not allowed src/vendor/google.golang.org/grpc/internal/transport/controlbuf.go:28:2: use of vendored package not allowed src/vendor/golang.org/x/net/http2/transport.go:33:2: use of vendored package not allowed /usr/local/go/src/vendor/golang.org/x/text/secure/bidirule/bidirule.go:15:2: use of vendored package not allowed /usr/local/go/src/vendor/golang.org/x/net/idna/idna10.0.0.go:27:2: use of vendored package not allowed src/vendor/golang.org/x/net/http2/frame.go:18:2: use of vendored package not allowed make: *** [Makefile:89: build] Error 1

any help appreciated.
prod deployment docs

i am still confused about how to deploy my own cluster beams, no time to think about the project structure about it. Anyone can tell me how to deploy prod beam server to use. Any Docs?

Originally posted by @shanghai-Jerry in https://github.com/eBay/beam/issues/33#issuecomment-494329554
Using beam as a library, or at least allow imporing its packages
Unless I'm missing something (a canonical package name maybe?), beam's structure makes it really really hard to go get any of its packages.

Ideally I'd be really interested in being able to use beam as a library in another service, skipping the gpc server part, clustering, and optionally even rocksdb.

But even that's not possible, being able to import and use util/grpc/client to connect to a beam server would be a big improvement to having to re-generate the grpc stuff in the client service.

Would there be any chances of getting any of these or accepting PRs for them or other ways of getting to the end goal (other than keeping our own forks)?

Move the beam packages to the top level

^ or introduce a canonical url we can import that points to the nested directory.

Get the protoc, genny, and other generated files commited.

Replace the custom dep tool with go mod (sarama needs to use the new package, and cheggaaa/pb needs to have their mod fix pr merged and the new version imported and everything else seems fine).

ps This is an amazing project and bql feels very nice and easy to use, thank you for releasing this publicly. :)
don't assume all predicates are transitive.

Currently the query engine assumes that all predicates are transitive, unless it knows the target object is a literal. This is a pretty expensive default. It would be better to only treat predicates explicitly declared as transitive as transitive. the owl:TransitiveProperty predicate seems like the best thing to use to indicate that. The query rewriter could be updated to fetch this property for all the predicates used in the query, and then pass this info along with the rest of the query structure.

Related tags

Cloud-Native distributed storage built on and for Kubernetes

Longhorn Build Status Engine: Manager: Instance Manager: Share Manager: Backing Image Manager: UI: Test: Release Status Release Version Type 1.1 1.1.2

Jan 1, 2023

triple-go (dubbo3) network protocol package

Triple-go is a golang network package that based on http2, used by Dubbo-go 3.0.

Dec 5, 2022

/ˈdʏf/ - diff tool for YAML files, and sometimes JSON

dyff is inspired by the way the old BOSH v1 deployment output reported changes from one version to another by only showing the parts of a YAML file that change.

Dec 29, 2022

Generate random, pronounceable, sometimes even memorable, "superhero like" codenames - just like Docker does with container names.

Codename an RFC1178 implementation to generate pronounceable, sometimes even memorable, "superheroe like" codenames, consisting of a random combinatio

Dec 11, 2022

JSONL graph tools - Graph is represented as JSONL of nodes and edges.

Sep 27, 2022

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes

DGL Operator The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes

Dec 19, 2022

Nut is a tool to manage versioned Go source code packages, called "nuts".

nut Nut is a tool to manage versioned Go source code packages, called "nuts". gonuts.io – central repository (source code) Documents Stable API Mailin

Jul 15, 2021

"there" also called "GoThere" aims to be a simple Go Library to reduce redundant code for REST APIs.

there "there" also called "GoThere" aims to be a simple Go Library to reduce redundant code for REST APIs. Despite the existence of the other librarie

Dec 25, 2022

A go project generator, which aims to simplify building and releasing go projects by storing all project configuration in a single file called gojen.json, and creates appropriate workflow/git files using that config.

gojen Define your go project's configuration using a json config. This config can be used to generate a new go project for you, and can also create co

Mar 8, 2022

An interpreter written in go for a brainfuck-based language called €*

eurostar-go-interpreter This is an interpreter written in go for a brainfuck-bas

Sep 14, 2022

Gostall - Run go install ./cmd/server and not have the binary install in your GOBIN be called server?

GOSTALL Ever wanted to run go install ./cmd/server and not have the binary insta

Jan 7, 2022

My solution for a Go exercises series called gophercises - exercise #1

Jan 11, 2022

You drive the heart of a little creature called: Musshi.

Musshi's Heart You drive the heart oh a little creature called: Musshi. Developed for the 50th Ludum Dare Game Jam: Delay The Inevitable Goal Every Mu

Dec 20, 2022

A personal knowledge management and sharing system for VSCode

Foam ?? This is an early stage project under rapid development. For updates join the Foam community Discord! ?? Foam is a personal knowledge managemen

Jan 9, 2023

gnark is a fast, open-source library for zero-knowledge proof protocols written in Go

gnark gnark is a framework to execute (and verify) algorithms in zero-knowledge. It offers a high-level API to easily design circuits and fast impleme

Jan 1, 2023

gnark is a fast, open-source library for zero-knowledge proof protocols written in Go

gnark gnark is a framework to execute (and verify) algorithms in zero-knowledge. It offers a high-level API to easily design circuits and fast impleme

Jun 1, 2021

Zero-knowledge-proof verification bridge

Submit Bug Rosefintech-Rosl2-Bridge Zero knowledge proof verification bridge Table of Contents Security Background Install Community Contact License S

Jun 4, 2022

A repository for showcasing my knowledge of the Google Go (2009) programming language, and continuing to learn the language.

Learning Google Golang (programming language) Not to be confused with the Go! programming language by Francis McCabe I don't know very much about the

Nov 6, 2022

A repository for showcasing my knowledge of the Go! (2003) programming language, and continuing to learn the language.

Learning Go! (programming language) Not to be confused with Google Golang (2009) I don't know too much about the Go! programming language, but I know

Oct 22, 2022

Public key derivator for ECDSA (without knowledge of the private key)

A proof of concept of a public key derivation for ECDSA (without knowledge of the private key) It is a demonstration of how to implement a simple key

Nov 9, 2022

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store.

Akutan

Installing dependencies and building Akutan

Running Akutan locally

Deployment concerns

The log

Docker and Kubernetes

Distributed tracing

Development

VS Code

Test targets

License Information

Owner

eBay

Comments

run "make run" error

Can't see how to resolve views.proto for Ubuntu 18.04

SPARQL 1.2 WC3 Community Group

Fix ordering of different length literal strings. Fixes #8

Hard to use beam, need more details

Fix missing keys printing in db-scan. fixes #23

Better encoding for KGObject / Fact. Fixes #7

Remove deprecated InsertFact from API

KGObjects encoding of strings needs a null terminator

Improve KGObject encoding

Add kakfa & zookeeper configs for kafka docker build

use of vendored package not allowed

prod deployment docs

Using beam as a library, or at least allow imporing its packages

don't assume all predicates are transitive.

Related tags

Cloud-Native distributed storage built on and for Kubernetes

triple-go (dubbo3) network protocol package

/ˈdʏf/ - diff tool for YAML files, and sometimes JSON

Generate random, pronounceable, sometimes even memorable, "superhero like" codenames - just like Docker does with container names.

JSONL graph tools - Graph is represented as JSONL of nodes and edges.

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes

Nut is a tool to manage versioned Go source code packages, called "nuts".

"there" also called "GoThere" aims to be a simple Go Library to reduce redundant code for REST APIs.

A go project generator, which aims to simplify building and releasing go projects by storing all project configuration in a single file called gojen.json, and creates appropriate workflow/git files using that config.

An interpreter written in go for a brainfuck-based language called €*

Gostall - Run go install ./cmd/server and not have the binary install in your GOBIN be called server?

My solution for a Go exercises series called gophercises - exercise #1

You drive the heart of a little creature called: Musshi.

A personal knowledge management and sharing system for VSCode

gnark is a fast, open-source library for zero-knowledge proof protocols written in Go

gnark is a fast, open-source library for zero-knowledge proof protocols written in Go

Zero-knowledge-proof verification bridge

A repository for showcasing my knowledge of the Google Go (2009) programming language, and continuing to learn the language.

A repository for showcasing my knowledge of the Go! (2003) programming language, and continuing to learn the language.

Public key derivator for ECDSA (without knowledge of the private key)