Distributed merge sort for large sets across nodes in a network

distMergeSort

An implementation of mergesort distributed across nodes used to sort large sets.

Introduction

Merge sort partitions sets so that they can be recursively sorted and then merged back to form a single sorted set. They can be either split across several threads in a node or across multiple nodes across a network.

But why not just use QuickSort?

Quicksort is indeed much faster than merge sort, but is a hassle when the sets to be sorted gets too big (>100M elements) that it does not fit in a single node's memory. The following implementation aims to tackle this by splitting these large sets and distributing them across nodes in a network. A trick here is we dont recursively split them until it reaches two elements, but split them until they reach a certain count so that they can then be sorted using quicksort and then be sent back to the sender node to be merged.

Example

$ curl -X POST localhost:8001/sort -d {"values": [5,4,3,2,1]} => [1,2,3,4,5]

Steps

To provision the cluster:

$ git clone https://github.com/el10savio/distMergeSort
$ cd distMergeSort
$ make provision

This creates a 3 node sort cluster established in their own docker network.

To view the status of the cluster

$ make info

This provides information on the cluster and its associated ports to access each node. An example of the output seen in make info would be like below:

d3fd26dd4df3  sort  "/go/bin/sort"  2 hours ago  Up 2 hours  0.0.0.0:8004->8080/tcp  peer-1
8830feb6cd68  sort  "/go/bin/sort"  2 hours ago  Up 2 hours  0.0.0.0:8003->8080/tcp  peer-0

Now we can also send requests to sort values from any peer node using its port allocated.

$ curl -i -X POST localhost:<peer-port>/sort -d {"values": <values>}

In the logs for each peer docker container, we can see the logs of the peer nodes partitioning and sorting the sent list.

To tear down the cluster and remove the built docker images:

$ make clean

This is not certain to clean up all the locally created docker images at times. You can do a docker rmi to manually delete them.

Testing

To provision the cluster and run automated end to end tests you can use make e2e. This uses BATS bash testing to run curl requests to each node and asserts the output received.

$ make e2e
Running E2E Testing On Sort Cluster
bash scripts/tests.sh
Provisioning Cluster
Cluster Sanity Tests
1..2
ok 1 Check Replicas Count
ok 2 Check Replicas Are Available
Sort Tests
1..3
ok 1 Sort Empty List
ok 2 Sort Basic List
ok 3 Sort Basic List II
Large Sort Tests
1..1
ok 1 Sort Large List
Tearing Down Cluster
Similar Resources

Ipcalc-contains - Golang micro-app, which check whether an IP address belongs to a given network

Ipcalc-contains - Golang micro-app, which check whether an IP address belongs to a given network

ipcalc-contains Golang micro-app, which check whether an IP address belongs to a given network I use it as an addition to standard ipcalc binary distr

Jan 6, 2022

Implementation of external merge sort with 2-way merge.

External merge sort An implementation of external merge sort algorithm (with 2-way merge) written in Go. This particular implementation sorts strings.

Jan 16, 2022

Merge Mock - testing tool for the Ethereum Merge

MergeMock Experimental debug tooling, mocking the execution engine and consensus node for testing. work in progress Quick Start To get started, build

Oct 21, 2022

A tool to find all duplicates in large sets of text documents.

⊧ dupi Dupi is an engine for identifying and exploring duplicative text in sets of documents. Status Dupi is in alpha/early beta development stage. Pl

Dec 23, 2022

Aquatone is a tool for visual inspection of websites across a large amount of hosts and is convenient for quickly gaining an overview of HTTP-based attack surface.

Aquatone is a tool for visual inspection of websites across a large amount of hosts and is convenient for quickly gaining an overview of HTTP-based attack surface.

Jan 6, 2023

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.

Consul Website: https://www.consul.io Tutorials: HashiCorp Learn Forum: Discuss Consul is a distributed, highly available, and data center aware solut

Jan 2, 2023

Sync distributed sets using bloom filters

goSetReconciliation An implementation to sync distributed sets using bloom filters. Based on the paper "Low complexity set reconciliation using Bloom

Jan 4, 2022

go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

go-fastdfs 是一个简单的分布式文件系统(私有云存储),具有无中心、高性能,高可靠,免维护等优点,支持断点续传,分块上传,小文件合并,自动同步,自动修复。Go-fastdfs is a simple distributed file system (private cloud storage), with no center, high performance, high reliability, maintenance free and other advantages, support breakpoint continuation, block upload, small file merge, automatic synchronization, automatic repair.(similar fastdfs).

中文 English 愿景:为用户提供最简单、可靠、高效的分布式文件系统。 go-fastdfs是一个基于http协议的分布式文件系统,它基于大道至简的设计理念,一切从简设计,使得它的运维及扩展变得更加简单,它具有高性能、高可靠、无中心、免维护等优点。 大家担心的是这么简单的文件系统,靠不靠谱,可不

Jan 8, 2023

A LoRaWAN nodes' and network simulator that works with a real LoRaWAN environment (such as Chirpstack) and equipped with a web interface for real-time interaction.

A LoRaWAN nodes' and network simulator that works with a real LoRaWAN environment (such as Chirpstack) and equipped with a web interface for real-time interaction.

LWN Simulator A LoRaWAN nodes' simulator to simulate a LoRaWAN Network. Table of Contents General Info Requirements Installation General Info LWN Simu

Nov 20, 2022

🌌 A libp2p DHT crawler that gathers information about running nodes in the network.

🌌 A libp2p DHT crawler that gathers information about running nodes in the network.

A libp2p DHT crawler that gathers information about running nodes in the network. The crawler runs every 30 minutes by connecting to the standard DHT bootstrap nodes and then recursively following all entries in the k-buckets until all peers have been visited.

Dec 27, 2022

LNC is a lightning network capital management tool built for routing nodes.

LNC is a lightning network capital management tool built for routing nodes.

Dec 21, 2021

Pokt-calculator - A reward explorer for Pocket Network nodes

Pokt-calculator - A reward explorer for Pocket Network nodes

POKT Calculator A reward explorer for Pocket Network nodes. Quick start You can

Oct 23, 2022

Tpu-traffic-classifier - This small program creates ipsets and iptables rules for nodes in the Solana network

TPU traffic classifier This small program creates ipsets and iptables rules for

Nov 23, 2022

Sabakan is a versatile network boot server designed for large on-premise data centers.

Sabakan is a versatile network boot server designed for large on-premise data centers.

Sabakan is a versatile network boot server designed for large on-premise data centers. Currently, it is made only for Flatcar Container Linux.

Jan 2, 2023

BlobStore is a highly reliable,highly available and ultra-large scale distributed storage system

BlobStore Overview Documents Build BlobStore Deploy BlobStore Manage BlobStore License Overview BlobStore is a highly reliable,highly available and ul

Oct 10, 2022

Portexporter - A HTTP(S) Proxy that registers one or more Gateways across a network boundary and proxies requests to those Gateways

portexporter An HTTP(S) Proxy that registers one or more Gateways across a netwo

Mar 21, 2022

network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

Dec 18, 2022

K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

May 16, 2022
Implementation of external merge sort with 2-way merge.

External merge sort An implementation of external merge sort algorithm (with 2-way merge) written in Go. This particular implementation sorts strings.

Jan 16, 2022
Sync distributed sets using bloom filters

goSetReconciliation An implementation to sync distributed sets using bloom filters. Based on the paper "Low complexity set reconciliation using Bloom

Jan 4, 2022
Stalin sort in multiple languages!

stalin-sort Stalin sort in multiple languages, contributions are welcome! Motivation This repo is motivated by this tweet, this tweet contains a refer

Jan 14, 2022
Code in this repository solves the game Water Sort Puzzle for phones
Code in this repository solves the game Water Sort Puzzle for phones

Water Sort Puzzle Solver This code can solve a game called "Water Sort Puzzle" available on APPStore and PlayMarket. Start Guide Firstly, you have to

Nov 5, 2022
Smartsort - A smart sorting algorithm for Go to sort filename containing digits that is not zero padded

smartsort A smart sorting algorithm for Go to sort filename containing digits th

Mar 26, 2022
Type-safe, zero-allocation sets for Go

Set Package set is a type-safe, zero-allocation port of the excellent package fatih/set. It contains sets for most of the basic types and you can gene

Jan 5, 2023
Generic types that are missing from Go, including sets, trees, sorted lists, etc.

go-typ Generic types that are missing from Go, including sets, trees, sorted lists, etc. All code is implemented with 0 dependencies and in pure Go co

Dec 4, 2022
A small flexible merge library in go
A small flexible merge library in go

conjungo A merge utility designed for flexibility and customizability. The library has a single simple point of entry that works out of the box for mo

Dec 27, 2022
A package for Go that can be used for range queries on large number of intervals

go-stree go-stree is a package for Go that can be used to process a large number of intervals. The main purpose of this module is to solve the followi

May 14, 2022
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.

EStruct traverses javascript projects and maps all the dependencies and relationships to a JSON. The output can be used to build network visualizations of the project and document the architecture.

Jan 27, 2022