Go library for writing standalone Map/Reduce jobs or for use with Hadoop's streaming protocol

dmrgo is a Go library for writing map/reduce jobs.

It can be used with Hadoop's streaming protocol, but also includes a standalone
map/reduce implementation (including partitioner) for 'small' jobs (~5G-10G).

It is partially based on ideas from Yelp's MrJob package for Python, but since
the Go is statically typed I've tried to make the API match more closely with
Hadoop's Java API.

The traditional "word count" example is in the examples directory.

This code is licensed under the GPLv3, or at your option any later version.

Further reading:

MrJob:
   http://packages.python.org/mrjob/
   https://github.com/Yelp/mrjob

Hadoop map/reduce tutorial:
   http://hadoop.apache.org/common/docs/current/mapred_tutorial.html

Hadoop streaming protocol:
   http://hadoop.apache.org/common/docs/current/streaming.html
Owner
Damian Gryski
Gopher
Damian Gryski
Similar Resources

Implementing SPEEDEX price computation engine in Golang as a standalone binary that exchanges can call

speedex-standalone Implementing SPEEDEX price computation engine in Golang as a standalone binary that exchanges can call. Notes from Geoff About Tato

Dec 1, 2021

PinGo is a standalone and feature-rich tool for common IP-based reachability checking tasks. Ping or Trace and Observe in real-time the statistics.

pingo As a network champion from designing and implementing to troubleshooting large scale networks - I know that is usually not easy for administrato

Sep 26, 2022

Scout is a standalone open source software solution for DIY video security.

Scout is a standalone open source software solution for DIY video security.

scout Scout is a standalone open source software solution for DIY video security. https://www.jonoton-innovation.com Features No monthly fees! Easy In

Oct 25, 2022

A simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

A simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

EaseProbe EaseProbe is a simple, standalone, and lightWeight tool that can do health/status checking, written in Go. Table of Contents EaseProbe 1. Ov

Dec 24, 2022

A library to simplify writing applications using TCP sockets to stream protobuff messages

BuffStreams Streaming Protocol Buffers messages over TCP in Golang What is BuffStreams? BuffStreams is a set of abstraction over TCPConns for streamin

Dec 13, 2022

Use pingser to create client and server based on ICMP Protocol to send and receive custom message content.

Use pingser to create client and server based on ICMP Protocol to send and receive custom message content.

pingser Use pingser to create client and server based on ICMP Protocol to send and receive custom message content. examples source code: ./examples Us

Nov 9, 2022

Gogrok is a self hosted, easy to use alternative to ngrok. It uses SSH as a base protocol, using channels and existing functionality to tunnel requests to an endpoint.

gogrok A simple, easy to use ngrok alternative (self hosted!) The server and client can also be easily embedded into your applications, see the 'serve

Dec 3, 2022

A library for the MIGP (Might I Get Pwned) protocolA library for the MIGP (Might I Get Pwned) protocol

MIGP library This contains a library for the MIGP (Might I Get Pwned) protocol. MIGP can be used to build privacy-preserving compromised credential ch

Dec 3, 2022
Comments
  • Hi! I cleaned up your code for you!

    Hi! I cleaned up your code for you!

    Hi there!

    This is WhitespaceBot. I'm an open-source robot that removes trailing white space in your code, and gives you a gitignore file if you didn't have one!

    Why whitespace? Whitespace is an eyesore for developers who use text editors with dark themes. It's not a huge deal, but it's a bit annoying if you use Vim in a terminal. Really, I'm just a proof of concept - GitHub's V3 API allows robots to automatically improve open source projects, and that's really cool. Hopefully, somebody, maybe you!, will fork me and make me even more useful. My owner is funding a bounty to anybody who can add security fixing features to me.

    I've only cleaned your most popular project, and I've added you to a list of users not to contact again, so you won't get any more pull requests from me unless you ask. If I'm misbehaving, please email my owner and tell him to turn me off! If this is pull request is of no use to you, please just ignore it.

    Thanks! WhiteSpacebot from Gun.io.

Related tags
Server and client implementation of the grpc go libraries to perform unary, client streaming, server streaming and full duplex RPCs from gRPC go introduction

Description This is an implementation of a gRPC client and server that provides route guidance from gRPC Basics: Go tutorial. It demonstrates how to u

Nov 24, 2021
Grpc-gateway-map-null - gRPC Gateway test using nullable values in map

Demonstrate gRPC gateway behavior with nullable values in maps Using grpc-gatewa

Jan 6, 2022
A standalone Web Server developed with the standard http library, suport reverse proxy & flexible configuration
A standalone Web Server developed with the standard http library, suport reverse proxy & flexible configuration

paddy 简介 paddy是一款单进程的独立运行的web server,基于golang的标准库net/http实现。 paddy提供以下功能: 直接配置http响应 目录文件服务器 proxy_pass代理 http反向代理 支持请求和响应插件 部署 编译 $ go build ./main/p

Oct 18, 2022
Hedged Go GRPC client which helps to reduce tail latency at scale.

hedgedgrpc Hedged Go GRPC client which helps to reduce tail latency at scale. Rationale See paper Tail at Scale by Jeffrey Dean, Luiz André Barroso. I

Dec 16, 2022
Use Consul to do service discovery, use gRPC +kafka to do message produce and consume. Use redis to store result.
Use  Consul to do service discovery, use gRPC +kafka to do message produce and consume. Use redis to store result.

目录 gRPC/consul/kafka简介 gRPC+kafka的Demo gRPC+kafka整体示意图 限流器 基于redis计数器生成唯一ID kafka生产消费 kafka生产消费示意图 本文kafka生产消费过程 基于pprof的性能分析Demo 使用pprof统计CPU/HEAP数据的

Jul 9, 2022
wire protocol for multiplexing connections or streams into a single connection, based on a subset of the SSH Connection Protocol

qmux qmux is a wire protocol for multiplexing connections or streams into a single connection. It is based on the SSH Connection Protocol, which is th

Dec 26, 2022
A simple tool to convert socket5 proxy protocol to http proxy protocol

Socket5 to HTTP 这是一个超简单的 Socket5 代理转换成 HTTP 代理的小工具。 如何安装? Golang 用户 # Required Go 1.17+ go install github.com/mritd/s2h@master Docker 用户 docker pull m

Jan 2, 2023
Standalone client for proxies of Opera VPN

opera-proxy Standalone Opera VPN client. Younger brother of hola-proxy. Just run it and it'll start a plain HTTP proxy server forwarding traffic throu

Jan 9, 2023
Standalone client for proxies of Windscribe browser extension

windscribe-proxy Standalone Windscribe proxy client. Younger brother of opera-proxy. Just run it and it'll start a plain HTTP proxy server forwarding

Dec 29, 2022
A standalone ipfs gateway

rainbow Because ipfs should just work like unicorns and rainbows Building go build Running rainbow Configuration NAME: rainbow - a standalone ipf

Nov 9, 2022