BitTorrent DHT Protocol && DHT Spider.

See the video on the Youtube.

中文版README

Introduction

DHT implements the bittorrent DHT protocol in Go. Now it includes:

It contains two modes, the standard mode and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as more metadata info as possiple. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg.

bthub.io is a BT search engine based on the crawling mode.

Installation

go get github.com/shiyanhui/dht

Example

Below is a simple spider. You can move here to see more samples.

import (
    "fmt"
    "github.com/shiyanhui/dht"
)

func main() {
    downloader := dht.NewWire(65535)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}

Download

You can download the demo compiled binary file here.

Note

  • The default crawl mode configure costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself.
  • Now it cant't run in LAN because of NAT.

TODO

  • NAT Traversal.
  • Implements the full BEP-3.
  • Optimization.

FAQ

Why it is slow compared to other spiders ?

Well, maybe there are several reasons.

  • DHT aims to implements the standard BitTorrent DHT protocol, not born for crawling the DHT network.
  • NAT Traversal issue. You run the crawler in a local network.
  • It will block ip which looks like bad and a good ip may be mis-judged.

License

MIT, read more here

Comments
  • find_node no response

    find_node no response

    你好,看了您写的两篇教程非常激动,于是自己用java尝试着写嗅探器 我遇到的问题是:通过对 bootstrap 返回的所有节点 发送find_node请求,竟然没收到一个response, 卡了很久不知道哪里出了问题 bdecode对compact nodes info 里包含的 ip和port的解码应该不会错 , 请求老哥的帮助

  • Spider needs fixing.

    Spider needs fixing.

    The code is not working properly

    Sometimes it will find 1 infohash then won't find anymore no matter how long i leave it running.

    I have used wireshark to verify that i am receiving see traffic which shows all the peers, infohashes so it's not a network issue.

    Also the spider will sometimes send invalid transaction ids which i can see responses via wireshark from peers. I'm not sure if this is a problem but i thought it maybe worth mentioning.

  • 端口占用的问题..

    端口占用的问题..

    其他的DHT节点获取我的ip和端口 是从我发送的UDP包中获取的 如果我Server端与Client 绑定的是同一端口就会有冲突 如果不是同一端口 那么在其他节点的路由表中, 我的DHT节点ip和端口是 我Client端发送的ip 和端口
    那么说 我的client 端和 Server不能同时运行吗? 还是我的理解有误?

  • 如何加入dht网络?

    如何加入dht网络?

    教程《一步一步教你写BT种子嗅探器-DHT篇》的krpc部分提到: 一开始你是不在DHT网络中的,你需要别人把你介绍进去,任何一个在DHT中的人都可以。一般我们可以向 router.bittorrent.com:6881、 dht.transmissionbt.com:6881 等发送find_node请求,然后我们的DHT就可以开始工作了

    这样好像还是需要一个有公网ip的网络节点啊,那这个中心节点关掉了不是同样无法工作了?请求老大解惑。同时想请教router.bittorrent.com:6881是如何把新节点介绍进dht网络中去的

  • 代码写的不是很好看.

    代码写的不是很好看.

    特别是这个函数,看的我胸闷! func (wire *Wire) fetchMetadata(r Request) ...

    提几个建议

    1. 能否运用以下函数减少重复制造轮子:

    binary.Read io.ReadFull

    2.能否把同一维度的东西归类写在一起

    比如, 这个是一个维度的, 发送握手包,得到握手应答,发送额外握手包

    	if sendHandshake(conn, infoHash, []byte(randomString(20))) != nil ||
    		read(conn, 68, data) != nil ||
    		onHandshake(data.Next(68)) != nil ||
    		sendExtHandshake(conn) != nil {
    		return
    	}
    

    但下面的代码呢? 读包头4 字节,再读1字节,再读1字节 . 这些不是同一维度的! 140行的函数看的头晕(1个for循环上下文乱跳,一堆暴露在外面的细节)

    1. 错误处理能否打个日志.贴个协议注释,链接.

    写程序要逻辑清楚干净, 表明清楚意思.这份代码看的我真的难受,提点抱怨,见谅.

  • Fix config.KBucketSize Size

    Fix config.KBucketSize Size

    In line 74 of dht.go and in other places regarding maxsize you are setting a value wich overflows int32. Error: /gopath/src/github.com/shiyanhui/dht/dht.go:74: constant 4294967296 overflows int. (this is run on an armv7 machine 32bits architecture)

  • Big God, May ask How to build all the source?

    Big God, May ask How to build all the source?

    I want build it as a single binary executable file, so I run go build just inside dht/ but, it says:can't load package: package dht: cannot find package "dht" in any of: /usr/local/Cellar/go/1.8.1/libexec/src/dht (from $GOROOT) /Users/jintian/go/src/dht (from $GOPATH) , What should I do?

  • 像路由表之类的数据结构能实现成接口吗?

    像路由表之类的数据结构能实现成接口吗?

    DHT和KRPC中有一些像路由表(routingTable)之类的数据结构都是使用内存式容器实现的,这些数据结构能否实现成接口 Interface 吗?

    之所以有这样的想法,主要是有个担心:如果结点Node一旦多了(比如几千万、几亿),需要几GB甚至几十GB的内存空间,内存有可能不足。

    如果是接口,那么就可以根据需求自定义它们的存储,比如可以使用 Redis 来代替内存。

  • 关于NAT传透.

    关于NAT传透.

    仅作参考。 bep5 应该是自带部分UDP 打洞效果的(Address-Restricted cone NAT and Port-Restricted cone NAT),你发送过 find_node 或其他任何消息的节点,向你发送get_peer 或 announce_peer 消息应该是可以穿透NAT的(当然一定时间内), 而通过其他节点的路由表发现你,直接向你发送消息就不可能传透 NAT了。

  • Spider Shows no result , waits almost 15 minutes for one metadata

    Spider Shows no result , waits almost 15 minutes for one metadata

    How can i improve this ? It giving so slow crawling i think i am doing something wrong . People wrote minimum 100 torrent an hour , but i think i am doing something wrong . I closed firewall even , but still having same problem , Thanks

  • 节点分裂的问题

    节点分裂的问题

    } else if root.KBucket().prefix.Compare(nd.id, prefixLen-1) == 0 {

    和你博客中说的好像有点差异

    第一种情况是当前的路径是该节点ID(注意不是要插入的key,是“我”自己的ID)的前缀,那么就分裂

    代码中是用了要插入的node,而不是自己的nodeid,我理解下来应该是自己的nodeid

    请问是我理解的问题吗

  • 关于节点插入时bucket分裂问题

    关于节点插入时bucket分裂问题

    } else if root.KBucket().prefix.Compare(nd.id, prefixLen-1) == 0 {

    routingtable.go 388行,bucket分裂的条件判断,是否应该是当前叶子节点和本机节点的前缀相同时,才去分裂,而代码中的判断是新插入节点与当前叶子节点进行比较。 PS:看了下有个人提了同样的问题,你的回答是为了容纳更多的节点,但是这样的话,后面else加入candidate的逻辑是不是都走不到了

  • 阿里云上接受不到数据, 通过日志分析绝大多数错误是decode

    阿里云上接受不到数据, 通过日志分析绝大多数错误是decode

    我在krpc.go里添加了一些日志

    func handle(dht *DHT, pkt packet) {
    	if len(dht.workerTokens) == dht.PacketWorkerLimit {
    
    		fmt.Println("return from len(dht.workerTokens) == dht.PacketWorkerLimit")
    		return
    	}
    
    	dht.workerTokens <- struct{}{}
    
    	go func() {
    		defer func() {
    			<-dht.workerTokens
    		}()
    
    		if dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port) {
    
    			fmt.Println("return from dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port)")
    			return
    		}
    
    		data, err := Decode(pkt.data)
    		if err != nil {
    
    			fmt.Print("return from data, err := Decode(pkt.data)")
    			fmt.Println(err)
    			return
    		}
    
    		response, err := parseMessage(data)
    		if err != nil {
    
    			fmt.Print("return from response, err := parseMessage(data)")
    			fmt.Println(err)
    			return
    		}
    
    		if f, ok := handlers[response["y"].(string)]; ok {
    			f(dht, pkt.raddr, response)
    		}
    	}()
    }
    

    然后用如下命令进行日志过滤

    
    grep "Got a response" nohup_dht.logs | wc -l
    grep "return from data, err := Decode(pkt.data)" nohup_dht.logs | wc -l
    grep "return from response, err := parseMessage(data)" nohup_dht.logs | wc -l
    grep "return from dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port)" nohup_dht.logs| wc -l  
    

    得到结果如下

    0
    620
    10
    22
    

    运行了俩分钟绝大多数都是decode error

    return from data, err := Decode(pkt.data)invalid bencode when decode item
    

    一条有用的数据都没拿到.

    请问是解码有问题吗?

    PS, 是通过在mac上编译出的linux版本, 编译命令

    CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/exec_linux_dht src/main/main.go
    

    centos版本

    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
    

    mac上go env信息

    GOARCH="amd64"
    GOBIN=""
    GOCACHE="/Users/xxx/Library/Caches/go-build"
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/xxx/godht:/usr/local/go/bin"
    GORACE=""
    GOROOT="/usr/local/Cellar/go/1.10/libexec"
    GOTMPDIR=""
    GOTOOLDIR="/usr/local/Cellar/go/1.10/libexec/pkg/tool/darwin_amd64"
    GCCGO="gccgo"
    CC="clang"
    CXX="clang++"
    CGO_ENABLED="1"
    CGO_CFLAGS="-g -O2"
    CGO_CPPFLAGS=""
    CGO_CXXFLAGS="-g -O2"
    CGO_FFLAGS="-g -O2"
    CGO_LDFLAGS="-g -O2"
    PKG_CONFIG="pkg-config"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_b/_xrkt7216glfsz7z989ss7zm0000gn/T/go-build666755492=/tmp/go-build -gno-record-gcc-switches -fno-common"
    
  • 按照demo什么也采集不到,完全没反应,什么情况呢?

    按照demo什么也采集不到,完全没反应,什么情况呢?

    不管是在本地还是在服务器上都是没反应?

    package main
    import (
        "fmt"
        "github.com/shiyanhui/dht"
    )
    
    func main() {
        downloader := dht.NewWire(65536)
        go func() {
            // once we got the request result
            for resp := range downloader.Response() {
                fmt.Println(resp.InfoHash, resp.MetadataInfo)
            }
        }()
        go downloader.Run()
    
        config := dht.NewCrawlConfig()
        config.OnAnnouncePeer = func(infoHash, ip string, port int) {
            // request to download the metadata info
            downloader.Request([]byte(infoHash), ip, port)
        }
        d := dht.New(config)
    
        d.Run()
    }
    
🌧 BitTorrent client and library in Go
🌧 BitTorrent client and library in Go

rain BitTorrent client and library in Go. Running in production at put.io. Features Core protocol Fast extension Magnet links Multiple trackers UDP tr

Jan 5, 2023
A(nother) Bittorrent client written in the go programming language

Taipei Torrent This is a simple command-line-interface BitTorrent client coded in the go programming language. Features: Supports multiple torrent fil

Dec 19, 2022
BitTorrent client in Go

wgo - Simple BitTorrent client in Go Roger Pau Monné (2010 - 2011) Introduction This project is based on the previous work of jackpal, Taipei-Torrent:

Jan 2, 2020
P2PDistributedHashTable - A golang Kademlia/Bittorrent DHT library that implements BEP5
P2PDistributedHashTable - A golang Kademlia/Bittorrent DHT library that implements BEP5

This is a golang Kademlia/Bittorrent DHT library that implements BEP 5. It's typ

Apr 10, 2022
A Go language binding for encodeing and decoding data in the bencode format that is used by the BitTorrent peer-to-peer file sharing protocol.

bencode-go A Go language binding for encoding and decoding data in the bencode format that is used by the BitTorrent peer-to-peer file sharing protoco

Nov 27, 2022
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

Dec 30, 2022
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

Jan 6, 2023
Gospider - Fast web spider written in Go
Gospider - Fast web spider written in Go

GoSpider GoSpider - Fast web spider written in Go Painless integrate Gospider into your recon workflow? Enjoying this tool? Support it's development a

Dec 31, 2022
Go spider: A crawler of vertical communities achieved by GOLANG

go_spider A crawler of vertical communities achieved by GOLANG. Latest stable Release: Version 1.2 (Sep 23, 2014). QQ群号:337344607 Features Concurrent

Dec 9, 2021
dht is used by anacrolix/torrent, and is intended for use as a library in other projects both torrent related and otherwise

dht Installation Install the library package with go get github.com/anacrolix/dht, or the provided cmds with go get github.com/anacrolix/dht/cmd/....

Dec 28, 2022
Kademlia/Mainline DHT node in Go.
Kademlia/Mainline DHT node in Go.

This is a golang Kademlia/Bittorrent DHT library that implements BEP 5. It's typically used by a torrent client such as Taipei-Torrent, but it could a

Nov 30, 2022
🌌 A libp2p DHT crawler that gathers information about running nodes in the network.
🌌 A libp2p DHT crawler that gathers information about running nodes in the network.

A libp2p DHT crawler that gathers information about running nodes in the network. The crawler runs every 30 minutes by connecting to the standard DHT bootstrap nodes and then recursively following all entries in the k-buckets until all peers have been visited.

Dec 27, 2022
Jun 20, 2022
Data Availability Sampling (DAS) on a Discovery-v5 DHT overlay

Implementing Data Availability Sampling (DAS) There's a lot of history to unpack here. Vitalik posted about the "Endgame": where ethereum could be hea

Nov 12, 2022
🌧 BitTorrent client and library in Go
🌧 BitTorrent client and library in Go

rain BitTorrent client and library in Go. Running in production at put.io. Features Core protocol Fast extension Magnet links Multiple trackers UDP tr

Dec 28, 2022
Full-featured BitTorrent client package and utilities

torrent This repository implements BitTorrent-related packages and command-line utilities in Go. The emphasis is on use as a library from other projec

Jan 4, 2023
🌧 BitTorrent client and library in Go
🌧 BitTorrent client and library in Go

rain BitTorrent client and library in Go. Running in production at put.io. Features Core protocol Fast extension Magnet links Multiple trackers UDP tr

Jan 5, 2023
A(nother) Bittorrent client written in the go programming language

Taipei Torrent This is a simple command-line-interface BitTorrent client coded in the go programming language. Features: Supports multiple torrent fil

Dec 19, 2022
BitTorrent client in Go

wgo - Simple BitTorrent client in Go Roger Pau Monné (2010 - 2011) Introduction This project is based on the previous work of jackpal, Taipei-Torrent:

Jan 2, 2020
Private BitTorrent tracker generator

Private BitTorrent tracker for everyone PrivTracker allows to share torrent files just with your fiends, nobody else. Unlike public trackers, it share

Jan 6, 2023