go-fasttld is a high performance top level domains (TLD) extraction module.

go-fasttld

Go Reference Go Report Card Codecov Coverage Mentioned in Awesome Go

GitHub license

go-fasttld is a high performance top level domains (TLD) extraction module implemented with compressed tries.

This module is a port of the Python fasttld module, with additional modifications to support extraction of subcomponents from full URLs, IPv4 addresses, and IPv6 addresses.

Trie

Background

go-fasttld extracts subcomponents like top level domains (TLDs), subdomains and hostnames from URLs efficiently by using the regularly-updated Mozilla Public Suffix List and the compressed trie data structure.

For example, it extracts the com TLD, maps subdomain, and google domain from https://maps.google.com:8080/a/long/path/?query=42.

go-fasttld also supports extraction of private domains listed in the Mozilla Public Suffix List like 'blogspot.co.uk' and 'sinaapp.com', extraction of IPv4 addresses, and extraction of IPv6 addresses.

Why not split on "." and take the last element instead?

Splitting on "." and taking the last element only works for simple TLDs like .com, but not more complex ones like oseto.nagasaki.jp.

Compressed trie example

Valid TLDs from the Mozilla Public Suffix List are appended to the compressed trie in reverse-order.

Given the following TLDs
au
nsw.edu.au
com.ac
edu.ac
gov.ac

and the example URL host `example.nsw.edu.au`

The compressed trie will be structured as follows:

START
 ╠═ au 🚩 βœ…
 β•‘  β•šβ• edu βœ…
 β•‘     β•šβ• nsw 🚩 βœ…
 β•šβ• ac
    ╠═ com 🚩
    ╠═ edu 🚩
    β•šβ• gov 🚩

=== Symbol meanings ===
🚩 : path to this node is a valid TLD
βœ… : path to this node found in example URL host `example.nsw.edu.au`

The URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are au -> edu -> nsw. Reversing the nodes gives the extracted TLD nsw.edu.au.

Installation

go get github.com/elliotwutingfeng/go-fasttld

Quick Start

Full demo available in the examples folder

Domain

// Initialise fasttld extractor
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

//Extract URL subcomponents
url := "https://[email protected]:5000/a/b/c/d/e/f/g/h/i?id=42"
res := extractor.Extract(fasttld.URLParams{URL: url})

// Display results
fmt.Println(res.Scheme)           // https://
fmt.Println(res.UserInfo)         // some-user
fmt.Println(res.SubDomain)        // a.long.subdomain
fmt.Println(res.Domain)           // ox
fmt.Println(res.Suffix)           // ac.uk
fmt.Println(res.RegisteredDomain) // ox.ac.uk
fmt.Println(res.Port) // 5000
fmt.Println(res.Path) // a/b/c/d/e/f/g/h/i?id=42

IPv4 Address

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://127.0.0.1:5000"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = 127.0.0.1
// res.Suffix = <no output>
// res.RegisteredDomain = 127.0.0.1
// res.Port = 5000
// res.Path = <no output>

IPv6 Address

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://[aBcD:ef01:2345:6789:aBcD:ef01:2345:6789]:5000"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = aBcD:ef01:2345:6789:aBcD:ef01:2345:6789
// res.Suffix = <no output>
// res.RegisteredDomain = aBcD:ef01:2345:6789:aBcD:ef01:2345:6789
// res.Port = 5000
// res.Path = <no output>

Internationalised label separators

go-fasttld supports the following internationalised label separators (IETF RFC 3490)

  • U+002E (full stop)
  • U+3002 (ideographic full stop)
  • U+FF0E (fullwidth full stop)
  • U+FF61 (halfwidth ideographic full stop)
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url = "https://brb\u002ei\u3002am\uff0egoing\uff61to\uff0ebe\u3002a\uff61fk"
res = extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = brb\u002ei\u3002am\uff0egoing\uff61to
// res.Domain = be
// res.Suffix = a\uff61fk
// res.RegisteredDomain = be\u3002a\uff61fk
// res.Port = <no output>
// res.Path = <no output>

Public Suffix List options

Specify custom public suffix list file

You can use a custom public suffix list file by setting CacheFilePath in fasttld.SuffixListParams{} to its absolute path.

cacheFilePath := "/absolute/path/to/file.dat"
extractor, _ := fasttld.New(fasttld.SuffixListParams{CacheFilePath: cacheFilePath})

Updating the default Public Suffix List cache

Whenever fasttld.New is called without specifying CacheFilePath in fasttld.SuffixListParams{}, the local cache of the default Public Suffix List is updated automatically if it is more than 3 days old. You can also manually update the cache by using Update().

// Automatic update performed if `CacheFilePath` is not specified
// and local cache is more than 3 days old
extractor, _ := fasttld.New(fasttld.SuffixListParams{})

// Manually update local cache
if err := extractor.Update(); err != nil {
    log.Println(err)
}

Private domains

According to the Mozilla.org wiki, the Mozilla Public Suffix List contains private domains like blogspot.com and sinaapp.com.

By default, go-fasttld excludes these private domains (i.e. IncludePrivateSuffix = false)

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://google.blogspot.com"
res := extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = google
// res.Domain = blogspot
// res.Suffix = com
// res.RegisteredDomain = blogspot.com
// res.Port = <no output>
// res.Path = <no output>

You can include private domains by setting IncludePrivateSuffix = true

extractor, _ := fasttld.New(fasttld.SuffixListParams{IncludePrivateSuffix: true})

url := "https://google.blogspot.com"
res := extractor.Extract(fasttld.URLParams{URL: url})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = google
// res.Suffix = blogspot.com
// res.RegisteredDomain = google.blogspot.com
// res.Port = <no output>
// res.Path = <no output>

Extraction options

Ignore Subdomains

You can ignore subdomains by setting IgnoreSubDomains = true. By default, subdomains are extracted.

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://maps.google.com"
res := extractor.Extract(fasttld.URLParams{URL: url, IgnoreSubDomains: true})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = <no output>
// res.Domain = google
// res.Suffix = com
// res.RegisteredDomain = google.com
// res.Port = <no output>
// res.Path = <no output>

Punycode

Convert internationalised URLs to punycode before extraction by setting ConvertURLToPunyCode = true. By default, URLs are not converted to punycode.

extractor, _ := fasttld.New(fasttld.SuffixListParams{})

url := "https://hello.δΈ–η•Œ.com"
res := extractor.Extract(fasttld.URLParams{URL: url, ConvertURLToPunyCode: true})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = hello
// res.Domain = xn--rhqv96g
// res.Suffix = com
// res.RegisteredDomain = xn--rhqv96g.com
// res.Port = <no output>
// res.Path = <no output>

res = extractor.Extract(fasttld.URLParams{URL: url, ConvertURLToPunyCode: false})

// res.Scheme = https://
// res.UserInfo = <no output>
// res.SubDomain = hello
// res.Domain = δΈ–η•Œ
// res.Suffix = com
// res.RegisteredDomain = δΈ–η•Œ.com
// res.Port = <no output>
// res.Path = <no output>

Testing

go test -v -coverprofile=test_coverage.out && go tool cover -html=test_coverage.out -o test_coverage.html

Benchmarks

go test -bench=. -benchmem -cpu 1

Modules used

Benchmark Name Source
GoFastTld go-fasttld (this module)
JPilloraGoTld github.com/jpillora/go-tld
JoeGuoTldExtract github.com/joeguo/tldextract
Mjd2021USATldExtract github.com/mjd2021usa/tldextract
M507Tlde github.com/M507/tlde

Results

Benchmarks performed on AMD Ryzen 7 5800X, Manjaro Linux.

go-fasttld performs especially well on longer URLs.


#1

https://news.google.com

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 2389614 496.8 ns/op 176 B/op 4 allocs/op βœ”οΈ
JPilloraGoTld 2300103 521.2 ns/op 224 B/op 2 allocs/op
JoeGuoTldExtract 1480351 822.2 ns/op 208 B/op 7 allocs/op
Mjd2021USATldExtract 1336317 876.7 ns/op 208 B/op 7 allocs/op
M507Tlde 2276070 513.1 ns/op 160 B/op 5 allocs/op

#2

https://iupac.org/iupac-announces-the-2021-top-ten-emerging-technologies-in-chemistry/

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 2254648 537.6 ns/op 304 B/op 4 allocs/op βœ”οΈ
JPilloraGoTld 1633924 737.0 ns/op 224 B/op 2 allocs/op
JoeGuoTldExtract 1532829 781.0 ns/op 288 B/op 6 allocs/op
Mjd2021USATldExtract 1444665 832.5 ns/op 288 B/op 6 allocs/op
M507Tlde 2032639 584.8 ns/op 272 B/op 5 allocs/op

#3

https://www.google.com/maps/dir/Parliament+Place,+Parliament+House+Of+Singapore,+Singapore/Parliament+St,+London,+UK/@25.2440033,33.6721455,4z/data=!3m1!4b1!4m14!4m13!1m5!1m1!1s0x31da19a0abd4d71d:0xeda26636dc4ea1dc!2m2!1d103.8504863!2d1.2891543!1m5!1m1!1s0x487604c5aaa7da5b:0xf13a2197d7e7dd26!2m2!1d-0.1260826!2d51.5017061!3e4

Benchmark Name Iterations ns/op B/op allocs/op Fastest
GoFastTld 1519119 785.9 ns/op 784 B/op 4 allocs/op βœ”οΈ
JPilloraGoTld 399526 2848 ns/op 928 B/op 4 allocs/op
JoeGuoTldExtract 778827 1420 ns/op 1120 B/op 6 allocs/op
Mjd2021USATldExtract 755976 1523 ns/op 1120 B/op 6 allocs/op
M507Tlde 806964 1584 ns/op 1120 B/op 6 allocs/op

Acknowledgements

Comments
  • Update github.com/joeguo/tldextract digest to d83daa6

    Update github.com/joeguo/tldextract digest to d83daa6

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | github.com/joeguo/tldextract | require | digest | 7e06486 -> d83daa6 |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Update golang.org/x/net digest to 2871e0c

    Update golang.org/x/net digest to 2871e0c

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 1d1ef93 -> 2871e0c |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Handle internationalised period delimiters

    Handle internationalised period delimiters

    The current implementation only handles . as a period delimiter, but not 。, ., and q

    | Code | Character | Is Handled | |--|--|--| | \u002e | . | :heavy_check_mark: | | \u3002 | 。 | :no_entry_sign: | | \uff0e | . | :no_entry_sign: | | \uff61 | q | :no_entry_sign: |

    Ideally, we can handle these period delimiter variants with minimal impact on performance.

    Similar to https://github.com/john-kurkowski/tldextract/pull/253

  • Update golang.org/x/net digest to 1d1ef93

    Update golang.org/x/net digest to 1d1ef93

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 1850ba1 -> 1d1ef93 |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Update golang.org/x/net digest to 1850ba1

    Update golang.org/x/net digest to 1850ba1

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | a630d4f -> 1850ba1 |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Enabled.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Update golang.org/x/net digest to a630d4f

    Update golang.org/x/net digest to a630d4f

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | 290c469 -> a630d4f |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Update golang.org/x/net digest to 290c469

    Update golang.org/x/net digest to 290c469

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Type | Update | Change | |---|---|---|---| | golang.org/x/net | require | digest | aac1ed4 -> 290c469 |


    Configuration

    πŸ“… Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    β™» Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    πŸ”• Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Dependency Dashboard

    Dependency Dashboard

    This issue provides visibility into Renovate updates and their statuses. Learn more

    This repository currently has no open or pending branches.


    • [ ] Check this box to trigger a request for Renovate to run again on this repository
High-performance minimalist queue implemented using a stripped-down lock-free ringbuffer, written in Go (golang.org)

This project is no longer maintained - feel free to fork the project! gringo A high-performance minimalist queue implemented using a stripped-down loc

Jan 18, 2022
skipmap is a high-performance concurrent sorted map based on skip list. Up to 3x ~ 10x faster than sync.Map in the typical pattern.
skipmap is a high-performance concurrent sorted map based on skip list. Up to 3x ~ 10x faster than sync.Map in the typical pattern.

Introduction skipmap is a high-performance concurrent map based on skip list. In typical pattern(one million operations, 90%LOAD 9%STORE 1%DELETE), th

Apr 21, 2022
A feature complete and high performance multi-group Raft library in Go.
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / δΈ­ζ–‡η‰ˆ News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

May 5, 2022
A Go queue manager on top of Redis

Queue A Go library for managing queues on top of Redis. It is based on a hiring exercise but later I found it useful for myself in a custom task proce

Apr 5, 2022
low level data type and utils in Golang.

low low level data type and utils in Golang. A stable low level function set is the basis of a robust architecture. It focuses on stability and requir

Apr 18, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

Apr 5, 2022
TLDs finder: check domain name availability across all valid top-level domains

TLD:er TLDs finder β€” check domain name availability across all valid top-level d

May 14, 2022
fast tool for separate existing domains from list of domains using DNS/HTTP.

NETGREP How To Install β€’ How to use Description netgrep can send http/https request or resolve domain from dns (can customize dns server) to separate

Jan 27, 2022
Godaddy-domains-client-go - Godaddy domains api Client golang - Write automaticly from swagger codegen

Go API client for swagger Overview This API client was generated by the swagger-codegen project. By using the swagger-spec from a remote server, you c

Jan 9, 2022
πŸ”‘A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout (LSM+WAL) similar to Riak.

bitcask A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout

Apr 15, 2022
the pluto is a gateway new time, high performance, high stable, high availability, easy to use

pluto the pluto is a gateway new time, high performance, high stable, high availability, easy to use Acknowledgments thanks nbio for providing low lev

Sep 19, 2021
πŸš₯ Yet another pinger: A high-performance ICMP ping implementation build on top of BPF technology.

yap Yet-Another-Pinger: A high-performance ICMP ping implementation build on top of BPF technology. yap uses the gopacket library to receive and handl

Apr 29, 2022
top in container - Running the original top command in a container
top in container - Running the original top command in a container

Running the original top command in a container will not get information of the container, many metrics like uptime, users, load average, tasks, cpu, memory, are about the host in fact. topic(top in container) will retrieve those metrics from container instead, and shows the status of the container, not the host.

Mar 23, 2022
Gue is Golang queue on top of PostgreSQL that uses transaction-level locks.

Gue is Golang queue on top of PostgreSQL that uses transaction-level locks.

May 7, 2022
Coordinates shutdown of processes with multiple top-level services.

package supervisor Package supervisor coordinates shutdown among multiple services and the OS. Example func main() { m := supervisor.New(context.Back

Dec 12, 2021
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

May 14, 2022
A Go port of the Rapid Automatic Keyword Extraction algorithm (RAKE)

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010).

Apr 16, 2022
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)
:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine) support (Go 1.8, 1.9, 1.10)

TextRank on Go This source code is an implementation of textrank algorithm, under MIT licence. The minimum requred Go version is 1.8. MOTIVATION If th

May 8, 2022
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

grate A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats. Why? Grate focuses on speed and stability first

Apr 13, 2022
Pi-hole data right from your terminal. Live updating view, query history extraction and more!
Pi-hole data right from your terminal. Live updating view, query history extraction and more!

Pi-CLI Pi-CLI is a command line program used to view data from a Pi-Hole instance directly in your terminal.

Apr 26, 2022