GoLang ads.txt scraper

Collects and parses ads.txt

GoLang program scrapes sites for ads.txt and stores its significant details to PostgreSQL database.

Give it a file with CSV list of sites to check (rank,site.url). I use top 1M sites from https://tranco-list.eu/top-1m.csv.zip For demonstration smaller top-1k.csv is supplied.

Scraper first checks HTTPS schema, if connection fails then fallback to HTTP. User-agent is spoofed. Timeout is 5 sec defined by const crawlerTimeout.

User who runs this program must have a ROLE in PostgreSQL allowing SELECT, INSERT, DELETE queries on working database. Program connects to the database via unix socket. Adjust dbConnectionString constant if TCP or another DB name or another authentication method used. PostgreSQL database is named adstxt.

sudo -u postgres psql -c 'CREATE DATABASE adstxt'

Create tables in it with the mktables.sql script.

psql -d adstxt < mktables.sql

Run the program with

go run main.go top-1k.csv

or build executable first

go build main.go
./main top-1k.csv

By default 64 goroutines run to fetch ads.txt from sites. This number can be increased for fast machines on fast connections with optional argument after the file name.

go run main.go top-1k.csv 1000

The third argument is continuation flag. If previous scraping was not finished, it's possible to continue scraping in the next run of the program by specifying flag c - continue. As arguments are positional then goroutines count parameter becomes mandatory for continuation flag to work.

go run main.go top-1k.csv 64 c
Similar Resources

High performance async-io(proactor) networking for Golang。golangのための高性能非同期io(proactor)ネットワーキング

High performance async-io(proactor) networking for Golang。golangのための高性能非同期io(proactor)ネットワーキング

gaio Introduction 中文介绍 For a typical golang network program, you would first conn := lis.Accept() to get a connection and go func(net.Conn) to start a

Dec 29, 2022

Gmqtt is a flexible, high-performance MQTT broker library that fully implements the MQTT protocol V3.1.1 and V5 in golang

中文文档 Gmqtt News: MQTT V5 is now supported. But due to those new features in v5, there area lots of breaking changes. If you have any migration problem

Jan 5, 2023

An SNMP library written in GoLang.

gosnmp GoSNMP is an SNMP client library fully written in Go. It provides Get, GetNext, GetBulk, Walk, BulkWalk, Set and Traps. It supports IPv4 and IP

Jan 7, 2023

A Crypto-Secure, Production-Grade Reliable-UDP Library for golang with FEC

 A Crypto-Secure, Production-Grade Reliable-UDP Library for golang with FEC

Introduction kcp-go is a Production-Grade Reliable-UDP library for golang. This library intents to provide a smooth, resilient, ordered, error-checked

Dec 28, 2022

Simple mDNS client/server library in Golang

mdns Simple mDNS client/server library in Golang. mDNS or Multicast DNS can be used to discover services on the local network without the use of an au

Jan 4, 2023

Easy SSH servers in Golang

gliderlabs/ssh The Glider Labs SSH server package is dope. —@bradfitz, Go team member This Go package wraps the crypto/ssh package with a higher-level

Dec 28, 2022

Golang Super Simple Load Balance

SSLB (Super Simple Load Balancer) ver 0.1.0 It's a Super Simple Load Balancer, just a little project to achieve some kind of performance. Features Hig

Dec 18, 2022

golang tcp server

TCPServer Package tcp_server created to help build TCP servers faster. Install package go get -u github.com/firstrow/tcp_server Usage: NOTICE: OnNewMe

Dec 28, 2022

A LWM2M Client and Server implementation (For Go/Golang)

Betwixt - A LWM2M Client and Server in Go Betwixt is a Lightweight M2M implementation written in Go OMA Lightweight M2M is a protocol from the Open Mo

Dec 23, 2022
Related tags
Block online ads by intercepting DNS queries

donutdns Block online ads by intercepting DNS queries Project Overview The gophers.dev/cmds/donutdns module provides a CoreDNS plugin as well as a sta

Jan 3, 2023
Access Google Ads API via GRPC

google-ads-pb You can use the golang library to interact with the Google Ads API across grpc. This library is not the official Google Ads API library.

Dec 9, 2022
grobotstxt is a native Go port of Google's robots.txt parser and matcher library.

grobotstxt grobotstxt is a native Go port of Google's robots.txt parser and matcher C++ library. Direct function-for-function conversion/port Preserve

Dec 27, 2022
A CoreDNS plugin to serve temporary TXT records for validation purposes (eg. Let's Encrypt DNS-01)

temptxt Name temptxt - serves TXT records for validation purposes (eg. ACME DNS-01 challenge) updated through a HTTP api. Description The temptxt plug

Aug 23, 2022
Txt-lsp - A toy project with Language Server Protocol (LSP)

txt-lsp txt-lsp is a toy project where I play around with Language Server Protoc

Jan 22, 2022
Service that calls uzma24/project1 service, takes input from .txt file and prints JSON output returned from the service.

Service that calls uzma24/project1 service, takes input from .txt file and prints JSON output returned from the service. Program can take large input files.

Feb 6, 2022
This small Docker project is the easiest way to send notifications directly via .txt files to services like: Gotify, Telegram, SMTP (Email) or Webhook.
This small Docker project is the easiest way to send notifications directly via .txt files to services like: Gotify, Telegram, SMTP (Email) or Webhook.

This small Docker project is the easiest way to send notifications directly via .txt files to services like: Gotify, Telegram, SMTP (Email) or Webhook.

Oct 5, 2022
Hprose 1.0 for Golang (Deprecated). Hprose 2.0 for Golang is here:

Hprose for Golang Introduction Installation Usage Http Server Http Client Synchronous Invoking Synchronous Exception Handling Asynchronous Invoking As

Dec 15, 2022
A Minecraft scanner written in Golang (first Golang project)

__ __/ \__ Gothyc A Minecraft port scanner written in Go. ?? / \__/ \__ \__/ \__/ \ Version 0.3.0 \__/ \__/ Author @toas

Nov 6, 2022
Fast IP to CIDR lookup in Golang
Fast IP to CIDR lookup in Golang

cidranger Fast IP to CIDR block(s) lookup using trie in Golang, inspired by IPv4 route lookup linux. Possible use cases include detecting if a IP addr

Dec 30, 2022