Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Antch

Build Status Coverage Status Go Report Card GoDoc

Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started.

Antch is a fast, powerful and extensible web crawling & scraping framework for Go, used to crawl websites and extract structured data from their pages.

Get Started

Getting Started

Follow the Getting Started instructions to start your first spider.

Features

  • Polite, highly concurrent web crawler.
  • Powerful and customizable HTTP middleware.
  • Item data pipeline for the web spider.
  • Built-in proxy support (HTTP, HTTPS, SOCKS5).
  • Built-in XPath query support for HTML/XML documents.
  • Easy to use and integrate with your project.

Examples

BingWallpaper - Bing daily wallpaper.

Documentation

See https://github.com/antchfx/antch/wiki

Owner
The open source web crawler framework project
null
Similar Resources

Fast conversions across various Go types with a simple API.

Go Package: conv Get: go get -u github.com/cstockton/go-conv Example: // Basic types if got, err := conv.Bool(`TRUE`); err == nil { fmt.Printf("conv.

Nov 29, 2022

Stargather is fast GitHub repository stargazers information gathering tool

Stargather is fast GitHub repository stargazers information gathering tool that can scrapes: Organization, Location, Email, Twitter, Follow

Dec 12, 2022

Count Dracula is a fast metrics server that counts entries while automatically expiring old ones

In-Memory Expirable Key Counter This is a fast metrics server, ideal for tracking throttling. Put values to the server, and then count them. Values ex

Jun 17, 2022

Fast Entity Component System in Golang

ECS Fast Entity Component System in Golang This module is the ECS part of the game engine i'm writing in Go. Features: as fast as packages with automa

Dec 11, 2022

Web app built with Go/Golang and Buffalo, deployed on Heroku, using Heroku Postgres

hundred-go-buffalo Background Read Go Read Buffalo Read Getting Started on Heroku with Go Recommended Tools PowerShell terminal Chocolatey Windows pac

Dec 18, 2021

Radiant is used for rapid development of enterprise application in Go, including RESTful APIs, web apps and backend services.

Radiant is used for rapid development of enterprise application in Go, including RESTful APIs, web apps and backend services.

Mar 22, 2022

F' - A flight software and embedded systems framework

F´ (F Prime) is a component-driven framework that enables rapid development and deployment of spaceflight and other embedded software applications.

Jan 4, 2023

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.

Dec 16, 2022

Highly customizable archive and index framework for EPITA

Highly customizable archive and index framework for EPITA

epitar.gz Highly customizable archive and index framework for EPITA. Get started

Nov 28, 2022
Comments
  • antch-getstarted demo looping forever

    antch-getstarted demo looping forever

    Hi, This is my first post to github - sorry if I did something wrong. When running antch-getstarted on my Win 7 (64bit) + golang 1.9, program outputs several json records and pausing/looping. Only Ctrl+c can terminate it. I'm new to golang, so I was not able to solve this issue internally. Thanks,

  • Allow option for antch to set HTTP2ForceUpgrade transport param

    Allow option for antch to set HTTP2ForceUpgrade transport param

    In follow up, I might propose to change crawler interface to allow a transport to be injected (or a http client). For now, this felt the minimal change to fix the bug where using a DialContext for HTTP Transport to use HTTP/2

  • Can't set the request header

    Can't set the request header

    Sometimes we would build one request that having different request header ,but the antch don't have this setting method .Yeah ,Can we add this function in the antch spider?Just like doing one interface in the crawler settings.

  • Please show another Exit Method ,ths

    Please show another Exit Method ,ths

    I had read the example spider, it used the signal chan as the exit method .But it's not very helpfully. Can you show more exit method ? For example , if the spider having done all the scrapy works ,it would exit automatically ?

An simple, easily extensible and concurrent health-check library for Go services
An simple, easily extensible and concurrent health-check library for Go services

Healthcheck A simple and extensible RESTful Healthcheck API implementation for Go services. Health provides an http.Handlefunc for use as a healthchec

Dec 30, 2022
An easy to use, extensible health check library for Go applications.

Try browsing the code on Sourcegraph! Go Health Check An easy to use, extensible health check library for Go applications. Table of Contents Example M

Dec 30, 2022
Entitas-Go is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity.

Entitas-Go Entitas-GO is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity. Code Generator Install the l

Dec 26, 2022
Go-linq - A powerful language integrated query (LINQ) library for Golang

go-linq A powerful language integrated query (LINQ) library for Go. Written in v

Jan 7, 2023
Fast and secure initramfs generator
Fast and secure initramfs generator

Booster - fast and secure initramfs generator Initramfs is a specially crafted small root filesystem that mounted at the early stages of Linux OS boot

Dec 28, 2022
The package manager for macOS you didn’t know you missed. Simple, functional, and fast.
The package manager for macOS you didn’t know you missed. Simple, functional, and fast.

Stew The package manager for macOS you didn’t know you missed. Built with simplicity, functionality, and most importantly, speed in mind. Installation

Mar 30, 2022
a really fast difficulty and pp calculator for osu!mania

gonia | mania star + pp calculator a very fast and accurate star + pp calculator for mania. gonia has low memory usage and very fast calculation times

Mar 10, 2022
Executor - Fast exec task with go and less mem ops

executor fast exec task with go and less mem ops Why we need executor? Go with g

Dec 19, 2022
A fast and easy-to-use gutenberg book downloader

Gutenberg Downloader A brief description of what this project does and who it's for Usage download books Download all english books as epubs with imag

Jan 11, 2022
A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022