Golang based web site opengraph data scraper with caching

Snapper

snapper

A Web microservice for capturing a website's OpenGraph data built in Golang

Building Snapper

building the binary

git clone https://github.com/owl-93/snapper
cd snapper
go build .

optionally give the executable a name

go build -o <executable name> .

Running Snapper

./snapper

Specifying a different port

you can tell snapper to use a different port by passing the port as a comman line argument

./snapper 8081

By default, snapper runs on port 8888

Using Snapper

To snap a webpage's Opengraph metadata, just make a REST POST request to / with the target website specified in the request body using the key page. You can optionally pass the forceRefresh option in the request body to force snapper to fetch the latest metadata and not use any cached values if present.

Example

curl --location --request POST 'http://localhost:8888/' 
--header 'Content-Type: application/json' 
--data-raw '{
    "page": "https://code.visualstudio.com/download",
    "forceRefresh" : true
}'

Response

[
    {
        "name": "og:title",
        "value": "Download Visual Studio Code - Mac, Linux, Windows"
    },
    {
        "name": "og:description",
        "value": "Visual Studio Code is free and available on your favorite platform - Linux, macOS, and Windows.  Download Visual Studio Code to experience a redefined code editor,  optimized for building and debugging modern web and cloud applications."
    },
    {
        "name": "og:image",
        "value": "https://code.visualstudio.com/opengraphimg/opengraph-home.png"
    },
    {
        "name": "og:url",
        "value": "https://code.visualstudio.com/Download"
    }
]
Owner
Stephen Schmidt
Full Stack - BS in CS from Cal Poly SLO - Native Android | Backend | Web | Kotlin | Java | Typescript | JS | Python | React | Node | Spring | Express | & more
Stephen Schmidt
Similar Resources

Best Room Price Scraper from Booking.com

Best Room Price Scraper from Booking.com This repo is a tutorial of Large Scale

Nov 11, 2022

Go-site-crawler - a simple application written in go that can fetch contentfrom a url endpoint

Go Site Crawler Go Site Crawler is a simple application written in go that can f

Feb 5, 2022

Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Go-based search engine URL collector , support Google, Bing, can be based on Google syntax batch collection URL

Nov 9, 2022

Youtube tutorial about web scraping using golang and Gocolly

This is an example project I wrote for a youtube tutorial about webscraping using golang and gocolly It extracts data from a tracking differences webs

Mar 26, 2022

Fast golang web crawler for gathering URLs and JavaSript file locations.

Fast golang web crawler for gathering URLs and JavaSript file locations. This is basically a simple implementation of the awesome Gocolly library.

Sep 24, 2022

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Crawlab 中文 | English Installation | Run | Screenshot | Architecture | Integration | Compare | Community & Sponsorship | CHANGELOG | Disclaimer Golang-

Jan 7, 2023

ant (alpha) is a web crawler for Go.

The package includes functions that can scan data from the page into your structs or slice of structs, this allows you to reduce the noise and complexity in your source-code.

Dec 30, 2022

Declarative web scraping

Declarative web scraping

Ferret Try it! Docs CLI Test runner Web worker What is it? ferret is a web scraping system. It aims to simplify data extraction from the web for UI te

Jan 4, 2023

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files

Fetch web pages using headless Chrome, storing all fetched resources including JavaScript files. Run arbitrary JavaScript on many web pages and see the returned values

Dec 29, 2022
Related tags
Ratemyprof scraper - Ratemyprof scraper with golang

ratemyprof scraper visit https://ratemyprof-api.vercel.app/api/getProf to try ou

Jan 18, 2022
A crawler/scraper based on golang + colly, configurable via JSON

A crawler/scraper based on golang + colly, configurable via JSON

Aug 21, 2022
A crawler/scraper based on golang + colly, configurable via JSON

Super-Simple Scraper This a very thin layer on top of Colly which allows configuration from a JSON file. The output is JSONL which is ready to be impo

Aug 21, 2022
Web Scraper in Go, similar to BeautifulSoup

soup Web Scraper in Go, similar to BeautifulSoup soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSou

Jan 9, 2023
Scraper to download school attendance data from the DfE's statistics website
Scraper to download school attendance data from the DfE's statistics website

?? Simple to use. Scrape attendance data with a single command! ?? Super fast. A

Mar 31, 2022
A simple scraper to export data from buildkite to honeycomb using opentelemetry SDK
A simple scraper to export data from buildkite to honeycomb using opentelemetry SDK

A quick scraper program that let you export builds on BuildKite as OpenTelemetry data and then send them to honeycomb.io for slice-n-dice high cardinality analysis.

Jul 7, 2022
Elegant Scraper and Crawler Framework for Golang

Colly Lightning Fast and Elegant Scraping Framework for Gophers Colly provides a clean interface to write any kind of crawler/scraper/spider. With Col

Jan 9, 2023
Warhammer40K faction scraper written in Golang, powered by colly.

Wascra Description Wascra is a tool written in Golang, which lets you extract all relevant Datasheet info from a Warhammer40K (9th edition) faction fr

Feb 8, 2022
Simple price scraper with HTTP server/exporter for use with Prometheus

priceserver v0.3 Simple price scraper with HTTP server/exporter for use with Prometheus Currently working with Bitrue.com exchange but easily adaptabl

Nov 16, 2021
A cli scraper of gocomics.com made in go

goComic goComic is a cli tool written in go that scrapes your favorite childhood favorite comic from gocomics.com. It will give you a single days comi

Dec 24, 2021