Scrape the web in the eink era. Convert websites into books.

❯ papeer get --format=epub --recursive --delay=500 --limit=10 https://news.ycombinator.com/
[===============================================>--------------------] Chapters 7 / 10
[====================================================================] 1. Three ex-US intelligence officers admit hacking for UAE
[====================================================================] 2. Show HN: Time Travel Debugger
[====================================================================] 3. How much faster is Java 17?
[====================================================================] 4. The First Webcam Was Invented to Keep an Eye on a Coffee Pot
[====================================================================] 5. Nikon's 2021 Photomicrography Competition Winners
[====================================================================] 6. HTTP Status 418 – I'm a teapot
[====================================================================] 7. H3: Hexagonal hierarchical geospatial indexing system
[--------------------------------------------------------------------] 8. Automatic cipher suite ordering in Go’s crypto/tls
[--------------------------------------------------------------------] 9. Find engineering roles at over 800 YC-funded startups
[--------------------------------------------------------------------] 10. Futarchy: Robin Hanson on prediction markets
Ebook saved to "Hacker_News.epub"

Installation

From source

go get -u github.com/lapwat/papeer

From binary

On Linux / MacOS

platform=linux
# platform=darwin for MacOS
curl -L https://github.com/lapwat/papeer/releases/download/v0.2.1/papeer-v0.2.1-$platform-amd64 > papeer
chmod +x papeer
sudo mv papeer /usr/local/bin

On Windows

Download latest release.

Install kindlegen to export websites to MOBI (optional)

TMPDIR=$(mktemp -d -t papeer-XXXXX)
curl -L https://github.com/lapwat/papeer/releases/download/kindlegen/kindlegen_linux_2.6_i386_v2_9.tar.gz > $TMPDIR/kindlegen.tar.gz
tar xzvf $TMPDIR/kindlegen.tar.gz -C $TMPDIR
chmod +x $TMPDIR/kindlegen
sudo mv $TMPDIR/kindlegen /usr/local/bin
rm -rf $TMPDIR

Usage

Browse the web in the eink era

Usage:
  papeer [flags]
  papeer [command]

Available Commands:
  completion  generate the autocompletion script for the specified shell
  get         Scrape URL content
  help        Help about any command
  ls          Print table of content
  version     Print the version number of papeer

Flags:
  -d, --delay int         time to wait before downloading next chapter, in milliseconds (default -1)
  -f, --format string     file format [md, epub, mobi] (default "md")
  -h, --help              help for papeer
      --images            retrieve images only
  -i, --include           include URL as first chapter, in resursive mode
  -l, --limit int         limit number of chapters, in recursive mode (default -1)
  -o, --offset int        skip first chapters, in recursive mode
      --output string     output file
  -r, --recursive         create one chapter per natigation item
  -s, --selector string   table of content CSS selector, in resursive mode
      --stdout            print to standard output
  -t, --threads int       download concurrency, in recursive mode (default -1)

Use "papeer [command] --help" for more information about a command.

Autocompletion

Execute this command in your current shell, or add it to your .bashrc.

. <(papeer completion bash)

Type papeer completion bash -h for more information.

You can replace bash by your own shell (zsh, fish or powershell).

Dependencies

  • cobra command line interface
  • go-readability extract content from HTML
  • html-to-markdown convert HTML to Markdown
  • go-epub convert HTML to EPUB
  • colly query HTML trees
  • uiprogress display progress bars
Similar Resources

This project is an implementation of Fermat's factorization method in which multiples of prime numbers are factored into their constituent primes

This project is an implementation of Fermat's factorization method in which multiples of prime numbers are factored into their constituent primes. It is a vanity attempt to break RSA Encryption which relies on prime multiples for encryption.

Jun 3, 2022

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Antch Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started. Antch is a fast, powerful and extensible web crawling &

Jan 6, 2023

A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022

Nodebook - Multi-Lang Web REPL + CLI Code runner

Nodebook - Multi-Lang Web REPL + CLI Code runner

nodebook Nodebook - Multi-Language REPL with Web UI + CLI code runner Useful to practice algorithms and datastructures for coding interviews. What is

Dec 29, 2022

🌍 📋 A web dashboard to inspect Terraform States

 🌍 📋 A web dashboard to inspect Terraform States

🌍 📋 A web dashboard to inspect Terraform States

Jan 1, 2023

A simple web shop that accepts TurtleCoin

A simple web shop that accepts TurtleCoin

TurtleShop A simple web shop that accepts TurtleCoin Note: This software is not finished. Donate to help development of TurtleShop TRTLuxEnfjdF46cBoHh

Feb 13, 2022

this is an example of hystrix-go usage in web dev

hystrix-go-example this is an example of hystrix-go usage in web dev Explanation this example contains 2 service: alpha as our main service, circuit b

Apr 22, 2022

Http web frame with Go Programming Language

Http web frame with Go Programming Language

Oct 17, 2021

A small web dashboard with stats for all pipelines of Buildkite organization.

A small web dashboard with stats for all pipelines of Buildkite organization.

Buildkite Stats A small Buildkite dashboard useful to prioritize which pipelines a Buildkite organization is waiting the most on. Noteworthy details:

Apr 25, 2022
Comments
  • list allows usage of `-l` w/o `-s` while get doesn't

    list allows usage of `-l` w/o `-s` while get doesn't

    The list command implicitly allows using no selector and uses the default of "". Get doesn't work when only passing -l, nor does it work with passing -s "". In my case the "default" selector that get uses works quite well, but simply spits out a few chapters at the end that I intend to omit. Using papeer list -l 5 <uri> does that, however I cannot do the same with papeer get -l 5 <uri>.

    I think ideally it should be allowed to use -l and others with a default selector.

  • Download error if article headline contain

    Download error if article headline contain ":"

    Nice work. If article title contain ":" it download a 0 byte file and if contain "?" it give error " The filename, directory name, or volume label syntax is incorrect.

  • Support for <tt> for code/typewriter text

    Support for for code/typewriter text

    As per https://github.com/JohannesKaufmann/html-to-markdown/issues/49, some websites don't use semantic markup but specify <tt> directly.

    Adding this rule for the markdown converter improves the output considerably:

    	converter.AddRules(
    		md.Rule {
    			Filter: []string{"tt"},
    			Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
    				content = "`" + content + "`"
    
    				return &content
    			},
    		},
    	);
    
Convert various benchmarking formats to benchfmt

benchfmt-convert This repo contains tools for converting various formats to benchfmt for use with benchstat. Most useful is github.com/prattmic/benchf

Sep 17, 2021
This Simple script is used to convert Datadog Dashboard to NewRelic.
This Simple script is used to convert Datadog Dashboard to NewRelic.

What is this? This Simple script is used to convert Datadog Dashboard to NewRelic. This script is build with specific dashboard layout in mind, so it

Feb 6, 2022
An easy way to add useful startup banners into your Go applications
An easy way to add useful startup banners into your Go applications

Try browsing the code on Sourcegraph! Banner Add beautiful banners into your Go applications Table of Contents Motivation Usage API Command line flags

Jan 1, 2023
Quickly clone an entire org/users repositories into one directory - Supports GitHub, GitLab, Bitbucket, and more
Quickly clone an entire org/users repositories into one directory - Supports GitHub, GitLab, Bitbucket, and more

ghorg ghorg allows you to quickly clone all of an orgs, or users repos into a single directory. This can be useful in many situations including Search

Jan 1, 2023
Yubigo is a Yubikey client API library that provides an easy way to integrate the Yubico Yubikey into your existing Go-based user authentication infrastructure.

yubigo Yubigo is a Yubikey client API library that provides an easy way to integrate the Yubikey into any Go application. Installation Installation is

Oct 27, 2022
Gorsair hacks its way into remote docker containers that expose their APIs
Gorsair hacks its way into remote docker containers that expose their APIs

Gorsair Gorsair is a penetration testing tool for discovering and remotely accessing Docker APIs from vulnerable Docker containers. Once it has access

Dec 31, 2022
Merge FiveM cars into a single resource

FiveM Cars Merger Merge FiveM cars into a single resource Usage Download the binary from here or build it Save it in a directory Create a new folder i

May 3, 2022
Download items from the Steam Workshop into the desired folder. (w/GUI)
Download items from the Steam Workshop into the desired folder. (w/GUI)

WorkshopDownloader Download How does it work? When you input a URL

Nov 22, 2022
import csv into existing table postgresql or cockroachdb

import csv into existing table postgresql or cockroachdb

Nov 1, 2021
Assemble multiple CODEOWNERS file into one

Codeowners Tool to generate a GitHub CODEOWNERS file from multiple CODEOWNERS files throughout the repo. This makes it easier to manage code ownership

Apr 11, 2022