Scrappy is a cli tool that allows multiple web scrappers to monitor periodically for a basic ruleset coverage and inform users when the criteria have been met.

Scrappy - A multi-type web scrapper with alerting

Scrappy is a cli tool that allows multiple web scrappers to monitor periodically for a basic ruleset coverage and inform users when the criteria have been met.

About

Scrappy can accept and manage multiple rule sets to scrap each one on its own period. Each scrap rule is consisted of a user-friendly name and all the required data used for scrapping and alerting. These fields are:

name - a friendly name of the scrapper (string)
url - a URL to scrap from (string)
attribute - the attribute used to scrap from (string)
trim_prefix_chars - prefix characters to trim (integer)
trim_suffix_chars - suffix characters to trim (integer)
value_type - value's type (string|integer|float)
check_value - value to check against (string parsed to value_type)
comparator_type - comparison type between value_type and check_value
check_period - Period to check for change (duration of type 1h30m10s)
status current status of the scrapped (active|error|complete)

The comparator_type field can have one of the following values depending on the value_type. For string value_type the applicable comparator_type are:

longer_than"
shorter_than"
contains"
is_same"
is_not_same"
exists"
not_exists"

For integer and float value_type the applicable comparator_type are:

less_than
greater_than
exists
not_exists

When a scrap rule is met an email is sent to the set-up email account.

Installation

Clone the repo with

$ go get -u github.com/mzampetakis/scrappy

Go modules are required.

In order to build the project use:

go build -o scrappy main.go

Usage

Configuration

A valid configuration is required for accessing an email account. The configuration for email account used to send the email can be placed at the email.conf with the following format:

{
    "email": "[email protected]",
    "password": "mails_password"
}

The email is used through SMTP protocol.

Don't use your personal email password. Issue a third party account access.

The scraps.json files contains all the rules for the available scraps. It's a json formatted text file. It is highly recommended using the available CLI tools to manage this file.

Adding a scrapper

Adding a new scrap rule-set can be done through the available CLI command. The command to add a new scrap is

./scrappy --mode add

The CLI will prompt for the required fields and validate the given values.

Starting the scrap

In order to start the scrap to run use

./scrappy

or in order to start it and let it on the background to run

./scrappy &

Contribute

You can contribute to this project by just opening a PR or open first an issue. Please describe thoroughly what are your PR solves or adds.

Some ideas for contribution:

  • Add other types of informers
  • Use separate email per scrapper
  • Improve CLIs in terms of suggestions
  • Your idea here...
Owner
Similar Resources

A command line tool that builds and (re)starts your web application everytime you save a Go or template fileA command line tool that builds and (re)starts your web application everytime you save a Go or template file

# Fresh Fresh is a command line tool that builds and (re)starts your web application everytime you save a Go or template file. If the web framework yo

Nov 22, 2021

Minutes is a CLI tool for synchronizing work logs between multiple time trackers, invoicing, and bookkeeping software to make entrepreneurs' daily work easier.

Minutes is a CLI tool for synchronizing work logs between multiple time trackers, invoicing, and bookkeeping software to make entrepreneurs' daily work easier.

Minutes is a CLI tool for synchronizing work logs between multiple time trackers, invoicing, and bookkeeping software to make entrepreneurs' daily work easier.

Aug 8, 2022

An easy to use menu structure for cli applications that prompts users to make choices.

An easy to use menu structure for cli applications that prompts users to make choices.

WMenu Package wmenu creates menus for cli programs. It uses wlog for its interface with the command line. It uses os.Stdin, os.Stdout, and os.Stderr w

Dec 26, 2022

Prompts users to enter values for required flags in Cobra CLI applications

Cobra Flag Prompt Cobra Flag Prompt prompts users to enter values for required flags. It is an extension of Cobra, and requires that you use Cobra to

Nov 13, 2021

A very basic cli keyring tool to use accross various OS.

A very basic cli keyring tool to use accross various OS.

Dec 14, 2022

PingMe is a CLI tool which provides the ability to send messages or alerts to multiple messaging platforms & email.

PingMe is a CLI tool which provides the ability to send messages or alerts to multiple messaging platforms & email.

PingMe is a personal project to satisfy my needs of having alerts, most major platforms have integration to send alerts but its not always useful, either you are stuck with one particular platform, or you have to do alot of integrations. I needed a small app which i can just call from my backup scripts, cron jobs, CI/CD pipelines or from anywhere to send a message with particular information. And i can ship it everywhere with ease. Hence, the birth of PingMe.

Dec 28, 2022

git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.

git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.

Table of contents Introduction Reference Contributing Introduction Overview git-xargs is a command-line tool (CLI) for making updates across multiple

Dec 31, 2022

CLI tool for manipulating GitHub Labels across multiple repositories

takolabel Installation Mac $ brew install tommy6073/tap/takolabel Other platforms Download from Releases page in this repository. Usage Set variables

Nov 3, 2022

git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command

git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command

git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command. You give git-xargs:

Feb 5, 2022
Related tags
CLI for SendGrid, which helps in managing SSO users, can install and update users from yaml config

Sendgrid API This script is needed to add new users to SendGrid as SSO teammates. Previously, all users were manually added and manually migrating the

Jul 20, 2022
The runner project is to create an interface for users to run their code remotely without having to have any compiler on their machine
The runner project is to create an interface for users to run their code remotely without having to have any compiler on their machine

The runner project is to create an interface for users to run their code remotely without having to have any compiler on their machine. This is a work in progress project for TCSS 401X :)

May 29, 2022
Periodically moves your mouse!

Mouse Jiggler Is your hand getting tired from waking your PC? Do you want to appear online while you take a nap? If so, keep reading. Running the bina

Jan 12, 2022
check if new episodes of anime has been released from you're terminal
check if new episodes of anime has been released from you're terminal

checkanime Check if new episodes of you're favourite anime has been released from you're terminal Installation Make sure $GOPATH/bin is added to PATH

Jan 20, 2022
A CLI tool for leveraging IDP signing keys to impersonate users and groups

Imperson8 Disclaimer This is a security testing tool. Only use this on systems you have explicit authorization to test. This isn't an exploit and won'

Jul 23, 2022
A CLI tool to change monitor settings over USB to the Gigabyte M32U

Gigabyte Monitor control Introduction A CLI tool to change monitor settings over USB to the Gigabyte M32U Supported monitors Gigabyte M32U In theory a

Dec 30, 2022
A simple way for CLI command to have many subcommands

subcommands This is a modified fork of google/subcommands that uses lucasepe/pflag Subcommands is a Go package that implements a simple way for a sing

Oct 12, 2021
An unsupervised coverage-guided kernel fuzzer

syzkaller - kernel fuzzer syzkaller is an unsupervised coverage-guided kernel fuzzer. Linux kernel fuzzing has the most support, akaros, freebsd, fuch

Oct 27, 2021
🎄 Go code coverage to SVG treemap

?? Go cover to Treemap Useful when you have large project with lots of files and packages $ go install github.com/nikolaydubina/go-cover-treemap@lates

Jan 9, 2023
News-parser-cli - Simple CLI which allows you to receive news depending on the parameters passed to it
News-parser-cli - Simple CLI which allows you to receive news depending on the parameters passed to it

news-parser-cli Simple CLI which allows you to receive news depending on the par

Jan 4, 2022