Highly customizable archive and index framework for EPITA

epitar.gz

Highly customizable archive and index framework for EPITA.

Get started

  • Create a new config.yml (see config.sample.yml) to configure the EPITA services you wish to archive by specifying the associated archive module.
  • Configure your sonic instance in sonic.cfg.
  • Run the given docker-compose.yml file in order to start your sonic instance and a docconv container (word extractor for PDF files).
  • Run ./epitar start to start archiving and indexing.

How does it work

Archive modules

An archive module scrapes, downloads, or archives websites and services. These modules are highly customizable as they run in Docker containers.

Index

Archived files may be scanned to build a search index. PDF files words are extracted using regular methods or using an OCR for scanned documents.
Words are then processed by a sonic instance in order to build a fast search index.

UI & API

A UI is exposed along with an API to quickly search for files.

Contributing

Add an archive module

An archive module is highly customizable as it can be written in programming language as long as a valid Dockerfile is provided.
Your archive module must have a Dockerfile, a module.json and a README.

Dockerfile

Your Dockerfile can use any base image but try to keep the image size small.

The output directory for archived files must be /output.

module.json

Your module.json must provide informations about the website or service that is being archived.
Here is an example:

{
    "name": "Past-Exams",
    "slug": "past-exams",
    "url": "https://github.com/Epidocs/Past-Exams",
    "description": "Past subjects and other files, for the benefit of EPITA students. ",
    "logo": "https://github.com/fluidicon.png", // optional
    "authors": [
        {
            "name": "Aurele Oules",
            "email": "[email protected]"
        }
    ]
}

README.md

You must provide a simple README.md that explains how to use this module.
An archive module may take environment variables as options so you may explain them here.

Other files

You may add any other files in the module directory but try to keep it organized and only commit necessary files.

You must edit the config.sample.yml file to provide an example on how to use your archive module.

License

MIT - Aurèle Oulès

Owner
Aurèle Oulès
i build things on the internet & also study at @EPITA
Aurèle Oulès
Similar Resources

Keyboard-firmware - Go Keyboard Firmware framework

Go Keyboard Firmware framework This is an experimental project that I am using t

Dec 31, 2022

[TOOL, CLI] - Filter and examine Go type structures, interfaces and their transitive dependencies and relationships. Export structural types as TypeScript value object or bare type representations.

typex Examine Go types and their transitive dependencies. Export results as TypeScript value objects (or types) declaration. Installation go get -u gi

Dec 6, 2022

:chart_with_upwards_trend: Monitors Go MemStats + System stats such as Memory, Swap and CPU and sends via UDP anywhere you want for logging etc...

Package stats Package stats allows for gathering of statistics regarding your Go application and system it is running on and sent them via UDP to a se

Nov 10, 2022

James is your butler and helps you to create, build, debug, test and run your Go projects

James is your butler and helps you to create, build, debug, test and run your Go projects

go-james James is your butler and helps you to create, build, debug, test and run your Go projects. When you often create new apps using Go, it quickl

Oct 8, 2022

GoThanks automatically stars Go's official repository and your go.mod github dependencies, providing a simple way to say thanks to the maintainers of the modules you use and the contributors of Go itself.

GoThanks automatically stars Go's official repository and your go.mod github dependencies, providing a simple way  to say thanks to the maintainers of the modules you use and the contributors of Go itself.

Give thanks (in the form of a GitHub ★) to your fellow Go modules maintainers. About GoThanks performs the following operations Sends a star to Go's r

Dec 24, 2022

A simple Cron library for go that can execute closures or functions at varying intervals, from once a second to once a year on a specific date and time. Primarily for web applications and long running daemons.

Cron.go This is a simple library to handle scheduled tasks. Tasks can be run in a minimum delay of once a second--for which Cron isn't actually design

Dec 17, 2022

Library to work with MimeHeaders and another mime types. Library support wildcards and parameters.

Mime header Motivation This library created to help people to parse media type data, like headers, and store and match it. The main features of the li

Nov 9, 2022

The new home of the CUE language! Validate and define text-based and dynamic configuration

The CUE Data Constraint Language Configure, Unify, Execute CUE is an open source data constraint language which aims to simplify tasks involving defin

Dec 31, 2022

Hack this repo and add your name to the list above. Creativity and style encouraged in both endeavors.

Hack this repo and add your name to the list above. Creativity and style encouraged in both endeavors.

Oct 1, 2021
Comments
  • Website shows error 404

    Website shows error 404

    I was just checking for issues on projects featured on EPITA.it when I noticed yours is down.

    The website epitar.aureleoules.com shows the following error: 404 page not found.

    Can you fix the issue?

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Antch Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started. Antch is a fast, powerful and extensible web crawling &

Jan 6, 2023
F' - A flight software and embedded systems framework

F´ (F Prime) is a component-driven framework that enables rapid development and deployment of spaceflight and other embedded software applications.

Jan 4, 2023
Entitas-Go is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity.

Entitas-Go Entitas-GO is a fast Entity Component System Framework (ECS) Go 1.17 port of Entitas v1.13.0 for C# and Unity. Code Generator Install the l

Dec 26, 2022
Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.

Dec 16, 2022
GoC2 - MacOS Post Exploitation C2 Framework
GoC2 - MacOS Post Exploitation C2 Framework

goc2 c2 client/server/paylod GoC2 - MacOS Post Exploitation C2 Framework Custom C2 for bypassing EDR and ease of use.

Dec 23, 2022
The High Code Framework (low-code for devs)

hof - the high code framework The hof tool tries to remove redundent development activities by using high level designs, code generation, and diff3 wh

Dec 24, 2022
A toaster component for hogosuru framework
A toaster component for hogosuru framework

Toaster component for hogosuru Toaster implementation for hogosuru How to use? Create a hogosurutoaster.Toaster or attach it to a hogosuru container a

Mar 24, 2022
A framework for constructing self-spreading binaries
A framework for constructing self-spreading binaries

A framework that aids in creation of self-spreading software Requirements go get -u github.com/redcode-labs/Coldfire go get -u github.com/yelinaung/go

Jan 2, 2023
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers on a single machine right now.

MapReduce This is an easy-to-use Map Reduce Go framework inspired by 2021 6.824 lab1. Feature Multiple workers on single machine right now. Easy to pa

Dec 5, 2022
Extensions for the melatonin test framework

melatonin-ext - Extensions for the melatonin test framework These packages extend melatonin to provide additional test contexts for testing various 3r

Nov 27, 2021