Convert scanned image PDF file to text annotated PDF file

Jisui (自炊)

This tool is PoC (Proof of Concept).

Jisui is a helper tool to create e-book.
Ordinary the scanned book have not text information, so you cannot search text from the PDF.
Jisui extract texts from a scanned book (PDF) and merge the text to PDF.

This tool is depending on Google Cloud Vision API to extract texts.
So you need GCP account & own project.

Jisui (自炊) is Japanese slung which means that scanning a book to make e-book.

Pre-requirements

Install

$ go get github.com/sachaos/jisui

Usage

$ jisui -bucket [your GCS bucket] -font [Downloaded font] -output result.pdf [scanned PDF file]

Example

You can see example PDF file.

Please download and open it in PDF viewer.

You can recongnize the difference when you search text.

image

Owner
Similar Resources

A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Sep 27, 2022

Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022

Guess the natural language of a text in Go

guesslanguage This is a Go version of python guess-language. guesslanguage provides a simple way to detect the natural language of unicode string and

Dec 26, 2022

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023

Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022

Extract urls from text

xurls Extract urls from text using regular expressions. Requires Go 1.13 or later. import "mvdan.cc/xurls/v2" func main() { rxRelaxed := xurls.Relax

Jan 7, 2023

Easy AWK-style text processing in Go

awk Description awk is a package for the Go programming language that provides an AWK-style text processing capability. The package facilitates splitt

Jul 25, 2022

Change the color of console text.

go-colortext package This is a package to change the color of the text and background in the console, working both under Windows and other systems. Un

Oct 26, 2022

Templating system for HTML and other text documents - go implementation

FAQ What is Kasia.go? Kasia.go is a Go implementation of the Kasia templating system. Kasia is primarily designed for HTML, but you can use it for any

Mar 15, 2022
Convert your markdown files to PDF instantly
Convert your markdown files to PDF instantly

Will take a markdown file as input and then create a PDF file with the markdown formatting.

Nov 7, 2022
AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline

AppGo AppGo is an application that is intended to read a plain text log file and deliver an encoded polyline. Installation To run AppGo it is necessar

Oct 23, 2021
pdfcpu is a PDF processor written in Go.
pdfcpu is a PDF processor written in Go.

pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).

Jan 4, 2023
Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

Nov 23, 2022
A PDF renderer for the goldmark markdown parser.
A PDF renderer for the goldmark markdown parser.

goldmark-pdf goldmark-pdf is a renderer for goldmark that allows rendering to PDF. Reference See https://pkg.go.dev/github.com/stephenafamo/goldmark-p

Jan 7, 2023
pdf document generation library
pdf document generation library

gopdf 项目介绍 gopdf 是一个生成 PDF 文档的 Golang 库. 主要有以下的特点: 支持 Unicode 字符 (包括中文, 日语, 朝鲜语, 等等.) 文档内容的自动定位与分页, 减少用户的工作量. 支持图片插入, 支持多种图片格式, PNG, BMP, JPEG, WEBP,

Dec 8, 2022
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

html-to-markdown Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent so

Jan 6, 2023
Convert xml and json to go struct

xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t

Oct 23, 2022
Convert Microsoft Word Document to Markdown
Convert Microsoft Word Document to Markdown

docx2md Convert Microsoft Word Document to Markdown Usage $ docx2md NewDocument.docx Installation $ go get github.com/mattn/docx2md Supported Styles

Jan 4, 2023
Easily to convert JSON data to Markdown Table

Easily to convert JSON data to Markdown Table

Oct 28, 2022