Golang PDF library for creating and processing PDF files (pure go)

Last update: Dec 28, 2022

Comments: 11

UniPDF - PDF for Go

UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services.

Features

Create PDF reports. Example output: unidoc-report.pdf.
Table PDF reports. Example output: unipdf-tables.pdf.
Invoice creation
Paragraph in creator handling multiple styles within the same paragraph
Merge PDF pages
Split PDF pages and change page order
Rotate pages
Extract text from PDF files
Text extraction support with size, position and formatting info
PDF to CSV illustrates extracting tabular data from PDF.
Extract images with coordinates
Images to PDF
Add images to pages
Compress and optimize PDF
Watermark PDF files
Advanced page manipulation (blocks/templates)
Load PDF templates and modify
Form creation
Fill and flatten forms
Fill out forms and FDF merging
Unlock PDF files / remove password
Protect PDF files with a password
Digital signing validation and signing
CCITTFaxDecode decoding and encoding support
JBIG2 decoding support

Multiple examples are provided in our example repository https://github.com/unidoc/unipdf-examples.

Installation

With modules:

go get github.com/unidoc/unipdf/v3

How can I convince myself and my boss to buy unipdf rather using a free alternative?

The choice is yours. There are multiple respectable efforts out there that can do many good things.

In UniDoc, we work hard to provide production quality builds taking every detail into consideration and providing excellent support to our customers. See our testimonials for example.

Security. We take security very seriously and we restrict access to github.com/unidoc/unipdf repository with protected branches and only the founders have access and every commit is reviewed prior to being accepted.

The profits are invested back into making unipdf better. We want to make the best possible product and in order to do that we need the best people to contribute. A large fraction of the profits made goes back into developing unipdf. That way we have been able to get many excellent people to work and contribute to unipdf that would not be able to contribute their work for free.

Contributing

All contributors must sign a contributor license agreement before their code will be reviewed and merged.

Support and consulting

Please email us at [email protected] for any queries.

If you have any specific tasks that need to be done, we offer consulting in certain cases. Please contact us with a brief summary of what you need and we will get back to you with a quote, if appropriate.

Licensing Information

This software package (unipdf) is a commercial product and requires a license code to operate.

The use of this software package is governed by the end-user license agreement (EULA) available at: https://unidoc.io/eula/

To obtain a Trial license code to evaluate the software, please visit https://unidoc.io/

Owner

UniDoc

PDF and Office (docx, xlsx, pptx) libraries for Golang

https://github.com/unidoc/unipdf https://unidoc.io

Comments

Finding bounding boxes of substrings of extracted text.

This PR combines the TextComponent idea from https://github.com/unidoc/unidoc/compare/v3...gunnsth:giggiu16-text-extraction-origtext with a mechanism for mapping substrings in extracted text positions described in the comments below.

Addresses parts of #35.

// TextMark maps extracted text to the location of the text on the PDF page and other // properties of the rendered text such the font. // Offset is the offset of the start of the textMark.text in extracted text. // BBox is the bounding box of the textMark. // Text is the extracted text // Meta is set true for characters that don't appear in the input PDF. // You can find the location of substrings in the extracted text as follows: // Use ToTextLocation() to return the extracted text as a []TextMark sorted by Offset. // substring := extracted[start:end+1] // binary search the []TextMark for start and end. // The bounding box of substring on the PDF page is the union of the TextMark.BBox's with // start <= TextMark.Offset < end // // getBBox() in test_text.go shows how to compute bounding boxes of substrings of extracted text. // The following code extracts the text on PDF page page into text then finds the bounding box // bbox of substring term in text. (indexRunes() works like strings.Index except that it // returns number of runes rather than numberof bytes) // // ex, _ := New(page) // // handle errors // pageText, _, _, err := ex.ExtractPageText() // // handle errors // text := pageText.Text() // textMarks := pageText.Marks() // // start := indexRunes(text, term) // end := start + len([]rune(term)) // spanMarks, err := textMarks.RangeOffset(start, end) // // handle errors // bbox, ok := spanMarks.BBox() // // handle errror

This change is
[BUG] Issue with Go 1.16
Description

As I am on a Macbook M1, I have the Go version 1.16 installed and I can't get unipdf compiles with Go 1.16 (works with 1.15 on another Mac)

Actual Behavior

Steps to reproduce the behavior:

Install Go 1.16 beta

Copy one of unipdf examples

go run main.go

See error:

github.com/unidoc/unipdf/v3/model <autogenerated>:1: internal compiler error: child dcl collision on symbol _febf within "".(*PdfAppender).replaceObject"
Text extraction code for columns.
This is a major update to the text extraction code that works with text arranged in columns.

extractor/text.go is now split across multiple text_*.go files.

the new design is summarised in the extractor README.

Here are new PDFs and text extraction references files for extractor/text_test.go.

reference.zip + eu.page005.txt +[Productivity.page001.txt] (https://github.com/unidoc/unipdf/files/4735832/Productivity.page001.txt) + we-dms.page001.txt + radar-eng.page002.txt + Nuance.page001.txt

pdfs.zip + eu.pdf + Productivity.pdf + we-dms.pdf + radar-eng.pdf +Nuance.pdf

You can also run pdf_extract_text.go to see the extraction. There is an updated version of this test here that makes it easier to test a corpus of PDFs.

This change is
Improve table rendering speed

I use unipdf to generate a pdf file, the file has 10 columns, 652 lines, Chinese characters, the generated table is 42 pages long, the entire process consumes 33.76 seconds, the font used is microsoft.ttf, how to optimize to provide performance, improve rendering speed?
Action support

Added actions (See Section 12.6 p. 412) Added filespec (See Section 7.11.3 p. 102)

Return was added in the creator so you can add annotations to a newly created page.

This change is
[BUG] 4002 error in Adobe Appearance Integrity Report for digital signature
Description

When we create any digital signature on document Adobe Appearance Integrity Report always show 4002 error for digital signature.

Expected Behavior

Add digital signature without any errors from Adobe Appearance Integrity Report

Actual Behavior

Steps to reproduce the behavior:

Add any digital signature

Open signature tab and click to "Click to view this version

Click on view report button

See error
Render unicode font characters
Description

I'm trying to render font glyph from the NotoEmoji.ttf font (Freely available from Google). NotoEmoji-Regular.ttf.zip

If I pass a string with the correct unicode point eg: \u3299 or \U00003299 this successfully renders. However if I use a unicode that needs extra bytes this fails to render. eg \U0001F60E

Expected Behavior

All available glyphs in the font to render correctly based on the unicode value.

Actual Behavior

No char is rendered

Attachments

font, _ = model.NewCompositePdfFontFromTTFFile("./NotoEmoji-Regular.ttf") p := c.NewStyledParagraph() t := p.Append("\U0001F60E") t.Style.Font = font c.Draw(p)

Ps. I'm currently evaluating unipdf before buying a license (including unioffice)
[BUG] UniPdf/invoice Long textblock is cutoff on a page instead of being split across pages

Description

Given a large contiguous text block (no line breaks/carriage returns), sections such as invoice.SetTerms or the base AddSection do not automatically break content across pages.

Expected Behavior

It would be awesome if this text content was split to new pages when encountering the end of a page or pages.

Actual Behavior

Steps to reproduce the behavior: Using the pdf_invoice_simple.go as a starting point, simply update the content for SetTerms to be a long contiguous block of text.

Attachments

Attached is an example pdf and the modified pdf_invoice_simple.go

pdf_invoice_simple.zip
[BUG] Protected files cannot be signed

Description

A PDF that's already password protected, or one that's created with unipdf's Encrypt() can no longer be digitally signed using the Appender{}. It throws the error page 1 not found.

Expected Behavior

It should be possible to open a PDF, decrypt it using Decrypt(), and use an Appender{} to apply a digital signature.

NewPdfAppender() makes a copy of an incoming reader https://github.com/unidoc/unipdf/blob/47ae7e277ef501ca413572ab9d42fada3f87c19f/model/appender.go#L136. It seems though that at this point, the decryption is lost. I hardcoded a .Decrypt() right after the copy is made and the error vanished, but the output lost all PDF contents and produced a blank page. This could potentially be due to the object comparison that happens in the update methods in the Appender.
Unable to extract and add again a font to a new pdf document

Hello to all, I am trying to extract certain text from a PDF page, and writing it back on a new PDF document. I am experimenting with v3 branch, that have new api tro extract vectorized text from the page. In order to return all the text marks -that are private in the current version of the api- I created a new convencience struct returning from a new getter in PageText struct.

The text extraction works well, the problem is that I am unable to set the font for a new paragraph element, created iterating returning marks, the same as the original one.

I also tried to add the font before in the page and then to the paragraph, but I have only errors, such as:

[DEBUG] simple.go:56 ERROR: NewSimpleTextEncoder. Unknown encoding "default" [DEBUG] simple.go:56 ERROR: NewSimpleTextEncoder. Unknown encoding "custom" error is: unsupported font encoding

or, with another PDF files:

[DEBUG] ttfparser.go:527 parseCmapVersion: format=0 length=262 language=0 [DEBUG] ttfparser.go:732 No PostScript name information is provided for the font.

This is the test code I am using: Gist You can pull the library with changes here: Repo Link

I attach the PDF files I'm testing on.

Thank you in advance! newspaper.pdf
[FEATURE] Ability to set opacity on styledParagraph

We have a use case where we require to write text over the top of existing text, but, we need this new text to not be fully opaque. Is there any way currently, that maybe I'm just not seeing in the docs, to set opacity on a StyledParagraph similar to how you can set opacity on a drawn line?

Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

go-exiftool go-exiftool is a golang library that wraps ExifTool. ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS,

Dec 28, 2022

A Docker-powered stateless API for PDF files.

Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice to convert many documents (HTML, Markdown, Word, Excel, etc.) to PDF, transform them, merge them, and more!

Dec 30, 2022

A simple library for generating PDF written in Go lang

gopdf gopdf is a simple library for generating PDF document written in Go lang. Features Unicode subfont embedding. (Chinese, Japanese, Korean, etc.)

Jan 3, 2023

Convert document to pdf with golang

Convert document to pdf Build docker: docker build --pull --rm -f "Dockerfile" -t convertdocument:latest "." docker run -p 3000:3000 registry.gitlab.

Nov 29, 2021

SeaweedFS is a distributed storage system for blobs, objects, files, and data warehouse, to store and serve billions of files fast! Blob store has O(1) disk seek, local tiering, cloud tiering. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV.

SeaweedFS Sponsor SeaweedFS via Patreon SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible ent

Jan 9, 2023

A PDF document generator with high level support for text, drawing and images

GoFPDF document generator Package go-pdf/fpdf implements a PDF document generator with high level support for text, drawing and images. Features UTF-8

Jan 4, 2023

Read data from rss, convert in pdf and send to kindle. Amazon automatically convert them in azw3.

Kindle-RSS-PDF-AZW3 The Kindle RSS PDF AZW3 is a personal project. The Kindle RSS PDF AZW3 is a personal project. I received a Kindle for Christmas, a

Jan 10, 2022

GoCsv is a library written in pure Go to use csv data more comfortable

GoCsv GoCsv is a library written in pure Go to use csv data more comfortable Supported Go version golang >= 1.13 Installation go get github.com/shr004

Nov 1, 2022

Processing large file - go

not_yet_hit_the_wall Processing large file - go After reading Marcel Lanz's tweet (seems somebody liked it, and it was shown in my twitter's home), an

Nov 18, 2021

app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files

app-services-go-linter app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files. A

Nov 29, 2021

Golang PDF library for creating and processing PDF files (pure go)

UniPDF - PDF for Go

Features

Installation

How can I convince myself and my boss to buy unipdf rather using a free alternative?

Contributing

Support and consulting

Licensing Information

Owner

UniDoc

Comments

Finding bounding boxes of substrings of extracted text.

[BUG] Issue with Go 1.16

Description

Actual Behavior

Text extraction code for columns.

Improve table rendering speed

Action support

[BUG] 4002 error in Adobe Appearance Integrity Report for digital signature

Description

Expected Behavior

Actual Behavior

Render unicode font characters

Description

Expected Behavior

Actual Behavior

Attachments

[BUG] UniPdf/invoice Long textblock is cutoff on a page instead of being split across pages

Description

Expected Behavior

Actual Behavior

Attachments

[BUG] Protected files cannot be signed

Description

Expected Behavior

Unable to extract and add again a font to a new pdf document

[FEATURE] Ability to set opacity on styledParagraph

Related tags

Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

A Docker-powered stateless API for PDF files.

A simple library for generating PDF written in Go lang

Convert document to pdf with golang

A PDF document generator with high level support for text, drawing and images

Read data from rss, convert in pdf and send to kindle. Amazon automatically convert them in azw3.

GoCsv is a library written in pure Go to use csv data more comfortable

Processing large file - go

app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files

A PDF processor written in Go.

PDF tools for reMarkable tablets

A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format

PDF file parser

create PDF from ASCII File for Cable labels

Ghostinthepdf - This is a small tool that helps to embed a PostScript file into a PDF

Go-wk - PDF Generation API with wkhtmltopdf

Newser is a simple utility to generate a pdf with you favorite news articles

PDF Annotator of Nightmares 🎃