Golang PDF library for creating and processing PDF files (pure go)

UniPDF - PDF for Go

UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services.

GitHub (pre-)release License: UniDoc EULA GoDoc

Features

Multiple examples are provided in our example repository https://github.com/unidoc/unipdf-examples.

Contact us if you need any specific examples.

Installation

With modules:

go get github.com/unidoc/unipdf/v3

How can I convince myself and my boss to buy unipdf rather using a free alternative?

The choice is yours. There are multiple respectable efforts out there that can do many good things.

In UniDoc, we work hard to provide production quality builds taking every detail into consideration and providing excellent support to our customers. See our testimonials for example.

Security. We take security very seriously and we restrict access to github.com/unidoc/unipdf repository with protected branches and only the founders have access and every commit is reviewed prior to being accepted.

The profits are invested back into making unipdf better. We want to make the best possible product and in order to do that we need the best people to contribute. A large fraction of the profits made goes back into developing unipdf. That way we have been able to get many excellent people to work and contribute to unipdf that would not be able to contribute their work for free.

Contributing

CLA assistant

All contributors must sign a contributor license agreement before their code will be reviewed and merged.

Support and consulting

Please email us at [email protected] for any queries.

If you have any specific tasks that need to be done, we offer consulting in certain cases. Please contact us with a brief summary of what you need and we will get back to you with a quote, if appropriate.

Licensing Information

This software package (unipdf) is a commercial product and requires a license code to operate.

The use of this software package is governed by the end-user license agreement (EULA) available at: https://unidoc.io/eula/

To obtain a Trial license code to evaluate the software, please visit https://unidoc.io/

Owner
UniDoc
PDF and Office (docx, xlsx, pptx) libraries for Golang
UniDoc
Comments
  • Finding bounding boxes of substrings of extracted text.

    Finding bounding boxes of substrings of extracted text.

    This PR combines the TextComponent idea from https://github.com/unidoc/unidoc/compare/v3...gunnsth:giggiu16-text-extraction-origtext with a mechanism for mapping substrings in extracted text positions described in the comments below.

    Addresses parts of #35.

    // TextMark maps extracted text to the location of the text on the PDF page and other // properties of the rendered text such the font. // Offset is the offset of the start of the textMark.text in extracted text. // BBox is the bounding box of the textMark. // Text is the extracted text // Meta is set true for characters that don't appear in the input PDF. // You can find the location of substrings in the extracted text as follows: // Use ToTextLocation() to return the extracted text as a []TextMark sorted by Offset. // substring := extracted[start:end+1] // binary search the []TextMark for start and end. // The bounding box of substring on the PDF page is the union of the TextMark.BBox's with // start <= TextMark.Offset < end // // getBBox() in test_text.go shows how to compute bounding boxes of substrings of extracted text. // The following code extracts the text on PDF page page into text then finds the bounding box // bbox of substring term in text. (indexRunes() works like strings.Index except that it // returns number of runes rather than numberof bytes) // // ex, _ := New(page) // // handle errors // pageText, _, _, err := ex.ExtractPageText() // // handle errors // text := pageText.Text() // textMarks := pageText.Marks() // // start := indexRunes(text, term) // end := start + len([]rune(term)) // spanMarks, err := textMarks.RangeOffset(start, end) // // handle errors // bbox, ok := spanMarks.BBox() // // handle errror


    This change is Reviewable

  • [BUG] Issue with Go 1.16

    [BUG] Issue with Go 1.16

    Description

    As I am on a Macbook M1, I have the Go version 1.16 installed and I can't get unipdf compiles with Go 1.16 (works with 1.15 on another Mac)

    Actual Behavior

    Steps to reproduce the behavior:

    1. Install Go 1.16 beta
    2. Copy one of unipdf examples
    3. go run main.go
    4. See error:
    github.com/unidoc/unipdf/v3/model
    <autogenerated>:1: internal compiler error: child dcl collision on symbol _febf within "".(*PdfAppender).replaceObject"
    
  • Text extraction code for columns.

    Text extraction code for columns.

    This is a major update to the text extraction code that works with text arranged in columns.

    • extractor/text.go is now split across multiple text_*.go files.
    • the new design is summarised in the extractor README.

    Here are new PDFs and text extraction references files for extractor/text_test.go.

    You can also run pdf_extract_text.go to see the extraction. There is an updated version of this test here that makes it easier to test a corpus of PDFs.


    This change is Reviewable

  • Improve table rendering speed

    Improve table rendering speed

    I use unipdf to generate a pdf file, the file has 10 columns, 652 lines, Chinese characters, the generated table is 42 pages long, the entire process consumes 33.76 seconds, the font used is microsoft.ttf, how to optimize to provide performance, improve rendering speed?

  • Action support

    Action support

    Added actions (See Section 12.6 p. 412) Added filespec (See Section 7.11.3 p. 102)

    Return was added in the creator so you can add annotations to a newly created page.


    This change is Reviewable

  • [BUG] 4002 error in Adobe Appearance Integrity Report for digital signature

    [BUG] 4002 error in Adobe Appearance Integrity Report for digital signature

    Description

    When we create any digital signature on document Adobe Appearance Integrity Report always show 4002 error for digital signature. image

    Expected Behavior

    Add digital signature without any errors from Adobe Appearance Integrity Report

    Actual Behavior

    Steps to reproduce the behavior:

    1. Add any digital signature
    2. Open signature tab and click to "Click to view this version image
    3. Click on view report button image
    4. See error image
  • Render unicode font characters

    Render unicode font characters

    Description

    I'm trying to render font glyph from the NotoEmoji.ttf font (Freely available from Google). NotoEmoji-Regular.ttf.zip

    If I pass a string with the correct unicode point eg: \u3299 or \U00003299 this successfully renders. However if I use a unicode that needs extra bytes this fails to render. eg \U0001F60E

    Expected Behavior

    All available glyphs in the font to render correctly based on the unicode value.

    Actual Behavior

    No char is rendered

    Attachments

    font, _ = model.NewCompositePdfFontFromTTFFile("./NotoEmoji-Regular.ttf")
    p := c.NewStyledParagraph()
    t := p.Append("\U0001F60E")
    t.Style.Font = font
    c.Draw(p)
    

    Ps. I'm currently evaluating unipdf before buying a license (including unioffice)

  • [BUG] UniPdf/invoice Long textblock is cutoff on a page instead of being split across pages

    [BUG] UniPdf/invoice Long textblock is cutoff on a page instead of being split across pages

    Description

    Given a large contiguous text block (no line breaks/carriage returns), sections such as invoice.SetTerms or the base AddSection do not automatically break content across pages.

    Expected Behavior

    It would be awesome if this text content was split to new pages when encountering the end of a page or pages.

    Actual Behavior

    Steps to reproduce the behavior: Using the pdf_invoice_simple.go as a starting point, simply update the content for SetTerms to be a long contiguous block of text.

    Attachments

    Attached is an example pdf and the modified pdf_invoice_simple.go

    pdf_invoice_simple.zip

  • [BUG] Protected files cannot be signed

    [BUG] Protected files cannot be signed

    Description

    A PDF that's already password protected, or one that's created with unipdf's Encrypt() can no longer be digitally signed using the Appender{}. It throws the error page 1 not found.

    Expected Behavior

    It should be possible to open a PDF, decrypt it using Decrypt(), and use an Appender{} to apply a digital signature.

    NewPdfAppender() makes a copy of an incoming reader https://github.com/unidoc/unipdf/blob/47ae7e277ef501ca413572ab9d42fada3f87c19f/model/appender.go#L136. It seems though that at this point, the decryption is lost. I hardcoded a .Decrypt() right after the copy is made and the error vanished, but the output lost all PDF contents and produced a blank page. This could potentially be due to the object comparison that happens in the update methods in the Appender.

  • Unable to extract and add again a font to a new pdf document

    Unable to extract and add again a font to a new pdf document

    Hello to all, I am trying to extract certain text from a PDF page, and writing it back on a new PDF document. I am experimenting with v3 branch, that have new api tro extract vectorized text from the page. In order to return all the text marks -that are private in the current version of the api- I created a new convencience struct returning from a new getter in PageText struct.

    The text extraction works well, the problem is that I am unable to set the font for a new paragraph element, created iterating returning marks, the same as the original one.

    I also tried to add the font before in the page and then to the paragraph, but I have only errors, such as:

    [DEBUG] simple.go:56 ERROR: NewSimpleTextEncoder. Unknown encoding "default" [DEBUG] simple.go:56 ERROR: NewSimpleTextEncoder. Unknown encoding "custom" error is: unsupported font encoding

    or, with another PDF files:

    [DEBUG] ttfparser.go:527 parseCmapVersion: format=0 length=262 language=0 [DEBUG] ttfparser.go:732 No PostScript name information is provided for the font.

    This is the test code I am using: Gist You can pull the library with changes here: Repo Link

    I attach the PDF files I'm testing on.

    Thank you in advance! newspaper.pdf

  • [FEATURE] Ability to set opacity on styledParagraph

    [FEATURE] Ability to set opacity on styledParagraph

    We have a use case where we require to write text over the top of existing text, but, we need this new text to not be fully opaque. Is there any way currently, that maybe I'm just not seeing in the docs, to set opacity on a StyledParagraph similar to how you can set opacity on a drawn line?

Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

go-exiftool go-exiftool is a golang library that wraps ExifTool. ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS,

Dec 28, 2022
A Docker-powered stateless API for PDF files.
A Docker-powered stateless API for PDF files.

Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice to convert many documents (HTML, Markdown, Word, Excel, etc.) to PDF, transform them, merge them, and more!

Dec 30, 2022
A simple library for generating PDF written in Go lang

gopdf gopdf is a simple library for generating PDF document written in Go lang. Features Unicode subfont embedding. (Chinese, Japanese, Korean, etc.)

Jan 3, 2023
Convert document to pdf with golang

Convert document to pdf Build docker: docker build --pull --rm -f "Dockerfile" -t convertdocument:latest "." docker run -p 3000:3000 registry.gitlab.

Nov 29, 2021
A PDF document generator with high level support for text, drawing and images

GoFPDF document generator Package go-pdf/fpdf implements a PDF document generator with high level support for text, drawing and images. Features UTF-8

Jan 4, 2023
Read data from rss, convert in pdf and send to kindle. Amazon automatically convert them in azw3.

Kindle-RSS-PDF-AZW3 The Kindle RSS PDF AZW3 is a personal project. The Kindle RSS PDF AZW3 is a personal project. I received a Kindle for Christmas, a

Jan 10, 2022
GoCsv is a library written in pure Go to use csv data more comfortable

GoCsv GoCsv is a library written in pure Go to use csv data more comfortable Supported Go version golang >= 1.13 Installation go get github.com/shr004

Nov 1, 2022
Processing large file - go
Processing large file - go

not_yet_hit_the_wall Processing large file - go After reading Marcel Lanz's tweet (seems somebody liked it, and it was shown in my twitter's home), an

Nov 18, 2021
app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files

app-services-go-linter app-services-go-linter plugin analyze source tree of Go files and validates the availability of i18n strings in *.toml files. A

Nov 29, 2021
A PDF processor written in Go.
A PDF processor written in Go.

pdfcpu: a Go PDF processor pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are al

Jan 8, 2023
PDF tools for reMarkable tablets

rm-pdf-tools - PDF tools for reMarkable Disclaimer: rm-pdf-tools is currently in a very early version, bugs are to be expected. Furthermore, the inten

Oct 14, 2022
A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format
A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format

Logbook CLI This is a command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format. It also supports rend

Feb 6, 2022
PDF file parser

#pdf A pdf document parsing and modifying library The libary provides functions to parse and show elements in PDF documents. It checks the validity

Nov 7, 2021
create PDF from ASCII File for Cable labels

CableLable create PDF from ASCII File for Cable labels file format is one label per line, a line containing up to 3 words, each word is a line on the

Nov 8, 2021
Ghostinthepdf - This is a small tool that helps to embed a PostScript file into a PDF

This is a small tool that helps to embed a PostScript file into a PDF in a way that GhostScript will run the PostScript code during the

Dec 20, 2022
Go-wk - PDF Generation API with wkhtmltopdf

Simple PDF Generation API with wkhtmltopdf Quick start Clone the repo locally an

Jan 25, 2022
Newser is a simple utility to generate a pdf with you favorite news articles
Newser is a simple utility to generate a pdf with you favorite news articles

Newser A simple utility to crawl some news sites or other resources and download content into a pdf Building Make sure you have config.yaml setup and

Nov 9, 2022
PDF Annotator of Nightmares πŸŽƒ
PDF Annotator of Nightmares πŸŽƒ

PDFrankenstein is a GUI tool that intends to fill the gap on Linux where a good capable PDF annotator like Adobe Acrobat does not exist. What can you

Dec 8, 2022