A PDF renderer for the goldmark markdown parser.

Stephen Afam-Osemene

Last update: Jan 7, 2023

Comments: 7

goldmark-pdf

goldmark-pdf is a renderer for goldmark that allows rendering to PDF.

Reference

See https://pkg.go.dev/github.com/stephenafamo/goldmark-pdf

Usage

Care has been taken to match the semantics of goldmark and its extensions.

The PDF renderer can be initiated with pdf.New() and the returned value satisfies goldmark's renderer.Renderer interface, so it can be passed to goldmark.New() using the goldmark.WithRenderer() option.

markdown := goldmark.New(
    goldmark.WithRenderer(pdf.New()),
)

Options can also be passed to pdf.New(), the options interface to be satisfied is:

// An Option interface is a functional option type for the Renderer.
type Option interface {
	SetConfig(*Config)
}

Here is the Config struct that is to be modified:

type Config struct {
	Context context.Context

	PDF PDF

	// A source for images
	ImageFS fs.FS

	// All other options have sensible defaults
	Styles Styles

	// A cache for the fonts
	FontsCache fonts.Cache

	// For debugging
	TraceWriter io.Writer

	NodeRenderers util.PrioritizedSlice
}

Some helper functions for adding options are already provided. See option.go

An example with some more options:

goldmark.New(
    goldmark.WithRenderer(
        pdf.New(
            pdf.WithTraceWriter(os.Stdout),
            pdf.WithContext(context.Background()),
            pdf.WithImageFS(os.DirFS(".")),
            pdf.WithLinkColor("cc4578"),
            pdf.WithHeadingFont(pdf.GetTextFont("IBM Plex Serif", pdf.FontLora)),
            pdf.WithBodyFont(pdf.GetTextFont("Open Sans", pdf.FontRoboto)),
            pdf.WithCodeFont(pdf.GetCodeFont("Inconsolata", pdf.FontRobotoMono)),
        ),
    ),
)

Fonts

The fonts that can be used in the PDF are based on the Font struct

// Represents a font.
type Font struct {
	CanUseForText bool
	CanUseForCode bool

	Category string
	Family   string

	FileRegular    string
	FileItalic     string
	FileBold       string
	FileBoldItalic string

	Type fontType
}

To be used for text, a font should have regular, italic, bold and bold-italic styles. Each of these has to be loaded separately.

To ease this process, variables have been generated for all the Google fonts that have these styles. For example:

var FontRoboto = Font{
	CanUseForCode:  false,
	CanUseForText:  true,
	Category:       "sans-serif",
	Family:         "Roboto",
	FileBold:       "700",
	FileBoldItalic: "700italic",
	FileItalic:     "italic",
	FileRegular:    "regular",
	Type:           fontTypeGoogle,
}

For codeblocks, if any other style is missing, the regular font is used in place.

var FontMajorMonoDisplay = Font{
	CanUseForCode:  true,
	CanUseForText:  false,
	Category:       "monospace",
	Family:         "Major Mono Display",
	FileBold:       "regular",
	FileBoldItalic: "regular",
	FileItalic:     "regular",
	FileRegular:    "regular",
	Type:           fontTypeGoogle,
}

When loading the fonts, they are downloaded on the fly using the fonts.

If you'd like to use a font outside of these, you should pass your own font struct which have been loaded into the PDF object you set in the Config. Be sure to set the FontType to FontTypeCustom so that we do not attempt to download it.

Contributing

Here's a list of things that I'd love help with:

More documentation
Testing
Finish the (currently buggy) implementation based on gopdf

License

MIT

Author

Stephen Afam-Osemene

Owner

Stephen Afam-Osemene

Part Programmer, Part Engineer, Part Entrepreneur. I have many interests that converge on improving lives with technology

https://github.com/stephenafamo/goldmark-pdf

Comments

Lack of image mime detection

The image renderer:

func (r *nodeRederFuncs) renderImage(w *Writer, source []byte, node ast.Node, entering bool) (ast.WalkStatus, error) {
	// while this has entering and leaving states, it doesn't appear
	// to be useful except for other markup languages to close the tag
	n := node.(*ast.Image)

	if entering {
		w.LogDebug("Image (entering)", fmt.Sprintf("Destination[%v] Title[%v]", string(n.Destination), string(n.Title)))
		// following changes suggested by @sirnewton01, issue #6
		// does file exist?
		imgPath := string(n.Destination) <--------------------------------------------------
		imgFile, err := w.ImageFS.Open(imgPath)
		if err == nil {
			defer imgFile.Close()

			width, _ := w.Pdf.GetPageSize()
			mleft, _, mright, _ := w.Pdf.GetMargins()
			maxw := width - (mleft * 2) - (mright * 2)

			format := strings.ToUpper(strings.Trim(filepath.Ext(imgPath), ".")) <-------
			w.Pdf.RegisterImage(imgPath, format, imgFile)
			w.Pdf.UseImage(imgPath, (mleft * 2), w.Pdf.GetY(), maxw, 0)
		} else {
			log.Printf("IMAGE ERROR: %v", err)
			w.LogDebug("Image (file error)", err.Error())
		}
	} else {
		w.LogDebug("Image (leaving)", "")
	}

	return ast.WalkContinue, nil
}

relies on path to determine the mime type of the file. But if I use http file directory to embed/render images linked via http and not stored locally, this fails miserably unless the url contains the mime type suffix, which is rarely the case.

Hence, there should be a built-in http file directory the mime should be determined manually by github.com/gabriel-vasile/mimetype or similar library.

My quickly built fs is:


type HttpFs struct{}

func (f *HttpFs) Open(name string) (fs.File, error) {
	res, err := http.Get(name)
	if err != nil {
		return nil, err
	}
	return &HttpFile{r: res}, nil
}

type HttpFile struct {
	r *http.Response
}

func (f *HttpFile) Stat() (fs.FileInfo, error) {
	return &HttpInfo{r: f.r}, nil
}

func (f *HttpFile) Read(p []byte) (int, error) {
	return f.r.Body.Read(p)
}

func (f *HttpFile) Close() error {
	return f.r.Body.Close()
}

type HttpInfo struct {
	r *http.Response
}

func (i *HttpInfo) Name() string {
	fn := strings.TrimPrefix(i.r.Request.URL.Path, "/")
	if fn == "" {
		if _, params, err := mime.ParseMediaType(i.r.Header.Get("Content-Disposition")); err == nil {
			fn = params["filename"]
		}
	}
	if filepath.Ext(fn) == "" {
		mt, _, _ := mime.ParseMediaType(i.r.Header.Get("Content-Type"))
		if spl := strings.Split(mt, "/"); len(spl) > 0 {
			if fn == "" {
				fn = spl[0]
			}
			fn += "." + spl[len(spl)-1]
		}
	}
	return filepath.Base(fn)
}

func (i *HttpInfo) Size() int64 {
	return i.r.ContentLength
}

func (i *HttpInfo) Mode() fs.FileMode {
	return fs.ModeIrregular
}

func (i *HttpInfo) ModTime() time.Time {
	if t, err := time.Parse(time.RFC1123, i.r.Header.Get("Last-Modified")); err == nil {
		return t
	}
	return time.Time{}
}

func (i *HttpInfo) IsDir() bool {
	return false
}

func (i *HttpInfo) Sys() any {
	return i.r
}

Tables do not render correctly

If the cell contents are longer than the heading length of the column then the next column will overwrite the last part of the previous column. Column widths are determined based on the values in the header row. Column contents do not wrap (at least not without changing the row height, which I have not tested).

This makes the table functionality not very useful except for the simplest of tables, which is a shame since otherwise the generated PDF looks very good.

Add option to recode embedded images when they are too large

When I link big image(say 4MB), it will be embedded into the pdf file as is, making the file size that much larger.

I think there should be an option to convert images above configured size into jpeg and embed those re-coded images instead of the originals. This can save a lot of disk space in case the assets' quality is not of utmost importance(which is usually the case, very few people embed high-resolution images into PDF documents).

With the FS branch with the new file system, there is access to file size via Stat() and mime detection is present as well, which can be used to invoke the code, that will look something like this:


import (
	"errors"
	"github.com/disintegration/imaging"
	"golang.org/x/image/webp"
	"image"
	"image/gif"
	"image/jpeg"
	"image/png"
	"io"
)

func Resize(srcMime string, src io.Reader, dst io.Writer, documentWidth int) error {
	var img image.Image
	var err error

	switch srcMime {
	case "image/jpeg":
		img, err = jpeg.Decode(src)
	case "image/gif":
		img, err = gif.Decode(src)
	case "image/png":
		img, err = png.Decode(src)
	case "image/webp":
		img, err = webp.Decode(src)
	default:
		return errors.New("unsupported input mime type: " + srcMime)
	}

	if err != nil {
		return err
	}

       if img.Bounds().Dx() < documentWidth {
              documentWidth = img.Bounds().Dx()
       }

	filter := imaging.Lanczos // best quality but slow
	img = imaging.Resize(img, documentWidth, 0, filter)
	return jpeg.Encode(dst, img, nil) // jpeg has the best compression with sufficient quality for this purpose
}

Local links/navigation is not working

Having local navigation, like:

# <a name="top"></a>Markdown Test Page

* [Headings](#Headings)
* [Paragraphs](#Paragraphs)
* [Blockquotes](#Blockquotes)
* [Lists](#Lists)
* [Horizontal rule](#Horizontal)
* [Table](#Table)
* [Code](#Code)
* [Inline elements](#Inline)

***

# <a name="Headings"></a>Headings

# Heading one

Sint sit cillum pariatur eiusmod nulla pariatur ipsum. Sit laborum anim qui mollit tempor pariatur nisi minim dolor. Aliquip et adipisicing sit sit fugiat commodo id sunt. Nostrud enim ad commodo incididunt cupidatat in ullamco ullamco Lorem cupidatat velit enim et Lorem. Ut laborum cillum laboris fugiat culpa sint irure do reprehenderit culpa occaecat. Exercitation esse mollit tempor magna aliqua in occaecat aliquip veniam reprehenderit nisi dolor in laboris dolore velit.

## Heading two

[[Top]](#top)

obrázok

will render links but they won't work.

Image ALT is ignored

When image is not loaded, it's alt text is ignored:

![The San Juan Mountains are beautiful!](san-juan-mountains.jpggg "San Juan Mountains Alt Text")
Image caption is not centered

Image with caption will have it rendered aligned to the left of the document instead of centered, with the image.

For example: ![The San Juan Mountains are beautiful!](/assets/images/san-juan-mountains.jpg "San Juan Mountains")

will end up like:
Small images are stretched for the entire width of the document

I have a testing document which contains a small image of 176px*176px dimensions. The problem is that this image will be scaled to the entire width of the document, which does not look good and is most likely not desired because it would not be rendered as such in HTML document, unless specifically configured with CSS.

The code responsible for this behavior is: https://github.com/stephenafamo/goldmark-pdf/blob/master/renderer_funcs.go#L581

I think that if the image is smaller than the width of the document, it should be kept as is and if the image is wider, only then it should be scaled down to fit the document.

A gemtext renderer for goldmark.

goldmark-gemtext A gemtext renderer for goldmark. You can use this library to parse commonmark markdown (with support for autolinks and strikethrough)

Dec 28, 2021

golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新，保留最基本的预览功能。服务器配置较低，如果出现链接超时请等待几秒刷新重试，或者换Chrome) 目前已经完成 docker部署（不用为运行环境烦恼） Wor

Dec 26, 2022

A markdown renderer package for the terminal

go-term-markdown go-term-markdown is a go package implementing a Markdown renderer for the terminal. Note: Markdown being originally designed to rende

Nov 25, 2022

Convert scanned image PDF file to text annotated PDF file

Jisui (自炊) This tool is PoC (Proof of Concept). Jisui is a helper tool to create e-book. Ordinary the scanned book have not text information, so you c

Dec 11, 2022

Convert your markdown files to PDF instantly

Will take a markdown file as input and then create a PDF file with the markdown formatting.

Nov 7, 2022

🚩 TOC, zero configuration table of content generator for Markdown files, create table of contents from any Markdown file with ease.

toc toc TOC, table of content generator for Markdown files Table of Contents Table of Contents Usage Installation Packages Arch Linux Homebrew Docker

Dec 29, 2022

Markdown - Markdown converter for golang

markdown ?? Talks ?? Join ?? Youtube ❤️ Sponsor Install via nami nami install ma

Jun 2, 2022

Mdfmt - A Markdown formatter that follow the CommonMark. Like gofmt, but for Markdown

Introduction A Markdown formatter that follow the CommonMark. Like gofmt, but fo

Dec 18, 2022

A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

goldmark A Markdown parser written in Go. Easy to extend, standards-compliant, well-structured. goldmark is compliant with CommonMark 0.29. Motivation

Dec 29, 2022

gofontrender renders text with different parameters using the go font renderer

gofontrender This simple program renders text using the go font render. It computes the anti-aliasing by computing the exact pixel coverage with the a

Jun 4, 2021

HomeDashboard renderer core components

HomeDashboard Renderer Core Components Contains core components and basic render

Jan 5, 2022

HTML, CSS and SVG static renderer in pure Go

Web render This module implements a static renderer for the HTML, CSS and SVG formats. It consists for the main part of a Golang port of the awesome W

Apr 19, 2022

pdfcpu is a PDF processor written in Go.

pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000).

Jan 4, 2023

Take screenshots of websites and create PDF from HTML pages using chromium and docker

gochro is a small docker image with chromium installed and a golang based webserver to interact wit it. It can be used to take screenshots of w

Nov 23, 2022

pdf document generation library

gopdf 项目介绍 gopdf 是一个生成 PDF 文档的 Golang 库. 主要有以下的特点: 支持 Unicode 字符 (包括中文, 日语, 朝鲜语, 等等.) 文档内容的自动定位与分页, 减少用户的工作量. 支持图片插入, 支持多种图片格式, PNG, BMP, JPEG, WEBP,

Dec 8, 2022

Blackfriday: a markdown processor for Go

Blackfriday Blackfriday is a Markdown processor implemented in Go. It is paranoid about its input (so you can safely feed it user-supplied data), it i

Jan 8, 2023

⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

html-to-markdown Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent so

Jan 6, 2023

Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Tagify Gets STDIN, file or HTTP address as an input and returns a list of most popular words ordered by popularity as an output. More info about what

Dec 19, 2022

Upskirt markdown library bindings for Go

Goskirt Package goskirt provides Go-bindings for the excellent Sundown Markdown parser. (F/K/A Upskirt). To use goskirt, create a new Goskirt-value wi

Oct 23, 2022

A PDF renderer for the goldmark markdown parser.

goldmark-pdf

Reference

Usage

Fonts

Contributing

License

Author

Owner

Stephen Afam-Osemene

Comments

Lack of image mime detection

Tables do not render correctly

Add option to recode embedded images when they are too large

Local links/navigation is not working

Image ALT is ignored

Image caption is not centered

Small images are stretched for the entire width of the document

Related tags

A gemtext renderer for goldmark.

golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

A markdown renderer package for the terminal

Convert scanned image PDF file to text annotated PDF file

Convert your markdown files to PDF instantly

🚩 TOC, zero configuration table of content generator for Markdown files, create table of contents from any Markdown file with ease.

Markdown - Markdown converter for golang

Mdfmt - A Markdown formatter that follow the CommonMark. Like gofmt, but for Markdown

A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

gofontrender renders text with different parameters using the go font renderer

HomeDashboard renderer core components

HTML, CSS and SVG static renderer in pure Go

pdfcpu is a PDF processor written in Go.

Take screenshots of websites and create PDF from HTML pages using chromium and docker

pdf document generation library

Blackfriday: a markdown processor for Go

⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

Produces a set of tags from given source. Source can be either an HTML page, Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.

Upskirt markdown library bindings for Go