Pure go library for creating and processing Office Word (.docx), Excel (.xlsx) and Powerpoint (.pptx) documents

unioffice is a library for creation of Office Open XML documents (.docx, .xlsx and .pptx). Its goal is to be the most compatible and highest performance Go library for creation and editing of docx/xlsx/pptx files.

Build Status GitHub (pre-)release License: UniDoc EULA GoDoc

https://github.com/unidoc/unioffice/

Status

  • Documents (docx) [Word]
    • Read/Write/Edit
    • Formatting
    • Images
    • Tables
  • Spreadsheets (xlsx) [Excel]
    • Read/Write/Edit
    • Cell formatting including conditional formatting
    • Cell validation (drop down combobox, rules, etc.)
    • Retrieve cell values as formatted by Excel (e.g. retrieve a date or number as displayed in Excel)
    • Formula Evaluation (100+ functions supported currently, more will be added as required)
    • Embedded Images
    • All chart types
  • PowerPoint (pptx) [PowerPoint]
    • Creation from templates
    • Textboxes/shapes

Performance

There has been a great deal of interest in performance numbers for spreadsheet creation/reading lately, so here are unioffice numbers for this benchmark which creates a sheet with 30k rows, each with 100 columns.

creating 30000 rows * 100 cells took 3.92506863s
saving took 89ns
reading took 9.522383048s

Creation is fairly fast, saving is very quick due to no reflection usage, and reading is a bit slower. The downside is that the binary is large (33MB) as it contains generated structs, serialization and deserialization code for all of DOCX/XLSX/PPTX.

Installation

go get github.com/unidoc/unioffice/

Document Examples

Spreadsheet Examples

Presentation Examples

Raw Types

The OOXML specification is large and creating a friendly API to cover the entire specification is a very time consuming endeavor. This library attempts to provide an easy to use API for common use cases in creating OOXML documents while allowing users to fall back to raw document manipulation should the library's API not cover a specific use case.

The raw XML based types reside in the schema/ directory. These types are accessible from the wrapper types via a X() method that returns the raw type.

For example, the library currently doesn't have an API for setting a document background color. However it's easy to do manually via editing the CT_Background element of the document.

dox := document.New()
doc.X().Background = wordprocessingml.NewCT_Background()
doc.X().Background.ColorAttr = &wordprocessingml.ST_HexColor{}
doc.X().Background.ColorAttr.ST_HexColorRGB = color.RGB(50, 50, 50).AsRGBString()

Contribution guidelines

CLA assistant

All contributors are must sign a contributor license agreement before their code will be reviewed and merged.

Licensing

This software package (unioffice) is a commercial product and requires a license code to operate.

The use of this software package is governed by the end-user license agreement (EULA) available at: https://unidoc.io/eula/

To obtain a Trial license code to evaluate the software, please visit https://unidoc.io/

Owner
UniDoc
PDF and Office (docx, xlsx, pptx) libraries for Golang
UniDoc
Comments
  • Nil pointer when trying to extract image data

    Nil pointer when trying to extract image data

    Description

    Hey folks. I am trying to extract raw image data from an MS word document. Here is the code snippet:

            doc, err := document.Read(reader, reader.Size())
    	if err != nil {
    		return "", nil, fmt.Errorf("document read failure with error: %v", err)
    	}
    	defer doc.Close()
    
    	for _, img := range doc.Images {
    		if img.Data() == nil {
    			ctx.Logger().Warn("received an image with nil data")
    			continue
    		}
    		_, imgResults, err := ScanImage(ctx, clients, n, f, *img.Data())
    		if err != nil {
    			ctx.Logger().Errorf("image scanning failure with error: %v", err)
    		}
    	}
    

    The doc.Images array is successfully populated with the number of images in the document however when I call img.Data() I receive a nil pointer.

    Expected Behavior

    img.Data() should return a non-nil pointer

    Actual Behavior

    img.Data() returns a nil pointer

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • Some Bugs In Windows

    Some Bugs In Windows

    Description

    Hi,author. First of all,I love this lib,and my english is not good.If I used the wrong words, please forgive me. When I watching your example, I was so exciting.But after I run your example's code.The worry is appear.Open the generate file,the windows tell me, can open this file.

    Expected Behavior

    Actual Behavior

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • Unsupported purl.oclc.org (strict ooxml namespace)

    Unsupported purl.oclc.org (strict ooxml namespace)

    Description

    I'm hitting a weird issue when updating FldChar's. I need to replace the default text in some form fields. If I mark the fields as dirty or set SetUpdateFieldsOnOpen(true) and then select to update the entire table as prompted when the document opens, saving the result makes it unusable by gooxml. I get the following warnings and doc.X().Body is nil:

    2019/02/19 16:34:16 unsupported relationship type: http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument tgt: word/document.xml
    2019/02/19 16:34:16 unsupported relationship type: http://purl.oclc.org/ooxml/officeDocument/relationships/extendedProperties tgt: docProps/app.xml
    

    Digging through the raw XML, it doesn't look like an issue with the actual FldChar's that are being altered, it looks like an issue with the document namespacing. For some reason, after updating the form fields, some of the http://schemas.openxmlformats.org/officeDocument/2006/... attributes get replaced with http://purl.oclc.org/ooxml/officeDocument/....

    I don't know enough about the spec to know why these would change. But the resulting document opens just fine with Word. It is only gooxml that has an issue.

    I tried this with and without the presence of a table of contents. When the table of contents isn't present, there is no issue. When the table of contents is present, and you select "update entire table", then you experience the problem after saving the document.

    Expected Behavior

    Handle documents using purl.oclc.org namespacing appropriately.

    Actual Behavior

    Documents with purl.oclc.org namespacing have a nil Body.

    I've attached before and after documents. The actual FldChar changes appear starting on page 11 (a result of re-using code that originally produced the issue). If you need the code making the changes or simplified before/after documents, just let me know.

    after.docx before.docx

  • Support adding/replacing MERGEFIELDs

    Support adding/replacing MERGEFIELDs

    I'm trying to understand if/how 'MERGEFIELDS' are supported within gooxml, or if it is the kind of thing I would need to drop into .X() to handle?

    I did see that there are doc.FormFields(), r.AddField(), etc functions, but as best I could tell, these didn't seem to do what I want. I also came across the 'KnownFields', which seems to correlate with this, but couldn't tell if it was associated to some deeper support/code:

    • https://github.com/baliance/gooxml/blob/master/document/knownfields.go

    Essentially, is there a way to create, read, edit/update, etc these elements in a gooxml native way currently? And if not, do you have any suggestions of the best way to interact with them?

    Below is a snippet from a document that uses these fields:

    <w:p w14:paraId="1566BC4D" w14:textId="3B6A9F12" w:rsidR="006D368D" w:rsidRPr="00497636" w:rsidRDefault="000E0283">
            <w:pPr>
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
            </w:pPr>
            <w:r>
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:t>Merge Field:</w:t>
            </w:r>
            <w:r w:rsidR="006D368D">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:t xml:space="preserve">
                </w:t>
            </w:r>
            <w:r w:rsidRPr="00497636">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:fldChar w:fldCharType="begin"/>
            </w:r>
            <w:r w:rsidRPr="00497636">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:instrText xml:space="preserve"> MERGEFIELD  $Foo.Bar  \* MERGEFORMAT </w:instrText>
            </w:r>
            <w:r w:rsidRPr="00497636">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:fldChar w:fldCharType="separate"/>
            </w:r>
            <w:r w:rsidRPr="00497636">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:t>«$Foo.Bar»</w:t>
            </w:r>
            <w:r w:rsidRPr="00497636">
                <w:rPr>
                    <w:lang w:val="en-AU"/>
                </w:rPr>
                <w:fldChar w:fldCharType="end"/>
            </w:r>
        </w:p>
    

    Refs:

    • https://github.com/baliance/gooxml/blob/master/document/knownfields.go
    • http://officeopenxml.com/WPfields.php
    • http://officeopenxml.com/WPfieldInstructions.php
    • http://officeopenxml.com/WPgeneralFieldSwitches.php
  • recompile the example and cannot open the generate .docx

    recompile the example and cannot open the generate .docx

    experimental environment: 1.OS:windows 7 x64、VScode 2.Go version 1.9 first,i use "go get baliance.com/gooxml",after that,i use "go build baliance.com/gooxml/..."but the compiler error:"The filename or extension is too long." so i rename the"schemas.openxmlformats.org" to "s" and change all the "schemas.openxmlformats.org" to "s" of the path in all files. finally,it compiles successfully. i test the file in "_examples/document/tables/main.go" ------go run main.go--- ------success----(maybe success,no error and generate the" tables.docx" file)------ ------cannot open the tables.docx------ so i test other examples,they all generate the .docx but cannot open with "MS Office "

    cheers,looking for your replay.

  • add the ability to utilize footnotes and endnotes in documents

    add the ability to utilize footnotes and endnotes in documents

    Added:

    • Basic CRUD functions to handle both endnotes and footnotes
    • Tester functions in convention of the library (e.g., HasFootnotes or IsFootnote)
    • Tests to cover the added functionality.

    This change is Reviewable

  • [Question] Handling excel data

    [Question] Handling excel data

    As far as i saw, there is no way to get the data from an excel file as a matrix when running row by row. unioffice removes empty cols in a row.

    Is there any way to get the whole data of an excel, even with empty cols?

  • [UO-129] Incorrect conversion of doc to pdf

    [UO-129] Incorrect conversion of doc to pdf

    Description

    Incorrect conversion of doc to pdf

    Expected Behavior

    doc to pdf converted correctly

    Actual Behavior

                outputPath := fmt.Sprintf("output/%s.pdf", filename)
    	doc, err := document.Open("https://github.com/unidoc/unioffice/files/8096886/liuna.docx")
    	if err != nil {
    		log.Fatalf("error opening document: %s", err)
    	}
    	defer doc.Close()
    	c := convert.ConvertToPdf(doc)
    
    	err = c.WriteToFile(outputPath)
    	if err != nil {
    		log.Fatalf("error converting document: %s", err)
    	}
    

    But the conversion result is incorrect,I tried using https://foxyutils.com/wordtopdf/ the conversion is correct

  • Bug in RunProperties.IsBold()

    Bug in RunProperties.IsBold()

    This is the method code:

    func (r RunProperties) IsBold() bool {
    	return r.x.B != nil
    }
    

    It works in most cases, since a non-bold run's properties look like this:

                B: (*wml.CT_OnOff)(<nil>),
    

    And a bold run's properties look like this:

                    B: (*wml.CT_OnOff)(0xc42000e450)({
                     ValAttr: (*sharedTypes.ST_OnOff)(<nil>)
                    }),
    

    However, if the style uses bold by default, and a run turns it off, then B will look like this:

                    B: (*wml.CT_OnOff)(0xc42000e3a0)({
                     ValAttr: (*sharedTypes.ST_OnOff)(0xc420011d80)(false)
                    })
    

    Therefore, IsBold() would falsely say that the run is bold.

    This seems to map directly from XML where <w:b/> is short for <w:b val="true"/>, and turning bold off requires <w:b val="false"/>.

    Actually I can't say what the proper behaviour should be, because a boolean is not enough to express the three possible states:

    • No change to boldness
    • Turn on boldness
    • Turn off boldness

    Perhaps two methods are needed: IsBold() and BoldSet():

    func (r RunProperties) IsBold() bool {
    	if r.x.B != nil {
    		if r.x.B.ValAttr != nil && r.x.B.ValAttr.Bool != nil && *r.x.B.ValAttr.Bool == false {
    			return false
    		} else {
    			return true
    		}
    	} else {
    		return false
    	}
    }
    func (r RunProperties) BoldSet() bool {
    	return r.x.B != nil
    }
    

    I also noticed an oddity in IsItalic():

    func (r RunProperties) IsItalic() bool {
    	if r.x == nil {
    		return false
    	}
    	return r.x.I != nil
    }
    

    All other methods assume that r.x is safely non-nil, so why this?

  • Does not compile on Windows

    Does not compile on Windows

    Description

    Trying to use document.New() on Windows.

    Expected Behavior

    It compiles.

    Actual Behavior

    Getting an error:

    go build baliance.com/gooxml/schema/soo/wml: C:\Go\pkg\tool\windows_amd64\compile.exe: fork/exec C:\Go\pkg\tool\windows_amd64\compile.exe: The filename or extension is too long.
    
  • Unable to Extract Images from Docx File

    Unable to Extract Images from Docx File

    Description

    I have a docx file that contains a single image. Here's the code I am using to pull out the images:

    	reader := bytes.NewReader(content)
    	doc, err := presentation.Read(reader, reader.Size())
    	if err != nil {
    		return "", nil, fmt.Errorf("presentation read failure with error: %v", err)
    	}
    	if doc == nil {
    		return "", nil, fmt.Errorf("internal error: [presentation.Read] returned a nil pointer")
    	}
    	defer doc.Close()
    
    	for _, img := range doc.Images {
    		if img.Path() == "" {
    			ctx.Logger().Warn("received an image with an empty path")
    			continue
    		}
    		data, err := os.ReadFile(img.Path())
    		if err != nil {
    			ctx.Logger().Error("failed to read file: %s with error: %v", img.Path(), err)
    			continue
    		}
    	}
    	extracted := doc.ExtractText()
    

    Expected Behavior

    For the file I've attached below, it contains one image. However, the length of doc.Images is 0 and it should be a length of 1. As a note, I have created this document on Microsoft OneDrive and downloaded it from OneDrive to share with you.

    Actual Behavior

    Length of doc.Images is 0 and does not enter for loop J1.docx

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • How to know a textline belongs to which page in a docx file

    How to know a textline belongs to which page in a docx file

    Description

    Is there any way to know a text line belongs to which page in a .docx file?

    Expected Behavior

    We should have an attribute in a text Item to get the page information:

    for ei, e := range extracted.Items {
          text: = e.Text`
          page_index = e.PageIndex
    

    Actual Behavior

    There is only Text, DrawingInfo, Paragraph, Hyperlink, TableInfo BUT no PageInfo in the TextItem.

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • Can I add html markup to doc files?

    Can I add html markup to doc files?

    Description

    Is there a plan for the word document package to support html tags or how i can impliment the print the below in creating word

    <b>Bold first</b><div><i><b>Bold second</b></i></div><div><i><b>There are so many of us </b>asasas</i>asas<i>asasasasas<u>asasasa</u></i></div><div><i><u><br></u></i></div>
    

    Expected Behavior

    The above html string should display as bold, italic , underline formatted

    Actual Behavior

    Printing as string on AddText()

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • Document append problem

    Document append problem

    Description

    When I use doc0.Append(doc1), the output file open fail

    Expected Behavior

    Actual Behavior

    image

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

  • column.SetStyle does not work with wrapped style

    column.SetStyle does not work with wrapped style

    When using the following function, it doesn't seem to set the cells to wrapped properly. If I use the exact same process on an individual cell it works properly. https://pkg.go.dev/gitea.com/unidoc/unioffice/spreadsheet#Column.SetStyle

    Expected Behavior

    When using Column.SetStyle with the wrapped style, all cells in that column should become wrapped.

    An example of the issue:

    style := ss.StyleSheet.AddCellStyle()
    style.SetWrapped(true)
    sheet.Column(1).SetStyle(style)
    sheet.Column(1).SetWidth(200)
    sheet.Cell("A7").SetStyle(style)
    

    In this example, the cell A7 will be wrapped, but no other cells in the first column will be wrapped. I can verify this is the correct column because SetWidth(200) works as intended.

  • [Feature Request] Provide access to the

    [Feature Request] Provide access to the "Alias" and "Tag" properties of StructuredDocumentTag objects.

    Description

    Currently, while structured document tags are available for a document, programmatic access to the title and tag properties for each structured document tag is not. The API only provides access to the paragraphs.

    Expected Behavior

    Each StructuredDocumentTag object provides Alias() and Tag() functions.

    for _, sdt := range doc.StructuredDocumentTags() {
        fmt.Printf("Alias: '%v'\n", sdt.Alias()) // returns an alias or empty string for the SDT
        fmt.Printf("Tag: '%v'\n", sdt.Tag()) // returns a tag or empty string for the SDT
    }
    

    Actual Behavior

    No access to alias or tag for a structured document tag.

    Please include a reproducible code snippet or document attachment that demonstrates the issue.

Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

Jan 5, 2023
Fast and reliable way to work with Microsoft Excel™ [xlsx] files in Golang

Xlsx2Go package main import ( "github.com/plandem/xlsx" "github.com/plandem/xlsx/format/conditional" "github.com/plandem/xlsx/format/conditional/r

Dec 17, 2022
A simple and light excel file reader to read a standard excel as a table faster | 一个轻量级的Excel数据读取库,用一种更`关系数据库`的方式解析Excel。

Intro | 简介 Expect to create a reader library to read relate-db-like excel easily. Just like read a config. This library can read all xlsx file correct

Dec 19, 2022
Go (golang) library for reading and writing XLSX files.

XLSX Introduction xlsx is a library to simplify reading and writing the XML format used by recent version of Microsoft Excel in Go programs. Tutorial

Dec 28, 2022
go-eexcel implements encoding and decoding of XLSX like encoding/json

go-eexcel go-eexcel implements encoding and decoding of XLSX like encoding/json Usage func ExampleMarshal() { type st struct { Name string `eexce

Dec 9, 2021
Golang bindings for libxlsxwriter for writing XLSX files
Golang bindings for libxlsxwriter for writing XLSX files

goxlsxwriter provides Go bindings for the libxlsxwriter C library. Install goxlsxwriter requires the libxslxwriter library to be installe

Nov 18, 2022
Golang bindings for libxlsxwriter for writing XLSX files
Golang bindings for libxlsxwriter for writing XLSX files

goxlsxwriter goxlsxwriter provides Go bindings for the libxlsxwriter C library. Install goxlsxwriter requires the libxslxwriter library to be installe

May 30, 2021
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

grate A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats. Why? Grate focuses on speed and stability first

Dec 26, 2022
Cheap/fast/simple XLSX file writer for textual data

xlsxwriter Cheap/fast/simple XLSX file writer for textual data -- no fancy formatting or graphs go get github.com/mzimmerman/xlsxwriter data := [][]s

Feb 8, 2022
一款 Go 语言编写的小巧、简洁、快速采集 fofa 数据导出到 Excel 表单的小工具。
一款 Go 语言编写的小巧、简洁、快速采集 fofa 数据导出到 Excel 表单的小工具。

fofa 一款 Go 语言编写的小巧、简洁、快速采集 fofa 数据导出到 Excel 表单的小工具。 Goroutine + retryablehttp Build git clone https://github.com/inspiringz/fofa cd fofa go build -ldf

Nov 9, 2022
A simple excel engine without ui to parse .csv files.

A simple excel engine without ui to parse .csv files.

Nov 4, 2021
Go Microsoft Excel Number Format Parser

NFP (Number Format Parser) Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression. Installatio

Dec 2, 2022
Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression

NFP (Number Format Parser) Using NFP (Number Format Parser) you can get an Abstract Syntax Tree (AST) from Excel number format expression. Installatio

Feb 4, 2022
Fastq demultiplexer for single cell data from MGI sequencer (10x converted library).

fastq_demultiplexer Converts fastq single cell data from MGI (10x converted library) to Illumina compatible format. Installation go install github.com

Nov 24, 2021
Смена автора в программах Microsoft Office (Word, Ecxel, PowerPoint) на случай если твой препод палит лабы по автору документа

AuthorChanger This program helps you to change Microsoft Office 2013-2019 document author. Works with MS Word, MS Excel, MS PowerPoint. Usage Clone a

Dec 31, 2021
word2text - a tool is to convert word documents (DocX) to text on the CLI with zero dependencies for free
word2text - a tool is to convert word documents (DocX) to text on the CLI with zero dependencies for free

This tool is to convert word documents (DocX) to text on the CLI with zero dependencies for free. This tool has been tested on: - Linux 32bit and 64 bit - Windows 32 bit and 64 bit - OpenBSD 64 bit

Apr 19, 2021
Simple .docx converter implemented by Go. Convert .docx to plain text.

docc Simple ".docx" converter implemented by Go. Convert ".docx" to plain text. License MIT Features Less dependency. No need for Microsoft Office. On

Mar 30, 2022
golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)
golang 在线预览word,excel,pdf,MarkDown(Online Preview Word,Excel,PPT,PDF,Image by Golang)

Go View File 在线体验地址 http://39.97.98.75:8082/view/upload (不会经常更新,保留最基本的预览功能。服务器配置较低,如果出现链接超时请等待几秒刷新重试,或者换Chrome) 目前已经完成 docker部署 (不用为运行环境烦恼) Wor

Dec 26, 2022
Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

go-exiftool go-exiftool is a golang library that wraps ExifTool. ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS,

Dec 28, 2022
Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Golang library for reading and writing Microsoft Excel™ (XLSX) files.

Excelize Introduction Excelize is a library written in pure Go providing a set of functions that allow you to write to and read from XLSX / XLSM / XLT

Jan 9, 2023