parse and generate XML easily in go

Last update: Dec 19, 2022

Comments: 15

etree

The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python ElementTree module.

Some of the package's capabilities and features:

Represents XML documents as trees of elements for easy traversal.
Imports, serializes, modifies or creates XML documents from scratch.
Writes and reads XML to/from files, byte slices, strings and io interfaces.
Performs simple or complex searches with lightweight XPath-like query APIs.
Auto-indents XML using spaces or tabs for better readability.
Implemented in pure go; depends only on standard go libraries.
Built on top of the go encoding/xml package.

Creating an XML document

The following example creates an XML document from scratch using the etree package and outputs its indented contents to stdout.

doc := etree.NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)

Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon"/>
  <Person name="Sally"/>
</People>

Reading an XML file

Suppose you have a file on disk called bookstore.xml containing the following data:

<bookstore xmlns:p="urn:schemas-books-com:prices">

  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <p:price>30.00</p:price>
  </book>

  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <p:price>29.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <p:price>49.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <p:price>39.95</p:price>
  </book>

</bookstore>

This code reads the file's contents into an etree document.

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

You can also read XML from a string, a byte slice, or an io.Reader.

Processing elements and attributes

This example illustrates several ways to access elements and attributes using etree selection queries.

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

Output:

ROOT element: bookstore
CHILD element: book
  TITLE: Everyday Italian (en)
  ATTR: category=COOKING
CHILD element: book
  TITLE: Harry Potter (en)
  ATTR: category=CHILDREN
CHILD element: book
  TITLE: XQuery Kick Start (en)
  ATTR: category=WEB
CHILD element: book
  TITLE: Learning XML (en)
  ATTR: category=WEB

Path queries

This example uses etree's path functions to select all book titles that fall into the category of 'WEB'. The double-slash prefix in the path causes the search for book elements to occur recursively; book elements may appear at any level of the XML hierarchy.

for _, t := range doc.FindElements("//book[@category='WEB']/title") {
    fmt.Println("Title:", t.Text())
}

Output:

Title: XQuery Kick Start
Title: Learning XML

This example finds the first book element under the root bookstore element and outputs the tag and text of each of its child elements.

for _, e := range doc.FindElements("./bookstore/book[1]/*") {
    fmt.Printf("%s: %s\n", e.Tag, e.Text())
}

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005
price: 30.00

This example finds all books with a price of 49.99 and outputs their titles.

path := etree.MustCompilePath("./bookstore/book[p:price='49.99']/title")
for _, e := range doc.FindElementsPath(path) {
    fmt.Println(e.Text())
}

Output:

XQuery Kick Start

Note that this example uses the FindElementsPath function, which takes as an argument a pre-compiled path object. Use precompiled paths when you plan to search with the same path more than once.

Other features

These are just a few examples of the things the etree package can do. See the documentation for a complete description of its capabilities.

Contributing

This project accepts contributions. Just fork the repo and submit a pull request!

Owner

Brett Vickers

Code hacker. Game dev.

https://github.com/beevik/etree

Comments

Add support for the '|' operator in paths.

Ok I added support for the OR operator in paths. Please try it out and let me know if it works for you.

I tagged this change as v0, so you can pull this repo from gopkg.in/beevik/etree.v0
Add handling of xmlns namespaces
When querying in xml fragments like this

<root xmlns:N="namespace"> <N:element2>v</N:element2> </root>

It is somewhat necessary to specify the full namespace name with tag like namespace:element2 since the N abbreviation can be arbitrarily changed by the user.

Therefore, this pull request added FullSpace to Element to represent the full namespace name. It also allows tag query using syntax like {namespace}name in methods like FindElements, SelectElement to support complex namespace like urls.
Need an AddElement method
I need a way to add an etree Element under another etree Element.

Trying to explain in code:

doc := etree.NewDocument() doc.ReadFromFile("bookstore.xml") root := doc.SelectElement("bookstore")

Now the root is an etree Element under which are a bunch of <book> XML Elements.

Suppose now I have

docMore.ReadFromString(xmlMoreBooks)

The question is how can I add docMore as new entries under the root etree Element?

I think such feature would be needed by others as well. Please consider adding it.

Thanks

Problem parsing CDATA after newline

Thanks a ton for this package - super useful for my work.

I'm parsing some RSS feeds that contain HTML contained in <!CDATA[ ... ]> tags with formatted HTML for post descriptions, content, etc. It looks like when the CDATA tag is preceded by a newline, the text can't be parsed out:

	workingCDATAString := `
	<rss>
		<channel>
			<item>
		   		<summary><![CDATA[Sup]]></summary>
			</item>
		</channel>
	</rss>
	`

	doc := etree.NewDocument()
	doc.ReadFromString(workingCDATAString)
	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
	// Output: (string) (len=3) "Sup"

	brokenCDATAString := `
	<rss>
		<channel>
			<item>
		   		<summary>
			 		<![CDATA[Sup]]>
				</summary>
			</item>
		</channel>
	</rss>
	`
	doc = etree.NewDocument()
	doc.ReadFromString(brokenCDATAString)
	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
	// Output: (string) (len=7) "\n\t\t\t \t\t"

I'm not familiar with XML parsing enough to say that this isn't the intended behavior, but I would expect these two code blocks to output the same thing ("Sup"). Any ideas?

support canonical xml

Thanks for creating this package. Its been very useful and saved me a lot of time.

I'm working on a project that requires canonical XML, so that signatures and digest hashes over the XML doc can be validated. I've made a couple changes here that should allow me to do that without affecting the default behavior for anyone else. Specifically, this enables explicit end tags (http://www.w3.org/TR/xml-c14n#Example-SETags), and bypasses the default escaping so that I can implement my own (http://www.w3.org/TR/xml-c14n#Example-Chars).

Let me know if this looks ok, or if you want it structured differently.

RemoveChildAt sometimes dosent work

Hi, i have been trying to add and delete rules from a pfsense XML config. Adding has been working great but deleting using RemoveChildAt doesn't work sometimes.

Here is my example code:

package main

import (
	"fmt"

	"github.com/beevik/etree"
)

const xml = `
<pfsense>
    <filter>
        <rule>
            <created>
                <username>testUser</username>
            </created>
        </rule>
        <rule>
            <created>
                <username>testUser</username>
            </created>
        </rule>
        <rule>
            <created>
                <username>admin</username>
            </created>
        </rule>
        <rule>
            <created>
                <username>testUser</username>
            </created>
        </rule>
        <rule>
            <created>
                <username>admin</username>
            </created>
        </rule>
    </filter>
</pfsense>
`

func main() {
	fmt.Println("Starting!")

	doc := etree.NewDocument()
	err := doc.ReadFromString(xml)
	if err != nil {
		panic(err)
	}

	fmt.Println("Original Rules:")
	displayRules(doc)

	fmt.Println("Deleting old Rules:")
	deleteOldRules(doc)

	doc.Indent(4)

	fmt.Println("New Rules:")

	displayRules(doc)
}

func displayRules(doc *etree.Document) {
	for _, rule := range doc.FindElements("pfsense/filter/rule") {
		fmt.Println("	Rule Element:", rule.Tag)
		username := rule.FindElement("created/username")
		if username != nil {
			fmt.Println("		Username: ", username.Text())
		}
	}
}

func deleteOldRules(doc *etree.Document) {
	oldrules := []int{}
	for i, rule := range doc.FindElements("pfsense/filter/*") {
		if rule.Tag == "rule" {
			username := rule.FindElement("created/username")
			if username == nil {
				//rule has no user
				continue
			}
			if username.Text() == "testUser" {
				fmt.Println("Old rule at index: ", i)
				oldrules = append([]int{i}, oldrules...)
			}
		}
	}
	for _, i := range oldrules {
		fmt.Println("deleting index", i)
		doc.FindElement("pfsense/filter").RemoveChildAt(i)
	}
}

And this is the output i get:

Starting!
Original Rules:
        Rule Element: rule
                Username:  testUser
        Rule Element: rule
                Username:  testUser
        Rule Element: rule
                Username:  admin
        Rule Element: rule
                Username:  testUser
        Rule Element: rule
                Username:  admin
Deleting old Rules:
Old rule at index:  0
Old rule at index:  1
Old rule at index:  3
deleting index 3
deleting index 1
deleting index 0
New Rules:
        Rule Element: rule
                Username:  admin
        Rule Element: rule
                Username:  testUser
        Rule Element: rule
                Username:  admin

What this code should do is go through all rule elements in the filter element and see if that rule has a "created" element with a child "username" element with the text "testUser". if this is true then write down the index of the rule into a slice . This part works fine As it detected all 3 rules with the indexes 0,1,3. Then it tries to delete the rules by index. The loop works fine as it first tries to delete index 3 then 1 and 0. But when i then loop over all rules again the last rule element is still there.

Please skip BOM
When reading from file (via ReadFrom() or ReadFromFile()), is it possible to skip the BOM (https://en.wikipedia.org/wiki/Byte_order_mark) char?

Every file created by MS under Windows has that witched char, which is very hard to get rid of. So it'll be great that etree can skip them when reading from file.

The following file will fail:

$ cat et_example.xml | hexdump -C 00000000 ff fe 3c 00 62 00 6f 00 6f 00 6b 00 73 00 74 00 |..<.b.o.o.k.s.t.| 00000010 6f 00 72 00 65 00 3e 00 0d 00 0a 00 20 00 3c 00 |o.r.e.>..... .<.| ...

with the following error

panic: XML syntax error on line 1: invalid UTF-8

Hmm, wait, is it because of BOM or the UTF16 encoding?

thx

New line after BOM

The doc.WriteTo is adding an extra new line after BOM. I've illustrate it with et_dump.go and et_dump.xml, which you can find under https://github.com/suntong/lang/blob/master/lang/Go/src/xml/.

Here is the result:

$ go run et_dump.go | diff -wU 1 et_dump.xml -
--- et_dump.xml 2016-03-08 16:40:41.667010100 -0500
+++ -   2016-03-08 16:40:57.842603083 -0500
@@ -1,4 +1,4 @@
-ï»¿<?xml version="1.0" encoding="utf-8"?>
+ï»¿
+<?xml version="1.0" encoding="utf-8"?>
 <bookstore xmlns:p="urn:schemas-books-com:prices">
-
   <book category="COOKING">
@@ -9,3 +9,2 @@
   </book>
-
   <book category="CHILDREN">
...
@@ -34,3 +31,2 @@
   </book>
-
 </bookstore>
\ No newline at end of file

I.e., an extra new line is added after BOM. This seems to be a trivial issue, but will cause my Microsoft Visual Studio failed to recognize the webtest file such dump creates. :-(

Please consider removing the added extra new line.

Thanks

Don't deprecate InsertChild() please
Deprecated: InsertChild is deprecated. Use InsertChildAt instead.

Please don't deprecate InsertChild() because InsertChildAt won't work for my case --

The xml file that I'm working on has a rigid format of where things are:

<A attr=... > <B attr=... /> <C attr=... /> <D attr=... /> </A>

B comes before C which comes before D. I know the order doesn't matter to xml, but I'm tracking the file with version control so, I'd prefer as little change as possible.

Whether I do doc.InsertChildAt(0, c) or doc.InsertChildAt(1, c), C will always be inserted before B; whereas I need it after B but before D (after I've remove C beforehand).

Was I using InsertChildAt incorrectly, or InsertChild() is just not replaceable for my case? Thx.
Adding some support for xml mixed content processing

The idea is to move processing a bit closer to python's lxml - allowing for simpler xml mixed content processing. The only possible "breaking" change is resulting indentation - I had to rewrite indenting code removing superficial CharData items.
Unable to extract Attribute Value w/ Path
Consider the following document partial (source: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=8f900e8b-d1eb-4834-bd26-f04bd623c3d2 , Appendix I.1)

<?xml version="1.0" ?> <ADI> <Metadata> <AMS Provider="InDemand" Product="First-Run" Asset_Name="The_Titanic" Version_Major="1" Version_Minor="0" Description="The Titanic asset package" Creation_Date="2002-01-11" Provider_ID="indemand.com" Asset_ID="UNVA2001081701004000" Asset_Class="package"/> <App_Data App="MOD" Name="Provider_Content_Tier" Value="InDemand1" /> <App_Data App="MOD" Name="Metadata_Spec_Version" Value="CableLabsVod1.1" /> </Metadata> </ADI>

While i can use a Path like //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier'] to get to a desired Element, I am not able to perform an xpath-style path search to extract just the data in the Value attribute for the identified elements as a []string. Most other XPath implementations support a path such as //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier']/@Value to extract attribute values directly from the Path.

This would be a really great feature to have to allow us to port a legacy app over to Go, without having to refactor our existing paths that perform the attribute extractions.

I'll take a stab at implementing in the coming days.
Expected " but "
My xml file contain " , When I parse it, there is " replace the " Please tell me how to avoid this situation.

for expample

<test> this "is" test</test> The parsing result is: <test> this "is" test</test>

Unexpected <> when adding outer layer

I tried to add outer XML to my XML. assume my XML is <Foo></Foo> but when trying to add the outer layer there's an additional empty tag like <></> how to remove that.

Here's my snippet

func main() {
	result := etree.NewDocument()

	if err := result.ReadFromString("<Foo></Foo>"); err != nil {
		log.Fatal(err.Error())
	}

	doc := etree.NewDocument()
	root := doc.CreateElement("Envelope")
	root.Space = "soap"

	soapHeader := doc.CreateElement("Header")
	soapHeader.Space = "soap"


	soapBody := doc.CreateElement("Body")
	soapBody.Space = "soap"

	root.AddChild(soapHeader)
	root.AddChild(soapBody)
	root.AddChild(result)

	doc.Indent(2)
	println(doc.WriteToString())
}

and the result is

<soap:Envelope>
  <soap:Header/>
  <soap:Body/>
  <><Foo/></>
</soap:Envelope>

I expected the result is

<soap:Envelope>
  <soap:Header/>
  <soap:Body/>
  <Foo/>
</soap:Envelope>

There's <></> between

can not parse xml with "&"

if a xml file with "&" in it, doc.ReadFromFile function will panic like this: panic: XML syntax error on line 627: invalid character entity & (no semicolon)

how to solve it?
preserve CDATA sections

Tee parser reads to a buffer to be able to inspect the raw data underlying xml.CharData tokens so that we can see if they start with a <![CDATA[ declaration.

Related tags

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

Dec 13, 2022

Parse data and test fixtures from markdown files, and patch them programmatically, too.

go-testmark Do you need test fixtures and example data for your project, in a language agnostic way? Do you want it to be easy to combine with documen

Oct 31, 2022

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023

Convert xml and json to go struct

xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t

Oct 23, 2022

'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.

gotestsum gotestsum runs tests using go test --json, prints formatted test output, and a summary of the test run. It is designed to work well for both

Dec 28, 2022

Go XML sitemap and sitemap index generator

Install go get github.com/turk/go-sitemap Example for sitemapindex func () main(c *gin.Context) { s := sitemap.NewSitemapIndex(c.Writer, true)

Jun 29, 2022

This package provides Go (golang) types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most famously used by app.diagrams.net, the new name of draw.io.

Go Draw - Golang MX This package provides types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most fa

Aug 30, 2022

parse and generate XML easily in go

etree

Creating an XML document

Reading an XML file

Processing elements and attributes

Path queries

Other features

Contributing

Owner

Brett Vickers

Comments

Add support for the '|' operator in paths.

Add handling of xmlns namespaces

Need an AddElement method

Problem parsing CDATA after newline

support canonical xml

RemoveChildAt sometimes dosent work

Please skip BOM

New line after BOM

Don't deprecate InsertChild() please

Adding some support for xml mixed content processing

Unable to extract Attribute Value w/ Path

Expected " but "

Unexpected <> when adding outer layer

can not parse xml with "&"

preserve CDATA sections

Related tags

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Parse data and test fixtures from markdown files, and patch them programmatically, too.

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

Convert xml and json to go struct

'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.

Go XML sitemap and sitemap index generator

This package provides Go (golang) types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most famously used by app.diagrams.net, the new name of draw.io.

Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

Sqly - An easy-to-use extension for sqlx, base on xml files and named query/exec

Parse placeholder and wildcard text commands

A Go library to parse and format vCard

Parse RSS, Atom and JSON feeds in Go

Go library to parse and render Remarkable lines files

xmlquery is Golang XPath package for XML query.

Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

axmlfmt is an opinionated formatter for Android XML resources

Freestyle xml parser with golang

🧑‍💻 Go XML generator without Structs™

wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format.