parse and generate XML easily in go

Build Status GoDoc

etree

The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python ElementTree module.

Some of the package's capabilities and features:

  • Represents XML documents as trees of elements for easy traversal.
  • Imports, serializes, modifies or creates XML documents from scratch.
  • Writes and reads XML to/from files, byte slices, strings and io interfaces.
  • Performs simple or complex searches with lightweight XPath-like query APIs.
  • Auto-indents XML using spaces or tabs for better readability.
  • Implemented in pure go; depends only on standard go libraries.
  • Built on top of the go encoding/xml package.

Creating an XML document

The following example creates an XML document from scratch using the etree package and outputs its indented contents to stdout.

doc := etree.NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)

Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon"/>
  <Person name="Sally"/>
</People>

Reading an XML file

Suppose you have a file on disk called bookstore.xml containing the following data:

<bookstore xmlns:p="urn:schemas-books-com:prices">

  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <p:price>30.00</p:price>
  </book>

  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <p:price>29.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <p:price>49.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <p:price>39.95</p:price>
  </book>

</bookstore>

This code reads the file's contents into an etree document.

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

You can also read XML from a string, a byte slice, or an io.Reader.

Processing elements and attributes

This example illustrates several ways to access elements and attributes using etree selection queries.

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

Output:

ROOT element: bookstore
CHILD element: book
  TITLE: Everyday Italian (en)
  ATTR: category=COOKING
CHILD element: book
  TITLE: Harry Potter (en)
  ATTR: category=CHILDREN
CHILD element: book
  TITLE: XQuery Kick Start (en)
  ATTR: category=WEB
CHILD element: book
  TITLE: Learning XML (en)
  ATTR: category=WEB

Path queries

This example uses etree's path functions to select all book titles that fall into the category of 'WEB'. The double-slash prefix in the path causes the search for book elements to occur recursively; book elements may appear at any level of the XML hierarchy.

for _, t := range doc.FindElements("//book[@category='WEB']/title") {
    fmt.Println("Title:", t.Text())
}

Output:

Title: XQuery Kick Start
Title: Learning XML

This example finds the first book element under the root bookstore element and outputs the tag and text of each of its child elements.

for _, e := range doc.FindElements("./bookstore/book[1]/*") {
    fmt.Printf("%s: %s\n", e.Tag, e.Text())
}

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005
price: 30.00

This example finds all books with a price of 49.99 and outputs their titles.

path := etree.MustCompilePath("./bookstore/book[p:price='49.99']/title")
for _, e := range doc.FindElementsPath(path) {
    fmt.Println(e.Text())
}

Output:

XQuery Kick Start

Note that this example uses the FindElementsPath function, which takes as an argument a pre-compiled path object. Use precompiled paths when you plan to search with the same path more than once.

Other features

These are just a few examples of the things the etree package can do. See the documentation for a complete description of its capabilities.

Contributing

This project accepts contributions. Just fork the repo and submit a pull request!

Owner
Brett Vickers
Code hacker. Game dev.
Brett Vickers
Comments
  • Add support for the '|' operator in paths.

    Add support for the '|' operator in paths.

    Ok I added support for the OR operator in paths. Please try it out and let me know if it works for you.

    I tagged this change as v0, so you can pull this repo from gopkg.in/beevik/etree.v0

  • Add handling of xmlns namespaces

    Add handling of xmlns namespaces

    When querying in xml fragments like this

    <root xmlns:N="namespace">
        <N:element2>v</N:element2>
    </root>
    

    It is somewhat necessary to specify the full namespace name with tag like namespace:element2 since the N abbreviation can be arbitrarily changed by the user.

    Therefore, this pull request added FullSpace to Element to represent the full namespace name. It also allows tag query using syntax like {namespace}name in methods like FindElements, SelectElement to support complex namespace like urls.

  • Need an AddElement method

    Need an AddElement method

    I need a way to add an etree Element under another etree Element.

    Trying to explain in code:

    doc := etree.NewDocument()
    doc.ReadFromFile("bookstore.xml")
    root := doc.SelectElement("bookstore")
    
    

    Now the root is an etree Element under which are a bunch of <book> XML Elements.

    Suppose now I have

    docMore.ReadFromString(xmlMoreBooks)
    

    The question is how can I add docMore as new entries under the root etree Element?

    I think such feature would be needed by others as well. Please consider adding it.

    Thanks

  • Problem parsing CDATA after newline

    Problem parsing CDATA after newline

    Thanks a ton for this package - super useful for my work.

    I'm parsing some RSS feeds that contain HTML contained in <!CDATA[ ... ]> tags with formatted HTML for post descriptions, content, etc. It looks like when the CDATA tag is preceded by a newline, the text can't be parsed out:

    	workingCDATAString := `
    	<rss>
    		<channel>
    			<item>
    		   		<summary><![CDATA[Sup]]></summary>
    			</item>
    		</channel>
    	</rss>
    	`
    
    	doc := etree.NewDocument()
    	doc.ReadFromString(workingCDATAString)
    	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
    	// Output: (string) (len=3) "Sup"
    
    	brokenCDATAString := `
    	<rss>
    		<channel>
    			<item>
    		   		<summary>
    			 		<![CDATA[Sup]]>
    				</summary>
    			</item>
    		</channel>
    	</rss>
    	`
    	doc = etree.NewDocument()
    	doc.ReadFromString(brokenCDATAString)
    	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
    	// Output: (string) (len=7) "\n\t\t\t \t\t"
    

    I'm not familiar with XML parsing enough to say that this isn't the intended behavior, but I would expect these two code blocks to output the same thing ("Sup"). Any ideas?

  • support canonical xml

    support canonical xml

    Thanks for creating this package. Its been very useful and saved me a lot of time.

    I'm working on a project that requires canonical XML, so that signatures and digest hashes over the XML doc can be validated. I've made a couple changes here that should allow me to do that without affecting the default behavior for anyone else. Specifically, this enables explicit end tags (http://www.w3.org/TR/xml-c14n#Example-SETags), and bypasses the default escaping so that I can implement my own (http://www.w3.org/TR/xml-c14n#Example-Chars).

    Let me know if this looks ok, or if you want it structured differently.

  • RemoveChildAt sometimes dosent work

    RemoveChildAt sometimes dosent work

    Hi, i have been trying to add and delete rules from a pfsense XML config. Adding has been working great but deleting using RemoveChildAt doesn't work sometimes.

    Here is my example code:

    package main
    
    import (
    	"fmt"
    
    	"github.com/beevik/etree"
    )
    
    const xml = `
    <pfsense>
        <filter>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>admin</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>admin</username>
                </created>
            </rule>
        </filter>
    </pfsense>
    `
    
    func main() {
    	fmt.Println("Starting!")
    
    	doc := etree.NewDocument()
    	err := doc.ReadFromString(xml)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println("Original Rules:")
    	displayRules(doc)
    
    	fmt.Println("Deleting old Rules:")
    	deleteOldRules(doc)
    
    	doc.Indent(4)
    
    	fmt.Println("New Rules:")
    
    	displayRules(doc)
    }
    
    func displayRules(doc *etree.Document) {
    	for _, rule := range doc.FindElements("pfsense/filter/rule") {
    		fmt.Println("	Rule Element:", rule.Tag)
    		username := rule.FindElement("created/username")
    		if username != nil {
    			fmt.Println("		Username: ", username.Text())
    		}
    	}
    }
    
    func deleteOldRules(doc *etree.Document) {
    	oldrules := []int{}
    	for i, rule := range doc.FindElements("pfsense/filter/*") {
    		if rule.Tag == "rule" {
    			username := rule.FindElement("created/username")
    			if username == nil {
    				//rule has no user
    				continue
    			}
    			if username.Text() == "testUser" {
    				fmt.Println("Old rule at index: ", i)
    				oldrules = append([]int{i}, oldrules...)
    			}
    		}
    	}
    	for _, i := range oldrules {
    		fmt.Println("deleting index", i)
    		doc.FindElement("pfsense/filter").RemoveChildAt(i)
    	}
    }
    
    

    And this is the output i get:

    Starting!
    Original Rules:
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
    Deleting old Rules:
    Old rule at index:  0
    Old rule at index:  1
    Old rule at index:  3
    deleting index 3
    deleting index 1
    deleting index 0
    New Rules:
            Rule Element: rule
                    Username:  admin
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
    

    What this code should do is go through all rule elements in the filter element and see if that rule has a "created" element with a child "username" element with the text "testUser". if this is true then write down the index of the rule into a slice . This part works fine As it detected all 3 rules with the indexes 0,1,3. Then it tries to delete the rules by index. The loop works fine as it first tries to delete index 3 then 1 and 0. But when i then loop over all rules again the last rule element is still there.

  • Please skip BOM

    Please skip BOM

    When reading from file (via ReadFrom() or ReadFromFile()), is it possible to skip the BOM (https://en.wikipedia.org/wiki/Byte_order_mark) char?

    Every file created by MS under Windows has that witched char, which is very hard to get rid of. So it'll be great that etree can skip them when reading from file.

    The following file will fail:

    $ cat et_example.xml | hexdump -C
    00000000  ff fe 3c 00 62 00 6f 00  6f 00 6b 00 73 00 74 00  |..<.b.o.o.k.s.t.|
    00000010  6f 00 72 00 65 00 3e 00  0d 00 0a 00 20 00 3c 00  |o.r.e.>..... .<.|
    ...
    

    with the following error

    panic: XML syntax error on line 1: invalid UTF-8

    Hmm, wait, is it because of BOM or the UTF16 encoding?

    thx

  • New line after BOM

    New line after BOM

    The doc.WriteTo is adding an extra new line after BOM. I've illustrate it with et_dump.go and et_dump.xml, which you can find under https://github.com/suntong/lang/blob/master/lang/Go/src/xml/.

    Here is the result:

    $ go run et_dump.go | diff -wU 1 et_dump.xml -
    --- et_dump.xml 2016-03-08 16:40:41.667010100 -0500
    +++ -   2016-03-08 16:40:57.842603083 -0500
    @@ -1,4 +1,4 @@
    -<?xml version="1.0" encoding="utf-8"?>
    +
    +<?xml version="1.0" encoding="utf-8"?>
     <bookstore xmlns:p="urn:schemas-books-com:prices">
    -
       <book category="COOKING">
    @@ -9,3 +9,2 @@
       </book>
    -
       <book category="CHILDREN">
    ...
    @@ -34,3 +31,2 @@
       </book>
    -
     </bookstore>
    \ No newline at end of file
    

    I.e., an extra new line is added after BOM. This seems to be a trivial issue, but will cause my Microsoft Visual Studio failed to recognize the webtest file such dump creates. :-(

    Please consider removing the added extra new line.

    Thanks

  • Don't deprecate InsertChild() please

    Don't deprecate InsertChild() please

    Deprecated: InsertChild is deprecated. Use InsertChildAt instead.

    Please don't deprecate InsertChild() because InsertChildAt won't work for my case --

    The xml file that I'm working on has a rigid format of where things are:

    <A attr=... >
      <B attr=... />
      <C attr=... />
      <D attr=... />
    </A>
    

    B comes before C which comes before D. I know the order doesn't matter to xml, but I'm tracking the file with version control so, I'd prefer as little change as possible.

    Whether I do doc.InsertChildAt(0, c) or doc.InsertChildAt(1, c), C will always be inserted before B; whereas I need it after B but before D (after I've remove C beforehand).

    Was I using InsertChildAt incorrectly, or InsertChild() is just not replaceable for my case? Thx.

  • Adding some support for xml mixed content processing

    Adding some support for xml mixed content processing

    The idea is to move processing a bit closer to python's lxml - allowing for simpler xml mixed content processing. The only possible "breaking" change is resulting indentation - I had to rewrite indenting code removing superficial CharData items.

  • Unable to extract Attribute Value w/ Path

    Unable to extract Attribute Value w/ Path

    Consider the following document partial (source: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=8f900e8b-d1eb-4834-bd26-f04bd623c3d2 , Appendix I.1)

    <?xml version="1.0" ?>
    <ADI>
      <Metadata>
        <AMS Provider="InDemand" Product="First-Run" Asset_Name="The_Titanic" Version_Major="1" Version_Minor="0" Description="The Titanic asset package" Creation_Date="2002-01-11" Provider_ID="indemand.com" Asset_ID="UNVA2001081701004000" Asset_Class="package"/>
        <App_Data App="MOD" Name="Provider_Content_Tier" Value="InDemand1" />
        <App_Data App="MOD" Name="Metadata_Spec_Version" Value="CableLabsVod1.1" />
      </Metadata>
    </ADI>
    

    While i can use a Path like //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier'] to get to a desired Element, I am not able to perform an xpath-style path search to extract just the data in the Value attribute for the identified elements as a []string. Most other XPath implementations support a path such as //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier']/@Value to extract attribute values directly from the Path.

    This would be a really great feature to have to allow us to port a legacy app over to Go, without having to refactor our existing paths that perform the attribute extractions.

    I'll take a stab at implementing in the coming days.

  • Expected

    Expected " but "

    My xml file contain " , When I parse it, there is " replace the " Please tell me how to avoid this situation.

    for expample

    <test> this "is" test</test>
    The parsing result is:
    <test> this &quot;is&quot; test</test>
    
  • Unexpected <></> when adding outer layer

    Unexpected <> when adding outer layer

    I tried to add outer XML to my XML. assume my XML is <Foo></Foo> but when trying to add the outer layer there's an additional empty tag like <></> how to remove that.

    Here's my snippet

    func main() {
    	result := etree.NewDocument()
    
    	if err := result.ReadFromString("<Foo></Foo>"); err != nil {
    		log.Fatal(err.Error())
    	}
    
    	doc := etree.NewDocument()
    	root := doc.CreateElement("Envelope")
    	root.Space = "soap"
    
    	soapHeader := doc.CreateElement("Header")
    	soapHeader.Space = "soap"
    
    
    	soapBody := doc.CreateElement("Body")
    	soapBody.Space = "soap"
    
    	root.AddChild(soapHeader)
    	root.AddChild(soapBody)
    	root.AddChild(result)
    
    	doc.Indent(2)
    	println(doc.WriteToString())
    }
    

    and the result is

    <soap:Envelope>
      <soap:Header/>
      <soap:Body/>
      <><Foo/></>
    </soap:Envelope>
    

    I expected the result is

    <soap:Envelope>
      <soap:Header/>
      <soap:Body/>
      <Foo/>
    </soap:Envelope>
    

    There's <></> between

  • can not parse xml with

    can not parse xml with "&"

    if a xml file with "&" in it, doc.ReadFromFile function will panic like this: panic: XML syntax error on line 627: invalid character entity & (no semicolon)

    how to solve it?

  • preserve CDATA sections

    preserve CDATA sections

    Tee parser reads to a buffer to be able to inspect the raw data underlying xml.CharData tokens so that we can see if they start with a <![CDATA[ declaration.

Dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool.
Dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool.

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Jan 1, 2023
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

docconv A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain tex

Jan 5, 2023
take bytes out of things easily ✨🍪
take bytes out of things easily ✨🍪

crunch a library for easily manipulating bits and bytes in golang features | installation | benchmarks | examples features feature-rich: supports read

Nov 20, 2022
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.
estruct traverses javascript projects and maps all the dependencies and relationships to a JSON. the output can be used to build network visualizations of the project and document the architecture.

EStruct traverses javascript projects and maps all the dependencies and relationships to a JSON. The output can be used to build network visualizations of the project and document the architecture.

Jan 27, 2022
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Go-edlib : Edit distance and string comparison library Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Ham

Dec 20, 2022
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.
Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves.

Hilbert Go package for mapping values to and from space-filling curves, such as Hilbert and Peano curves. Documentation available here This is not an

Dec 23, 2022
Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.

A Go package for calculating the Levenshtein distance between two strings This package implements distance and similarity metrics for strings, based o

Dec 15, 2022
💯 Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator implements value validations for structs and individual fields based on tags.

Nov 9, 2022
Go translations of the algorithms and clients in the textbook Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne.

Overview Go translations of the Java source code for the algorithms and clients in the textbook Algorithms, 4th Edition by Robert Sedgewick and Kevin

Dec 13, 2022
Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transforming and consuming them.

iter Package iter provides generic, lazy iterators, functions for producing them from primitive types, as well as functions and methods for transformi

Dec 16, 2022
Snackbox - Snackbox can make it easier for customers to order snacks and rice boxes and do tracking
Snackbox - Snackbox can make it easier for customers to order snacks and rice boxes and do tracking

Catering Ecommerce Platform API Docs · Wireflow · Use Case Diagram · Entity Rela

Dec 5, 2022
An app with Trie tree and Breve search Implementation CLI and HTTP both 🥳

Introduction LifeLongLearner project consists of two different parts. My English Vocabulary My Technical Book Notes All of them provided by me within

Jul 1, 2022
A binary stream packer and unpacker

binpacker A binary packer and unpacker. Install go get github.com/zhuangsirui/binpacker Examples Packer buffer := new(bytes.Buffer) packer := binpacke

Dec 1, 2022
A collection of useful, performant, and threadsafe Go datastructures.

go-datastructures Go-datastructures is a collection of useful, performant, and threadsafe Go datastructures. NOTE: only tested with Go 1.3+. Augmented

Dec 29, 2022
Go native library for fast point tracking and K-Nearest queries

Geo Index Geo Index library Overview Splits the earth surface in a grid. At each cell we can store data, such as list of points, count of points, etc.

Dec 3, 2022
:pushpin: State of the art point location and neighbour finding algorithms for region quadtrees, in Go
:pushpin: State of the art point location and neighbour finding algorithms for region quadtrees, in Go

Region quadtrees in Go Region quadtrees and efficient neighbour finding techniques in Go Go-rquad proposes various implementations of region quadtrees

Dec 13, 2022
A simple set type for the Go language. Trusted by Docker, 1Password, Ethereum and Hashicorp.

golang-set The missing set collection for the Go language. Until Go has sets built-in...use this. Coming from Python one of the things I miss is the s

Jan 8, 2023
Data structure and algorithm library for go, designed to provide functions similar to C++ STL

GoSTL English | 简体中文 Introduction GoSTL is a data structure and algorithm library for go, designed to provide functions similar to C++ STL, but more p

Dec 26, 2022
Gota: DataFrames and data wrangling in Go (Golang)

Gota: DataFrames, Series and Data Wrangling for Go This is an implementation of DataFrames, Series and data wrangling methods for the Go programming l

Jan 6, 2023