parse and generate XML easily in go

Build Status GoDoc

etree

The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python ElementTree module.

Some of the package's capabilities and features:

  • Represents XML documents as trees of elements for easy traversal.
  • Imports, serializes, modifies or creates XML documents from scratch.
  • Writes and reads XML to/from files, byte slices, strings and io interfaces.
  • Performs simple or complex searches with lightweight XPath-like query APIs.
  • Auto-indents XML using spaces or tabs for better readability.
  • Implemented in pure go; depends only on standard go libraries.
  • Built on top of the go encoding/xml package.

Creating an XML document

The following example creates an XML document from scratch using the etree package and outputs its indented contents to stdout.

doc := etree.NewDocument()
doc.CreateProcInst("xml", `version="1.0" encoding="UTF-8"`)
doc.CreateProcInst("xml-stylesheet", `type="text/xsl" href="style.xsl"`)

people := doc.CreateElement("People")
people.CreateComment("These are all known people")

jon := people.CreateElement("Person")
jon.CreateAttr("name", "Jon")

sally := people.CreateElement("Person")
sally.CreateAttr("name", "Sally")

doc.Indent(2)
doc.WriteTo(os.Stdout)

Output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<People>
  <!--These are all known people-->
  <Person name="Jon"/>
  <Person name="Sally"/>
</People>

Reading an XML file

Suppose you have a file on disk called bookstore.xml containing the following data:

<bookstore xmlns:p="urn:schemas-books-com:prices">

  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <p:price>30.00</p:price>
  </book>

  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <p:price>29.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <p:price>49.99</p:price>
  </book>

  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <p:price>39.95</p:price>
  </book>

</bookstore>

This code reads the file's contents into an etree document.

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

You can also read XML from a string, a byte slice, or an io.Reader.

Processing elements and attributes

This example illustrates several ways to access elements and attributes using etree selection queries.

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

Output:

ROOT element: bookstore
CHILD element: book
  TITLE: Everyday Italian (en)
  ATTR: category=COOKING
CHILD element: book
  TITLE: Harry Potter (en)
  ATTR: category=CHILDREN
CHILD element: book
  TITLE: XQuery Kick Start (en)
  ATTR: category=WEB
CHILD element: book
  TITLE: Learning XML (en)
  ATTR: category=WEB

Path queries

This example uses etree's path functions to select all book titles that fall into the category of 'WEB'. The double-slash prefix in the path causes the search for book elements to occur recursively; book elements may appear at any level of the XML hierarchy.

for _, t := range doc.FindElements("//book[@category='WEB']/title") {
    fmt.Println("Title:", t.Text())
}

Output:

Title: XQuery Kick Start
Title: Learning XML

This example finds the first book element under the root bookstore element and outputs the tag and text of each of its child elements.

for _, e := range doc.FindElements("./bookstore/book[1]/*") {
    fmt.Printf("%s: %s\n", e.Tag, e.Text())
}

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005
price: 30.00

This example finds all books with a price of 49.99 and outputs their titles.

path := etree.MustCompilePath("./bookstore/book[p:price='49.99']/title")
for _, e := range doc.FindElementsPath(path) {
    fmt.Println(e.Text())
}

Output:

XQuery Kick Start

Note that this example uses the FindElementsPath function, which takes as an argument a pre-compiled path object. Use precompiled paths when you plan to search with the same path more than once.

Other features

These are just a few examples of the things the etree package can do. See the documentation for a complete description of its capabilities.

Contributing

This project accepts contributions. Just fork the repo and submit a pull request!

Owner
Brett Vickers
Code hacker. Game dev.
Brett Vickers
Comments
  • Add support for the '|' operator in paths.

    Add support for the '|' operator in paths.

    Ok I added support for the OR operator in paths. Please try it out and let me know if it works for you.

    I tagged this change as v0, so you can pull this repo from gopkg.in/beevik/etree.v0

  • Add handling of xmlns namespaces

    Add handling of xmlns namespaces

    When querying in xml fragments like this

    <root xmlns:N="namespace">
        <N:element2>v</N:element2>
    </root>
    

    It is somewhat necessary to specify the full namespace name with tag like namespace:element2 since the N abbreviation can be arbitrarily changed by the user.

    Therefore, this pull request added FullSpace to Element to represent the full namespace name. It also allows tag query using syntax like {namespace}name in methods like FindElements, SelectElement to support complex namespace like urls.

  • Need an AddElement method

    Need an AddElement method

    I need a way to add an etree Element under another etree Element.

    Trying to explain in code:

    doc := etree.NewDocument()
    doc.ReadFromFile("bookstore.xml")
    root := doc.SelectElement("bookstore")
    
    

    Now the root is an etree Element under which are a bunch of <book> XML Elements.

    Suppose now I have

    docMore.ReadFromString(xmlMoreBooks)
    

    The question is how can I add docMore as new entries under the root etree Element?

    I think such feature would be needed by others as well. Please consider adding it.

    Thanks

  • Problem parsing CDATA after newline

    Problem parsing CDATA after newline

    Thanks a ton for this package - super useful for my work.

    I'm parsing some RSS feeds that contain HTML contained in <!CDATA[ ... ]> tags with formatted HTML for post descriptions, content, etc. It looks like when the CDATA tag is preceded by a newline, the text can't be parsed out:

    	workingCDATAString := `
    	<rss>
    		<channel>
    			<item>
    		   		<summary><![CDATA[Sup]]></summary>
    			</item>
    		</channel>
    	</rss>
    	`
    
    	doc := etree.NewDocument()
    	doc.ReadFromString(workingCDATAString)
    	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
    	// Output: (string) (len=3) "Sup"
    
    	brokenCDATAString := `
    	<rss>
    		<channel>
    			<item>
    		   		<summary>
    			 		<![CDATA[Sup]]>
    				</summary>
    			</item>
    		</channel>
    	</rss>
    	`
    	doc = etree.NewDocument()
    	doc.ReadFromString(brokenCDATAString)
    	spew.Dump(doc.FindElement("rss").FindElement("channel").FindElement("item").FindElement("summary").Text())
    	// Output: (string) (len=7) "\n\t\t\t \t\t"
    

    I'm not familiar with XML parsing enough to say that this isn't the intended behavior, but I would expect these two code blocks to output the same thing ("Sup"). Any ideas?

  • support canonical xml

    support canonical xml

    Thanks for creating this package. Its been very useful and saved me a lot of time.

    I'm working on a project that requires canonical XML, so that signatures and digest hashes over the XML doc can be validated. I've made a couple changes here that should allow me to do that without affecting the default behavior for anyone else. Specifically, this enables explicit end tags (http://www.w3.org/TR/xml-c14n#Example-SETags), and bypasses the default escaping so that I can implement my own (http://www.w3.org/TR/xml-c14n#Example-Chars).

    Let me know if this looks ok, or if you want it structured differently.

  • RemoveChildAt sometimes dosent work

    RemoveChildAt sometimes dosent work

    Hi, i have been trying to add and delete rules from a pfsense XML config. Adding has been working great but deleting using RemoveChildAt doesn't work sometimes.

    Here is my example code:

    package main
    
    import (
    	"fmt"
    
    	"github.com/beevik/etree"
    )
    
    const xml = `
    <pfsense>
        <filter>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>admin</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>testUser</username>
                </created>
            </rule>
            <rule>
                <created>
                    <username>admin</username>
                </created>
            </rule>
        </filter>
    </pfsense>
    `
    
    func main() {
    	fmt.Println("Starting!")
    
    	doc := etree.NewDocument()
    	err := doc.ReadFromString(xml)
    	if err != nil {
    		panic(err)
    	}
    
    	fmt.Println("Original Rules:")
    	displayRules(doc)
    
    	fmt.Println("Deleting old Rules:")
    	deleteOldRules(doc)
    
    	doc.Indent(4)
    
    	fmt.Println("New Rules:")
    
    	displayRules(doc)
    }
    
    func displayRules(doc *etree.Document) {
    	for _, rule := range doc.FindElements("pfsense/filter/rule") {
    		fmt.Println("	Rule Element:", rule.Tag)
    		username := rule.FindElement("created/username")
    		if username != nil {
    			fmt.Println("		Username: ", username.Text())
    		}
    	}
    }
    
    func deleteOldRules(doc *etree.Document) {
    	oldrules := []int{}
    	for i, rule := range doc.FindElements("pfsense/filter/*") {
    		if rule.Tag == "rule" {
    			username := rule.FindElement("created/username")
    			if username == nil {
    				//rule has no user
    				continue
    			}
    			if username.Text() == "testUser" {
    				fmt.Println("Old rule at index: ", i)
    				oldrules = append([]int{i}, oldrules...)
    			}
    		}
    	}
    	for _, i := range oldrules {
    		fmt.Println("deleting index", i)
    		doc.FindElement("pfsense/filter").RemoveChildAt(i)
    	}
    }
    
    

    And this is the output i get:

    Starting!
    Original Rules:
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
    Deleting old Rules:
    Old rule at index:  0
    Old rule at index:  1
    Old rule at index:  3
    deleting index 3
    deleting index 1
    deleting index 0
    New Rules:
            Rule Element: rule
                    Username:  admin
            Rule Element: rule
                    Username:  testUser
            Rule Element: rule
                    Username:  admin
    

    What this code should do is go through all rule elements in the filter element and see if that rule has a "created" element with a child "username" element with the text "testUser". if this is true then write down the index of the rule into a slice . This part works fine As it detected all 3 rules with the indexes 0,1,3. Then it tries to delete the rules by index. The loop works fine as it first tries to delete index 3 then 1 and 0. But when i then loop over all rules again the last rule element is still there.

  • Please skip BOM

    Please skip BOM

    When reading from file (via ReadFrom() or ReadFromFile()), is it possible to skip the BOM (https://en.wikipedia.org/wiki/Byte_order_mark) char?

    Every file created by MS under Windows has that witched char, which is very hard to get rid of. So it'll be great that etree can skip them when reading from file.

    The following file will fail:

    $ cat et_example.xml | hexdump -C
    00000000  ff fe 3c 00 62 00 6f 00  6f 00 6b 00 73 00 74 00  |..<.b.o.o.k.s.t.|
    00000010  6f 00 72 00 65 00 3e 00  0d 00 0a 00 20 00 3c 00  |o.r.e.>..... .<.|
    ...
    

    with the following error

    panic: XML syntax error on line 1: invalid UTF-8

    Hmm, wait, is it because of BOM or the UTF16 encoding?

    thx

  • New line after BOM

    New line after BOM

    The doc.WriteTo is adding an extra new line after BOM. I've illustrate it with et_dump.go and et_dump.xml, which you can find under https://github.com/suntong/lang/blob/master/lang/Go/src/xml/.

    Here is the result:

    $ go run et_dump.go | diff -wU 1 et_dump.xml -
    --- et_dump.xml 2016-03-08 16:40:41.667010100 -0500
    +++ -   2016-03-08 16:40:57.842603083 -0500
    @@ -1,4 +1,4 @@
    -<?xml version="1.0" encoding="utf-8"?>
    +
    +<?xml version="1.0" encoding="utf-8"?>
     <bookstore xmlns:p="urn:schemas-books-com:prices">
    -
       <book category="COOKING">
    @@ -9,3 +9,2 @@
       </book>
    -
       <book category="CHILDREN">
    ...
    @@ -34,3 +31,2 @@
       </book>
    -
     </bookstore>
    \ No newline at end of file
    

    I.e., an extra new line is added after BOM. This seems to be a trivial issue, but will cause my Microsoft Visual Studio failed to recognize the webtest file such dump creates. :-(

    Please consider removing the added extra new line.

    Thanks

  • Don't deprecate InsertChild() please

    Don't deprecate InsertChild() please

    Deprecated: InsertChild is deprecated. Use InsertChildAt instead.

    Please don't deprecate InsertChild() because InsertChildAt won't work for my case --

    The xml file that I'm working on has a rigid format of where things are:

    <A attr=... >
      <B attr=... />
      <C attr=... />
      <D attr=... />
    </A>
    

    B comes before C which comes before D. I know the order doesn't matter to xml, but I'm tracking the file with version control so, I'd prefer as little change as possible.

    Whether I do doc.InsertChildAt(0, c) or doc.InsertChildAt(1, c), C will always be inserted before B; whereas I need it after B but before D (after I've remove C beforehand).

    Was I using InsertChildAt incorrectly, or InsertChild() is just not replaceable for my case? Thx.

  • Adding some support for xml mixed content processing

    Adding some support for xml mixed content processing

    The idea is to move processing a bit closer to python's lxml - allowing for simpler xml mixed content processing. The only possible "breaking" change is resulting indentation - I had to rewrite indenting code removing superficial CharData items.

  • Unable to extract Attribute Value w/ Path

    Unable to extract Attribute Value w/ Path

    Consider the following document partial (source: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=8f900e8b-d1eb-4834-bd26-f04bd623c3d2 , Appendix I.1)

    <?xml version="1.0" ?>
    <ADI>
      <Metadata>
        <AMS Provider="InDemand" Product="First-Run" Asset_Name="The_Titanic" Version_Major="1" Version_Minor="0" Description="The Titanic asset package" Creation_Date="2002-01-11" Provider_ID="indemand.com" Asset_ID="UNVA2001081701004000" Asset_Class="package"/>
        <App_Data App="MOD" Name="Provider_Content_Tier" Value="InDemand1" />
        <App_Data App="MOD" Name="Metadata_Spec_Version" Value="CableLabsVod1.1" />
      </Metadata>
    </ADI>
    

    While i can use a Path like //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier'] to get to a desired Element, I am not able to perform an xpath-style path search to extract just the data in the Value attribute for the identified elements as a []string. Most other XPath implementations support a path such as //AMS[@Asset_Class='package']/../App_Data[@Name='Provider_Content_Tier']/@Value to extract attribute values directly from the Path.

    This would be a really great feature to have to allow us to port a legacy app over to Go, without having to refactor our existing paths that perform the attribute extractions.

    I'll take a stab at implementing in the coming days.

  • Expected

    Expected " but "

    My xml file contain " , When I parse it, there is " replace the " Please tell me how to avoid this situation.

    for expample

    <test> this "is" test</test>
    The parsing result is:
    <test> this &quot;is&quot; test</test>
    
  • Unexpected <></> when adding outer layer

    Unexpected <> when adding outer layer

    I tried to add outer XML to my XML. assume my XML is <Foo></Foo> but when trying to add the outer layer there's an additional empty tag like <></> how to remove that.

    Here's my snippet

    func main() {
    	result := etree.NewDocument()
    
    	if err := result.ReadFromString("<Foo></Foo>"); err != nil {
    		log.Fatal(err.Error())
    	}
    
    	doc := etree.NewDocument()
    	root := doc.CreateElement("Envelope")
    	root.Space = "soap"
    
    	soapHeader := doc.CreateElement("Header")
    	soapHeader.Space = "soap"
    
    
    	soapBody := doc.CreateElement("Body")
    	soapBody.Space = "soap"
    
    	root.AddChild(soapHeader)
    	root.AddChild(soapBody)
    	root.AddChild(result)
    
    	doc.Indent(2)
    	println(doc.WriteToString())
    }
    

    and the result is

    <soap:Envelope>
      <soap:Header/>
      <soap:Body/>
      <><Foo/></>
    </soap:Envelope>
    

    I expected the result is

    <soap:Envelope>
      <soap:Header/>
      <soap:Body/>
      <Foo/>
    </soap:Envelope>
    

    There's <></> between

  • can not parse xml with

    can not parse xml with "&"

    if a xml file with "&" in it, doc.ReadFromFile function will panic like this: panic: XML syntax error on line 627: invalid character entity & (no semicolon)

    how to solve it?

  • preserve CDATA sections

    preserve CDATA sections

    Tee parser reads to a buffer to be able to inspect the raw data underlying xml.CharData tokens so that we can see if they start with a <![CDATA[ declaration.

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

Dec 13, 2022
Parse data and test fixtures from markdown files, and patch them programmatically, too.

go-testmark Do you need test fixtures and example data for your project, in a language agnostic way? Do you want it to be easy to combine with documen

Oct 31, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
Convert xml and json to go struct

xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t

Oct 23, 2022
'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.
'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.

gotestsum gotestsum runs tests using go test --json, prints formatted test output, and a summary of the test run. It is designed to work well for both

Dec 28, 2022
Go XML sitemap and sitemap index generator

Install go get github.com/turk/go-sitemap Example for sitemapindex func () main(c *gin.Context) { s := sitemap.NewSitemapIndex(c.Writer, true)

Jun 29, 2022
This package provides Go (golang) types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most famously used by app.diagrams.net, the new name of draw.io.

Go Draw - Golang MX This package provides types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most fa

Aug 30, 2022
Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Jan 13, 2022
Sqly - An easy-to-use extension for sqlx, base on xml files and named query/exec

sqly An easy-to-use extension for sqlx ,base on xml files and named query/exec t

Jun 12, 2022
Parse placeholder and wildcard text commands

allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL

Nov 24, 2022
A Go library to parse and format vCard

go-vcard A Go library to parse and format vCard. Usage f, err := os.Open("cards.vcf") if err != nil { log.Fatal(err) } defer f.Close() dec := vcard.

Dec 26, 2022
Parse RSS, Atom and JSON feeds in Go
Parse RSS, Atom and JSON feeds in Go

gofeed The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser

Jan 8, 2023
Go library to parse and render Remarkable lines files
Go library to parse and render Remarkable lines files

go-remarkable2pdf Go library to parse and render Remarkable lines files as PDF.

Nov 7, 2022
xmlquery is Golang XPath package for XML query.

xmlquery Overview xmlquery is an XPath query package for XML documents, allowing you to extract data or evaluate from XML documents with an XPath expr

Jan 1, 2023
Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

go_policyExtractor Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec). Le programme suivant se base sur les intitulé

Nov 4, 2021
axmlfmt is an opinionated formatter for Android XML resources

axmlfmt axmlfmt is an opinionated formatter for Android XML resources. It takes XML that looks like <?xml version="1.0" encoding="utf-8"?> <LinearLayo

May 14, 2022
Freestyle xml parser with golang

fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-de

Jul 1, 2022
🧑‍💻 Go XML generator without Structs™

exml ??‍?? Go XML generator without Structs™ Package exml allows XML documents to be generated without the usage of structs or maps. It is not intende

Nov 15, 2022
wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format.

wikipedia-jsonl wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format. How to use At first, download the XML dump from Wikime

Dec 26, 2022