Generate a Go struct from XML.

zek

Zek is a prototype for creating a Go struct from an XML document. The resulting struct works best for reading XML (see also #14), to create XML, you might want to use something else.

It was developed at Leipzig University Library to shorten the time to go from raw XML to a struct that allows to access XML data in Go programs.

Skip the fluff, just the code.

Given some XML, run:

$ curl -s https://raw.githubusercontent.com/miku/zek/master/fixtures/e.xml | zek -e
// Rss was generated 2018-08-30 20:24:14 by tir on sol.
type Rss struct {
    XMLName xml.Name `xml:"rss"`
    Text    string   `xml:",chardata"`
    Rdf     string   `xml:"rdf,attr"`
    Dc      string   `xml:"dc,attr"`
    Geoscan string   `xml:"geoscan,attr"`
    Media   string   `xml:"media,attr"`
    Gml     string   `xml:"gml,attr"`
    Taxo    string   `xml:"taxo,attr"`
    Georss  string   `xml:"georss,attr"`
    Content string   `xml:"content,attr"`
    Geo     string   `xml:"geo,attr"`
    Version string   `xml:"version,attr"`
    Channel struct {
        Text          string `xml:",chardata"`
        Title         string `xml:"title"`         // ESS New Releases (Display...
        Link          string `xml:"link"`          // http://tinyurl.com/ESSNew...
        Description   string `xml:"description"`   // New releases from the Ear...
        LastBuildDate string `xml:"lastBuildDate"` // Mon, 27 Nov 2017 00:06:35...
        Item          []struct {
            Text        string `xml:",chardata"`
            Title       string `xml:"title"`       // Surficial geology, Aberde...
            Link        string `xml:"link"`        // https://geoscan.nrcan.gc....
            Description string `xml:"description"` // Geological Survey of Cana...
            Guid        struct {
                Text        string `xml:",chardata"` // 304279, 306212, 306175, 3...
                IsPermaLink string `xml:"isPermaLink,attr"`
            } `xml:"guid"`
            PubDate       string   `xml:"pubDate"`      // Fri, 24 Nov 2017 00:00:00...
            Polygon       []string `xml:"polygon"`      // 64.0000 -98.0000 64.0000 ...
            Download      string   `xml:"download"`     // https://geoscan.nrcan.gc....
            License       string   `xml:"license"`      // http://data.gc.ca/eng/ope...
            Author        string   `xml:"author"`       // Geological Survey of Cana...
            Source        string   `xml:"source"`       // Geological Survey of Cana...
            SndSeries     string   `xml:"SndSeries"`    // Bedford Institute of Ocea...
            Publisher     string   `xml:"publisher"`    // Natural Resources Canada,...
            Edition       string   `xml:"edition"`      // prelim., surficial data m...
            Meeting       string   `xml:"meeting"`      // Geological Association of...
            Documenttype  string   `xml:"documenttype"` // serial, open file, serial...
            Language      string   `xml:"language"`     // English, English, English...
            Maps          string   `xml:"maps"`         // 1 map, 5 maps, Publicatio...
            Mapinfo       string   `xml:"mapinfo"`      // surficial geology, surfic...
            Medium        string   `xml:"medium"`       // on-line; digital, digital...
            Province      string   `xml:"province"`     // Nunavut, Northwest Territ...
            Nts           string   `xml:"nts"`          // 066B, 095J; 095N; 095O; 0...
            Area          string   `xml:"area"`         // Aberdeen Lake, Mackenzie ...
            Subjects      string   `xml:"subjects"`
            Program       string   `xml:"program"`       // GEM2: Geo-mapping for Ene...
            Project       string   `xml:"project"`       // Rae Province Project Mana...
            Projectnumber string   `xml:"projectnumber"` // 340521, 343202, 340557, 3...
            Abstract      string   `xml:"abstract"`      // This new surficial geolog...
            Links         string   `xml:"links"`         // Online - En ligne (PDF, 9...
            Readme        string   `xml:"readme"`        // readme | https://geoscan....
            PPIid         string   `xml:"PPIid"`         // 34532, 35096, 35438, 2563...
        } `xml:"item"`
    } `xml:"channel"`
}

Online

Try it online at https://www.onlinetool.io/xmltogo/ -- thanks, kjk!

About

Build Status Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Upsides:

  • it works fine for non-recursive structures,
  • does not need XSD or DTD,
  • it is relatively convenient to access attributes, children and text,
  • will generate a single struct, which make for a quite compact representation,
  • simple user interface,
  • comments with examples,
  • schema inference across multiple files.

Downsides:

  • experimental, early, buggy, unstable prototype,
  • no support for recursive types (similar to Russian Doll strategy, [1])
  • no type inference, everything is accessible as string.

Bugs:

Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values.

https://golang.org/pkg/encoding/xml/#pkg-note-BUG

Related projects:

Presentations:

Install

$ go get github.com/miku/zek/cmd/...

Debian and RPM packages:

It's in AUR, too.

Usage

$ zek -h
Usage of zek:
  -C    emit less compact struct
  -F    skip formatting
  -c    emit more compact struct (noop, as this is the default since 0.1.7)
  -d    debug output
  -e    add comments with example
  -j    add JSON tags
  -max-examples int
        limit number of examples (default 10)
  -n string
        use a different name for the top-level struct
  -p    write out an example program
  -s    strict parsing and writing
  -t string
        emit struct for tag matching this name
  -u    filter out duplicated examples
  -version
        show version
  -x int
        max chars for example (default 25)

Examples:

$ cat fixtures/a.xml
<a></a>

$ zek -C < fixtures/a.xml
type A struct {
    XMLName xml.Name `xml:"a"`
    Text    string   `xml:",chardata"`
}

Debug output dumps the internal tree as JSON to stdout.

$ zek -d < fixtures/a.xml
{"name":{"Space":"","Local":"a"}}

Example program:

package main

import (
	"encoding/json"
	"encoding/xml"
	"fmt"
	"log"
	"os"
)

// A was generated 2017-12-05 17:35:21 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
}

func main() {
	dec := xml.NewDecoder(os.Stdin)
	var doc A
	if err := dec.Decode(&doc); err != nil {
		log.Fatal(err)
	}
	b, err := json.Marshal(doc)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(string(b))
}

$ zek -C -p < fixtures/a.xml > sample.go && go run sample.go < fixtures/a.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "a"
  },
  "Text": ""
}

More complex example:

$ zek < fixtures/d.xml
// Root was generated 2019-06-11 16:27:04 by tir on hayiti.
type Root struct {
        XMLName xml.Name `xml:"root"`
        Text    string   `xml:",chardata"`
        A       []struct {
                Text string `xml:",chardata"`
                B    []struct {
                        Text string `xml:",chardata"`
                        C    string `xml:"c"`
                        D    string `xml:"d"`
                } `xml:"b"`
        } `xml:"a"`
}

$ zek -p < fixtures/d.xml > sample.go && go run sample.go < fixtures/d.xml | jq . && rm sample.go
{
  "XMLName": {
    "Space": "",
    "Local": "root"
  },
  "Text": "\n\n\n\n",
  "A": [
    {
      "Text": "\n  \n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hi",
          "D": ""
        },
        {
          "Text": "\n    \n    \n  ",
          "C": "World",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "Hello",
          "D": ""
        }
      ]
    },
    {
      "Text": "\n  \n",
      "B": [
        {
          "Text": "\n    \n  ",
          "C": "",
          "D": "World"
        }
      ]
    }
  ]
}

Annotate with comments:

$ zek -e < fixtures/l.xml
// Records was generated 2019-06-11 16:29:35 by tir on hayiti.
type Records struct {
        XMLName xml.Name `xml:"Records"`
        Text    string   `xml:",chardata"` // \n
        Xsi     string   `xml:"xsi,attr"`
        Record  []struct {
                Text   string `xml:",chardata"`
                Header struct {
                        Text       string `xml:",chardata"`
                        Status     string `xml:"status,attr"`
                        Identifier string `xml:"identifier"` // oai:ojs.localhost:article...
                        Datestamp  string `xml:"datestamp"`  // 2009-06-24T14:48:23Z, 200...
                        SetSpec    string `xml:"setSpec"`    // eppp:ART, eppp:ART, eppp:...
                } `xml:"header"`
                Metadata struct {
                        Text    string `xml:",chardata"`
                        Rfc1807 struct {
                                Text           string   `xml:",chardata"`
                                Xmlns          string   `xml:"xmlns,attr"`
                                Xsi            string   `xml:"xsi,attr"`
                                SchemaLocation string   `xml:"schemaLocation,attr"`
                                BibVersion     string   `xml:"bib-version"`  // v2, v2, v2...
                                ID             string   `xml:"id"`           // http://jou...
                                Entry          string   `xml:"entry"`        // 2009-06-24...
                                Organization   []string `xml:"organization"` // Proceeding...
                                Title          string   `xml:"title"`        // Introducti...
                                Type           string   `xml:"type"`
                                Author         []string `xml:"author"`       // KRAMPEN, G..
                                Copyright      string   `xml:"copyright"`    // Das Urhebe...
                                OtherAccess    string   `xml:"other_access"` // url:http:/...
                                Keyword        string   `xml:"keyword"`
                                Period         []string `xml:"period"`
                                Monitoring     string   `xml:"monitoring"`
                                Language       string   `xml:"language"` // en, en, en, e...
                                Abstract       string   `xml:"abstract"` // After a short...
                                Date           string   `xml:"date"`     // 2009-06-22 12...
                        } `xml:"rfc1807"`
                } `xml:"metadata"`
                About string `xml:"about"`
        } `xml:"Record"`
}

Only consider a nested element

$ zek -t metadata fixtures/z.xml
// Metadata was generated 2019-06-11 16:33:26 by tir on hayiti.
type Metadata struct {
        XMLName xml.Name `xml:"metadata"`
        Text    string   `xml:",chardata"`
        Dc      struct {
                Text  string `xml:",chardata"`
                Xmlns string `xml:"xmlns,attr"`
                Title struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"title"`
                Identifier struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"identifier"`
                Rights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                        Lang  string `xml:"lang,attr"`
                } `xml:"rights"`
                AccessRights struct {
                        Text  string `xml:",chardata"`
                        Xmlns string `xml:"xmlns,attr"`
                } `xml:"accessRights"`
        } `xml:"dc"`
}

Inference across files

$ zek fixtures/a.xml fixtures/b.xml fixtures/c.xml
// A was generated 2017-12-05 17:40:14 by tir on apollo.
type A struct {
	XMLName xml.Name `xml:"a"`
	Text    string   `xml:",chardata"`
	B       []struct {
		Text string `xml:",chardata"`
	} `xml:"b"`
}

This is also useful, if you deal with archives containing XML files:

$ unzip -p 4082359.zip '*.xml' | zek -e

Given a directory full of zip files, you can combined find, unzip and zek:

$ for i in $(find ftp/b571 -type f -name "*zip"); do unzip -p $i '*xml'; done | zek -e

Another example (tarball with thousands of XML files, seemingly MARC):

$ tar -xOzf /tmp/20180725.125255.tar.gz | zek -e
// OAIPMH was generated 2018-09-26 15:03:29 by tir on sol.
type OAIPMH struct {
        XMLName        xml.Name `xml:"OAI-PMH"`
        Text           string   `xml:",chardata"`
        Xmlns          string   `xml:"xmlns,attr"`
        Xsi            string   `xml:"xsi,attr"`
        SchemaLocation string   `xml:"schemaLocation,attr"`
        ListRecords    struct {
                Text   string `xml:",chardata"`
                Record struct {
                        Text   string `xml:",chardata"`
                        Header struct {
                                Text       string `xml:",chardata"`
                                Identifier struct {
                                        Text string `xml:",chardata"` // aleph-pub:000000001, ...
                                } `xml:"identifier"`
                        } `xml:"header"`
                        Metadata struct {
                                Text   string `xml:",chardata"`
                                Record struct {
                                        Text           string `xml:",chardata"`
                                        Xmlns          string `xml:"xmlns,attr"`
                                        Xsi            string `xml:"xsi,attr"`
                                        SchemaLocation string `xml:"schemaLocation,attr"`
                                        Leader         struct
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                        } `xml:"leader"`
                                        Controlfield []struct {
                                                Text string `xml:",chardata"` // 00001nM2.01200024
                                                Tag  string `xml:"tag,attr"`
                                        } `xml:"controlfield"`
                                        Datafield []struct {
                                                Text     string `xml:",chardata"`
                                                Tag      string `xml:"tag,attr"`
                                                Ind1     string `xml:"ind1,attr"`
                                                Ind2     string `xml:"ind2,attr"`
                                                Subfield []struct {
                                                        Text string `xml:",chardata"` // KM0000002
                                                        Code string `xml:"code,attr"`
                                                } `xml:"subfield"`
                                        } `xml:"datafield"`
                                } `xml:"record"`
                        } `xml:"metadata"`
                } `xml:"record"`
        } `xml:"ListRecords"`
}

Misc

As a side effect, zek seems to be a useful for debugging. Example:

This record is emitted from a typical OAI server (OJS, not even uncommon), yet one can quickly spot the flaw in the structure.

Over 30 different struct generated manually in the course of a few hours (around five minutes per source): https://git.io/vbTDo.

-- Current extent leader: 1532 lines struct

Owner
Comments
  • Feature request: provide a flag to specify the output file (make it usable with go:generate)

    Feature request: provide a flag to specify the output file (make it usable with go:generate)

    We would like to use zek with go generate. For this to work, we need a possibility to specify the target file (instead of printing the result to stdout). An additional flag, which allows to specify the target file for the output would be very helpful.

  • crash if user.Current() returns an error

    crash if user.Current() returns an error

    This happened on a Linux machine.

    In:

    func NewStructWriter(w io.Writer) *StructWriter {
    	// Some info for banner.
    	usr, err := user.Current()
    	if err != nil {
    		usr.Name = "an unknown user"
    	}
    

    If user.Current() returns an error, usr is nil so usr.Name = "foo" will panic. It could be sth. like:

    	userName := "an unknown user"
    	usr, _ := user.Current()
    	if usr != nil {
    		userName = usr.Name
    	}
    
  • Consider pre built releases for the major platforms

    Consider pre built releases for the major platforms

    It would be very convenient if this project could provide pre built releases for the major platforms (in particular, I am interested in linux amd64 releases for simple integration in Docker containers) e.g. via GoReleaser.

    GoReleaser does support packaging rpm and deb as well. So the whole release process can be simplified and streamlined.

  • Issue with the debian package

    Issue with the debian package

    Having installed the zek_0.1.10_amd64.deb file:

    $ zek
    bash: zek: command not found
    
    $ /usr/sbin/zek
    bash: /usr/sbin/zek: Permission denied
    

    The reason:

    $ ls -l /usr/sbin/zek
    -rw-r--r-- 1 root root 6377472 2020-11-04 06:07 /usr/sbin/zek
    
  • Add flag to make examples unique

    Add flag to make examples unique

    It would be nice to have an additional flag, e.g. -u to make examples unique. For now zek output may look like this

    	Text string `xml:",chardata"` // v2, v2, v2, v2, v2, v2, v...
    

    which does not provide any additional information about possible values for this field. In my opinion results like this is good enough

    	Text string `xml:",chardata"` // v2
    

    I can create a PR with this feature if you think it's a good idea.

  • xml attribute with underscore and number breaks zek

    xml attribute with underscore and number breaks zek

    $ echo '<Yo _30day="aoeu" />' | zek
    2021/08/16 19:36:06 5:1: expected '}', found 30
    

    The resulting struct is something like

    type Yo struct {
      30day string
    }
    

    which causes go fmt to throw the expected '}' error since struct field names cannot start with a number.

  • Optionally make structs more compact.

    Optionally make structs more compact.

    Optionally turn:

    Postcode struct {
        Text string `xml:",chardata"`
    } `xml:"postcode"`
    

    Into:

    Postcode string `xml:"postcode"`
    
  • Increase perf by removing reflection

    Increase perf by removing reflection

    There is a way to parse xml and bind to a strict without using any reflection.

    It can also be extended to generate the golang structs from the xml.

    If you’re interested let me know. The lib is used by many golang xml projects . Battle proven

  • added named struct extraction flag

    added named struct extraction flag

    Hello!

    Here's my suggestion for a flag allowing non-nested structs, discussed in #14 . The diff is a bit noisy, but essentially it takes in a slice of names in the flag, then the structwriter check if the node that is about to be written as an anonymous struct is contained in this slice, and just writes the capitalized version of the name instead.

    Let me know what you think!

  • Feature request: Optionally generate non-nested structs

    Feature request: Optionally generate non-nested structs

    Right now zek generates nested structs, with one struct being nested in another.

    The downside of this is that it is (apparently) not possible to attach methods to the nested struct.

    Hence the question, can zek be extended to optionally generate non-nested structs?

    Real-life code example: https://github.com/probonopd/go-scribus/blob/48984ecccda9be0d30a4e7cb5be50670520f7dd2/scribus.go#L740-L748

  • xml: unsupported version

    xml: unsupported version "1.1"; only version 1.0 is supported

    zek < config.xml 
    2018/12/12 19:32:58 xml: unsupported version "1.1"; only version 1.0 is supported
    
    
    <?xml version='1.1' encoding='UTF-8'?>
    
    
  • Issue with debian package (Windows Subsystem for Linux)

    Issue with debian package (Windows Subsystem for Linux)

    dpkg-deb: Fehler: Archiv »zek_0.1.5_amd64.deb« enthält vorzeitiges Element »control.tar.xz« vor »control.tar.gz«, Abbruch dpkg: Fehler beim Bearbeiten des Archivs zek_0.1.5_amd64.deb (--install): Unterprozess dpkg-deb --control gab den Fehlerwert 2 zurück Fehler traten auf beim Bearbeiten von: zek_0.1.5_amd64.deb

Gosaxml is a streaming XML decoder and encoder, similar in interface to the encoding/xml

gosaxml is a streaming XML decoder and encoder, similar in interface to the encoding/xml, but with a focus on performance, low memory footprint and on

Aug 21, 2022
XML to MAP converter written Golang

xml2map XML to MAP converter written Golang Sometimes there is a need for the representation of previously unknown structures. Such a universal repres

Dec 8, 2022
xmlwriter is a pure-Go library providing procedural XML generation based on libxml2's xmlwriter module

xmlwriter xmlwriter is a pure-Go library providing a procedural XML generation API based on libxml2's xmlwriter module. The package is extensively doc

Sep 27, 2022
XPath package for Golang, supports HTML, XML, JSON document query.

XPath XPath is Go package provides selecting nodes from XML, HTML or other documents using XPath expression. Implementation htmlquery - an XPath query

Dec 28, 2022
Extract data or evaluate value from HTML/XML documents using XPath

xquery NOTE: This package is deprecated. Recommends use htmlquery and xmlquery package, get latest version to fixed some issues. Overview Golang packa

Nov 9, 2022
☄ The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct.
☄ The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct.

Gormat - Cross platform gopher tool The golang convenient converter supports Database to Struct, SQL to Struct, and JSON to Struct. 中文说明 Features Data

Dec 20, 2022
Go encoding/xml package that improves support for XML namespaces

encoding/xml with namespaces This is a fork of the Go encoding/xml package that improves support for XML namespaces, kept in sync with golang/go#48641

Nov 11, 2022
Gosaxml is a streaming XML decoder and encoder, similar in interface to the encoding/xml

gosaxml is a streaming XML decoder and encoder, similar in interface to the encoding/xml, but with a focus on performance, low memory footprint and on

Aug 21, 2022
Convert xml and json to go struct

xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t

Oct 23, 2022
parse and generate XML easily in go

etree The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python Ele

Dec 30, 2022
parse and generate XML easily in go

etree The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python Ele

Dec 19, 2022
goconfig uses a struct as input and populates the fields of this struct with parameters from command line, environment variables and configuration file.

goconfig goconfig uses a struct as input and populates the fields of this struct with parameters from command line, environment variables and configur

Dec 15, 2022
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str

Dec 13, 2022
Match regex group into go struct using struct tags and automatic parsing

regroup Simple library to match regex expression named groups into go struct using struct tags and automatic parsing Installing go get github.com/oris

Nov 5, 2022
:100:Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator Package validator implements value validations for structs and individual fields based on tags. It has the following unique features

Jan 1, 2023
Copier for golang, copy value from struct to struct and more

Copier I am a copier, I copy everything from one to another Features Copy from field to field with same name Copy from method to field with same name

Jan 8, 2023
💯 Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving

Package validator implements value validations for structs and individual fields based on tags.

Nov 9, 2022
Go generator to copy values from type to type and fields from struct to struct. Copier without reflection.

Copygen is a command-line code generator that generates type-to-type and field-to-field struct code without adding any reflection or dependenc

Dec 29, 2022
Automatically generate Go (golang) struct definitions from example JSON

gojson gojson generates go struct definitions from json or yaml documents. Example $ curl -s https://api.github.com/repos/chimeracoder/gojson | gojson

Jan 1, 2023
Automatically generate Go (golang) struct definitions from example JSON

gojson gojson generates go struct definitions from json or yaml documents. Example $ curl -s https://api.github.com/repos/chimeracoder/gojson | gojson

Jan 1, 2023