Package feed implements a flexible, robust and efficient RSS and Atom parser

Feed Parser (RSS, Atom)

GoDoc Travis Build Status

Package feed implements a flexible, robust and efficient RSS/Atom parser.

If you just want some bytes to be quickly parsed into an object without care about underlying feed type, you can start with this: Simple Use

If you want to take a deeper dive into how you can customize the parser behavior:

Installation & Use

Get the pkg

go get github.com/jloup/xml

Use it in code

import "github.com/jloup/xml/feed"

Simple Use : feed.Parse(io.Reader, feed.DefaultOptions)

Example:

f, err := os.Open("feed.txt")

if err != nil {
    return
}

myfeed, err := feed.Parse(f, feed.DefaultOptions)

if err != nil {
    fmt.Printf("Cannot parse feed: %s\n", err)
    return
}

fmt.Printf("FEED '%s'\n", myfeed.Title)
for i, entry := range myfeed.Entries {
    fmt.Printf("\t#%v '%s' (%s)\n\t\t%s\n\n", i, entry.Title,
                                                 entry.Link,
                                                 entry.Summary)
}

Output:

FEED 'Me, Myself and I'
	#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
		eggs and bacon, yup !

	#1 'Dinner' (http://example.org/2005/04/02/dinner)
		got soap delivered !

feed.Parse returns a BasicFeed which fields are :

// Rss channel or Atom feed
type BasicFeed struct {
  Title   string
  Id      string // Atom:feed:id | RSS:channel:link 
  Date    time.Time
  Image   string // Atom:feed:logo:iri | RSS:channel:image:url
  Entries []BasicEntryBlock
}

type BasicEntryBlock struct {
	Title   string
	Link    string
	Date    time.Time // Atom:entry:updated | RSS:item:pubDate
	Id      string // Atom:entry:id | RSS:item:guid
	Summary string
}

Extending BasicFeed

BasicFeed is really basic struct implementing feed.UserFeed interface. You may want to access more values extracted from feeds. For this purpose you can pass your own implementation of feed.UserFeed to feed.ParseCustom.

type UserFeed interface {
    PopulateFromAtomFeed(f *atom.Feed) // see github.com/jloup/xml/feed/atom
    PopulateFromAtomEntry(e *atom.Entry)
    PopulateFromRssChannel(c *rss.Channel) // see github.com/jloup/xml/feed/rss
    PopulateFromRssItem(i *rss.Item)
}

func ParseCustom(r io.Reader, feed UserFeed, options ParseOptions) error

To avoid starting from scratch, you can embed feed.BasicEntryBlock and feed.BasicFeedBlock in your structs

Example:

type MyFeed struct {
	feed.BasicFeedBlock
	Generator string
	Entries   []feed.BasicEntryBlock
}

func (m *MyFeed) PopulateFromAtomFeed(f *atom.Feed) {
	m.BasicFeedBlock.PopulateFromAtomFeed(f)

	m.Generator = fmt.Sprintf("%s V%s", f.Generator.Uri.String(), 
	                                    f.Generator.Version.String())
}

func (m *MyFeed) PopulateFromRssChannel(c *rss.Channel) {
	m.BasicFeedBlock.PopulateFromRssChannel(c)

	m.Generator = c.Generator.String()
}

func (m *MyFeed) PopulateFromAtomEntry(e *atom.Entry) {
	newEntry := feed.BasicEntryBlock{}
	newEntry.PopulateFromAtomEntry(e)
	m.Entries = append(m.Entries, newEntry)
}

func (m *MyFeed) PopulateFromRssItem(i *rss.Item) {
	newEntry := feed.BasicEntryBlock{}
	newEntry.PopulateFromRssItem(i)
	m.Entries = append(m.Entries, newEntry)

}

func main() {
    f, err := os.Open("feed.txt")

    if err != nil {
        return
    }

    myfeed := &MyFeed{}

    err = feed.ParseCustom(f, myfeed, feed.DefaultOptions)

    if err != nil {
        fmt.Printf("Cannot parse feed: %s\n", err)
        return
    }

    fmt.Printf("FEED '%s' generated with %s\n", myfeed.Title, myfeed.Generator)
    for i, entry := range myfeed.Entries {
        fmt.Printf("\t#%v '%s' (%s)\n", i, entry.Title, entry.Link)
    }
}

Output:

FEED 'Me, Myself and I' generated with http://www.atomgenerator.com/ V1.0
	#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
	#1 'Dinner' (http://example.org/2005/04/02/dinner)

Robustness and recovery from bad input

Feeds are wildly use and it is quite common that a single invalid character, missing closing/starting tag invalidate the whole feed. Standard encoding/xml is quite pedantic (as it should) about input xml.

In order to produce an output feed at all cost, you can set the number of times you want the parser to recover from invalid input via XMLTokenErrorRetry field in ParseOptions. The strategy is quite simple, if xml decoder returns an XMLTokenError while parsing, the faulty token will be removed from input and the parser will retry to build a feed from it. It useful when invalid html, xml is present in content tag (atom) for example.

Example:

f, err := os.Open("testdata/invalid_atom.xml")

opt := feed.DefaultOptions
opt.XMLTokenErrorRetry = 1

_, err = feed.Parse(f, opt)

if err != nil {
  fmt.Printf("Cannot parse feed: %s\n", err)
} else {
  fmt.Println("no error")
}

Output:

no error

with XMLTokenError set to 0, it would have produced the following error:

Cannot parse feed: [XMLTokenError] XML syntax error on line 574: illegal character code U+000C

Parse with specification compliancy checking

RSS and Atom feeds should conform to a specification (which is complex for Atom). The common behavior of Parse functions is to not be too restrictive about input feeds. To validate feeds, you can pass a custom FlagChecker to ParseOptions. If you really know what you are doing you can enable/disable only some spec checks.

Error flags can be found for each standard in packages documentation:

  • RSS : github.com/jloup/xml/feed/rss
  • Atom : github.com/jloup/xml/feed/atom

Example:

// the input feed is not compliant to spec
f, err := os.Open("feed.txt")
if err != nil {
    return
}

// the input feed should be 100% compliant to spec...
flags := xmlutils.NewErrorChecker(xmlutils.EnableAllError)

//... but it is OK if Atom entry does not have <updated> field
flags.DisableErrorChecking("entry", atom.MissingDate)

options := feed.ParseOptions{extension.Manager{}, &flags}

myfeed, err := feed.Parse(f, options)

if err != nil {
    fmt.Printf("Cannot parse feed:\n%s\n", err)
    return
}

fmt.Printf("FEED '%s'\n", myfeed.Title)

Output:

Cannot parse feed:
in 'feed':
[MissingId]
	feed's id should exist

Rss and Atom extensions

Both formats allow to add third party extensions. Some extensions have been implemented for the example e.g. RSS dc:creator (github.com/jloup/xml/feed/rss/extension/dc)

Example:

type ExtendedFeed struct {
    feed.BasicFeedBlock
    Entries []ExtendedEntry
}

type ExtendedEntry struct {
    feed.BasicEntryBlock
    Creator string // <dc:creator> only present in RSS feeds
    Entries []feed.BasicEntryBlock
}

func (f *ExtendedFeed) PopulateFromAtomEntry(e *atom.Entry) {
    newEntry := ExtendedEntry{}
    newEntry.PopulateFromAtomEntry(e)
    f.Entries = append(f.Entries, newEntry)
}

func (f *ExtendedFeed) PopulateFromRssItem(i *rss.Item) {
    newEntry := ExtendedEntry{}
    newEntry.PopulateFromRssItem(i)

    creator, ok := dc.GetCreator(i)
    // we must check the item actually has a dc:creator element
    if ok {
        newEntry.Creator = creator.String()
    }
    f.Entries = append(f.Entries, newEntry)

}

func main() {
     f, err := os.Open("rss.txt")

    if err != nil {
        return
    }

    //Manager is in github.com/jloup/xml/feed/extension
    manager := extension.Manager{}
    // we add the dc extension to it
    // dc extension is in "github.com/jloup/xml/feed/rss/extension/dc"
    dc.AddToManager(&manager)

    opt := feed.DefaultOptions
    //we pass our custom extension Manager to ParseOptions
    opt.ExtensionManager = manager

    myfeed := &ExtendedFeed{}
    err = feed.ParseCustom(f, myfeed, opt)

    if err != nil {
        fmt.Printf("Cannot parse feed: %s\n", err)
        return
    }

    fmt.Printf("FEED '%s'\n", myfeed.Title)
    for i, entry := range myfeed.Entries {
        fmt.Printf("\t#%v '%s' by %s (%s)\n", i, entry.Title,
                                                 entry.Creator,
                                                 entry.Link)
    }
}

Output:

FEED 'Me, Myself and I'
	#0 'Breakfast' by Peter J. (http://example.org/2005/04/02/breakfast)
	#1 'Dinner' by Peter J. (http://example.org/2005/04/02/dinner)
Owner
Similar Resources

Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Jan 13, 2022

A NMEA parser library in pure Go

go-nmea This is a NMEA library for the Go programming language (Golang). Features Parse individual NMEA 0183 sentences Support for sentences with NMEA

Dec 20, 2022

TOML parser for Golang with reflection.

THIS PROJECT IS UNMAINTAINED The last commit to this repo before writing this message occurred over two years ago. While it was never my intention to

Dec 30, 2022

User agent string parser in golang

User agent parsing useragent is a library written in golang to parse user agent strings. Usage First install the library with: go get xojoc.pw/userage

Aug 2, 2021

Simple HCL (HashiCorp Configuration Language) parser for your vars.

HCL to Markdown About To write a good documentation for terraform module, quite often we just need to print all our input variables as a fancy table.

Dec 14, 2021

A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.

goldmark A Markdown parser written in Go. Easy to extend, standards-compliant, well-structured. goldmark is compliant with CommonMark 0.29. Motivation

Dec 29, 2022

A PDF renderer for the goldmark markdown parser.

A PDF renderer for the goldmark markdown parser.

goldmark-pdf goldmark-pdf is a renderer for goldmark that allows rendering to PDF. Reference See https://pkg.go.dev/github.com/stephenafamo/goldmark-p

Jan 7, 2023

Experimental parser Angular template

Experimental parser Angular template This repository only shows what a parser on the Go might look like Benchmark 100k line of template Parser ms @ang

Dec 15, 2021

Freestyle xml parser with golang

fxml - FreeStyle XML Parser This package provides a simple parser which reads a XML document and output a tree structure, which does not need a pre-de

Jul 1, 2022
Comments
  • code.google.com is broken

    code.google.com is broken

    Hi,

    As announced, code.google.com repositories have been removed as of January 2016. The build of this package is broken.

    Can we expect a quick migration to either vendored package or new packages for code.google.com/p/go-charset/charset and code.google.com/p/go-charset/data?

    Thank you.

Related tags
Watches container registries for new and changed tags and creates an RSS feed for detected changes.

Tagwatch Watches container registries for new and changed tags and creates an RSS feed for detected changes. Configuration Tagwatch is configured thro

Jan 7, 2022
Svenska-yle-rss-content-fixer - Attach content to Svenska Yle RSS feeds

svenska-yle-rss-content-fixer This little tool attaches article content to the S

Oct 4, 2022
iTunes and RSS 2.0 Podcast Generator in Golang

podcast Package podcast generates a fully compliant iTunes and RSS 2.0 podcast feed for GoLang using a simple API. Full documentation with detailed ex

Dec 23, 2022
This command line converts thuderbird's exported RSS .eml file to .html file

thunderbird-rss-html This command line tool converts .html to .epub with images fetching. Install > go get github.com/gonejack/thunderbird-rss-html Us

Dec 15, 2021
Colored RSS feeds in your console

RSS Console Feed Read colored rss feeds in your console Usage ./rss-console-feed

Dec 22, 2021
A dead simple parser package for Go
A dead simple parser package for Go

A dead simple parser package for Go V2 Introduction Tutorial Tag syntax Overview Grammar syntax Capturing Capturing boolean value Streaming Lexing Sta

Dec 30, 2022
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS

Jan 4, 2023
A shell parser, formatter, and interpreter with bash support; includes shfmt

sh A shell parser, formatter, and interpreter. Supports POSIX Shell, Bash, and mksh. Requires Go 1.14 or later. Quick start To parse shell scripts, in

Dec 29, 2022
A simple CSS parser and inliner in Go

douceur A simple CSS parser and inliner in Golang. Parser is vaguely inspired by CSS Syntax Module Level 3 and corresponding JS parser. Inliner only p

Dec 12, 2022
Unified diff parser and printer for Go

go-diff Diff parser and printer for Go. Installing go get -u github.com/sourcegraph/go-diff/diff Usage It doesn't actually compute a diff. It only rea

Dec 14, 2022