Parse RSS, Atom and JSON feeds in Go

gofeed

Build Status Coverage Status Go Report Card License

The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model. You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom.Feed, rss.Feed and json.Feed respectively.

Table of Contents

Features

Supported feed types:

  • RSS 0.90
  • Netscape RSS 0.91
  • Userland RSS 0.91
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
  • JSON 1.0

Extension Support

The gofeed library provides support for parsing several popular predefined extensions into ready-made structs, including Dublin Core and Apple’s iTunes.

It parses all other feed extensions in a generic way (see the Extensions section for more details).

Invalid Feeds

A best-effort attempt is made at parsing broken and invalid XML feeds. Currently, gofeed can succesfully parse feeds with the following issues:

  • Unescaped/Naked Markup in feed elements
  • Undeclared namespace prefixes
  • Missing closing tags on certain elements
  • Illegal tags within feed elements without namespace prefixes
  • Missing "required" elements as specified by the respective feed specs.
  • Incorrect date formats

Overview

The gofeed library is comprised of a universal feed parser and several feed specific parsers. Which one you choose depends entirely on your usecase. If you will be handling rss, atom and json feeds then it makes sense to use the gofeed.Parser. If you know ahead of time that you will only be parsing one feed type then it would make sense to use rss.Parser or atom.Parser or json.Parser.

Universal Feed Parser

The universal gofeed.Parser works in 3 stages: detection, parsing and translation. It first detects the feed type that it is currently parsing. Then it uses a feed specific parser to parse the feed into its true representation which will be either a rss.Feed or atom.Feed or json.Feed. These models cover every field possible for their respective feed types. Finally, they are translated into a gofeed.Feed model that is a hybrid of all feed types. Performing the universal feed parsing in these 3 stages allows for more flexibility and keeps the code base more maintainable by separating RSS, Atom and Json parsing into seperate packages.

Diagram

The translation step is done by anything which adheres to the gofeed.Translator interface. The DefaultRSSTranslator, DefaultAtomTranslator, DefaultJSONTranslator are used behind the scenes when you use the gofeed.Parser with its default settings. You can see how they translate fields from atom.Feed or rss.Feed json.Feed to the universal gofeed.Feed struct in the Default Mappings section. However, should you disagree with the way certain fields are translated you can easily supply your own gofeed.Translator and override this behavior. See the Advanced Usage section for an example how to do this.

Feed Specific Parsers

The gofeed library provides two feed specific parsers: atom.Parser, rss.Parser and json.Parser. If the hybrid gofeed.Feed model that the universal gofeed.Parser produces does not contain a field from the atom.Feed or rss.Feed or json.Feed model that you require, it might be beneficial to use the feed specific parsers. When using the atom.Parser or rss.Parser or json.Parser directly, you can access all of fields found in the atom.Feed, rss.Feed and json.Feed models. It is also marginally faster because you are able to skip the translation step.

Basic Usage

Universal Feed Parser

The most common usage scenario will be to use gofeed.Parser to parse an arbitrary RSS or Atom or JSON feed into the hybrid gofeed.Feed model. This hybrid model allows you to treat RSS, Atom and JSON feeds the same.

Parse a feed from an URL:
fp := gofeed.NewParser()
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)
Parse a feed from a string:
feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
fp := gofeed.NewParser()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Title)
Parse a feed from an io.Reader:
file, _ := os.Open("/path/to/a/file.xml")
defer file.Close()
fp := gofeed.NewParser()
feed, _ := fp.Parse(file)
fmt.Println(feed.Title)
Parse a feed from an URL with a 60s timeout:
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
fp := gofeed.NewParser()
feed, _ := fp.ParseURLWithContext("http://feeds.twit.tv/twit.xml", ctx)
fmt.Println(feed.Title)

Feed Specific Parsers

You can easily use the rss.Parser, atom.Parser or json.Parser directly if you have a usage scenario that requires it:

Parse a RSS feed into a rss.Feed
feedData := `<rss version="2.0">
<channel>
<webMaster>[email protected] (Example Name)</webMaster>
</channel>
</rss>`
fp := rss.Parser{}
rssFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(rssFeed.WebMaster)
Parse an Atom feed into a atom.Feed
feedData := `<feed xmlns="http://www.w3.org/2005/Atom">
<subtitle>Example Atom</subtitle>
</feed>`
fp := atom.Parser{}
atomFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(atomFeed.Subtitle)
Parse a JSON feed into a json.Feed
feedData := `{"version":"1.0", "home_page_url": "https://daringfireball.net"}`
fp := json.Parser{}
jsonFeed, _ := fp.Parse(strings.NewReader(feedData))
fmt.Println(jsonFeed.HomePageURL)

Advanced Usage

Parse a feed while using a custom translator

The mappings and precedence order that are outlined in the Default Mappings section are provided by the following two structs: DefaultRSSTranslator, DefaultAtomTranslator and DefaultJSONTranslator. If you have fields that you think should have a different precedence, or if you want to make a translator that is aware of an unsupported extension you can do this by specifying your own RSS or Atom or JSON translator when using the gofeed.Parser.

Here is a simple example of creating a custom Translator that makes the /rss/channel/itunes:author field have a higher precedence than the /rss/channel/managingEditor field in RSS feeds. We will wrap the existing DefaultRSSTranslator since we only want to change the behavior for a single field.

First we must define a custom translator:

import (
    "fmt"

    "github.com/mmcdole/gofeed"
    "github.com/mmcdole/gofeed/rss"
)

type MyCustomTranslator struct {
    defaultTranslator *gofeed.DefaultRSSTranslator
}

func NewMyCustomTranslator() *MyCustomTranslator {
  t := &MyCustomTranslator{}

  // We create a DefaultRSSTranslator internally so we can wrap its Translate
  // call since we only want to modify the precedence for a single field.
  t.defaultTranslator = &gofeed.DefaultRSSTranslator{}
  return t
}

func (ct* MyCustomTranslator) Translate(feed interface{}) (*gofeed.Feed, error) {
	rss, found := feed.(*rss.Feed)
	if !found {
		return nil, fmt.Errorf("Feed did not match expected type of *rss.Feed")
	}

  f, err := ct.defaultTranslator.Translate(rss)
  if err != nil {
    return nil, err
  }

  if rss.ITunesExt != nil && rss.ITunesExt.Author != "" {
      f.Author = rss.ITunesExt.Author
  } else {
      f.Author = rss.ManagingEditor
  }
  return f
}

Next you must configure your gofeed.Parser to utilize the new gofeed.Translator:

feedData := `<rss version="2.0">
<channel>
<managingEditor>Ender Wiggin</managingEditor>
<itunes:author>Valentine Wiggin</itunes:author>
</channel>
</rss>`

fp := gofeed.NewParser()
fp.RSSTranslator = NewMyCustomTranslator()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Author) // Valentine Wiggin

Extensions

Every element which does not belong to the feed's default namespace is considered an extension by gofeed. These are parsed and stored in a tree-like structure located at Feed.Extensions and Item.Extensions. These fields should allow you to access and read any custom extension elements.

In addition to the generic handling of extensions, gofeed also has built in support for parsing certain popular extensions into their own structs for convenience. It currently supports the Dublin Core and Apple iTunes extensions which you can access at Feed.ItunesExt, feed.DublinCoreExt and Item.ITunesExt and Item.DublinCoreExt

Default Mappings

The DefaultRSSTranslator, the DefaultAtomTranslator and the DefaultJSONTranslator map the following rss.Feed, atom.Feed and json.Feed fields to their respective gofeed.Feed fields. They are listed in order of precedence (highest to lowest):

gofeed.Feed RSS Atom JSON
Title /rss/channel/title
/rdf:RDF/channel/title
/rss/channel/dc:title
/rdf:RDF/channel/dc:title
/feed/title /title
Description /rss/channel/description
/rdf:RDF/channel/description
/rss/channel/itunes:subtitle
/feed/subtitle
/feed/tagline
/description
Link /rss/channel/link
/rdf:RDF/channel/link
/feed/link[@rel=”alternate”]/@href
/feed/link[not(@rel)]/@href
/home_page_url
FeedLink /rss/channel/atom:link[@rel="self"]/@href
/rdf:RDF/channel/atom:link[@rel="self"]/@href
/feed/link[@rel="self"]/@href /feed_url
Updated /rss/channel/lastBuildDate
/rss/channel/dc:date
/rdf:RDF/channel/dc:date
/feed/updated
/feed/modified
/items[0]/date_modified
Published /rss/channel/pubDate /items[0]/date_published
Author /rss/channel/managingEditor
/rss/channel/webMaster
/rss/channel/dc:author
/rdf:RDF/channel/dc:author
/rss/channel/dc:creator
/rdf:RDF/channel/dc:creator
/rss/channel/itunes:author
/feed/author /author/name
Language /rss/channel/language
/rss/channel/dc:language
/rdf:RDF/channel/dc:language
/feed/@xml:lang
Image /rss/channel/image
/rdf:RDF/image
/rss/channel/itunes:image
/feed/logo /icon
Copyright /rss/channel/copyright
/rss/channel/dc:rights
/rdf:RDF/channel/dc:rights
/feed/rights
/feed/copyright
Generator /rss/channel/generator /feed/generator
Categories /rss/channel/category
/rss/channel/itunes:category
/rss/channel/itunes:keywords
/rss/channel/dc:subject
/rdf:RDF/channel/dc:subject
/feed/category
gofeed.Item RSS Atom JSON
Title /rss/channel/item/title
/rdf:RDF/item/title
/rdf:RDF/item/dc:title
/rss/channel/item/dc:title
/feed/entry/title /items/title
Description /rss/channel/item/description
/rdf:RDF/item/description
/rss/channel/item/dc:description
/rdf:RDF/item/dc:description
/feed/entry/summary /items/summary
Content /rss/channel/item/content:encoded /feed/entry/content /items/content_html
Link /rss/channel/item/link
/rdf:RDF/item/link
/feed/entry/link[@rel=”alternate”]/@href
/feed/entry/link[not(@rel)]/@href
/items/url
Updated /rss/channel/item/dc:date
/rdf:RDF/rdf:item/dc:date
/feed/entry/modified
/feed/entry/updated
/items/date_modified
Published /rss/channel/item/pubDate
/rss/channel/item/dc:date
/feed/entry/published
/feed/entry/issued
/items/date_published
Author /rss/channel/item/author
/rss/channel/item/dc:author
/rdf:RDF/item/dc:author
/rss/channel/item/dc:creator
/rdf:RDF/item/dc:creator
/rss/channel/item/itunes:author
/feed/entry/author /items/author/name
GUID /rss/channel/item/guid /feed/entry/id /items/id
Image /rss/channel/item/itunes:image
/rss/channel/item/media:image
/items/image
/items/banner_image
Categories /rss/channel/item/category
/rss/channel/item/dc:subject
/rss/channel/item/itunes:keywords
/rdf:RDF/channel/item/dc:subject
/feed/entry/category /items/tags
Enclosures /rss/channel/item/enclosure /feed/entry/link[@rel=”enclosure”] /items/attachments

Dependencies

License

This project is licensed under the MIT License

Credits

Comments
  •  Atom: implement xml:base relative URI resolution

    Atom: implement xml:base relative URI resolution

    My application needs to resolve relative URLs in content html according to the xml:base attribute of the root feed element, so this is my attempt at implementing xml:base resolution (Issue #2 )

    I believe this will work well for my needs, and if you think it's a reasonable approach in general I don't mind spending more time fixing any issues you foresee with it.

    What it does:

    Resolve relative URIs in feed element attributes, feed elements which contain URIs (like author:uri), and HTML element attributes in atom elements of type "html" or "xhtml" according to the xml:base specification (https://www.w3.org/TR/xmlbase/)

    What it is:

    Three changesets:

    1. The first actually implements the XMLBase type and functions which live in the internal/shared package (internal/shared/xmlbase.go), with a smallish patch against atom/parser.go

    2. The second adds several tests adapted from the Python feedparser project

    3. The third fixes a small bug in atom/parser_test.go which confused me while testing for a second

    How it works:

    As each atom element is parsed, a new xml:base is (recursively) pushed to the stack; the top xml:base URI is used to resolve attributes (uses golang.org/x/net/html to parse any "html" or "xhtml" element content); then the base is popped from the stack.

    TODO:

    This has not been manually tested much yet so I'm sure there are edge cases that fail and possibly some low-hanging performance improvements.

  • Support multiple links

    Support multiple links

    This adds a Links var to Feed and Items. This was mainly created to allow the usage of Atom feeds with multiple links for each item, but the variable captures RSS feeds that have multiple links for a feed as well. It's not really useful for JSON, but it still captures the single link as one would expect.

  • Allow unknown html entities

    Allow unknown html entities

    I've encounter few feeds with html entities in or <description> for ex. <code><title>Site d&#8217;actualit&eacute; g&eacute;n&eacute;raliste</title></code></p> <p>Instead of clean UTF8: <code><title>Site d’actualité généraliste</title></code></p> <p>Even if it's not a valid XML, it's quite easy to be tolerant, and not reject the feed.</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/740408?v=4" class="lazy profile-pict-img img-fluid" alt="Missing or incomplete values when parsing extensions"> </span> </div> <div class="right"> <h4> Missing or incomplete values when parsing extensions </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <p>When parsing <a href="http://portal-api.thisisdistorted.com/xml/leftism">this RSS feed</a>, the <code>iTunesExt.Summary</code> field should be correctly populated for each item in the feed.</p> <h3>Actual behavior</h3> <p>The <code>iTunesExt.Summary</code> field is blank for every item.</p> <h3>Steps to reproduce the behavior</h3> <p>Parse the feed and inspect the resulting rss.Item values. You could also look at the translated gofeed.Item values.</p> <h3>What's going on?</h3> <p>The problem appears to be in the <a href="https://github.com/mmcdole/gofeed/blob/master/internal/shared/extparser.go#L46">parseExtensionElement</a> function. The function takes an XML node (in this case an <code><itunes:summary></code> tag) and uses it to create a new ext.Extension. It iterates over any child nodes and, if the child is of type text, <a href="https://github.com/mmcdole/gofeed/blob/master/internal/shared/extparser.go#L83">sets the Value of the new Extension</a> to the text node's value. Note that if the parent node contains multiple child nodes of type text, only the final node's value is retained.</p> <p>In <a href="http://portal-api.thisisdistorted.com/xml/leftism">this particular feed</a>, the item-level <code><itunes:summary></code> tags all contain three text nodes. The first and last are blank while the middle node holds the actual text. Currently this text is being overwritten with the final blank string.</p> <h3>A possible cause</h3> <p>If you view the source for <a href="http://portal-api.thisisdistorted.com/xml/leftism">the feed</a> you will see that there are extra line breaks around the text in the <code><itunes:summary></code> tags. These line breaks are not present on any other tags (all of which are being parsed correctly as far as I can tell). Perhaps the line breaks are causing the spurious text nodes.</p> <h3>A possible fix</h3> <p>I fixed this in my vendored version of the code by changing <a href="https://github.com/mmcdole/gofeed/blob/master/internal/shared/extparser.go#L83">this line</a>:</p> <p><code>e.Value = strings.TrimSpace(p.Text)</code></p> <p>to this:</p> <p><code>e.Value += strings.TrimSpace(p.Text)</code></p> <p>But I'm not familiar with the project. Maybe this quick fix isn't the best approach. Let me know what you think (I can submit a PR if you'd like).</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/7190048?v=4" class="lazy profile-pict-img img-fluid" alt="Remember to close non-2xx responses"> </span> </div> <div class="right"> <h4> Remember to close non-2xx responses </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>I'm using gofeed to constantly poll some Atom/RSS feeds. There was a memory leak in my project caused by HTTP responses in this library not being closed: I was using <code>ParseURL</code>. It turns out any non 2xx response was leaking, because <code>Close()</code> was not being called on it.</p> <p>This PR fixes this and also closes responses even if there was a non-nil error. The presence of an <code>err</code> does not mean an absence of <code>resp</code>, as redirection failures will produce both <code>err</code> and <code>resp</code> as non-nil.</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/1482156?v=4" class="lazy profile-pict-img img-fluid" alt="Adds support for publish date in Dublin Core."> </span> </div> <div class="right"> <h4> Adds support for publish date in Dublin Core. </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>Discovered this flaw when using Slashdot's RSS feed as an example during development. The only Published date is a DC extension.</p> <p>Example RSS URL: http://rss.slashdot.org/Slashdot/slashdotMain</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/273509?v=4" class="lazy profile-pict-img img-fluid" alt="atom.Parser atom:content type html might not be wrapped in DIV"> </span> </div> <div class="right"> <h4> atom.Parser atom:content type html might not be wrapped in DIV </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <p>The following example entry, included within a valid Atom feed, should create a gofeed.Item with the following content.</p> <pre><code class="language-xml"><atom:entry> <atom:title>Parsing Atom with gofeed</atom:title> <atom:link href="https://example.com/blog/2016/04/18/parsing-atom-with-gofeed" /> <atom:updated>2016-04-18T00:00:00+00:00</atom:updated> <atom:id>https://example.com/blog/2016/04/18/parsing-atom-with-gofeed</atom:id> <atom:content type="html"> &lt;p&gt;This is a directly included child element, no wrapping in a DIV element.&lt;/p&gt; &lt;div class="not-root"&gt;&lt;p&gt;This DIV is part of the post content, wholly unrelated to what RFC 4287 might say about DIVs.&lt;/p&gt;&lt;/div&gt; </atom:content> </atom:entry> </code></pre> <pre><code class="language-go">for _, item := range feed.Items { fmt.Println(item.Content) } // <p>This is a directly included child element, no wrapping in a DIV element.</p>\n\n<div class="not-root"><p>This DIV is part of the post content, wholly unrelated to what RFC 4287 might say about DIVs.</p></div> </code></pre> <h3>Actual behavior</h3> <pre><code class="language-go">for _, item := range feed.Items { fmt.Println(item.Content) } // <p>This DIV is part of the post content, wholly unrelated to what RFC 4287 might say about DIVs.</p> </code></pre> <h3>Steps to reproduce the behavior</h3> <p>The problematic feed is https://terinstock.com/atom.xml. The author is alright.</p> <h3>Supporting documentation</h3> <p><a href="https://tools.ietf.org/html/rfc4287#section-4.1.3.3">RFC 4287 § 4.1.3.3</a>:</p> <blockquote> <pre><code> 2. If the value of "type" is "html", the content of atom:content MUST NOT contain child elements and SHOULD be suitable for handling as HTML. The HTML markup MUST be escaped; for example, "<br>" as "&lt;br>". The HTML markup SHOULD be such that it could validly appear directly within an HTML <DIV> element. Atom Processors that display the content MAY use the markup to aid in displaying it. 3. If the value of "type" is "xhtml", the content of atom:content MUST be a single XHTML div element [XHTML] and SHOULD be suitable for handling as XHTML. The XHTML div element itself MUST NOT be considered part of the content. Atom Processors that display the content MAY use the markup to aid in displaying it. The escaped versions of characters such as "&" and ">" represent those characters, not markup. </code></pre> </blockquote> <p>Of course, a DIV is valid within a DIV, but it's not required for type html. Even if the content was wrapped with a DIV, it should be considered part of the content, for the html type.</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/27904?v=4" class="lazy profile-pict-img img-fluid" alt="Added support for JSON feed"> </span> </div> <div class="right"> <h4> Added support for JSON feed </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>Added support for JSON Feed 1.0 based on this specification - https://jsonfeed.org/version/1</p> <ul> <li>Support all fields except Hubs and JSON Feed Extensions</li> <li>Uses Jsoniter for parsing JSON, 3x faster at parsing JSON https://github.com/json-iterator/go</li> <li>Support for JSON feed in DetectFeedType</li> <li>Support for using JSON feed with the global Feed struct</li> <li>Support for translation of all known JSON feeds elements in global Feed except a few which are specified in comments as TODO</li> <li>Test coverage for all new code added to the codebase</li> </ul> <p>Resolves #80</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/8398225?v=4" class="lazy profile-pict-img img-fluid" alt="Add Timeout to httpClient"> </span> </div> <div class="right"> <h4> Add Timeout to httpClient </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>Using a client without a timeout can be leak file pointers. A more thorough discussion can be found at: https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/ The timeout is set to an arbitrary 15 seconds. Although Parse(io.Reader) decuples the parser from the request handler (allowing the use of a custom client), the default client should at least have a naive timeout.</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/5162249?v=4" class="lazy profile-pict-img img-fluid" alt="ISSUE 189 - Multiple enclosures in rss"> </span> </div> <div class="right"> <h4> ISSUE 189 - Multiple enclosures in rss </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>Multiple enclosures in RSS are not handled. Rather we get the last one.</p> <ol> <li>Added Multiple enclosure handling to RSS and translator.</li> <li>Updated test data.</li> <li>Added specific test coverage for multiple enclosures.</li> </ol> <p>Note: since the RSS parser seemed to accumulate the last enclosure, and select that one. I reversed the accumulation array, so that, anyone expecting the old behavior would not be surprised.</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/385326?v=4" class="lazy profile-pict-img img-fluid" alt="Some better support for "broken" xml"> </span> </div> <div class="right"> <h4> Some better support for "broken" xml </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <p>Currently with beta2 if the xml contains ampersands without semicolons it is considered broken (rightfully so). However, in real life, these things happen much like it does with HTML websites. One of the feeds we had to support in our project had ampersands in titles and such (i.e. "Fish & chips"). Other solutions to this problem are possible: for example add a flag to skip parsing of the contents and leave that to the library user to deal with. However, i like this approach and it adds one/two more points to the "broken xml handling".</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/119131351?v=4" class="lazy profile-pict-img img-fluid" alt="no parse content"> </span> </div> <div class="right"> <h4> no parse content </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <p>parsing tag <a href="turbo:content">turbo:content</a></p> <h3>Actual behavior</h3> <p>no parsing tag <a href="turbo:content">turbo:content</a></p> <h3>Steps to reproduce the behavior</h3> <p>feed.Items[0].Content Note: Please include any links to problem feeds, or the feed content itself! rss:</p> <pre><code><channel> <item turbo="true"> <turbo:content> <![CDATA[ <!-- ARTICLE --> ]]> </turbo:content> </item> </channel> </rss>```</code></pre> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/1053445?v=4" class="lazy profile-pict-img img-fluid" alt="GoFeed parses the CoinDesk feed but the Link fields are empty"> </span> </div> <div class="right"> <h4> GoFeed parses the CoinDesk feed but the Link fields are empty </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <h3>Actual behavior</h3> <h3>Steps to reproduce the behavior</h3> <p>Note: Please include any links to problem feeds, or the feed content itself!</p> <p>GoFeed parses https://www.coindesk.com/arc/outboundfeeds/rss/ successfully but it returns empty Link fields for both Feed and Item. It's probably due to the multiple Link tags in the feed data. See a couple of examples below.</p> <p>Channel:</p> <pre><code><channel> <title>CoinDesk</title> <link>https://www.coindesk.com</link> <link href="https://www.coindesk.com/arc/outboundfeeds/rss/?outputType=xml" rel="self" type="application/rss+xml"/> <link href="https://pubsubhubbub.appspot.com/" rel="hub"/> <atom:link href="https://www.coindesk.com/arc/outboundfeeds/rss/?outputType=xml" rel="self" type="application/rss+xml"/> <description>Latest headlines from Coindesk.</description> </code></pre> <p>Item:</p> <pre><code><item> <title> <![CDATA[Panic Grips SOL With Record Volatility and Massive Put Demand ]]> </title> <link>https://www.coindesk.com/markets/2022/11/10/sol-market-in-the-state-of-panic-record-volatility-and-put-demand-suggests/?utm_medium=referral&amp;utm_source=rss&amp;utm_campaign=headlines</link> <link href="https://www.coindesk.com/arc/outboundfeeds/rss/?outputType=xml" rel="self" type="application/atom+xml"/> <link href="https://pubsubhubbub.appspot.com/" rel="hub"/> <guid isPermaLink="false">HAENGWQ2FVEE5OCLQIWC2CQBV4</guid> </code></pre> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/82493701?v=4" class="lazy profile-pict-img img-fluid" alt="Failed to match this url:https://rss.netkeiba.com/?pid=rss_netkeiba&site=netkeiba"> </span> </div> <div class="right"> <h4> Failed to match this url:https://rss.netkeiba.com/?pid=rss_netkeiba&site=netkeiba </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <h3>No error but failed to get some content</h3> <h3>Steps to reproduce the behavior</h3> <p>Note: Please include any links to problem feeds, or the feed content itself!</p> </article> </div> </div> </div> </li> <li> <div class="d-flex"> <div class="left"> <span> <img data-original="https://avatars.githubusercontent.com/u/4865412?v=4" class="lazy profile-pict-img img-fluid" alt="XML syntax error on line 34: illegal character code U+0008"> </span> </div> <div class="right"> <h4> XML syntax error on line 34: illegal character code U+0008 </h4> <div class="review-description"> <article class="markdown-body text-wrap"> <h3>Expected behavior</h3> <p>RSS feed parsed correctly</p> <h3>Actual behavior</h3> <p>gofeed cannot parse RSS feed, with the following error:</p> <pre><code>XML syntax error on line 34: illegal character code U+0008 </code></pre> <h3>Steps to reproduce the behavior</h3> <p>Parse this feed: http://newsletter.grokking.org/?format=rss</p> <p>Apparently there's a "strange" character in line 34</p> </article> </div> </div> </div> </li> </ul> </div> </div> </div> </div> <div class="col-lg-4 right"> <div id="basic" class="tab-pane fade show active"> <div class="box shadow-sm rounded bg-white mb-3"> <div class="box-title border-bottom p-3"> <h6 class="m-0">Related tags </h6> </div> <div class="tags mt-2 ml-4 mb-2"> <a href="/catalog/go-text-processing_newest_1"> Text Processing </a> <a href="/tag/atom">atom</a> <a href="/tag/go">go</a> <a href="/tag/golang">golang</a> <a href="/tag/rss">rss</a> <a href="/tag/parser">parser</a> <a href="/tag/feed">feed</a> <a href="/tag/rss-feed">rss-feed</a> <a href="/tag/atom-feed">atom-feed</a> <a href="/tag/jsonfeed">jsonfeed</a> </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/gorilla-feeds-go-text-processing"><h6 class="font-weight-bold ">golang rss/atom generator library</h6></a> <p class="mb-0 text-muted">gorilla/feeds feeds is a web feed generator library for generating RSS, Atom and JSON feeds from Go applications. Goals Provide a simple interface to </p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 26, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/joegasewicz-rss-console-feed"><h6 class="font-weight-bold ">Colored RSS feeds in your console</h6></a> <p class="mb-0 text-muted">RSS Console Feed Read colored rss feeds in your console Usage ./rss-console-feed</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 22, 2021 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/woefe-tagwatch"><h6 class="font-weight-bold ">Watches container registries for new and changed tags and creates an RSS feed for detected changes.</h6></a> <p class="mb-0 text-muted">Tagwatch Watches container registries for new and changed tags and creates an RSS feed for detected changes. Configuration Tagwatch is configured thro</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Jan 7, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/eduncan911-podcast-go-text-processing"><h6 class="font-weight-bold ">iTunes and RSS 2.0 Podcast Generator in Golang</h6></a> <p class="mb-0 text-muted">podcast Package podcast generates a fully compliant iTunes and RSS 2.0 podcast feed for GoLang using a simple API. Full documentation with detailed ex</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 23, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/gonejack-thunderbird-rss-html"><h6 class="font-weight-bold ">This command line converts thuderbird's exported RSS .eml file to .html file</h6></a> <p class="mb-0 text-muted">thunderbird-rss-html This command line tool converts .html to .epub with images fetching. Install > go get github.com/gonejack/thunderbird-rss-html Us</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 15, 2021 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/foolin-pagser-go-text-processing"><h6 class="font-weight-bold ">Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler</h6></a> <img class="lazy img-fluid float-left mr-2" style="max-width: 100px;max-height: 60px" alt="Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler" data-original="https://github.com/foolin/pagser/raw/master/grammar.png" > <p class="mb-0 text-muted">Pagser Pagser inspired by page parser。 Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and str</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 13, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/warpfork-go-testmark"><h6 class="font-weight-bold ">Parse data and test fixtures from markdown files, and patch them programmatically, too.</h6></a> <p class="mb-0 text-muted">go-testmark Do you need test fixtures and example data for your project, in a language agnostic way? Do you want it to be easy to combine with documen</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Oct 31, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/sbstjn-allot-go-text-processing"><h6 class="font-weight-bold ">Parse placeholder and wildcard text commands</h6></a> <p class="mb-0 text-muted">allot allot is a small Golang library to match and parse commands with pre-defined strings. For example use allot to define a list of commands your CL</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Nov 24, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/emersion-go-vcard-go-text-processing"><h6 class="font-weight-bold ">A Go library to parse and format vCard</h6></a> <p class="mb-0 text-muted">go-vcard A Go library to parse and format vCard. Usage f, err := os.Open("cards.vcf") if err != nil { log.Fatal(err) } defer f.Close() dec := vcard.</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 26, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/poundifdef-go-remarkable2pdf"><h6 class="font-weight-bold ">Go library to parse and render Remarkable lines files</h6></a> <img class="lazy img-fluid float-left mr-2" style="max-width: 100px;max-height: 60px" alt="Go library to parse and render Remarkable lines files" data-original="https://github.com/poundifdef/go-remarkable2pdf/raw/main/static/go-remarkable2pdf.png" > <p class="mb-0 text-muted">go-remarkable2pdf Go library to parse and render Remarkable lines files as PDF. </p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Nov 7, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/beevik-etree-go-text-processing"><h6 class="font-weight-bold ">parse and generate XML easily in go</h6></a> <p class="mb-0 text-muted">etree The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python Ele</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 19, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/mattn-go-shellwords-go-text-processing"><h6 class="font-weight-bold ">Parse line as shell words</h6></a> <p class="mb-0 text-muted">go-shellwords Parse line as shell words. Usage args, err := shellwords.Parse("./foo --bar=baz") // args should be ["./foo", "--bar=baz"] envs, args, e</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 23, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/clbanning-mxj-go-text-processing"><h6 class="font-weight-bold ">Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.</h6></a> <p class="mb-0 text-muted">mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 29, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/jf-tech-omniparser-go-text-processing"><h6 class="font-weight-bold ">omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.</h6></a> <img class="lazy img-fluid float-left mr-2" style="max-width: 100px;max-height: 60px" alt="omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc." data-original="https://github.com/jf-tech/omniparser/raw/master/cli/cmd/web/playground-demo.gif" > <p class="mb-0 text-muted">omniparser Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JS</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Jan 4, 2023 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/stackerzzq-xj2go-go-text-processing"><h6 class="font-weight-bold ">Convert xml and json to go struct</h6></a> <p class="mb-0 text-muted">xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Oct 23, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/gitHusband-markdowntable-go-text-processing"><h6 class="font-weight-bold ">Easily to convert JSON data to Markdown Table</h6></a> <p class="mb-0 text-muted">Easily to convert JSON data to Markdown Table</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Oct 28, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/mosuka-wikipedia-jsonl"><h6 class="font-weight-bold ">wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format.</h6></a> <p class="mb-0 text-muted">wikipedia-jsonl wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format. How to use At first, download the XML dump from Wikime</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 26, 2022 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/krishpranav-jsonparser"><h6 class="font-weight-bold ">A simple json parser built using golang</h6></a> <p class="mb-0 text-muted">jsonparser A simple json parser built using golang Installation: go get -u githu</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 29, 2021 </div> </div> <div class="box shadow-sm mb-3 rounded bg-white ads-box"> <div class="p-3 border-bottom"> <a href="/g/asaskevich-govalidator-go-text-processing"><h6 class="font-weight-bold ">[Go] Package of validators and sanitizers for strings, numerics, slices and structs</h6></a> <p class="mb-0 text-muted">govalidator A package of validators and sanitizers for strings, structs and collections. Based on validator.js. Installation Make sure that Go is inst</p> </div> <div class="p-2"> <i class="fa fa-clock-o ml-3" aria-hidden="true"></i> Dec 28, 2022 </div> </div> </div> </div> </div> </div> <!-- footer --> <footer class="bg-white"> <div class="container"> <div class="copyright"> <div class="logo"> <a href="/"> <img src="/assets/images/logo_golangd.png"> </a> </div> <p>2022.GolangResource </p> </div> </div> </footer> <!-- footer--> <!-- Bootstrap core JavaScript --> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha512-bnIvzh6FU75ZKxp0GXLH9bewza/OIw6dLVh9ICg0gogclmYGguQJWl8U30WpbsGTqbIiAwxTsbe76DErLq5EDQ==" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.5.0/js/bootstrap.bundle.min.js" integrity="sha512-Oy5BruJdE3gP9+LMJ11kC5nErkh3p4Y0GawT1Jrcez4RTDxODf3M/KP3pEsgeOYxWejqy2SPnj+QMpgtvhDciQ==" crossorigin="anonymous"></script> <!-- select2 Js --> <script src="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.13/js/select2.min.js" integrity="sha512-2ImtlRlf2VVmiGZsjm9bEyhjGW4dU7B6TNwh/hx/iSByxNENtj3WVE6o/9Lj4TJeVXPi4bnOIMXFIJJAeufa0A==" crossorigin="anonymous"></script> <!-- Custom --> <script src="/assets/js/custom.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.lazyload/1.9.1/jquery.lazyload.min.js"></script> <script> $(function() { $("img.lazy").lazyload({ threshold :180, failurelimit :20, effect : "fadeIn" }); }); </script> <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/10.5.0/highlight.min.js"></script> <script> hljs.initHighlightingOnLoad(); </script> </body> </html><script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script>