Fork me on GitHub

Feed and Entry URI Mapping, how SyndFeed and SyndEntry 'uri' properties map to RSS and Atom elements

Rss and atOM utilitiEs (ROME) Synd* beans define the concept of URI at both feed and entry levels. The purpose of this URI is to help uniquely identifying feeds and entries when processed with ROME. This is particularly useful when a system is persisting feeds and when doing data manipulation where a primary key for the feeds or the entries is needed.

The Problem

RSS 0.90, 0.91, 0.92, 0.93 and 1.0 do not define any special element for the purposes of identifying the feed or the items other than the items link element. The channel link element cannot be used to identify the feed as its semantics is to refer to the web site that has the information being published through the feed and, if web site offers more than one feed most likely both of them will have the same feed link value.

RSS 0.94 and RSS 2.0 define the guid element at item level. The guid element has an overloaded meaning. If the isPermalink attribute is true, the guid value can be used also as the URL for the item instead of the link element as well as a unique ID for the item. If the isPermalink attribute is false, the guid value can be used as a unique ID for the item. However, RSS 0.94 or RSS 2.0, do not define that the value of the guid element must be an URI.

Atom 0.3 defines an id element at feed and entry levels. The id element is defined as an URI. The Atom specification being currently designed requires the id element to contain a normalized URI as defined by RFC 2396bis.

The RSS0.94 and RSS 2.0 guid element and the Atom 0.3 id element are optional elements. Because of this, they may not be present at all in feeds.

ROME's Solution

Because the concept of a URI it is not defined in all the feed formats, ROME makes an arbitrary design decision to provide the URI functionality regardless of the original (or target) feed type. The following behavior as been chosen based on expected and assumed usage pattern of feed and entry data.

URI Normalization

If the uri property of a SyndFeed or SyndEntry bean is not NULL, the getter method must return a normalized URI following the rules defined in RFC2396bis.

Converting from WireFeed (RSS or Atom) to SyndFeed

The common use case for this scenario is when consuming a feed. Because of that, for clarity purposes, when referring to the data in the WireFeed the following sections talks about elements (as in the XML feed) instead of talking of properties.

SyndFeed

None of the RSS versions define an ID at channel level. In addition to this, ROME input classes (WireFeedInput and SyndFeedInput) do not have access to the URL (if any) used to fetch the feed. Because of this ROME does not set in the SyndFeed uri property , it is left to developer to set (if needed) a URI in the feed bean.

In the case of Atom 0.3, if the id element is present in the feed, the SyndFeed uri property will be set with the value of the id element. If it is not present, the developer must set (if needed) a URI manually.

SyndEntry

For RSS 0.91, RSS 0.92, RSS 0.93 & RSS 1.0 if the link element is present in the item, the SyndEntry uri property will be set with the value of the link element. Otherwise the SyndEntry uri property is left as NULL and the developer must set it (if needed).

For RSS 0.94 & RSS 2.0 if the guid property is present in the item, the SyndEntry uri property will be set with the value of the guid element. If the guid element is not present, the SyndEntry uri property will be set with the value of the link element. If the link element is missing but the guid is present and it is flagged as a permalink, ROME will set set the SyndEntry link property with the value of the guid element (ROME is doing this because is a common practice to generated RSS 2.0 fees with guid elements marked as permalinks and without a link element).

For Atom 0.3 if the id element is present, the SyndEntry uri property is set with the value of the id element. If the id element is not present, the SyndEntry uri property is set with the value of the lternate link element.

Converting from SyndFeed to WireFeed (RSS or Atom)

The common use case for this scenario is when generating a feed. Because of that, for clarity purposes, when referring to the data in the WireFeed the following sections talks about elements (as in the XML feed) instead of talking of properties.

SyndFeed

For RSS 0.91, RSS 0.92, RSS 0.93, RSS 1.0 & RSS 2.0 the SyndFeed uri property is lost as there is not possible representation in the channel element for it.

For Atom 0.3 set the SyndFeed uri property is set as the value for the id element.

SyndEntry

For RSS 0.91, RSS 0.92, RSS 0.93 & RSS 1.0 the SyndEntry uri property is lost as there is not possible representation in the item element for it.

For RSS 0.94 & RSS 2.0 the SyndEntry uri property is set as the value of the guid element with permalink set to false. If the SyndEntry uri property is not set, the SyndEntry link property is set as the value of the guid element with permalink set to true. Note that if the SyndFEntry linkproperty is missing the SyndEntry uri cannot be set as the value of the link element because it cannot be assumed that the uri property is an URL.

Because SyndEntry instances always return a normalized URI the value of the guid elements of a generated RSS 0.94 or RSS 2.0 may differ from the values of the guid elements of the original feed.

For Atom 0.3 the SyndEntry uri property is set as the id element.