Fork me on GitHub

ROME Feature Requests

  • BUG: com.rometools.rome.io.impl.DateParser:Date parseW3CDateTime(String) incorrectly uses a comma (",") rather than a decimal (".") to delimit the seconds from miliseconds. The correct format can be found on http://www.w3.org/TR/NOTE-datetime. The bug is on line 170 (version 0.8). The fix is to replace the line with this: int secFraction = pre.indexOf("."); -- JLP 9/4/2006
  • BUG: Atom 1.0 parsing uses wrong Content types. "text", "html", "xhtml" do not match what is parse from the content elements. Subsequently, the content elements always have a null value - no way to get content.
  • BUG: Link in description is not parsed
    Try to parse http://jakarta.apache.org/site/rss.xml, look at entry http://jakarta.apache.org/site/news/news-2006-q1.html#20060107.1 This entry has an "<a href..." in the first line, which isn't parsed by Rome -- Main.iterson - 25 Jan 2006
  • BUG: Support all encodings
    The problem is when reading RSS a space between the encoding to the value or ualue in '' insted of "" will cause error, for example: this will work work encoding="windows-1255" but this: encoding = "windows-1255" or encoding='windows-1255' won't work.
  • BUG: The reader doesn't attempt use the masks that defined in the rome.properties for reading the date for all date parsing method, e.g. RSS093Parser.parseItem uses DateParser.parserRFC822 which is not covered by that logic -- Main.den_st - 17 Jan 2006; if it will use the mask the code will run good. I had a problem to read date and I defined a mask in the properties file (datetime.extra.masks=yyyy-MM-dd'T'HH:mm:ss trying to read 2005-09-22T09:00:41} ). Then i try to change one of the mask at runtime to the mask i defined in the properties file and it works good. The logic in the code trys to format the date with each one of the default masks if it faild it returns null instead of trying to format the date using the format that defined in the rome.properties file.
  • Support for writing to OutputStreams. If I want to compress the feeds to a (.gz) file or write to a socket, I have to extend SyndFeedOutput and WireFeedOutput to add a method called output(SyndFeed, OuputStream). It would be nice to have that built in instead. -- Main.agherna - 08 Aug 2005
  • I'd like the getDate method on feeds and entries to go to the associated modules and retrieve the appropriate dc:date when the getDate() method returns null. This way entries from feeds like this one: http://www.magpiebrain.com/index.xml would have valid dates without requiring me to write code work out what format the feed is in and act accordingly.
  • Would like to see OPML parser also.
    This is already supported by Rome. RSS2.0 parser, see Date Elements mapping by default does not process Modules. Refer to the Modules Plugins documentation to see how to enable this.
  • RFC3229 support (in RomeFetcher and example code implementing it for production) would be a killer feature.
  • The RSS 1.0 Spec http://web.resource.org/rss/1.0/spec indicates that the suggested maximum length for a description field on an entry is 500 characters, but the 0.4 codebase enforces 500 characters as a hard limit -- exceeding it on input or output generates a FeedException. Since one doesn't always have control over the feeds one consumes, it seems to me that it would be a good idea if Rome were more forgiving in accepting feed entries that exceed the suggested lengths.
  • Is there a chance to include an option in Rome for liberal parsing, ie. trying to get most out of a feed even when it's non-conforming without throwing exceptions? I believe RSS is pretty close to HTML not from a technical point of view but thinking of practical use. Hence, RSS feeds will be incorrect in many cases however they still could join the party with a tolerant parser. Maybe Rome could do for Java what Mark Pilgrim has done for Python (although I did not verify his ultraliberal parser's tolerance)?
  • More liberal parsing for dates, to handle un-parseable dates like: "12 sep 1998", "'05" or monsters like this one : "[2005]". I faced this problem using DCModule, dc:date attribute can have mentioned values. In older versions of my app there were no constrains for date format, so users have written them very freely.
  • I think that Rome has problems parsing rss feeds where the xml contains a link to a stylesheet. Try parsing http://ihatemyflatmate.blogspot.com/atom.xml (Atom) or http://msdn.microsoft.com/rss.xml (RSS 1.0). I get Exceptions with both, and they both have stylesheets, whereas other working feeds don't.
  • It would be very nice to have a possibility to add stylesheet to generated feed. I can do this by replacing header in generated String, but this method is ...
  • There are problems with the correct encoding of HTML when generating RSS2.0. It is in the area of extended character sets. If you encode the following:
    <FONT size="2">Quatre pi&amp;#232;ces</FONT>
    

    In the hope of gettin "Quatre pièces" in your html feed. You get from the SynFeedOutput.output() this:

    &amp;lt;FONT size="2"&amp;gt;Quatre pi&amp;amp;#232;ces&amp;lt;/FONT&amp;gt;
    

    Which ends up with "Quatre pièces" being displayed in the RSS Reader that is taking your feed. To get the correct ouput I have had to resort to outputString.replaceAll("&#","&#"); OK as a a workaround but not very elegant or performant! -- Main.rjwallis - 19 Mar 2005

  • From that what i know about Rome it's impossible to use "<[!CDDATA [" entities in content of feed's tags. i know this isn't essential, but it would a very nice feature.
  • Provide support for lastBuildDate in RSS, many news provider, including Yahoo News and BBC use lastBuildDate instead of lastPublishDate.