Introduction to rFeedParser, the Universal Feed Parser in Ruby

rFeedParser is a translation of Mark Pilgrim's Universal Feed Parser from Python into Ruby. It has nearly the exact same behavior.

Example code:

            require 'rubygems'
            require 'rfeedparser'

            pf = FeedParser.parse(a_url_file_stream_or_string)
            # Or
            pf = rfp(a_url_file_stream_or_string)
            # These next four lines are equivalent.
            pf.entries.each{ |e| puts e.title }
            pf.entries.each{ |e| puts e['title'] }
            pf['entries'].each{ |e| puts e.title }
            pf['entries'].each{ |e| puts e['title'] }

Notes on XML Parsers: Because of rfeedparser's ability to work with multiple available XML parsers, we do not include one as a requirement. This is an unfortunate side effect of the expat XML bindings not being a ruby gem and the rubygems package management system having no way to specify that one of multiple gems would satisfy a dependency requirement. You'll have to install the libxml-ruby gem manually, or install the expat XML bindings. The expat bindings are available on OS X as rb-xmlparser in MacPorts, or on a Ubuntu or Debian system as the libxml-parser-ruby1.8 package.

Notes on Time/Date Storage: The storage of dates in the *_parsed items (such as, updated_parsed, created_parsed, etc.) are in the Python 9-tuple format. While this is great for passing the date tests, Ruby has no good way of understanding these things. There is now a "_time" key next to every key containing a Python datetime 9-tuple that contains a Ruby Time object. So, if you see updated or updated_parsed, you can also find a updated_time right next door.

More documentation can be found over at the Feed Parser documentation site.

There is one extra thing. You can subclass StrictFeedParser or LooseFeedParser and pass it to FeedParser#parse with the keys :strict and :loose and your subclass will be used to parse the feeds in its respective case.

The latest release can is 0.9.951 and you can track new releases, send bug reports or just yell at me belligerently over at the rFeedParser project page.

Depedencies and Installation

rfeedparser has undergone some major changes and now supports not only the expat XML parser, but the new, better version of libxml-ruby. We've also dropped the ActiveSupport gem through sheer awesomeness. rfp is now quite a bit less than a Frankenstein's monster.

The libxml-ruby

The rest of the gems are as follows:

These dependencies are all being used because they allow for (mostly) UTF-8 safe parsing and speed out the wazoo. rFeedParser runs at roughly the same pace as the Python Feed Parser in its best configuration. Of course, rFeedParser is quite a bit faster and more complete than the other Ruby feed parsers available.

Well, enough talk. Go code. And get a hold of me through the project page or my contact form if you have questions.

Tests and Completeness

For the curious, rFeedParser's XML test cases are exactly the same XML files that the original Universal Feed Parser is tested against. It currently succeeds on 98.7% of the tests, with some of the failures coming from "superficial" problems ("would pass if this little twiddly bit that doesn't mean anything was different"). I hope to correct these in the near future. I should point out here that this large of a success rate over such a mature and complex testing environment makes rFeedParser, by far, the most viable of the current crop of Ruby feed parsers.

I'll be blogging about all the coolness I was able to create from Sam Ruby's pirate testing ideas, Hpricot and so on. Links will be appearing as I write them.

And, lo, the first link, On rFeedParser, did appear. And, it was way a bit long but, otherwise okay.