Introduction to rFeedParser, the Universal Feed Parser in Ruby

rFeedParser is a translation of Mark Pilgrim's Universal Feed Parser from Python into Ruby. It has nearly the exact same behavior.

Example code:

            require 'rubygems'
            require 'rfeedparser' 
            # Now we can parse!

            fp = FeedParser.parse('some-url-or-filepath')
            
            # These next four lines are equivalent!
            fp.entries.each{ |e| puts e.title }
            fp.entries.each{ |e| puts e['title'] }
            fp['entries'].each{ |e| puts e.title }
            fp['entries'].each{ |e| puts e['title'] }
        

Notes on Time/Date Storage: The storage of dates in the *_parsed items (such as, updated_parsed, created_parsed, etc.) are in the Python 9-tuple format. While this is great for passing the date tests, Ruby has no good way of understanding these things. So, I wrote a helper method py2rtime that takes one of these 9-tuples as its argument and returns a Ruby Time object. It's placed in the top-level namespace, so you can call it anywhere in your code (ex. py2rtime(fp.entries[0].updated_parsed) ).

More documentation can be found over at the Feed Parser documentation site.

The latest release can is 0.9.93 and you can track new releases, send bug reports or just yell at me belligerently over at the rFeedParser project page.

Depedencies and Installation

Currently, rFeedParser is a bit of a Frankenstein's monster. It depends on Hpricot, Character-Encodings, ActiveSupport, HTMLTools, HTMLEntities, and the not-quite-easy-to-get Ruby bindings of the Expat XML Parser. The first four are easy to install (they all come as gems, and the dependencies are built in to the rfeedparser gem), while the Expat bindings are less so. If you are on a Ubuntu or Debian system, you can simply install the libxml-parser-ruby1.8 package. Others will have to download the source code from Yoshida Masato's site and compile it yourself (Sorry). I do have a gem for xmlparser which includes some patches from Debian, but there have been difficulties compiling it on certain systems, specifically some MacOSX 10.4 boxes. For some reason, it compiles fine on mine but not others. I'd love some help figuring out this Works for Me™ bug.

These dependencies are all being used because they allow for (mostly) UTF-8 safe parsing and speed out the wazoo. rFeedParser runs at roughly the same pace as the Python Feed Parser in its best configuration. Of course, rFeedParser is quite a bit faster and more complete than the other Ruby feed parsers available.

Well, enough talk. Go code. And get a hold of me through the project page or my contact form if you have questions.

Tests and Completeness

For the curious, rFeedParser's XML test cases are exactly the same XML files that the original Universal Feed Parser is tested against. It currently succeeds on 98.7% of the tests, with some of the failures coming from "superficial" problems ("would pass if this little twiddly bit that doesn't mean anything was different"). I hope to correct these in the near future. I should point out here that this large of a success rate over such a mature and complex testing environment makes rFeedParser, by far, the most viable of the current crop of Ruby feed parsers.

I'll be blogging about all the coolness I was able to create from Sam Ruby's pirate testing ideas, Hpricot and so on. Links will be appearing as I write them.

And, lo, the first link, On rFeedParser, did appear. And, it was way a bit long but, otherwise okay.

On Me

I'm Jeff Hodges and I sometimes blog over at Something Similar and I've got a side project up over at Mixed States. I've recently taken a position at ICTV where part of my work is maintaining and improving rFP.