OFX? Hpricot To The Rescue!

Feb 24, 2007

My most recent pet project is a simple budgeting app I put together with Rails. Quicken and it’s ilk have always seemed like a bit of overkill to me. All I really want to do is easily track my expenses against a basic budget every month. While entering each expense into a form works, it’s less than efficient when my bank gives me the ability to download all the transactions to Quicken / Quickbooks / MS Money. Just being able to import this file once a week or so would end up saving me a lot of time.

The thing to note is that for awhile now all three of these financial apps have stored their data in OFX format. Even though my bank gives me separate links for each application, it’s really the same OFX file.

Who needs XML?

The issue is that OFX is SGML. This means that it does not have closing tags for certain elements. So basically REXML and xml-simple are out of the running for reliably parsing this stuff. The thing to note is that HTML is SGML, so what we really need here is a good HTML parser. Who likes fruit?

Hpricot

So around last July _why unleashed Hpricot on the world, and I really haven’t had an excuse to use it yet. Turns out it’s been gemified and is super easy to parse documents with. So after about 10 minutes of hacking together a short script I was parsing OFX and getting the useful data out of it.

Hpricot wins.

hpricot-ofx.rb

require 'rubygems'
require 'hpricot'

doc = open("export.qfx") { |f| Hpricot(f) }

trns = doc.search("//stmttrn")

trns.each do |t| 
  puts "Date:\t" + 
          Time.parse(t.search("dtposted").inner_html).strftime("%m/%d/%Y") 
          + "\n" 
  puts "Amount:\t" + sprintf("%.2f", t.search("trnamt").inner_html) + "\n" 
  puts "Name:\t" + t.search("name").inner_html + "\n" 
  puts "\n" 
end