Not really XmlSimple
»From the ruby part of the brain.
Use Ruby? Like to parse XML–who doesn’t? Using Ruby’s XmlSimple library? Don’t do that… like, ever, ever.. EVER.
But if you must, take heed of the following advice.
Test #1: What do the Ruby defaults do?
Now let’s take a look at XmlSimple in straight up Ruby:
>> xml = "
<xml>
<head/>
<list>
<item id='1'>chunky</item>
</list>
</xml>"
>> XmlSimple.xml_in(xml1)
=> {"head"=>[{}],
"list"=>[{"item"=>[{"id"=>"1", "content"=>"chunky"}]}]}
Ugh, every key returns an array of hashes so you’ll end up doing things like hash["head"].first or hash["item"] to access values. It looks nasty, but it actually makes sense since there’s no way to know a priori whether list or head contain 1 or many items.
Let’s try that with the XmlSimple option of forcearray => false.
>> XmlSimple.xml_in(xml, "forcearray" => false)
=> {"head"=>{},
"list"=>{"item"=>{"id"=>"1", "content"=>"chunky"}}}
A little cleaner, but problematic as we’ll see later.
Test #2: XmlSimple uses content to reference element values, so what happens if you have an attribute called content ?
>> xml = "
<xml>
<head/>
<list>
<item id='1' content='chunky'>bacon</item>
</list>
</xml>"
>> XmlSimple.xml_in(xml, "forcearray" => false)
=> {"head"=>{},
"list"=>{"item"=>{"id"=>"1", "content"=>["chunky", "bacon"]}}}
By default, both values for the attribute and the element named “content” are returned in a single array. There’s no way to distinguish between the two.
Test #3: What happens if you have more than one <item> in a <list>?
>> xml = "
<xml>
<head/>
<list>
<item id='1'>chunky</item>
<item id='2'>bacon</item>
</list>
</xml>"
>> XmlSimple.xml_in(xml, "forcearray" => false)
=> {"head"=>{}, "list"=>{
"item"=>[{"id"=>"1", "content"=>"chunky"},
{"id"=>"2", "content"=>"bacon"}]
}}
In this example, note that <item> returns an array of two hashes. Like I previously mentioned, there’s no way for XmlSimple to know that an element will have 1 or many items. With the "forcearray" => false option, a key could return a Hash or an Array depending on the XML. Not desirable, but you can probably coerce the correct behavior with the right XmlSimple configuration options.
Now, let’s take a look at XmlSimple embedded and mixed-in with the Hash class, as it is in Rails.
Test #4: What do the Rails defaults do?
>> xml = "
<xml>
<head/>
<list>
<item id='1' content='chunky'>bacon</item>
</list>
</xml>"
>> Hash.from_xml(xml)
=> {"xml"=>{"head"=>nil, "list"=>{"item"=>"bacon"}}}
No id attributes.
Yikes!
Test #5: Similarly to before, what happens if you have more than one item, like in the case of xml2?
>> xml = "
<xml>
<head/>
<list>
<item id='1'>chunky</item>
<item id='2'>bacon</item>
</list>
</xml>"
>> Hash.from_xml(xml)
>> {"xml"=>{"head"=>nil, "list"=>{"item"=>["chunky", "bacon"]}}}
Same as before, the id attributes are removed, and <item> references both element values with a single key.
By default, Hash.from_xml in Rails will eat your attributes.
In summary, Ruby’s XmlSimple is bork^H^H^H^H surprising to use and in Rails, doubly so. Actually this really shouldn’t be surprising since most of these cautions are already mentioned on the XmlSimple homepage.
Updated: January 3rd, 2009: What to use instead of XmlSimple?
Check out libxml-ruby and more recently HTTParty. Check out HTTParty examples.