Pensieri di un lunatico minore

15 July 2005 Ruby, Smalltalk

Ruby versus Smalltalk, a useless comparison

Ruby is heavily inspired by Smalltalk. I’d say it’s about 50% Smalltalk-insired, and 50% Perl-inspired. Obviously, one can detect what my preference is in those inspirations. Since I’m talking about performance, a little anyway, I’d like to compare a bit.

Here’s the situation. I have a file, with 22,291 “micro-content” XML bits. Each root “document” has 29 elements that are interesting. All I am interested in is the textual content of these elements. The code for Smalltalk is:

firewallStream := 'ns_fw.xml' asFilename readStream.
[firewallStream atEnd] whileFalse: 
    [xml := parser parse: (firewallStream upTo: Character cr) readStream.
    root := xml root.
    newEvent := NormalizedEvent new.
    newEvent xmlConversionMap keysAndValuesDo: 
        [:elementName :selector | 
        newEvent perform: selector
            with: (root elementNamed: elementName) characterData]].

The code for Ruby:

File.open('ns_fw.xml', 'r') do |file|
  file.each_line {|line|
    tmp = NormalizedEvent.new.from_xml(line)
  }
end

This uses this code, inside the class:

def from_xml(xml_string)
    xml = REXML::Document.new(xml_string)
    root = xml.root
    XML_EVENT_MAP.each {|xml_form, event_form|
        tmp = root.elements[xml_form].text
        self.send(event_form, tmp)
    }   
end

The big difference is that the Ruby code I used a method inside the object to populate the instance attributes, as I honestly had no idea how to do this dynamically in a clean way. I’m new to Ruby, so this is likely not the optimal way to do it. Honestly, they both do the same basic thing, just moved the code inside the class for Ruby, which should make it faster.

Time? 0.89ms/iteration for Smalltalk, 24.7ms for Ruby. Both are using “native” XML parsers, not wrappers for some other library. For Ruby, this is REXML. Definitely a difference in maturity for the VMs in both places. This is not to say Ruby is bad, it’s probably roughly on par with Python and Perl, but it’s not quite in the league of Smalltalk (or Lisp, likely), which have extremely mature VMs with on-the-fly compilation and optimization.

Remember, Java’s JVM (HotSpot) came from Strongtalk, an explicitly-typed version of Smalltalk.

And for those who have read my red herring of performance comments, and wondered if I’m a hypocrite, the answer is yes, but not in this case. I happen to know from profiling that this component is used tens of millions of times, representing a large percentage of the time consumed, and worse, in a real-time fashion, so performance matters in this case.

Sadly, this is not something I can fix by saying “go away, damn XML atrocity,” because it’s something dictated by entities outside my influence.

This entry was posted at 10:32 pm on 15 July 2005 and is filed under Ruby, Smalltalk. You can follow any responses to this entry through the post-specific RSS 2.0 feed.

It will be interesting to see if Ruby’s performance improve with Ruby 2.0 and the new VM (YARV).

Well, as someone pointed out to me on the Smalltalk IRC channel, the VM underlying VisualWorks (my Smalltalk of choice) has had 20 years of development, if not more, by the smartest people in the world. It’s sad to say that these sort of things are truly hard work, and so I don’t hold out a lot of hope.

This is not to say Ruby is “slow” or “too slow”, just simply that it’s different. People should always work in what they feel productive because in most cases, developer time is the most precious commodity.

I wasn’t aware that Ruby (in its current released versions) even had a VM to be optimised!

As far as I know, Ruby does what Python does, compiles down to byte-codes and then executes them on a “virtual machine”. Perl does this, most do. I could be totally wrong, and if so, woops, but I’m not sure how else you could do it without ending up with some massively ugly code.

I spoke too soon. It looks like Ruby runs directly off the parse tree. Wow.

Your comment that high performance VM development is hard work is an important one. The ability to easily leverage someone elses hard work seems to gloss over much of the magic that is going on. The result is a tendency of some to trivialize high performance and even argue that it is unecessary.

It’s true that a lot of code can be written in any language with any level of performance. Often, these are trivial examples, but the backbone of the information age requires some really fast stuff built by some really smart people. Unfortunately, these people are often forgotten because they don’t get a lot of publicity.

The solution to the REXML performance problem, by the way, is simple. Use one of the libxml2 wrappers under Linux. Or, if you’re running under Windows and don’t care about interoperability, use MSXML via the ‘winole32’ library (included in the Ruby distribution). There’s even a pre-made wrapper for these COM objects with the Ruby 1.8.4 Windows installer – look up ‘xml.rb’ in your Ruby directory.

Responses are currently closed, but you can trackback from your own site.