struct versus xdrlib
I have been looking at storing some data on disk, and while it would be “nice” to use the pickle format, I need it to be “cross language,” which pickle most certainly is not. So, that leaves me a couple choices, given I’m dealing with potentially hundreds of millions of data pairs.
- XML – Uselessly bloated and too slow.
struct– Simple to use, but has lots of limits about what it can represent
xdrlib– Based on Sun’s External Data Representation, it can represent a large number of things, and is used all over the placeSince performance is of some concern, I figured I’d do a really quick 20 line program to compare the two of them. Here’s some results, dealing with pairs of 64-bit numbers. Take it as you will, and there are per second numbers.
Read Write struct421,251 446,727 xdrlib55,886 97,181 This is a pretty major difference in performance. So a bit of research turns up that the
structlibrary is written in C, but thexdrlibis written in Python. That’s likely to be the biggest difference. If thexdrlibleveraged Sun’s code (written for NFS), it’d likely be just as fast. Unfortunately, I would really prefer to use XDR, but I suspect I’ll just fake some of the capabilities (like variable length strings) instruct.This entry was posted at 5:50 pm on 31 March 2005 and is filed under Python. You can follow any responses to this entry through the post-specific RSS 2.0 feed.
Yes, and it made me want to gouge my eyeballs out. :-) ASN.1 makes XML look well thought out.
What about YAML? It’s insanely popular in the Ruby community as a kind of XML for humans. Check out Syck, a YAML parser for Ruby, Python and PHP.
YAML is neat, and I’ve used it for configuration files, etc., but this is a data storage system, and we are dealing with potentially billions of data elements, so the verbosity of text is going to bloat the storage requirements (and thereby I/O requirements) by orders of magnitude.
Responses are currently closed, but you can trackback from your own site.
Have you looked at ASN.1?