Clojure, Protocol Buffers and ZeroMQ, oh my!

So, what does it mean to put together ZeroMQ and Pro­to­col Buffers inside of Clo­jure? What exists below is really just a quick thrown-together com­bi­na­tion of a few sam­ple bits of code from projects. This isn’t intended as a les­son in ZeroMQ, Pro­to­col Buffers, or Clo­jure, but if you’ve got even a lit­tle bit of expe­ri­ence with a Lisp-ish lan­guage, you should be fine.

First, you’re going to need to have ZeroMQ installed, and Anto­nio Garrote’s post is a great guide. On Mac OS X, at least, you can’t use the Mac Ports ver­sion of ZeroMQ, as it’s built 32-bit. Maybe, if you ran the Java VM as 32-bit, but I just built it myself, fol­low­ing the instruc­tions in the guide. The only other thing I did was install the jar using the fol­low­ing Maven incantation:

$ mvn install:install-file -DgroupId=org.zmq -DartifactId=jzmq \
  -Dversion=2.0.7-SNAPSHOT -Dpackaging=jar -Dfile=src/zmq.jar

Once you’ve got that, you’ll need to install Pro­to­col Buffers, and again, I build it from scratch as I couldn’t find a port in the Mac Ports col­lec­tion. You’ll also need to do a mvn install in the java sub­di­rec­tory to get it installed in your local Maven repos­i­tory ($HOME/.m2/).

Here’s my hacked-together Leinin­gen project file:

(defproject cljpbzmq "0.0.1-SNAPSHOT"
  :description "FIXME: I'm a lazy SOB"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]
                 [org.clojars.mikejs/clojure-zmq "2.0.7-SNAPSHOT"]
                 [org.zmq/jzmq "2.0.7-SNAPSHOT"]
                 [clojure-protobuf "0.2.11"]
                 [com.google.protobuf/protobuf-java "2.3.0"]]
  :dev-dependencies [[swank-clojure "1.3.0-SNAPSHOT"]]
  :native-path "/usr/local/lib")

Once you run lein deps you should have all the pieces. If not, please post a com­ment so I can update things since I did this over 2 days and didn’t pay com­plete atten­tion the whole time.

So the first thing we need to do is cre­ate the Pro­to­col Buffer descrip­tion. I’ve stolen this whole­sale from the Java tuto­r­ial from Google, and then sim­pli­fied the code even more.

package tutorial;

option java_package = "org.amber.tutorial";
option java_outer_classname = "AddressBookProtos";

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
}

I put this into $PROJECT_HOME/proto/addressbook.proto, which is the rec­om­mended place from the clojure-protobuf project. To com­pile the code into the Java code, and then into a com­piled Java class, use the fol­low­ing bits:

$ protoc --java_out=src proto/addressbook.proto
$ javac -cp lib/protobuf-java-2.3.0.jar src/org/amber/tutorial/AddressBookProtos.java

Unfor­tu­nately, I don’t use cake and wasn’t able to get the tasks to work cor­rectly from clojure-protobuf so I did it by hand. Need to write some Leinin­gen exten­sions soon.

Now, we need to write some code to actu­ally receive all those fas­ci­nat­ing mes­sages we’re going to send. Rather than keep it all inside the sin­gle process, I wanted to just throw it together as a multi-process application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(use 'protobuf)
(use 'org.zeromq.clojure)
(import org.amber.tutorial.AddressBookProtos) 
 
(def *ctx* (make-context 1)) 
 
(defprotobuf Person org.amber.tutorial.AddressBookProtos$Person)
 
(future 
 (let [sock (make-socket *ctx* +upstream+)]
   (bind sock "tcp://127.0.0.1:5555")
   (loop [msg (recv sock)]
     (println (str "Received message: " (protobuf-load Person msg)))
     (recur (recv sock)))))

So let’s walk through the code. First, lines 1 – 3 bring in the pack­ages that we’re going to need. On line 5, we cre­ate a “global” con­text for ZeroMQ. Note that since we’re not using the inproc: com­mu­ni­ca­tion method, we pass an argu­ment of 1 to make-context, which tells it to cre­ate a sin­gle thread in the thread pool for com­mu­ni­ca­tion. If we were using inproc:, we’d have to pass a 0 in instead. Next, we need to build the Clo­jure ver­sion of the Java Pro­to­col Buffers skele­ton code. This elim­i­nates all the tedious get­ters and set­ters that Java devel­op­ers so love, and makes it behave much more like a proper Clo­jure data structure.

Finally, we get into the actual code, where we cre­ate a socket, bind it, and sit and wait for mes­sages to come in. Those mes­sages then get passed through protobuf-load to de-serialize them, and poof, they are printed. This will run for­ever waiting.

Now, we have the sender code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(use 'protobuf)
(use 'org.zeromq.clojure)
(import org.amber.tutorial.AddressBookProtos) 
 
(def *ctx* (make-context 1)) 
 
(defprotobuf Person org.amber.tutorial.AddressBookProtos$Person)
 
(doseq [i (range 0 5)]
  (future (let [s (make-socket *ctx* +downstream+)]
            (connect s "tcp://127.0.0.1:5555")
            (loop [c 0]
              (send- s (protobuf-dump
                        (protobuf Person
                                  :id (* i c)
                                  :name "Bob"
                                  :email "bob@foobar.com")))
              (Thread/sleep (rand 5000))
              (recur (inc c))))))

This cre­ates 5 sep­a­rate threads — that’s what future does under­neath — and sends mes­sages. Not a lot dif­fer­ent here, except instead of protobuf-load we use protobuf with the Person “object” to build a new data struc­ture, and then protobuf-dump to seri­al­ize it into its native binary for­mat. And as “they say”, Bob’s your uncle.

One final bit is that the whole (protobuf Person ...) non­sense seems a bit annoy­ing to me. Until I fig­ure out a bet­ter way to do it, I’ve taken to using par­tial appli­ca­tions to hide it all:

(def pb-person (partial protobuf Person))

Now, you can sim­ply do:

(pb-person :id 42 :name "Bob" :email "foo@bar.com")

It’s not per­fect, but it’s a start.

Pear and goat cheese parfait

A dessert for a cool fall evening if I’ve ever seen one.

Pear and goat cheese parfait

Comice pear with goat cheese, honey, lime and grains of par­adise.

The coming IPocalypse, Pt 2 — The Addressening

For most peo­ple, if they think about IP address­ing at all, they see it as a 1:1 assign­ment of addresses to machines. This isn’t true though, either in the­ory or prac­tice. A sin­gle machine can con­tain at least the fol­low­ing address types:

  1. IPv4 uni­cast — 192.168.1.10
  2. IPv4 loop­back — 127.0.0.1
  3. IPv4 broad­cast — 192.168.1.255
  4. IPv4 mul­ti­cast — 224.0.1.41
  5. IPv6 uni­cast — 2001:DB8:0:0:0202:B3FF:FE1E:8329
  6. IPv6 loop­back — ::1
  7. IPv6 any­cast — 2001:DB8:0:0:0:0:0:0
  8. IPv6 mul­ti­cast — FF05:0:0:0:0:0:1:3

That doesn’t even begin to get into the IPv6 dif­fer­en­ti­a­tion between global, link-local and site-local addresses. By the way, it’s impor­tant to note that IPv6 does not have the idea of a broad­cast address. Instead, there’s a manda­tory mul­ti­cast address with the link-local all-nodes address FF02::1. A sin­gle inter­face on a machine could con­tain any num­ber of those, and with the rise of vir­tu­al­iza­tion, this is def­i­nitely a dis­tinct like­li­hood. So what does this look like:

Now, when we intro­duce vir­tu­al­iza­tion into the pic­ture, we get this lovely thing:

As you can imag­ine, this can get very com­pli­cated quite quickly. If one of the things you’re try­ing to do is under­stand traf­fic flows — say for a secu­rity project — you need to know when mul­ti­ple addresses are shared by a sin­gle instance of an oper­at­ing sys­tem. It would also be nice to know when those oper­at­ing sys­tems are shar­ing a sin­gle piece of hard­ware. All of this requires some help­ful abstractions.

Read More »

The coming IPocalypse, Pt 1

Nearly 15 years ago, the IETF gave final approval to the next ver­sion of IP: IPv6. This came after sev­eral years of exten­sive research, pro­to­typ­ing and work by a lot of very smart peo­ple attempt­ing to solve the prob­lems fac­ing the then-current ver­sion, IPv4. The prob­lem is, in the inter­ven­ing years, IPv6 has not been rolled out into pro­duc­tion on any sub­stan­tive basis. While Internet2 has made some effort at rolling it out, the pen­e­tra­tion is still not full, and IPv4 con­tin­ues to run in parallel.

So what does all this mean? Well, a few weeks ago, OMB released a memo (PDF) instruct­ing fed­eral agen­cies to begin migra­tion now. The prob­lem? Well, who remem­bers GOSIP, a US gov­ern­ment man­date that all agen­cies use OSI pro­to­cols for their net­works. This was issued in 1990, and by 1995 it had been watered down to effec­tively say “well, use some­thing”. IETF–based solu­tions like IP were the dom­i­nant solu­tions at that point. This is to say that — as inter­est­ing as an OMB man­date might appear to be — it’s effec­tively a tooth­less piece of paper that has zero impact.

So where does that leave us?

Read More »

Pick a peck of pickled encodings

In a pre­vi­ous post, I talked about ZeroMQ, and how it han­dles a lot of the under­ly­ing pieces/parts for you when you’re writ­ing a dis­trib­uted appli­ca­tion. One thing it doesn’t deal with is the encod­ing of application-level data onto the wire. It just moves byte-based mes­sages around. For that, you’ll need some kind of encod­ing scheme. There are a cou­ple of choices that dom­i­nate the options out there:

  • XML
  • ASN.1
  • XDR
  • JSON
  • Pro­to­col Buffers
  • Thrift

Each of these for­mats has trade-offs that have to be eval­u­ated. In this post, I’m going to take a look at each of them — sorta — and dis­cuss a bit of the pros and cons of each.

Read More »

Refactoring a life, Pt 1

Within the soft­ware devel­op­ment com­mu­nity, the term refac­tor­ing is used pretty fre­quently. Mar­tin Fowler defines it as:

Refac­tor­ing is the process of chang­ing a soft­ware sys­tem in such a way that it does not alter the exter­nal behav­ior of the code yet improves its inter­nal structure.

The ques­tion for me is, what hap­pens if we apply some of the same prin­ci­ples of soft­ware refac­tor­ing to a person’s life?

For soft­ware, the “exter­nal behav­ior” refers to the actual func­tion­al­ity of the soft­ware. What does it accom­plish? To what pur­pose can it be put? For a human being, though, it’s more about sur­vival in the world and the hap­pi­ness we try to pur­sue while sur­viv­ing. It’s not a per­fect metaphor — what one is? — but it’ll do for this purpose.

Think of this as the first of a n–part series on fig­ur­ing it out. It may take a while.

Thinking in Ø

If Berke­ley sock­ets didn’t exist, some­one would have to invent them. If that some­one then focused more on the abstrac­tion of what you want to accom­plish, rather than how, what you would get is ZeroMQ. If you then stole a char­ac­ter from Nor­we­gian, you’d get ØMQ. The cre­ators of ZeroMQ define it as:

ØMQ looks like an embed­d­a­ble net­work­ing library but acts like a con­cur­rency frame­work. It gives you sock­ets that carry whole mes­sages across var­i­ous trans­ports like inproc, IPC, TCP, and mul­ti­cast. You can con­nect sock­ets N-to-N with pat­terns like fanout, pub­sub, task dis­tri­b­u­tion, and request-reply. It’s fast and small enough to be the fab­ric for clus­tered prod­ucts. Its asyn­chro­nous I/O model gives you scal­able mul­ti­core appli­ca­tions, built as asyn­chro­nous message-processing tasks. It has over twenty lan­guage APIs and runs on most oper­at­ing sys­tems. ØMQ is open source and fully sup­ported by iMatix.

That’s a lot of ter­mi­nol­ogy in one def­i­n­i­tion, so let’s look at it in turn.

Embed­d­a­ble
This is key. ØMQ is not an appli­ca­tion. It’s a library, and it’s designed to be embed­ded into your application.
Con­cur­rency
The library is designed to deal with a lot of the issues around thread­ing and asyn­chro­nous responses. This helps enor­mously in build­ing more scal­able appli­ca­tions. In addi­tion, it allows the devel­oper to ignore a lot of that and write more famil­iar code.
Message-oriented
From the zmq_socket ref­er­ence: “Where con­ven­tional sock­ets trans­fer streams of bytes or dis­crete data­grams, ØMQ sock­ets trans­fer dis­crete mes­sages.” You either get the entire mes­sage, or you get nothing.
Trans­port
ØMQ is not designed just for net­work appli­ca­tions. While it obvi­ously can run over TCP/IP, and even mul­ti­cast PGM, it is just as happy run­ning inside a sin­gle machine (IPC), or even a sin­gle process space (inproc). The only dif­fer­ence is how you bind things together.
Pat­terns
Pat­terns can often be a dirty word. Some would argue that pat­terns are often com­pen­sa­tion for the fail­ure of a lan­guage, but in this case, pat­terns refer to com­mu­ni­ca­tion pat­terns. ØMQ sup­ports 4 core com­mu­ni­ca­tion pat­terns: publish/subscribe, request/response, pipeline and exclu­sive pair. The last is the most sim­i­lar to what peo­ple are famil­iar with when it comes to socket pro­gram­ming. It also sup­ports the tools nec­es­sary to cre­ate routers and inter­me­di­ary sys­tems with­out major effort.
Asyn­chro­nous
This is key. Effec­tively, you queue a mes­sage and then for­get about it. You don’t have to worry as much about how things work under-the-hood.

So, for me, the best way to think of ØMQ is as a robust toolkit for build­ing dis­trib­uted appli­ca­tions, in both large and small scales. I was con­fused at first by the sim­i­lar­ity between the name and tools like Rab­bitMQ, ActiveMQ, Web­sphere MQ. What ØMQ is is the tools you can build things like that with, but also a sim­pler approach. There’s a few things I’ve learned about it that I like.

While with The Matrix there was no spoon, with ØMQ, there is no bro­ker. Or, more truth­fully, the bro­ker is every­where. It’s also very easy to use. For exam­ple, here’s an exam­ple from the man­ual that imple­ments a net­work “hello world”:

import zmq
import time
 
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:5555")
 
while True:
    # Wait for next request from client
    message = socket.recv()
    print "Received request: ", message
 
    # Do some 'work'
    time.sleep (1) # Do some 'work'
 
    # Send reply back to client
    socket.send("World")

In it, you can see a cou­ple things. First, we cre­ate a Context, which is a han­dle for all the back­end behav­ior of ØMQ. Then, we cre­ate a socket, where we spec­ify what kind of socket we want. This is the pat­tern that the socket is imple­ment­ing. Do not think of this as a socket in the Bek­er­ley sock­ets world, it’s not. It’s an “end point” for com­mu­ni­ca­tion. We then “bind” the socket to a spe­cific end point loca­tion, namely port 5555 for a TCP socket. The “*” means it will bind to all inter­faces. If you wanted to lis­ten to another port, let’s say port 6666, we could sim­ply add another line after it:

socket.bind("tcp://*:5555")
socket.bind("tcp://*:6666")

From the developer’s per­spec­tive, we only have to deal with that one socket end point, and not mul­ti­ple ones. This is a big dif­fer­ence between ØMQ and tra­di­tional socket pro­gram­ming. You could even bind it to mul­ti­cast and IPC mech­a­nisms and not have to worry about it. To me, this is a huge gain over tra­di­tional socket pro­gram­ming. Once we’ve bound things, we sit and wait for mes­sages to show up, using socket.recv(). Pre­tend to do some work, and then send some­thing back. It’s that easy. In fact, ØMQ even takes care of things like LRU load bal­anc­ing and fair-queuing for you. It might not be a 100% solu­tion for every­one, but it is for 99% of applications.

So what else is fun and excit­ing in ØMQland? Well, first, you can con­nect to a “remote” socket before the server actu­ally starts the socket up for lis­ten­ing. In tra­di­tional pro­gram­ming, this would go boom and you’d have to catch it. With ØMQ, it just sits wait­ing. Obvi­ously, this may not be what you want, but it often is. It “just works”.

So, every­thing is happy rain­bows and ponies? No, there’s some big imped­i­ments. First, until recently there was spotty and inad­e­quate doc­u­men­ta­tion for some­thing as intri­cate as a mes­sag­ing solu­tion. This has recently been rec­ti­fied with a new man­ual, which brings a lot of clar­ity to the sit­u­a­tion. There’s many things that now make sense that didn’t before. I highly rec­om­mend it as a good place to start.

In addi­tion, unlike a lot of tools, Java is a def­i­nite second-class cit­i­zen. It requires the baroque cre­ation known as JNI, and, at least on the Mac, there’s def­i­nite issues with 32 v 64-bit libraries. The libraries that are built if you use Mac Ports are 32-bit and you’ll have a very hard time get­ting them to work — if you even can — with the JNI-base Java libraries. I had to build every­thing from scratch, but Anto­nio Gar­rote has a great blog post on it. Once I did that, the Clo­jure and Java inter­faces worked fine, but read­ing online it seems like they’re def­i­nitely not a focus. Python inter­faces, how­ever, work pretty cleanly from the start

.

Next, while a bunch of trans­ports are sup­ported and bind­ing to them is triv­ial, they do not all behave exactly the same. There’s a few edge cases touched on in the man­ual about inproc espe­cially. This makes life slightly less double-rainbow. There’s also a gen­eral over­load of terms like “socket” with mean­ings that aren’t what they have been tra­di­tion­ally. This is a pet peeve of mine, and I think cre­ates some con­fu­sion in devel­op­ers try­ing to learn.

All told, though, it’s a great tool, and likely what I’ll be using to build network-based solu­tions. While it’s also designed to deal with multi-threaded prob­lems, most of my multi-threaded code already uses actors or STM to solve a lot of issues, and I find that works bet­ter for my needs. Your mileage may vary, espe­cially if you’re in a lan­guage that doesn’t have either.

Grado iGi in-ear monitors

For a while, I’ve been “sur­viv­ing” with the iPhone ear­buds, and they are, to put it sim­ply, dread­ful. They aren’t the worst I’ve ever heard, but they’re def­i­nitely a tri­umph of style over sub­stance. Given I also own a pair of Ety­motic Research ER-6i ear­phones, why was I using the dread­ful Apple bits? Sim­ple … the ER-6i made my ear hurt, though not because of the design. No, it was the “sound”. It sim­ply was painful to lis­ten to them for any extended period, some­thing I didn’t learn until after hav­ing had them for a while. The boost in the 2kHz and up range made my head hurt — per­haps from hear­ing dam­age as a child — and I couldn’t lis­ten to them for more than 15 – 20 min­utes before hav­ing to surrender.

So, when I wanted to lis­ten to things, I returned to my ever trust­wor­thy Grado Labs head­phones, for­ever wish­ing that they’d finally release an in-ear prod­uct wor­thy of their sto­ried name. And, now, they have. In fact, they’ve released two dif­fer­ent in-ear mon­i­tor head­phones: the afford­able iGi and the more up-market GR8. Since what I was look­ing for was some­thing to wear dur­ing my com­mute, the gym and per­haps at work, I went with the iGi, which arrived this afternoon.

The ini­tial ver­dict? They’re def­i­nitely voiced like a Grado prod­uct. Right now, there’s a tiny bit of break­ing in to be done — Grado claims 100 hours — but already they’re lightyears ahead of the piti­ful Apple prod­uct and the tire­some Ety­motic head­phones. I’m look­ing for­ward to wear­ing them as often as I want, and now all I need to do is fig­ure out the best set of tips to use.

Pushing the waterfall uphill

The ideas behind agile devel­op­ment method­olo­gies are not new. They were more for­mally stated in The Agile Man­i­festo. The ideas behind more basic iter­a­tive devel­op­ment are even older, and have their own trade-offs. Both con­tain the idea that you can not know every­thing upfront. The agile method­ol­ogy states this idea explicitely:

Wel­come chang­ing require­ments, even late in
devel­op­ment. Agile processes har­ness change for
the customer’s com­pet­i­tive advantage.

Iter­a­tive, on the other hand, allows for change between iter­a­tions, though not within them in any mean­ing­ful fash­ion. Each has it’s advan­tages, and largely they are cul­tural dif­fer­ences. Some orga­ni­za­tions sim­ply aren’t capa­ble of one or the other, and that’s OK.

The project I’m work­ing on now is attempt­ing to be iter­a­tive. We are try­ing very hard to be iter­a­tive, but the dif­fi­culty is that the client is absolutely insis­tent that we define every sin­gle require­ment up front before we do any­thing. For exam­ple, we are devel­op­ing use cases to doc­u­ment the exter­nal actor’s inter­ac­tions with the sys­tem. A sim­ple enough idea, and one that is very use­ful in a large sys­tem such as this. The prob­lem is two-fold how­ever. First, we are expected to com­plete the use case in one iter­a­tion and never touch it again. Touch­ing it again requires a con­tract change. Bril­liant. Equally bad, but more sub­tly so, is the fact that there is entirely too much detail in the use cases.

To me, use cases rep­re­sent value when they “tell a story” about how an actor (user) inter­acts with the sys­tem. They are told clearly from the actor’s per­spec­tive, and dis­cuss how the actor accom­plishes their goal, but not how the sys­tem works under­neath. Unfor­tu­nately, here we are putting in detailed enti­ties, attrib­utes and ser­vice (SOA) calls that will be con­sid­ered part of the con­tract. This makes it nearly impos­si­ble to “do the right thing” when the time comes, and forces an inor­di­nant amount of design detail into a period of time where not enough infor­ma­tion exists to make the decision.

In the end, it feels like a water­fall process, but with use­less labels applied. You can pre­tend it’s “agile”, or it’s “iter­a­tive”, but when the client looks at you in a meet­ing, and with a straight face and no sense of irony says: “This project is too big to be agile” you know some­thing is going to go ter­ri­bly wrong. The larger the project, the more impor­tant it is to be agile and adap­tive. The scope is entirely too big to ever know every­thing up front. When they say they want to define every sin­gle require­ment before begin­ning the design, and will not allow revis­it­ing of the require­ments, you know it’s going to be a long uphill battle.

There has to be a bet­ter way. I’m no Mar­tin Fowler, but this just doesn’t feel like a good idea.

A beautiful Google Reader application for the iPhone

Main screen of the Reeder 2 iPhone application for reading Google Reader feeds

Cur­rently, I sub­scribe to just over 740 RSS/Atom feeds in Google Reader. This has replaced my old stand-by of Net­NewsWire that I used for­ever. There’s two main rea­sons for this. First, it’s avail­able and always syn­chro­nized, some­thing Net­NewsWire wasn’t at the time. More impor­tantly, though, is the “social” aspect that it brings with shar­ing of links with my friends, as well as com­ments about those links. It’s helped me dis­cover a lot of new things over the years.

One of the down­sides of that many feeds is the sheer quan­tity of items that show up every day. It ranges from around 450 on a “slow” day to over 1,200 on a reg­u­lar day. This makes it quite dif­fi­cult to get through every­thing, and often I let some cat­e­gories just lan­guish until I just wipe them clean. Not use­ful! All that has changed based on a tweet by Jacob Kaplan-Moss that alerted me to a beau­ti­ful ded­i­cated Google Reader appli­ca­tion for the iPhone: Reeder.

So what makes Reeder such a plea­sure to use?

  • It is an iPhone app. I mean this in the sense that it behaves in a very touch-centric way, and has all the nice touches that a good iPhone appli­ca­tion should have. Like Tweetie it has the physics in it’s UI that feels “right”.
  • It’s rea­son­ably fast. The first sync took a lit­tle bit of time — per­haps a minute over 3G — but after that it’s been quite fast for me.
  • It works with other online ser­vices like Twit­ter, Instapa­per and Deli­cious. It also let’s me share things with my friends on Google Reader.
  • I keep dis­cov­er­ing lit­tle touches that feel like presents. For exam­ple, swip­ing over an item let’s you mark it read with­out look­ing at it.

All told, for $2.99, it’s a bar­gain. On my 30 minute ride home, I was able to sift through about 750 links, read some, throw some at Instapa­per for later and mark the rest as read. Highly rec­om­mend.