Pensieri di un lunatico minore

8 March 2005 Technology

The red-herring of performance

I’m on a bunch of mailing lists, as is everyone else in the technical world, and on one of them, there’s been a recent discussion about how to interface a library to the web. Lots of people have been spouting things like:

Python is great, but use C++ because Python is too slow.

Or Perl, or PHP, or whatever. Here’s the rules:

  1. Make it work
  2. Make it right
  1. Make it fast enough

    That’s it people. Premature optimization is the mental masturbation of geeks obsessed with complexity over anything else. If you think your site is going to have performance problems: You are wrong. You aren’t writing Yahoo, or Google, and if you are, you can solve the problems incrementally as they happen, and as you understand them.

    Write in whatever language you feel most comfortable and productive in. I know quite a few Python (either Zope or SkunkWeb) based websites that are running millions of hits per day. That puts them in the top 1%. Your site isn’t like that. If you think it is, you’re wrong. Maybe someday it might be, but deal with it then.

    Also, your performance problems? Algorithmic, most likely. O(2^N^) sucks in every language for any realistic value of N. Use a good algorithm and you’ll not have performance problems for a long time. One of the nice thing about most of the scripting languages (as well as languages like Lisp and Smalltalk) is they have extremely optimized algorithms under them. You can not manage your memory better than they can, unless you are the world’s finest programmer. You’re not. You will probably never write a better hashed map algorithm, or faster list handling with any flexibility. It’s just absurd.

  2. Make it work
  3. Make it right
  1. Make it fast enough

    End rant.

    This entry was posted at 10:18 am on 8 March 2005 and is filed under Technology. You can follow any responses to this entry through the post-specific RSS 2.0 feed.

    My work project is in C# (mostly). There’s exactly one place that’s performance critical, and I sped that up incrementally as I went along.

    And in fact, it is a mostly-exponential problem, and speeding it up meant figuring out ways to fail-fast before testing call the combinations.

    C# isn’t my favorite language, but I would not have wanted to right the algorithm in C++ and had to worry about pointers and memory allocation.

    No shit man.

    I can’t believe people are still arguing this.

    What planet do these people live on, where one millisecond in performance improvement is worth 20 hours of developer time?

    1 millisecond may be worth it, if you do something a million times. This is why profiling is the only way to determine what is the correct focus. Performance can often be an issue, but it’s insane to think you know where it will be before you actually write the code and shove it in a test jig and find out.

    It’s interesting to try to argue this “myth of scaling” with marketers who insist on designing the interface of the product as if it, too, will need to deal with best-case scales: “there might be tens of thousands of users in that list!” “We think our customers will store thousands of files in their storage folders!” No, you’re wrong, we really have dozens of users in that list and each one might keep tens of files in their storage area.

    Building a UI around those insane assumptions of glorious success is another kind of mental masturbation and leads to bad products. And, like optimization, it’s not hard to build UIs that can be “scaled up” when the time comes.

    In this same project I had to scale a list UI from tens to (a few) thousand. (It involves adding sorting and filtering. Maybe some searching in the future too.)

    The bottom line is that the fastest x86 CPU available right now costs a few hours of programming time, if that. As long as your application wasn’t written by a moron, you’re better off just buying a faster CPU or another machine to add to a cluster. That’ll have the effect of making all your apps faster, and it’s cheaper.

    I use FreeBSD at my hosting service, and I had a guy explain to me one day that I would get a 10% performance gain just by switching to Linux 2.6. I have no doubt that it’s a great kernel, but my machines are dual P3-750’s that are 99% idle. I laughed and said “wow, my machines could be 99.1% idle!”

    The only web site that I’ve seen with a performance problem was written in ColdFusion by a sizeable local company that was supposedly in the business of doing such. It was actually crashing the shared Win2K server where it resided. I was tasked with looking into it (and later rewriting it in PHP) and found out why it was so awful.

    The front page of the site required 193 queries to display. When I wrote it there were 5 or so. And their queries were heavy, some pulling in millions of characters from the db just to get a row count or determine whether to show a link. The only defense the programming shop had to offer was “but, he’s our best programmer.”

    Unless your code is written by a moron like that, it probably just doesn’t matter.

    193 queries? That’s not many. I’ve worked on a product that takes over 1,500 queries to produce an HTML page.

    But I still agree, no matter which language you choose a bad programmer will still make bad code.

    The First Rule of Program Optimization
    Don’t do it.

    For experts only
    The Second Rule of Program Optimization
    Don’t do it yet.

    Michael Jackson “Principles of Program Design” 1975

    Hate to burst your bubble, but IMNSHO, if your web site doesn’t incorporate Performance By Design, and performance isn’t carefully “built in”, you are putting your business at risk.

    The most common performance problem on the Internet isn’t N squared algorithms, although there are programmers stupid enough to use them. The most common performance problem on the Internet is assuming your visitors have more bandwidth and fatter clients than they actually have!

    I struggled with 28.8 kilobits for a number of years before the local cable company finally got out here and installed broadband. And the phone company still can’t get DSL here!

    Ed Borasky
    http://www.borasky-research.net/

    Business is at risk for lack of value, not performance. I have watched thousands of companies fail, but find me one that failed because of performance, and I can show you 10,000 that failed because of poor value.

    Put the investment where it matters, not where it gives you a mental hard-on.

    > The bottom line is that the fastest x86 CPU available right now costs a few hours of programming time, if that.

    WRONG

    Even if the hardware is FREE, it is still very expensive. You need to pay IT to install it and keep it running. You need to pay for the power it draws. You need to pay for the bandwidth it consumes. You need to pay for the rack space it takes up. You need to pay for the extra A/C to keep it cool.

    We aren’t talking about another machine or two thrown under your desk. Running a datacenter can be very expensive.

    As I’ve often said, time to market is the most valuable commodity you can have, and giving it up, even to save power or A/C is absurdly stupid. As for bandwidth, it’s identical effectively, so that’s just another red herring. Management of servers in these kind of environments had best be automated and so the cost of maintenance is incrementally near zero once the investment is made. You don’t think Google has to buy admins as fast as they buy computers do you?

    For a while, I worked with someone who, when we started running into scalability problems, thought that the whole premise of writing a client application in C# that actually talked to a database was absurd. The only acceptable architecture for a non-trivial database application was to build a server-side app in C++ that bound itself tightly to the database instance (using sockets to maximize throughput, if I remember correctly) and perform virtually all the operations there, so that we only had a tiny client app and minimized the data traffic.

    The fact that it would have taken three years to start over from scratch and build this optimized monster merely meant that as far as he was concerned, the project was doomed beyond hope.

    OF course, we simply found the worst bottlenecks (mostly, requesting drastically more data than we needed at a time), and now the application performs just fine.

    Personally, my only question about scalability in selecting a platform is whether the platform has a proven enough track record in other deployments to have confidence. If we’re the first company to ever use Uncle Bob’s Web Server in a real deployment, and it’s impractical to go inside and rip it apart to make it faster (for example, we might not have source), then we might be better off choosing a different platform. If others have gotten it to scale and we don’t have any remarkable requirements, we should be able to get it to scale as well.

    [...] Speed it up. The biggest performance improvement you can make is getting it to run. Like here and here. [...]

    It depends a whole lot on what you’re writing.

    I’ve written several scientific apps and evolutionary computation engines. In that kind of programming, you’re pushing present-day computers (yes, even today’s dual-core monsters) to their utmost limit and you’re still waiting hours or days for a result. If you can take a few hours off your run time by using inline MMX assembly language, you do it.

    But I agree… this is seldom a problem for web apps. Just had to be macho and point out that some thing still take REAL ULTIMATE POWER!

    While the adage certainly holds that you shouldn’t be coding functions in assembly at the outset of your project, sticking your head in the sand about performance until it’s too late is hardly a recommended practice. This is specially true when you’re talking about something as all encompassing as the platform.

    And it is remarkable that anyone would claim that performance doesn’t matter, when it’s one of the primary reasons software systems get abandoned. Once you built a huge system with a “deal with it later” mentality, they’ll deal with it by punting the system and getting something that doesn’t die the second it’s mentioned on a meme site.

    None of that says anything about whether performance concerns of the languages mentioned matter. I mean at the outset someone might note that there is some overhead, but that selective caching can be used. That’s entirely different than pretending that performance doesn’t matter.

    Adam, even in that situation, I would argue that you have entered a “black magic” realm where the intricacies of each individual system design must be considered. I remember writing Lisp* for a Connection Machine, and it required all sorts of odd things because of the nature of the machine. This is not the case for 99% of applications that are written.

    Dennis, I would respectfully disagree. My experience is that most apps are abandoned because they fail to meet business needs, not because of speed. Speed might be an annoyance, but if they solve real business problems, then they aren’t a major issue. Not solving the business issue and being really fast is pointless.

    Fair point, but I’ve done a lot of time in the corporate space (I’ve served my debt to society!) and I’ve seen solution after solution discarded because it couldn’t handle the business with even significant amounts of hardware. Whether it was a insurance backend that was replaced by a mainframe solution, the excel spreadsheet that was used for modelling until it died, the Access application, and so on—these “platforms” and early design decisions spelled doom for the projects.

    I entirely agree that there is such a thing as premature optimization at the micro level, but at a macro level it should absolutely be a concern at the outset.

    Sadly, much of what I say presumes a modicum of talent on the part of developers to actually not do something blindingly stupid, like join 40 tables together because they’re lazy. Alas, that’s often not the case. Even still, the advice stands I think.

    As for the Excel/access things, you just simply need to think of everything in constant revision. Without the ease that the Excel option presented, you might never have known you needed something. Without the little Access database, you might not have known something needed recording. The tool enabled the exploration of business processes, and didn’t limit them.

    If people had to write everything in Java/EJB/etc, rather than using a simple Excel spreadsheet, much that works in companies wouldn’t get done.

    Responses are currently closed, but you can trackback from your own site.