Pensieri di un lunatico minore
If there’s one thing I hate more than Windows, it’s AIX. I admit it. While I know that AIX has some amazing capabilities sorely lacking in other UNIXish operating systems, dealing with software development on an AIX box is just too painful to speak of. So, in order to reduce my pain, I’m developing some integration software (Cisco CallManager with some proprietary back-end systems) on my Windows notebook in the interim.
Here’s the steps to getting Oracle and Python happy on Windows. This assumes you’re using Python v2.5 on Windows XP, or similar:
- Download Oracle’s Instant Client from their website. You probably only need two packages: Basic Lite and SQL*Plus.
- Unpack the downloaded Instant Client into
C:\instantclient.
- Open a command line and
CD C:\instantclient then run SQL*Plus with a command like sqlplus test/test@//192.168.1.100:1521/MYDB. This should connect you to the Oracle database. There’s more information here.
Assuming all of the above work fine, you’re good to start with making sure things can be found by Python, et. al.
- Go to your desktop, and right click on My Computer, select Properties, then the Advanced tab.
- Click on the button labeled Environment Variables.
- Under System Variables, find the variable
Path and click Edit.
- Append a semicolon to the displayed list of directories and append
C:\instantclient.
- Click OK enough times to get out of all the dialogs
- Log out and back in of Windows to make sure the path settings “take”
Now we need to install the Python Oracle adapter. For my purposes, even though I once worked on another Oracle adapter, I recommend cx_Oracle. I’m using version 4.3.1 successfully.
- Download the Windows installer for whichever version of Python (2.5) and Oracle (10g) you are using1.
- Run the installer, accepting the defaults, or changing as appropriate.
You should be done at this point. Pop open a Python interpreter and type in the following:
>>> import cx_Oracle
>>> ora = cx_Oracle.connect("test/test@//192.168.1.100:1521/MYDB")
>>> ora
<cx_oracle .Connection to test/test@//192.168.1.100:1521/MYDB>
>>> ora.version
'10.2.0.3.0'
>>></cx_oracle>
Assuming everything is installed correctly, and your remote Oracle instance is allowing network connections, you should get back a connection object that you can use to work with Oracle. The rest is your problem.
1 As far as I know, there’s no pre-built version against the 11g release, but you should be able to do anything you need with the older libraries. I’ve had no problems with the 10g cx_Oracle using the 11g Instant Client.
No thoughts
Since I develop a lot of things on my MacBook Pro that end up running on a multitude of other Unix-ish machines (mostly AIX, Linux and Solaris), I tend to need to make sure that I keep things as clean as possible when developing. To do that, I use virtualenv, which is a great little package that helps create miniature virtual environments to work with. A little Python clean-room.
Anyway, it comes with a script to do activation of the environment, but it was annoying to use, so here’s what I’ve done:
function activate()
{
VIRTUALENV=$HOME/src/venv/$1
if [[ ${#VIRTUAL_ENV} > 0 ]]; then
echo "ERROR: Already in virtual environment: `expr $VIRTUAL_ENV : '.<strong>/\(.</strong>\)'`"
elif [[ -d $VIRTUALENV && -x $VIRTUALENV/bin/python ]]; then
source $VIRTUALENV/bin/activate
else
echo "ERROR: Virtual environment $1 not found or setup incorrectly."
fi
}
In addition, I have a quick wrapper around the virtualenv script to make sure that everything gets put into a common location:
function virtualenv()
{
VIRTUAL_ENV_HOME=${HOME}/src/venv
VIRTUAL_ENV_NEW=${VIRTUAL_ENV_HOME}/$1
if [[ ${#1} = 0 ]]; then
echo "ERROR: Please provide a virtual environment name"
elif [[ -d $VIRTUAL_ENV_NEW ]]; then
echo "ERROR: A virtual environment already exists at ${VIRTUAL_ENV_NEW}"
else
/usr/local/bin/virtualenv $VIRTUAL_ENV_NEW
fi
}
Basically, I keep all my virtual environments in $HOME/src/venv/, and so I can quickly make sure that things get changed correctly. The little function also helps make sure you don’t try and activate a non-existent virtual environment and get silly messages and that it will keep you from “stacking” them on top of one another. I can’t think of a good reason you’d ever do that.
Here’s quick run of how I use it:
~/src$ virtualenvERROR: Please provide a virtual environment name
~/src$ virtualenv test
New python executable in /Users/petrilli/src/venv/test/bin/python
Installing setuptools............done.
<sub>/src$ activate test
(test)</sub>/src$ deactivate
<sub>/src$ activate test
(test)</sub>/src$ activate test
ERROR: Already in virtual environment: test
(test)~/src$ deactivate
~/src$ activate foo
ERROR: Virtual environment foo not found or setup incorrectly
~/src$
No thoughts
Via another blog, I caught this screencast on zc.buildout, yet another magical creation by Jim Fulton. Those of us in the Python community know Jim pretty well, and I was lucky enough to work with him at one point, so it’s got me wondering.
Does it solve the problem as well as it appears, and does it come with any scary magic I can’t live with? Anyone worked with it?
1 thought
I’ve been looking at a new online backup system from a company called SpiderOak, and today I got an invitation to download their client. Until now, I think it’s been a somewhat closed trial. Anyway, since I’m traveling, and can use it with multiple machines, I downloaded the Windows client to look at. During install, I noticed a bunch of .pyd files being installed, which are Python DLL files. Interesting, so I popped open the installed software to find:
Interesting. It’s a pretty UI, and it runs very snappily, proving that many things can be done in a “scripting language” with excellent performance.
No thoughts
So, I’m working on a project. I have no idea if I’ll be allowed to open source it or not, but that’s neither here nor there for now. I’ve been using this as a good project to work on for learning Django to add to my tool-belt of web frameworks. Certainly the Pythonesque aspects have been easy, as I’ve been writing Python code since 1995. There is one thing that bugs me, though, and that is the way forms work. Let me first show you the code:
issue = Issue()
if request.method == 'POST':
form = IssueForm(issue, request.POST)
if form.is_valid():
issue.creator = request.user
issue.status = Status.objects.get(name__exact='New')
issue.owner = Component.objects.get(id__exact=form.data['component']).lead
new_issue = form.save()
else:
form = IssueForm(issue)
return render_to_response('issue/new.html', {'form': form},
context_instance=RequestContext(request))
Now, I’m sure this is suboptimal code, but I’ve yet to figure out a clean way to only have some of my data collected to a form, and properly populate the object with the rest of the data in the background. Is this how you do it? It works, sure, but something says “kludge” to me. It’s not ugly per say, but it just feels less than elegant.
Maybe I’m just missing something?
2 thoughts
Oi! Sometimes, people annoy me. The rest of the time, they can be awefully useless. Anyway, I’m working with Django on some stuff, and need to be able to cleanly handle UTF-8 characters. This shouldn’t be too hard, as Python has had Unicode support for quite a while. Unfortunately, changing the default character encoding is painful.
First, the solution is to put the following code in your _ _ init _ _.py for the site.
import sys
# Have to futz with namespace because of idiocy
reload(sys)
sys.setdefaultencoding(‘utf-8’)
So, why the futzing? Someone decided that once site.py is loaded, you shouldn’t be able to fix up the encoding. I suppose this makes sense if you don’t trust anyone, but in the “real world,” it’s critical that the individual programs be allowed to choose their encodings. I’ve not found any other way to do this, unfortunately. If someone knows of a better way, please let me know.
n.b. The formatting is all hosed because the Textile processor on WordPress is so mind-numbingly stupid as to not understand that PRE tags should not be mangled, or that if something is inside a code tag, you don’t want it to translate italics.
No thoughts
I admit it: I hate assembler. I hate C, C++, and most everything else other people consider “programming languages.” They are glorified switch-flipping on the front panel of a PDP-8e that we no longer have sitting in front of us. They are tuned for the benefit of the computer, and honestly, as an arrogant bastard, I find my time infinately more valuable. Computers are servants. Technology must bend to my will, not the other way around. That’s just the way I am.
So, it is with great glee that I find the resurgence of discussions about more advanced programming languages. Take an article from eWeek published yesterday.
The man from Mars sees development languages being chosen for the convenience of machines, despite attendant productivity penalties and difficulty of delivering high-quality code, instead of being chosen for the convenience of the developers who are the actual scarce resource.
Continually, I have reminded people that talented developers are expensive, CPU cycles are asymptotically approaching free. They are not being “wasted” when we burn them to make the developer’s life easier—we are freeing his own cycles to contemplate better algorithms, more advanced approaches, more adaptive reasoning.
Like anything that’s been around for several decades, LISP carries the baggage of what “everyone knows” about it that is no longer true.
“Everyone knows,” for example, that LISP is an interpreted language and, therefore, too slow for production applications—except that modern LISPs can compile functions for run-time speeds competitive with those of C or C++ programs in algorithmically complex tasks.
I remember the first time I ran into LISP in a high-performance computing environment. This was in the days when Cray ruled the world of supercomputers, and every cycle was expensive. Some of the fastest programs ever written for a Cray supercomputer came out of their LISP compiler. Why? Because the algorithm could be the focus of the intellectual power of the developer, and not bit-twiddling. It was a shock, and yet logical all at once.
The trade-offs are clear. In a study performed in 2000 by Erann Gat, a researcher at the California Institute of Technology’s Jet Propulsion Laboratory, programmers writing in LISP produced programs with less variability in performance than more experienced programmers writing in C and C++.
The fastest versions of C and C++ programs were faster than most LISP implementations, but the median performance of the LISP implementations was actually twice as good as the median performance of the C and C++ code performing typical tasks (more at www.flownet.com/gat/papers/lisp-java.pdf).
For real-world teams, such reduction of technical risk and improved worst-case scenarios arguably outweigh best-case results.
And that’s the thing. We live in the real-world, and that world is swimming in CPU cycles, memory, and un-tapped resources of our systems. Use them. Don’t waste your time worrying about saving 1 cycle when it means the real problems go unsolved. You might find out that, by focusing on the underlying problem, that the solution comes sooner, with more certainty, and thorugh more interesting work.
These observations to me also apply to Smalltalk and other “research languages,” that people dismiss as “interesting,” but not fast enough. Ruby is another such beast, as is Python, or even Perl—trading “performance” for expressiveness. Worry about “performance,” in it’s myopic traditional measure, when, if you solve the underlying problem first.
Fill your tool-belt with all the tools you need to solve the problem. You might be able to build the Taj Mahal with a pair of dental tweezers and an ice-pick, but it sure won’t be much fun.
[via Phil Windley’s Technometria]
2 thoughts
There seems to be a bit of a disagreement going between Chad Fowler and Ian Bicking about what Pythoneers call “monkeypatching,” Ruby people call “opening a class” and Smalltalkers call “doing the right thing.”
What do I mean? Well, in a bit of commentary by Ian that was sent to Chad, he states:
The use I object to that I see in lots of Ruby examples (and maybe isn’t indicative of most real Ruby code) is when people add methods to other classes that aren’t meant to fix anything, but just because they don’t have an object of their own to hang the method off of. I’ve seen several examples where people add methods to Array to implement some recursive algorithm, instead of using a function.
This is a blurry line in the world of truly dynamic languages, and it reminds me of the absurd notion of final in Java. There are times when I need to extend an existing class to provide additional functionality because that is where it belongs. Sometimes people put things in the wrong place, but the idea is that they should just “write a function” isn’t necessarily the right solution either.
For example, there is a coercion idiom in Smalltalk where you use message names like asString to coerce one thing into another. It is totally appropriate to extend this to allow for further expressiveness of the language.
The important thing is that Smalltalk implementations have a clear way to do this—through class extensions, which can be loaded and unloaded as needed by other systems. Is it perfect? No, but the idea that someone in the past thought of everything that might be needed in the future is a grossly arrogant position to take.
Languages are tools, and the more they try and “protect you,” the more they limit your creativity. There are bad times to extend classes, and bad places to put functionality, but that is something one can’t make generalizations about, and has to be evaluated in the context of the specific system.
7 thoughts
There is an odd and disquiting conversation going on on comp.lang.python about adding symbols to Python. Every other language I use regularly (Smalltalk, Ruby and Lisp) has the concept of a symbol in the language. The Wikipedia contains the following definition for a symbol:
A symbol, in its basic sense, is a conventional representation of a concept or quantity; i.e., an idea, object, concept, quality, etc. In more psychological and philosophical terms, all concepts are symbolic in nature, and representations for these concepts are simply token artifacts that are allegorical to (but do not directly codify) a symbolic meaning.
In Lisp, symbols are actually objects in the system, and to quote the hyperspec:
Symbols are used for their object identity to name various entities in Common Lisp, including (but not limited to) linguistic entities such as variables and functions.
In fact, because of the structure of macros in the Lisp world, you actually have ways to generate symbols that are held as placeholders for other symbols, in the form of GENSYM. For Smalltalk, the definition is a little different, but also it is a specific type of object (from Squeak):
Symbol is a subclass of String, and understands, in large part, the same messages. The primary difference between a symbol and a string is that all symbols comprising the same sequence of characters are the same instance. Two different string instances can both have the characters ‘test one two three’, but every symbol having the characters #’test one two three’ is the same instance. This “unique instance” property means that Symbols can be efficiently compared, because equality (=) is the same as identity (==).
And in Ruby:
Simply, a symbol is something that you use to represent names and strings. What this boils down to is a way to efficiently have descriptive names while saving the space one would use to generate a string for each naming instance.
The common theme that runs through all of these implementations is that symbols are really just placeholders. We don’t particularly care what they are placeholders for, only that we can make comparison decisions based on them, and the only comparison that matters is equality. No other manipulation really matters. (This is not totally true in Lisp, but without macros, the rest vaporizes).
In the Python world, a symbol is a name for something. For example, when you define a function:
def myFunction(x, y, z):
pass
The name of the function, myFunction is a symbol, as are the variables x, y and z. The idea exists, however because it’s never been formalized in the same way they have in other languages. Python even has a symbol module, but it’s really not the same thing.
Symbols can be thought of as a parallel namespace (or in the case of some languages, multiple name spaces, each attached to a package) with a whole set of strings in them which will always be the same. A veritable garden of global names. For example:
x = 'string'
x = :string
are two seemingly similar things in Ruby, but the second reffers to a symbol that will always be the same. The reality is that the Python world has been using strings as symbols for a very long time, but without a lot of the advantages in implementation that the exposure of real symbols can bring (lots of reduced evaluation context costs).
1 thought
Avi Bryan, he of the brilliant Seaside framework writes about interfacing objects with FTP. Brilliant stuff, but it’s not actually “new.” In fact, when I worked on Zope, we implemented an FTP interface into the object-database, which was amazingly useful. There are some issues related to it, but they’re all solvable.
5 thoughts
Paul Bissex takes a look at the situation on the Mac and finds that Java has been dumped in favor of Python:
Apple’s taking a gamble here. I imagine that among other factors they expect to get better, more Mac-like applications via PyObjC than Java. This will rankle the Java folks, of course. And when we see the first official tutorial on the Ruby-ObjC bridge the rioting will start. But Apple’s never been particularly averse to pissing off developers—often to the company’s detriment. I hope this works out happily.
Seriously, though, you don’t make progress without risks. Python has a lot more in common with Objective-C and the whole Cocoa framework than Java ever will.
5 thoughts
Mark Williamson writes about Twisted and its documentation. Twisted is a asynchronous networking framework for Python. It is amazingly powerful, written by some amazingly smart people, but the documentation is quite insubstantial. To quote Mark:
This superb tool comes with the barest minimum of documentation. The new Twisted 2.0 seems determined to add to the mystery by removing the API docs and splitting it in to a number of separate components that do not always make it clear what they are for.
Mark points out the red-herring of “read the source,” for anyone who has tried to grok a large framework of code. It’s one thing to leave a 100 line script undocumented, but to leave a piece of code that is tens of thousands of lines of intricate and interdependent code effectively undocumented makes the learning curve nearly unapproachable.
You could argue that by setting the bar for entry so high the developers are saving themselves an awful lot of hassle on the support front. They do hang ‘round on IRC being helpful but I personally find that it bit like gatecrashing a party to which you haven’t been invited and trying to change the subject. The bottom line is that Twisted is well worth the effort but that effort is huge and the chances are you will end up implementing things independently that are already there once in a while. I know I have.
This was a problem we had with Zope. You had to understand a certain amount of the framework before you could understand the rest. “Zen” is what Jim called it, and he was spot on. Unfortunately, that makes it unlikely that many people will ever achieve enlightenment.
What I’m talking about isn’t detailed API documentation, but instead conceptual documentation, tons of excruciatingly well documented examples, and lots of application-focused documentation, rather than low-level. API docs are great, but they’re the reference point.
I suspect Twisted would get a lot more traction with better documentation. It would also help reveal problems in the architecture. If you can’t explain something simply, then it simply may be too complex. I keep trying to use it for an application, but finding that I need more knowledge than I have time to develop, and so I move on to something else—something likely to be inferior.
2 thoughts
While reading Joe Gregorio’s posting on implementing Sparklines in Python, I found a reference to data: URIs, which is a way to encode immediate data in the URI itself, rather than requiring another retrieval. While only supported by everyone except IE (gee, how often have we said that?), it could be interesting for embedding small graphics in an XHTML document, and reducing the load on the remote server.
Sparklines, by the way are word sized graphics, and are a concept created by the talented Edward Tufte.
2 thoughts
Brian Beck has submitted a recipe for readable switch construction in Python. This is one area that the language seems to lose its elegance, and you end up with a cascade of if and elif statements constantly repeating much of the same code. What Brian has done is leverage the use of generators and create something quite nice.
It means that you can create something like this:
for case in switch(foo):
if case(1):
print 1
elif case(2):
print 2
elif case(3):
print 3
else:
print 0
In fact, you can use cascading if case(bar) and the break to allow “fall through” of the switch block. More examples are on his recipe.
Traditionally I’ve almost always used a dispatch table, implemented as a combination of dictionaries and the occasional lambda. This works better for simple cases, although if you’re doing really complex processing inside the switch, you might consider moving it out and using the dispatch table model.
Nice work!
No thoughts
A new version of NetworkX has been released into the wild. This is a toolkit for dealing with graph models (think graph theory), and is something I’ve been playing with as a way to model system interaction in networks. The trick is keeping it up to date with the incoming data, but it provides one interesting view into your network topology from a data perspective, rather than just physical.
No thoughts
I have been looking at storing some data on disk, and while it would be “nice” to use the pickle format, I need it to be “cross language,” which pickle most certainly is not. So, that leaves me a couple choices, given I’m dealing with potentially hundreds of millions of data pairs.
- XML – Uselessly bloated and too slow.
struct – Simple to use, but has lots of limits about what it can represent
xdrlib – Based on Sun’s External Data Representation, it can represent a large number of things, and is used all over the place
Since performance is of some concern, I figured I’d do a really quick 20 line program to compare the two of them. Here’s some results, dealing with pairs of 64-bit numbers. Take it as you will, and there are per second numbers.
| |
Read |
Write |
struct |
421,251 |
446,727 |
xdrlib |
55,886 |
97,181 |
This is a pretty major difference in performance. So a bit of research turns up that the struct library is written in C, but the xdrlib is written in Python. That’s likely to be the biggest difference. If the xdrlib leveraged Sun’s code (written for NFS), it’d likely be just as fast. Unfortunately, I would really prefer to use XDR, but I suspect I’ll just fake some of the capabilities (like variable length strings) in struct.
4 thoughts
Alas, I didn’t make it this year. This is quite unfortunate, but I had other commitments that prohibited my taking several days off. Fortunately, a lot of people went, and have written about it.
Jonathan Gennick writes about his experiences, and I wanted to pull out a few responses and comments to him first:
- The location is great. Not only is it a good buiilding, but I think he hits on it that it’s not some isolated suburban “retreat” (whatever that is), but immersed in a real city, where there are lots of places to walk and mass transit to use. This makes it both cheaper for attendees (bigger reach for hotels, less need for a car), and more vibrant for social bits outside the meeting.
- Money. Funny thing, Money, but it’s actually has a characteristic that is not shared with other numerical items. It is time sensitive in many venues. For example, USD100 is a quantity of a specific unit, but time plays into it as well. USD100 is not the same today as it was 50 years ago, not when you compare it. This means that comparisons are time sensitive as well. Hopefully, whoever works on this will take this into account, as the result of a comparison between different forms of money, or money over different periods of time, are sensitive to when you do them. I have some code I wrote a long time ago, which I should at least release and update to use the Decimal data type.
Next there was the PyCon blog, which had a lot of people writing on it, which I enjoyed. I look forward to reading the papers that were presented. Ted Leung, whose writing I always enjoy, has the collaborative note-taking (using SubEthaEdit) from the conference, which is my next read.
So far, the most interesting thing I’ve heard out of PyCon this year is PythonEggs, which comes from that inscrutable Philip Eby, and the ever talented Bob Ippolito.
2 thoughts
I’ve been using Trac, from Edgewall Software, to manage some internal things I’m working on for a while. I’ll steal from their helpful website to explain what it is:
- An integrated system for managing software projects
- An enhanced wiki
- A flexible web-based issue tracker
- An interface to the Subversion revision control system
It’s not a replacement for Bugzilla, nor is it a replacement for MoinMoin or MediaWiki, but it is wonderfully integrated, and even has hooks so that if you commit a file in Subversion that has a commit message like this:
Resolved memory leak. Closes #194.
It will automatically close ticket #194 and link the changes to the ticket. All very simple to use. It provides 80-90% of the power that most everyone can use, with 10% of the effort and a lot more of the fun. It also happens to ship with some very clean XHTML and does almost everything with CSS, which makes it lay out fast and clean. The internal code is also quite easy to understand, which helps extension.
Check it out. There’s some features coming down the road that will hopefully make it a great tool for exposing to customers at some point.
No thoughts
I don’t get it, what happened to Guido? I really don’t.
def foo__(a, b): # The original function
"your code here"
def foo(a, b): # Typechecking wrapper
a = <i>typecheck</i>(a, t1)
b = <i>typecheck</i>(b, t2)
r = foo__(a, b)
r = <i>typecheck</i>(t3)
return r
Wow, yet more creative ways to do ugly code. I don’t know what else to say. I’ve yet to see a clear, consise explanation of the use cases for “static typing.” Don’t we usually start with requirements, and then figure out how we solve the requirement, rather than foisting some half-assed syntax on people that solves nobody’s problems well, and contributes to an overall decline in readability and language syntax?
Mark Williamson explains how adaption could solve the problem today, without new syntax, and especially without introducing a gaggle of underscores. What happened to:
- Problem
- Requirements
- Options
- Trade-offs
- Solution
What happened? I am convinced the only rational explanation of how one of the smartest people I’ve ever met can propose such things is that he has been taken over by the Crab People.
2 thoughts
So, I think we, the community, have given Guido a bit of heartburn on his take on static typing.
My two posts on adding optional static typing to Python have been widely misunderstood, and spurred some flames from what I’ll call the NIMPY (Not In My PYthon) crowd
I hope that Guido understands that nothing I’ve written was intended as a flame, and in fact, the idea that people have a “my Python” mentality is actually quite reassuring, as it means people have a strong attachment to the language, and strong feelings, which generally means they feel it binds to their thinking at a good level. This is the ultimate flattery for a language designer. Imagine how other languages might have progressed if its users had more input into the design, and also some care and passion. Having said that, I realize that some comments have been harsh, and perhaps mine have been a bit overly so, but I really do see this as a slippery slope, especially when combined with the debacle that was decorators.
So into the further morass. I’m going to take some things out of order here, from Guido’s thoughts, to better put mine back together, so forgive the bizarre thought process. First, several people have said I have contradicted myself on performance-related concerns, namely whether Guido expressed interest in them, and I think his thinking speaks for itself, but:
Most importantly, I’m dropping any direct connection to compile-time type checking or generating more efficient code.
Now, static typing is generally considered a “compile time” thing, at least in my mind, so ok that’s interesting, but also he’s thrown out efficient code as a concern. While he does go on to discuss that interfaces might be useful to type-inferencers.
So, where does that leave us? Back to syntax, first:
bc. @classmethod
def foo(x: t1, t: t2) -> t3:
pass
I kept out of the discussion of decorators, but wouldn’t this be much more attractive as:
bc. def classmethod foo(t1 x, t2 t) returns t3:
pass
OK, so I admit it’s not much more attractive, honestly, but I do think it’s less painful on the eyes overall, and certainly a bit more in keeping with the normal style of Python. This gets especially ugly when we begin to look at default values, as Guido discusses:
bc. def foo(x: int = 42):
pass
For my idea, it’d be:
bc. def foo(int x=42):
pass
Note that I’m keeping the spaces out of it in keeping with style recommendations and tradition in the Python world. He then goes on to discuss typed attributes of a class, which look like this:
bc. class C:
x: t1
I dunno, that just looks ugly to me, but I’m not sure how to make it better. As someone once proposed, if we make as a keyword of the language, then we can write things a lot more cleanly:
bc. def foot(t1 x, t2 y) as classmethod returns t3:
foo as anotherType
For me, while it’s a bit wordier, it’s much easier to read. And I read more code than I ever write. If I wanted to save keystrokes typing, I’d write in APL rather than anything else. But I don’t, typing is easy, reading is hard. Make it easy to read.
From attribute typing, we move to interfaces, where Guido returns to what I consider sane, and other than residual muck from his typing nonsense, proposes, I think we define interface as a keyword, and it becomes, effectively, a pseudo-classish thing:
bc. interface I1(I2, I3):
def foo(...
That’s definitely fine with me, and looks reasonably easy to read. Better than throwing an ’@’ in front of everything. But then again, I remember discussing “interfaces” with Jim Fulton and Jeffrey Shell years and years ago, and that wasn’t far from what we liked back then, and Jim later implemented.
Now, on to “design by contract” or at least by gentleman’s agreement likely:
bc..
def _pre_foo(self, a, b): # The pre-condition has the same signature as the function
assert a > 0
assert b > a
def _post_foo(self, rv, a, b): # The signature inserts the return value in front!
assert rv > b
@dbc(_pre_foo, _post_foo) # design-by-contract decorator
def foo(self, a, b):
return a+b
Holy mother of perl. This seems like a really ugly syntax. I don’t have a better one, but I know that this makes my head hurt, and is convoluted and baroque. Baroque is Perl, not Python in my mind.
Design by contract, while a popular thing to talk about on the Internet, has not taken off with a rush of enthusiasm in the “real world,” and Eiffel is an interesting, but irrelevant language to most of us, with market share that even Smalltalkers laugh at. Neat ideas, not sure they’re the right ones for most people. This seems to be low on the list of things to burn brain cells on. There are few people who get static typing right, and I can’t imagine that full design by contract is going to be accomplished by any more people. It’s just another long piece of rope to hang your productivity with, unless you’re truly one of the best.
I’m dropping the advanced and untried ideas for now, such as overloaded methods, parameterized types, variable declarations, and ‘where’ clauses. I’m also dropping things like unions and cartesian products, and explicit references to duck typing (the adapt() function can default to duck typing). Most of these (except for ‘where’ clauses) can be added back later without introducing new syntax when people feel the need, but right now they just act as red flags for the NIMPY (Not In My PYthon) crowd.
Need? I guess I’m a weird one, but I’ve not witnessed lots of people in the Python community who say they need this kind of thing. Interfaces, definately, but the type annotations seem a bit arbitrary. Then again, I’ve not been immersed in it, and have been busy writing code I can’t release for a while now. To restate my needs:
- Faster
- Library clean up
- Block closures
The last, of course, are insanely not going to happen given the rift between Stackless Python and regular Python, but there it is. If we’re going to burn a lot of bridges and set fire to the city, let’s at least get some productivity out of it. I don’t think I’ve heard anyone argue that static typing increases productivity.
Finally, Paul Boddie mentions in the comments of my first post on this topic that:
But even if we wish to perform type inference on unfinished systems, is it not too much to ask that if developers wish to see concrete type information then they supply concrete types to the inferencer in the form of test cases, for example? This is hardly more of a burden than writing out lots of type annotations manually and arguably reveals bugs more effectively, too.
I hadn’t thought of expressing types through test cases, but this is a potentially unique, and excellent way to gain multiple benefits. It provides for type-hinting for the inferencing engine, and it also pushes people to write real test cases, instead of just pretending that “static typing is safe”. This seems to be a much better approach, at least to me.
I’m somehow worried that there’s been a lot of work on adding bizarre widgets to the edge cases of Python, and very little focus on the center. This is not surprising, given the brain-power of those involved, but it is troubling. Getting Psyco running on multiple platforms would be a challenge too, and really only 1-2 others are needed to get 99% market coverage (PowerPC and SPARC).
In the end, however, it’s “Guido’s language”, and as is the fancy of open source software, it can go wherever its authors choose to take it, with no answering to its users. I can’t tell people what to work on, as they’ll work on what interests them. If this is what interests Guido, then it’s what he’s going to work on. Like it or not. When you take a free ride, you get to go where the driver wants to go.
You still don’t have to like it.
No thoughts
Several people have commented that those of us bent out of shape by Guido’s static-typing thoughts need to just “chill.” They’re just thoughts they say. Here’s what I say…
Last year, at PyCon 2004, Guido stood up on stage, and as I recall, said “I know decorators are really controversial, so we need to be careful.” He then laid out a moderate idea, as I recall, and specifically some choices that seemed vaguely sane. Then, just a few months later, decorators went into Python with a horrendously ugly syntax, and all kinds of silliness, just about all of which contradicted what he’d said at PyCon.
As absurd as it sounds, some of us felt betrayed by the leadership, or lack thereof, displayed by Guido. Honestly, I think people wore him down, and that means the system failed at some level, in my opinion. Here’s someone who has demonstrated for a very long time a strong leadership of sanity, and managed to keep the language relatively clean, and simple. Someone who has emphasized those specific things, and come back with code solutions that are often much more elegant than anything else people have proposed. Suddenly, we’re faced with a situation where there’s huge disagreement in the community, with no real broad support for any solution, but a lot of “rah rah”ing of the ideas by a few people, and it ends up in the system in a damned unPythonic syntax in my opinion.
The only saving grace of the decorator madness is that few people will use them. They’re at the edge of normal programming practice for the average Python programmer, and therefore I don’t think will become common place. So we’ve added something ugly to the language for a situation that isn’t common for most Python programmers.
Now, we’re facing something that if added would radically change the landscape. There are but two schools in the world when it comes to language typing. They can be further refined, but the truth is, you have the Smalltalk, Lisp, Python world with “dynamic typing” and Java, C, C++, etc., with “static typing”[1], and while Lisp has advisory typing, and Smalltalk has had it’s Strongtalk version, neither of them embrasses it for any other reason than performance. And yet this is one reason that Guido singles out as not important.
If Python (namely CPython) needs anything, it is a massive guts change to put a serious VM under it. The brilliance of early Python is that it is very portable, and has a relatively tiny interpreter, simply understood (it is basically a huge switch statement). However, now that Python has become a “serious language” in some people’s books, its performance is being compared with languages which are either compiled (C, C++), run on a very powerful JITing VM (Smalltalk, Java, C#), or can do both (Lisp). At that point, it really looks horrible.
This isn’t an easy thing, and it’s certainly not “sexy” work. Nor is it likely to become visible work, but looking at what Psyco can do, as well as StarKiller, Iron Python and others, it’s clear that 1) people care about performance a lot more than we’d like them to, 2) there’s some massive performance gains to be had.
If we’re going to “muss up the language,” let’s be sure we are doing to solve real problems, not some perceived “safety” BS, or documentation (where PyDoc does 10x better job). Also, let’s not use some messy Perlesque line-noise implementation. If I were forced to come up with a syntax, it would look something like this:
bc. def doSomething(self, smallint x, y) returns string:
“Doc strings belong in the doc string”
That doesn’t hurt my eyes. It does, however, require that a type hierarchy be put in place, and everything reorganized some underneath. Honestly, that isn’t that visible to the outside. It will likely require some potentially substantial parser changes, but again, the “easy” solution is largely guaranteed to be the wrong one.
Then we need to make sure people understand why they exist. In Lisp, for example, advisory typing is only done in a few places, where you need to eek out the best performance of a heavily used function. Same thing with inlining (as compared with macros). You’ll note I didn’t bother to type the second thing, that’s because in this theoretical situation, I don’t really care, as I’m going to return a string that does the same thing to everything. A good JIT-VM or compiler can then take the argument x and shove it in a register for speed, and throw away all the need to dispatch. In some languages this can be 10-100x performance boosts.
Use it when you need it, otherwise, don’t throw the baby (dynamicism) out with the bath water. Hopefully some will understand why these “thoughts out loud” make some of us very, very nervous. It’s not that Guido shouldn’t be allowed to think out loud, it’s that some of us are very, very strongly opposed to what’s being said.
1 Yes, yes, I know that C and C++ really aren’t typed languages in any really meaningful sense as they have casting that is destructive, but people perceive them as statically typed. They are, but in a weak manner.
For those interested, Peter Lount says a lot of things I’d like to think in his writing on this topic.
No thoughts
Sometimes, smart people can be so blind. It is a willful blindness, perhaps, but more the blindness that comes with the mind-numbing sameness of the computer industry. The drum beats of stupidity, as it were. Brilliance is punished, mediocrity that conforms is rewarded. Let’s start…
At the same time, this is something that many people, especially folks writing frameworks or large applications, need—and as long as there’s no standard syntax, they are forced to make up their own notation using existing facilities.
Really? Hmm. Some of the largest frameworks ever written, and some of the things which most drive the name framework, like MVC, are written in languages much more dynamic than Python. What most people call frameworks today are really little more than huge libraries with some commonality. While Guido then discusses that interfaces are also important, they are the first important thing that must be added, not static type declarations. Without interfaces, the rest is absurd fiddling.
I’m not doing this with code optimization in mind. I like many of the reasons for type declarations that were given by various proponents: they can be useful for documentation, for runtime introspection (including adaptation), to help intelligent IDEs do their magic (name completion, find uses, refactoring, etc.), and to help find certain bugs earlier.
Oh dear, the “type safety” issue. Which is a red-herring, as one can figure out by looking at actual metrics, and I think you’ll find that, at least in the production world, every truly “high quality” application is driven by massive testing. Unit testing, functional testing, structural testing, UI testing. I know GUI frameworks that have tens of thousands of unit tests, and still don’t have full code coverage. Static typing doesn’t eliminate bugs, although it might, on some rare occasion, catch a typo. The embedded systems I’ve worked on in the past actually had 2:1 ratio between testing and actual code. That is how you get quality—cover the corners and edges.
Indirectly, optimization is also served: the best way to optimize code is probably to use type inference, and type declarations can sometimes help the type inferencing algorithm overcome dark spots. Python is so dynamic that worst-case assumptions often make optimizations nearly impossible; this was brought home to me recently when I saw a preview of Brett Cannon’s thesis (sorry, no URL yet). But most programs uses the dynamism sparingly, and that’s where type declarations can help the type inferencer.
I don’t buy this. It’s not my field, but looking at the fact that Common Lisp systems often can be faster than C code, and Smalltalk is often much faster than Java, I don’t buy it. It’s not demonstrated through actual examples. Truly statically typed languages, which does not include C or C++, are generally slower, as they incur type checking all over the place.
I remember when I was working for the University of Texas, the project I was working on crossed paths with Schlumberger’s Austin Systems Center, which is where all their super-computers were. I had a discussion, illuminating in the extreme, with one of their senior researchers, whose name I’ve sadly forgotten, and he explained how they actually moved some of their code from FORTRAN to Lisp on the Crays because it was actually faster for their application.
Guido then steps into the disastrous minefield of “parameterized types”. Tell me, what language is this written in?
bc. class List(list) [T]:
That’s starting to get really ugly. One of the reasons I’ve always liked Python was it made my eyes happy visually. It’s slowly, but surely, running headlong away from that visual pleasure.
For example, consider a typed library function declared as taking a list[int] argument. Now we call this from untyped Python code with a plain list argument, where we happen to have ensured that this list only contains ints. This should be accepted of course! But if we pass it a list containing some ints and a float, this ought to fail, preferably with a TypeError at the call site.
So what do I do if I want to say that I just want a number. Any number will do. Or maybe I only want rational numbers? What if my int is actually a Decimal type? Is that ok?
bc. def foo(a: T1, b: T2) > T2 where T2 <= T1:
Holy crap. That’s all I’m saying. Larry Wall, your ship has arrived. Perhaps I missed something, but this is starting to look like Dylan, which is cool, but I don’t see any mention of generic functions, which means it basically comes loaded with baggage-that’s empty.
I don’t know what to say about the rest of it, honestly, it’s starting to make my head hurt. The simplicity of Python is being quickly subsumed. Is Guido bored? What ever happened to P4E, or Programming For Everybody? This certainly isn’t going to help.
Now, we can all sit back and say “oh yes, this is just optional things for just a few people”, but that is a bald-faced delusion. The minute this capability exists, it will be mandated, forced, and shoved down everyone’s throats by the type-fanatics that have been beating the drums. Libraries will start using it quickly, just as people adopt every new feature in fashion.
Tha’r be monsters, Guido.
18 thoughts
The BDFL has written an article discussing options for optional static-typing in Python, and I wonder what is going through his head? The last thing Python needs is static type declarations. Well, perhaps not the last. It doesn’t need braces either, but at least braces would actually hurt productivity even less than static typing.
There’s a couple reasons people talk about static typing in a language:
- Performance
- “Safety”
- Documentation
I’m going to talk about them in that order. First, performance. There’s a couple reasons why this is an illusion more than anything. Many languages with substantially more “dynamic” natures than Python (Smalltalk and Lisp being the 2 primary candidates) are much faster than Python in many places. In the case of Smalltalk, it’s a combination of a highly optimized VM plus JIT technology. In the case of Lisp, there’s some amazingly advanced native-code compiling going on. There are places where Lisp, for example, is actually faster than C simply because it’s easier to express a good algorithm in, and the compilers actually know what to do with the platform in detail.
If Python is interested in performance, integrating the advances in Psyco (for x86 only), and adding additional platforms for that, I think would be a much more useful endeavor. Psyco has, in a lot of my code, given me 2-3x improvements in total performance, and in some places 10x. It doesn’t require any typing, as it uses inferencing. In addition, as I recall from the presentation on StarKiller, type-inferencing can go very far to fixing the problem. It’s not easy, but it’s a more useful solution to the performance problem that may exist with some applications. Honestly, another thing would be to provide more types that are heavily optimized to help prevent hot spots in data structures.
Next, comes safety, which I think is a mirage. As many people have pointed out in the comments to Guido’s article, simply saying “this should be an integer” isn’t that useful. Guido himself discusses the orthogonal nature of the problem. It’s not about name, it’s about behavior, which means it’s about interfaces. More than that, it’s about constraints, and the ability to express them in a clear way, or as fans of Eiffel will call them, pre-conditions and post-conditions.
Having said that, and understanding the usefulness of such for certain applications, I don’t think most people will use them, nor do I think the performance drain that comes with such checking is going to mesh well with the first goal… increasing performance. A possible solution would be to be able to only optionally enable it for debugging purposes, more as a set of triggers in your code than actual production components. The illusion of safety that comes with static typing is just that, an illusion. Are Java programs less buggy than Python? Not likely.
Having worked on likely the largest and most complicated Python application in existence, I’m not sure static typing would have been useful at all, given our dependency on dynamicism to pull off the tricks. Interfaces, on the other hand, are amazingly useful at documenting and exploring what you want to be there. Perhaps Jim disagrees.
The final bit is documentation, or the idea of “interfaces” and “protocols” (terms used interchangeably in various languages). This is something I think is absolutely critical to large applications, and Jim, as well as Philip Eby, have done some great work in this area. By allowing the documentation of interfaces, and then the application of them to classes, and methods, we have a situation where an inferencing engine can figure out faster what’s expected, and narrow down it’s possible graph of solutions.
Just a few thoughts from my side of the boat, having spent more time in Smalltalk recently than Python, and written identical programs in each, just for grins and education. Personally, I think there are many better projects to work on for Python in order to make it more usable at an enterprise level, as where else would you care about those issues? My 15 line script doesn’t need interfaces.
- One documentation standard for embedded documentation
- Type inferencing engines with native JIT
- Cleaned up class library (you know what I mean!)
- Improved debugging tools
That’s what I care about, but simply don’t have the time, or knowledge, to work on. Static typing is simply a false meme promulgated by people who are stuck with it.
6 thoughts
As I’m working on a multi-threaded system, I found a point where the performance goes all “off-the-chart,” and degrades exponentially. This was seriously concerning, since it could create a non-recoverable situation. In fact, it threw my box into such a fit that the load went over 30, and the system wouldn’t respond if you needed disk access.
So what was the problem? Pretty simple, actually. At the end of the process, it writes things to disk in a relational database. It batches the data into one transaction (called once per second, or every n datum) for performance. The problem is, as the database gets bigger and bigger, the time it takes to commit the data increases as well, although it seems only in a normal linear fashion. Unfortunately, each time that happens, the back log of things to commit gets bigger, which means the next time around, it grows even more, causing a near exponential growth factor.
Fun! By turning off fsync() in the database, I was able to get performance to remain effectively flat at the point I want it (250 items per second). This, unfortunately, creates other problems—-namely that it could lose data in a power failure. I need to see what I can do to potentially allow things to work differently.
The nice part is I can hit that performance goal with less than 40% of the CPU burned on a desktop machine.
No thoughts
As part of a project I’m working on, I’m trying to use Posh, which is a shared-memory object sharing tool that looks pretty nice. Unfortunately, it seems to be painfully brittle, and I’m having massive issues getting it to work correctly, and lastly, not leave it’s handles hanging around in shared memory.
My main problem is things don’t seem to return correctly from forkcall, and the parent process doesn’t continue on as expected. This shows up on the demonstration Producer/Consumer example as well. At least on Linux FC3. :/ When running it, using a dequeue, and using append in the producer, and popleft in the consumer, I get this bizarre output:
bc. p1 appending 0
s1 queue: 1 items
s1 found index
s1 queue: 0 items
p1 queue: 0 items
p1 appending 1
s1 queue: 1 items
s1 found isalnum
s1 queue: 0 items
p1 queue: 0 items
p1 appending 2
s1 queue: 1 items
s1 found isalpha
s1 queue: 0 items
p1 queue: 0 items
p1 appending 3
s1 queue: 1 items
s1 found isdecimal
s1 queue: 0 items
p1 queue: 0 items
p1 appending 4
s1 queue: 1 items
p1 queue: 0 items
Huh? That’s just too weird for words. Where are those method names coming from? Playing with it in a regular Python shell with a shared copy still shows me that I get pure ints back.
Updated: I spoke with one of the authors, Steffen Viken Valvåg, and his comment was that Posh only went through proof of concept, and never further, so it has a lot of issues. That certainly clarifies the problems I’ve had with it. For now, I’m going to put it on the back burner, and come back, and perhaps update it myself.
No thoughts
One of the things I picked up in Smalltalk is that a method should be tiny. In Smalltalk, many people would frown on a method that is more than 10-12 lines. This obviously is much larger than most Python methods in many classes, and so I started thinking about it. The main reason not to have lots of little methods (LOLM) is the dispatch overhead, but is that necessarily the right problem to be worrying about right now?
My decision was that I’m going to start using the lots of little methods, as I’ve always done in Smalltalk, and then if performance is a problem, I can worry about inlining them, and potentially even using decorators to inline them for me. This is a further step on my road to stop optimizing before I understand the problem.
2 thoughts
So I’m looking at using SQLite for a simple data repository, though I’m not overly interested in its SQL components, simply for it to handle basic indexing, transactions, etc., rather than using Berkeley DB from Sleepy Cat, as I’ve had some reliability issues with it in the past. In order to get a handle on the performance model of SQLite, I decided I’d do some benchmarking. What follows is a bit of information about that.
Read the rest
3 thoughts
L. Peter Deutsch, a man who has had his fingers in nearly every major VM design of the past 20 years, has decided to implement Python on top of VisualWorks. Zoinks!
The numbers are quite interesting, and obviously there’s a long way to go, but this is a very early prototype, and the fact that it’s already in some ways almost 10x faster tells you that there’s something not kosher under the Python VM. The VM designs is very simple, and that’s great for a lot of reasons, but it is missing a lot of the advanced designs that are underlying the Lisp and Smalltalk world. I’m quite sure it made sense when Python was purely a teaching language to have it be simple under as well as over top of the VM, but realistically, Python is no longer that, and the recent addition of decorators, generators, list comprehension, and other things, show that it’s growing to become a full-blown development language.
Perhaps it’s now time to give it a real VM? IronPython is probably the #1 candidate right now, but who knows what the future holds? I wonder what Cincom would think of releasing just the object engine under a totally free license (not source, necessarily)?
[Thanks to Patrick Logan for the link.]
4 thoughts
Earlier, I had mentioned a tool called Trac, which is written in Python and manages to do everything I’ve been looking for in a project management application. I’ve been playing with it, and honestly, it’s making some great progress, and I’m looking forward to the next release.
The one thing that I’ve seen so far that I dislike intensely is how it handle authentication. Not just that it really only uses Apache’s authentication mechanism, which leads to massive disconnection, but more importantly, that it doesn’t really keep track of a user’s information, other than their name. This could lead to an issue in the future. For example, if you wanted to change a user’s login, you have to go in and munge the database to relink things to the right person, as opposed to having the normal layer of indirection. They are working on this though, so I’ll cut them some slack.
No thoughts
After being pointed at it by Jeffrey, I took a look at the Trac project management system. Basically it integrates a few different things together:
- Wiki functionality
- Basic ticketing system
- Some milestone management
- Introspection into Subversion
It’s still early in its life, from my view, but it’s showing a lot of promise in integrating a lot of different things together. One of the things that I’m curious about is how it might handle inter-project ticketing and linking. That would be interesting. Especially if you want to make sub-projects, let’s say, for customer-specific problems, but then want to promote the information to the core project.
The only thing I’m missing is some of the Wiki formatting that comes with MoinMoin, but I suspect I can add some of it, or just introduce some of the MoinMoin engine into Trac. Either way, nice looking system for a start.
One thing that I think is ultra-powerful is the ability to use some of the linking syntax in Subversion commit messages. That allows you to link back to the ticket it was involved in easily. Now if it was easy to find all Subversion commits that related to a specific ticket… maybe you can, but I’ve not figured it out yet.
It’s written in Python, by the way.
No thoughts