A response from the Googleplex
Some of my earlier comments did not sit well with Harold Davis. He writes in his blog, Googleplex:
An item (Delusions of Community) in the otherwise apparently unattributed Penseiri di un lunatico minore (“Thoughts of a minor lunatic”) blog attacks my blog entry Publish the PageRank Algorithm Now! for things I said, things I didn’t say, and also contributes ad hominem personal attacks on me to this discussion.
Well, the lack of attribution isn’t intentional, as the site is in transition, and 5 minutes on a search engine would turn up who I am. I’ve not hidden from anyone. As for the ad hominem attacks, I’m not sure what could be construed that way other than my “squeel” (typoed alas, as was carefully pointed out by Mr. Davis—I don’t proof most of my posts) comment, which I have removed as it was inappropriate.
Other than that, I simply attack his thought process, which is most certainly not ad hominem in any rational definition that I can find, nor is my observation of the oddity of someone demanding a feedback loop from Google who does not himself provide one. The argument that it is impossible to have trackbacks or comments is falicious as there are many systems (such as WordPress) which allow for comment moderation.
Now, on to the more substantive concerns. Mr. Davis begins his discussion of those issues with the observation:
In my original post I clearly noted “It’s probably unreasonable to expect Google to publish how PageRank really works in light of competition from other search engines, and the efforts of SEO Webmasters to game the system.”
While this is true, he then spends the rest of the post arguing that Google should still release it, which is, to me, a curious polarity of positions. Thereby I take the majority of his words to be the underlying desire which I believe, when combined with the title of his post, to be a reasonable position.
Well, no, I don’t believe that everything benefits from openness. (I never said I did.) I just believe that the mechanisms behind forces that have a huge impact on our lives should be transparent. We should be able to verify the results of elections, and we should understand (at least roughly) how Internet search orders its results.
The comparison of elections, which determine leadership of a nation, with a search engine is a comparison that is a bit absurd. There is competition in the market, and I believe that so long as Google is not an abusive monopoly, the “control of ideas” concept is not a major risk. While Google is a convenience, it is not the only way to find things on the Internet, and may not represent the majority of the way memes spread, which would reduce its importance from a social perspective. As it is late, I don’t want to get into the construction of social perspective and the way things are framed, however, I believe Google represents a minimal impact on the framing of social views versus an election, and thereby the comparison is invalid.
I distrust authority even when it is as benign as Google, and I am always mindful of Lord Acton’s dictum about absolute power corrupting absolutely.
And Google is not “absolute power,” by any definition I can imagine. They represent perhaps 50% of the market, and like many predecessors, can be knocked off the top. They are not a political entity, which is what Lord Acton was discussing.
Yes, I agree with the minor lunatic that I’d rather have smart people who are naturally interested in a field working on it, than everybody under the sun regardless of proclivity or talent. That said, it is my absolute conviction that more transparency, and more community involvement, would benefit the formulation of Google’s search algorithm.
Why? That’s all I want to know. What makes Mr. Davis so sure that this is true? Thousands of eyes haven’t made OpenOffice, Linux, or any other project less of a disaster from a UI perspective. People solve problems, if they can, that cause them pain. The majority of people who will be interested in the algorithm are competitors and people attempting to game the system. This is not Wikipedia, where shallow contributions are helpful, but instead a deep problem that is not comprehensible without a huge amount of back-data and raw data to work on—neither of which are generally available to anyone who isn’t already working with search.
It’s nice to hold an ideological position, but any that demonstrates inflexibility is likely to be unsustainable. The droning of the blogosphere that “information wants to be free” is an absurdity. Information doesn’t want anything, it simply is. People don’t want to pay for information, and others want to charge—the fact that the digital age has made the legal and illegal distribution of bits easier doesn’t change the natural order of things. RMS may ramble on endlessly (and having met him, trust me on this one) about freedom as a social and philosophical construct, but make no mistake that a vast majority of people who “support” open source are more interested in the “free as in beer” perspective. That includes me sometimes.
Finally, as for things that I said he said that he did not, I would appreciate examples, as I quoted from his own article.
This entry was posted at 12:00 am on 20 April 2005 and is filed under Technology. You can follow any responses to this entry through the post-specific RSS 2.0 feed.
“Thousands of eyes haven’t made OpenOffice, Linux, or any other project less of a disaster from a UI perspective.”
But is it UI we’re talking about here? As far as I know, UI (the area in which non-Firefox open source fares most poorly) and a data-crunching algorithm are unrelated.
As far as I am concerned, this is a much more reasoned comment than your first one (although I still wish you would sign your name as I don’t want to spend the time looking you up, and I don’t want to refer to someone I am having a dialog with as “il minore” whatever).
The primary thing you said I said that I didn’t (and that in fact I don’t believe) is that everything should be open. I do not believe this, and never said I did. Some things should, and some things shouldn’t—although I think Linux is a case in point of something that has clearly benefited from being open.
It’s both a blessing and a curse to see both sides of an issue. The reason for the “polarity” of my position is, of course, I see the problems with any kind of disclosure of PageRank. Bearing in mind these problems, and the unlikeliness of it ever being disclosed, here are the reasons I think at least some more community discourse regarding the precise nature of PageRank would be helpful:
(1) PageRank, based on my searches, is not working as well as it used to. My impression is that the rate of deterioration is increasing. So it is not the case of “if it isn’t broken, don’t fix it.” Rather, it is this isn’t working, and Google is playing catchup to try to make it work, kludging together something with 100 variables (!). The elegant simplicity of the PageRank concept has clearly been lost.
(2) The time delay built into newer iterations of the Google model really bugs me. I like my information fresh! And as someone who is frequently putting up web sites, I like to be able to get them picked up fast without resorting to chicanery myself.
(3) In fact, Google is the predominant way people find information on the web. Anyone who thinks this is not very mportant to people, politics, and life is naive. And, Google itself is more of a community effort than may be apparent. Case in point: Google uses the community-run Open Directory Project for major taxonomic information.
(4) It’s bad when Microsoft is heavy-handed and secretive, but OK when Google is? Come on, Googlers may be the good guys, but let’s hold them to the same standards as everyone else.
(5) No, I do not believe Google has hired all the smart people with something to contribute to search. What baloney! Sometimes the best ideas in fact do come from outside the box.
I’ll deal with your issues as they are numbered.
1) The PageRank’s failure is not the result of Google, but the result of the gaming of the system that is now endemic to the destruction of the commons. Unfortunately, this seems to be growing as fast as online usage does, if not faster. Casual statistics from a friend at one of the top 3 backbone providers says that nearly 1/4 of traffic is now spamish. Certainly my experience is that 90%+ of email is now spam. Advanced algorithms to defeat it have more and more variables as the opponent becomes more skilled. There is no reason to believe this will end without radical sociopolitical and legal changes, or that the algorithm will become simpler in any near term.
2) The delay is largely a result of the gaming of the system. This is one way to delay the visibility of things that are being gamed. In fact, I would expect it to be a non-deterministic behavior pattern, rather than some constant. As an example, the move of my web site showed up in Google somewhere between 24 and 36 hours after happening. This involved moving a lot of URLs. This is an acceptable delay. While we might want instant gratification, it is unlikely to occur given the architectural model in use and the necessity in addressing the systemic behavioral problems.
3) Yahoo used to be the predominant way, then Alta Vista, and now Google. I have no doubt that one day it will be someone else. If I am to sweat the control of public discourse, the airwaves of television and radio are orders of magnitude more dominant than Google. Alas, those of us in the industry often forget our perspective place in the world.
4) No, heavy handed is heavy handed. The reasoning and behavior are important. I distrust Microsoft, but I don’t believe for 1 minute that their have any obligation to release their source code or algorithms to the public, or that the forcing of such an action is anything but theft of property. APIs are different, and Microsoft has a mixed history of those. Google is moving towards more flexibility with APIs and such, and I think this demonstrates a lot of their flexibility and cooperativeness.
5) If I had a revolutionary idea in search, I would not hand it over to Google for free. Also, by viewing their designs, unless they were released with zero restrictions (a fantasy world), it would contaminate your own IP and thereby make ownership a more difficult thing to determine.
In the end, I’ve yet to see a real argument FOR Google releasing it, from THEIR perspective. You may want it, but as with my own opinions, who cares? Does it serve Google’s goals (do no evil, profit, etc)? I can see no indication that it does.
Just a few more comments and then we’re going to have to agree to disagree on some of these points (at least as far as I am concerned).
I don’t really have a problem with a delay of 24-36 hours (as you describe in moving your site). I don’t know what this move entailed, but my own experience with new sites is a great deal longer delay which does bother me very much.
As someone who interacts with a variety of Google programs, I do have a problem with Google’s opacity (although I love much of Google’s technology and appreciate a great deal about the company). I’d guess this is the only publicly traded US corporation that inspires the kind of blind loyalty and trust that your comments indicate.
I can not speak to “new sites,” as my site has been up since before Google existed, however, I suspect it’s more the need to garner that “first link” to you to find you. Perhaps Google should make that easier to do, but then we step into the mess that is the gaming of the system. I suspect that their system is adaptive to the change profile of a site, crawling sites that change a lot more often.
As for the “blind loyalty,” we often confuse disagreement on fundamental principles for blindness.
Both comments and pings are currently closed.
Right on.