<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Mimeographs from the Future]]></title>
  <link href="http://blog.amber.org/atom.xml" rel="self"/>
  <link href="http://blog.amber.org/"/>
  <updated>2013-05-08T17:07:58-04:00</updated>
  <id>http://blog.amber.org/</id>
  <author>
    <name><![CDATA[Christopher Petrilli]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Scaling FedRAMP]]></title>
    <link href="http://blog.amber.org/blog/2013/05/08/scaling-fedramp/"/>
    <updated>2013-05-08T08:45:00-04:00</updated>
    <id>http://blog.amber.org/blog/2013/05/08/scaling-fedramp</id>
    <content type="html"><![CDATA[<p><a href="http://www.fedramp.gov/">FedRAMP</a> was created to provide &#8220;a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services.&#8221;.
To an outsider, this sounds like a whole bunch of nonsense, but if you&#8217;ve ever had to deal with the <a href="http://csrc.nist.gov/groups/SMA/fisma/Risk-Management-Framework/">NIST Risk Management Framework</a>, at least as it is always implemented by government agencies, you&#8217;ll understand how absolutely critical it is that the approach be standardized.
As a geek, nerd, and overall general enemy of paperwork, I have to start by simply saying that conceptually, I don&#8217;t have a problem with the Risk Management Framework (RMF).
If you sit down and talk to Dr. Ross, the man behind the curtain, you&#8217;ll see that he&#8217;s generally a very reasonable person, and his goals are simply to provide a conceptual framework for the Federal government to understand and assess the risk of its systems.</p>

<p>The reality, unfortunately, is much darker.
Instead of using the RMF, and its <a href="http://csrc.nist.gov/publications/PubsSPs.html">accompanying standards</a> as the framework they are intended to be, they are instead generally treated as a veritable gospel that can never be questioned, thought about, reasoned about, or otherwise adapted to the situation at hand.
This creates a situation that inverts the incentives and often creates systems that have much lower actual security, and substantially increased risk, but have lots of paperwork to get the approval.</p>

<p>Now that I&#8217;ve taken some organizations through the FedRAMP process, I can say that it is an improvement.
It is more risk-focused, and more interested in being a collaborative effort to actually identify risk and address them wherever possible.
It still suffers from a serious paperwork overhead, however, and worse, there are some conceptual gaps within it that do not address the neads of large cloud providers.
I&#8217;m going to try and address some of the ideas that I think need to be tackled within FedRAMP to succeed with the likes of Amazon, Windows Azure, Google, Rackspace, etc.
Without these changes, or something more effective even, I believe that FedRAMP, for all its admirable goals, will wither and die.</p>

<p>In follow-on articles, I&#8217;m going to cover some ideas for how to scale FedRAMP to both larger and, where I can, smaller cloud service providers.
Most of my experience is with the gorillas in the yard, so it will focus on that, but I&#8217;d like to see it made more flexible for the organizations just starting, especially when they&#8217;re in the SaaS/PaaS space.</p>

<p>The topics I intend to cover are:</p>

<ul>
<li>Issues of technical scale &#8211; How do you scale FedRAMP to deal with the issues faced by the likes of Amazon, Google, Microsoft, Rackspace, et. al.?</li>
<li>DevOps v. The Paperwork Monster &#8211; Specifically how do you deal with the enormous velocity exhibited by most cloud providers?  The RMF isn&#8217;t really used to coping with this rate of change. No calculus, I promise.</li>
<li>Importance of automation &#8211; Spot checks of technical controls are fine, but the key is all inside the automation, which is rarely covered by many 3PAO.</li>
</ul>


<p>If anyone has any other ideas they&#8217;d like to see addressed, I&#8217;d be happy to delve into them.  Just as a note for my qualifications, I&#8217;m the quality manager for a major 3PAO, and technical lead on large-scale cloud projects.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Downside of the Internet]]></title>
    <link href="http://blog.amber.org/blog/2013/05/01/the-downside-of-the-internet/"/>
    <updated>2013-05-01T10:32:00-04:00</updated>
    <id>http://blog.amber.org/blog/2013/05/01/the-downside-of-the-internet</id>
    <content type="html"><![CDATA[<p>A <a href="http://www.thedailybeast.com/articles/2013/04/27/false-flags-sharia-law-and-gun-grabs-gop-lawmakers-embrace-the-crazy.html">recent article</a> in San Francisco&#8217;s <cite>The Daily Beast</cite> came with the following quote that got me thinking:</p>

<blockquote>The fact that conspiracy theories are percolating up to local party leaders and even the halls of Congress should be a warning sign for the GOP. As the faithful know, you reap what you sow, and the steady diet of hyperpartisan media has seeded these conspiracy theories in the minds of party activists to the extent that they are starting to shape policy debates. The embarrassing incidents are evidence of a larger problem that needs to be confronted: when you do not condemn the use of hate and fear to serve as a recruiting tool against your political opponents, the ability to reason together is undermined and self-government is compromised. There is a cost to condoning extremism when it seems to benefit “your team.”</blockquote>


<p>Even <cite>The Economist</cite> has <a href="http://www.economist.com/blogs/democracyinamerica/2012/06/georgia-and-united-nations">weighed in</a> on the lunacy of the issue.</p>

<p>As much as it pains me to say, I think the Internet has amplified the crazy by allowing the lunatics among us to connect with each other and reinforce their belief system.
At some level, it is the basic &#8220;walled garden&#8221; cognitive bias, where people associate only with people who reinforce their belief system, thereby removing all doubt and thought from the process.
The Internet has simply allowed us to define our garden even more narrowly than ever before, and while this has benefited small groups that often felt isolated &#8211; for example, people with a rare disease &#8211; it has also increased the &#8220;echo chamber&#8221; effect.
From outside the wall, the system seems obviously broken, but from within, all is in harmony because any element that might undermine that harmony has been removed.</p>

<p>Combine the dominant confirmation biases exhibited by most people &#8211; and part of the reason people misunderstand science &#8211; with things like the gambler&#8217;s fallacy and herd instinct, and you have an environment ripe for exploitation.
50 years ago, the lunatics were yelling on the street corner, and were largely ignored by society.
The Internet gave them the ability to find their own &#8220;kind&#8221;, and reinforce their beliefs, thereby creating a more monolithic social structure that can be used to garner perceived rationality.</p>

<p>Or at least, that&#8217;s my view.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[6 Months with the Hooniganmobile]]></title>
    <link href="http://blog.amber.org/blog/2013/04/30/6-months-with-the-hooniganmobile/"/>
    <updated>2013-04-30T19:47:00-04:00</updated>
    <id>http://blog.amber.org/blog/2013/04/30/6-months-with-the-hooniganmobile</id>
    <content type="html"><![CDATA[<p>In July of last year, I special ordered a <a href="http://www.ford.com/cars/focus/focusst/">Ford Focus ST</a>.
At the time, it had just been announced, and I hadn&#8217;t even driven one.
Instead, my decision was based on the reputation of the previous generation of Focus ST and RS, and the quality of the current <a href="https://en.wikipedia.org/wiki/Ford_Focus_(third_generation">third generation Focus</a>).
Getting an order in was, shall we say, less than easy, as most dealers weren&#8217;t used to pre-ordering things, and I often knew much more about the process than they did.
Still, in November of late year, I took delivery of an Oxford White 2013 Focus ST.</p>

<p><img src="http://assets.amber.org/photos/2013_focus_st_shenandoah.jpg" title="'2013 Focus ST'" ></p>

<p>So, after six months with the car, I decided that it&#8217;s about time to put together my thoughts.
The good and the bad, though the good far outweigh the bad.</p>

<h2>The Bad</h2>

<p>I have to start with the dreadful <a href="http://www.ford.com/technology/sync/">MyFord Touch</a> system, &#8220;designed&#8221; by Microsoft.
It&#8217;s not the worst user interface I&#8217;ve ever seen, but it is slow, sometimes oddly unresponsive, and periodically requires some kind of maintenance reboot, sometimes in the middle of a drive.
Enough has been written about the system that I don&#8217;t need to go into it more here; suffice it to say that it&#8217;s not Microsoft&#8217;s finest hour.
Fortunately, as a small redeemly grace, there has been a code release that was downloadable, and I was able to upgrade the software.
The upgrade increased performance a little bit, and generally made the system more stable.
Unfortunately, the system &#8211; even though it&#8217;s capable of connecting to a WiFi network in your house &#8211; is unable to download its own updates directly, and instead you have to download them, stick them on a USB drive and sit in the car for the entire time with the engine running while it does an update, which can take an hour.</p>

<p>The next three issues are all tied together.
Rear-facing visibility is sub-optimal.
It&#8217;s certainly worse than my Mazda 3, and comes with some serious blind spots.
Unfortunately, unlike the Titanium model of the Focus, you can&#8217;t get the ST with either ultrasonic parking sensors, or a back-up camera.
This is a gigantic, glaring oversight.</p>

<p>The last big issue is the dealer.
No matter how much the domestic car makes improve their product &#8211; and they have made gigantic strides &#8211; the dealers are still sub-par at best.
They fail on sales; they fail on service; and they fail in general communication.</p>

<h2>The Good</h2>

<p>As I said, the good far outweight the bad for me. For the good, I&#8217;m going to just hit them in bullet points:</p>

<ul>
<li>Handling is Teutonic. The handling is more like that of a BMW than anything you&#8217;ve ever run into with a domestic car before. The limits are very high, with the car pulling close to 1G on the skid pad. That&#8217;s serious sports-car territory.</li>
<li>Oversteer. Yes, Virginia, you can hang the tail out of a front-wheel-drive car. The tuning of the stability system, and the chassis as a whole mean that you can create lift-off oversteer on demand if you put the suspension in sport mode.</li>
<li>Brakes are excellent. They&#8217;re not fancy Brembo brakes, but even a day in the mountains doesn&#8217;t cause any fade, and the initial bite is excellent. Pedal feel is outstanding. They are comparable to my Infinti G35 w/Brembo brakes.</li>
<li>Engine is a locomotive. 270ft-lbs of torque from just off idle, and it just
pulls. The performance curve is not that of a turbo-charged 4 cylinder, but more like a straight 6, or even a nice old-school V8.  There is minimal turbo lag.</li>
<li>Sounds great. Unlike most &#8220;great sounding cars&#8221;, though, it&#8217;s all intake noise coming through a special Sound Symposer system. It&#8217;s a butterfly valve that lets some of the intake noise into the cabin, but only when you step on it. Otherwise, nothing.</li>
<li>Sleeper and a car for hooning around. I don&#8217;t know how else to describe it other than Jekkyl and Hyde. Keep your foot out of the &#8220;go fast&#8221; pedal, and the car is a quiet, well mannered, cruiser. Step on it, and you can do all sorts of things that your parents would never approve of. For me, this is the greatest asset of the car.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Waiting with Bated Breath]]></title>
    <link href="http://blog.amber.org/blog/2013/03/15/waiting-with-bated-breath/"/>
    <updated>2013-03-15T11:04:00-04:00</updated>
    <id>http://blog.amber.org/blog/2013/03/15/waiting-with-bated-breath</id>
    <content type="html"><![CDATA[<p>My first digital camera was an <a href="http://en.wikipedia.org/wiki/Apple_QuickTake">Apple Quicktake 200</a> that I bought in 1996.
It produced images in glorious 640x480 resolution, with a level of oversaturation that would make an 80s music video blush.
Soon after, I switched to another camera, a Canon point-and-shoot, but that also didn&#8217;t really provide satisfactory results.
In late 2005, I bought a <a href="http://en.wikipedia.org/wiki/Nikon_D50">Nikon D50</a> digital SLR (DSLR) camera to get back into photography.
In purchasing the DSLR, I had looked at other cameras, and there simply had been nothing else that produced an image of any quality that could be used to make prints at 8x10 or above.
It wasn&#8217;t just an issue of the sensor &#8211; the megapixel war had started already &#8211; but of the glass in front of the sensor, or plastic in some cases.</p>

<p>For a couple years, I used it extensively and took thousands of photos, but soon it became a bit of a hassle to keep with me.
The DSLR is a bulky camera structure, and when you go to take a picture, it can intimidate your subject if you&#8217;re trying to take a picture of people.
It&#8217;s a &#8220;serious camera&#8221;, and comes with all the problems associated with it.</p>

<p>So, a couple years later, I bought a <a href="http://www.kenrockwell.com/canon/s90.htm">Canon Powershot S90</a>.
While it might <em>look</em> like a simple point-and-shoot, but it was the first of a new generation of serious small cameras.
It had good glass (f/2 at its fastest), a lower resolution sensor that had very low noise levels and good low-light performance, and the ability to operate in full manual with the creation of RAW images.
The camera was pocketable in pants, and I took this camera with me all over the world, taking some amazing pictures with it.
There&#8217;s 16x20 enlargements in my house from this camera that people have commented on repeatedly.
Just amazing.</p>

<p>But, it&#8217;s time for somethng &#8220;new&#8221;, and when a good friend bought a Fujifilm <a href="http://kenrockwell.com/fuji/x-pro1.htm">X-Pro1</a>, and started showing me the amazing work he was getting out of it, I started looking at the Fujifilm line of cameras.
One of the things I learned was that I actually mainly used a 35-50mm effective focal length on my lenses.
There is something freeing about not worrying about the lens.
My Nikon D50 has had a f/1.8 35mm lens (50mm effective) on it most of its life, and the zooms stay in the bag.</p>

<p>Then, miracle of miracles, Fujifilm announced the <a href="http://fujifilm-x.com/x100s/en/">X100S</a>, an update to one of the most well reguarded, and troubled camera released.
After a few days of research and thought, I decided to buy one from my favorite camera shop in NYC (B&amp;H), and now the wait comes.
The hardest part is seeing reviews of the new X100S <a href="http://www.fujirumors.com/x100s-vs-x100/">start</a> to <a href="http://www.briankraft.com/Blog/personal/fuji-x100s-pros-and-cons/">be released</a>, and how amazing the camera is looking.</p>

<p><img src="http://www.fujifilm.com/products/digital_cameras/x/fujifilm_x100s/product_views/img/index/pic_additional_01.jpg"></p>

<p>Now, where&#8217;s mine?</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Over-Engineering a flashlight]]></title>
    <link href="http://blog.amber.org/blog/2013/03/06/over-engineering-a-flashlight/"/>
    <updated>2013-03-06T15:01:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/03/06/over-engineering-a-flashlight</id>
    <content type="html"><![CDATA[<p>What feels like almost 2 years ago, <a href="http://www.hexbright.com/">I backed a programmable flashlight</a> on <a href="http://www.kickstarter.com/projects/527051507/hexbright-an-open-source-light">Kickstarter</a>.
While it took <em>substantially</em> longer than intended for it to get through the manufacturing process, I actually am quite happy to report that I received it in the mail a few weeks ago.  Here it is:</p>

<p><img src="http://assets.amber.org/p/hexbright/hexbright.jpg"></p>

<p>It is perhaps the single most over-engineered flashlight in existance.
The body is CNC machined aircraft-grade aluminum that&#8217;s been hard annodized.
But, that&#8217;s not what makes it the most over-engineered flashlight, this is:</p>

<p><img src="http://assets.amber.org/p/hexbright/hexbright-internals.jpg"></p>

<p>Yes, that&#8217;s a complete circuit board with microcontroller on it.
Not just <em>any</em> microcontroller, though, but an <a href="http://arduino.cc">Arduino</a> compatible one.
This means that you can program the flashlight just like an Arduino, using the same simple coding environment you&#8217;re used to.
Then, you just plug in your USB cable, and away you go.</p>

<p><img src="http://assets.amber.org/p/hexbright/hexbright-usb.jpg"></p>

<p>Is it overkill?
Is a rechargable flashlight with a three-axis accelerometer absurd?
Yes, yes it is.
And that&#8217;s why I love it.
It also happens to be the best flashlight I&#8217;ve ever seen, with a CREE XM-L U2 LED that puts out 500 lumens, and a beautiful total internal refraction lens.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Big data as a haystack]]></title>
    <link href="http://blog.amber.org/blog/2013/02/21/big-data-as-a-haystack/"/>
    <updated>2013-02-21T15:05:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/02/21/big-data-as-a-haystack</id>
    <content type="html"><![CDATA[<p>Yesterday&#8217;s video by <a href="http://redmonk.com/jgovernor/2012/09/27/how-to-not-define-big-data/">James Governor</a> about &#8220;how not to define big data&#8221; got me to thinking, as so much of James&#8217; writing does.
First, go watch the video:</p>

<iframe width="560" height="315" src="http://www.youtube.com/embed/O1l0HiKY3tA?rel=0" frameborder="0" allowfullscreen></iframe>


<p>People often talk about big data as though it is a measurable quantity, something quantitative.
And it is, but that&#8217;s inadequate to understand the different nature of it.
For me, the more important aspect is qualitative.
Big data isn&#8217;t just about the number of gigabytes that your system deals with, but instead about the underlying nature of that data.
Take, for example, traditional account data, where we might store orders, and shipping information.
This data has a high signal-to-noise ratio.
If, instead, we look at things like website logs, telemetry from embedded devices, or even a stream of tweets, we&#8217;re talking about what is traditionally a <em>very low</em> signal-to-noise ratio.</p>

<p>Big data, then, isn&#8217;t just about the size of your haystack, but instead finding the needle hidden within.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Review: Two Scoops of Django]]></title>
    <link href="http://blog.amber.org/blog/2013/02/19/review-two-scoops-of-django/"/>
    <updated>2013-02-19T16:01:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/02/19/review-two-scoops-of-django</id>
    <content type="html"><![CDATA[<p>A few weeks ago, <a href="http://audreymroy.com/">Audrey Roy</a> and <a href="http://pydanny.com/">Daniel Greenfield</a> <a href="https://django.2scoops.org/">released the beta of their new book</a>, <cite>Two Scoops of Django: Django Best Practices for Django 1.5</cite>.
Being a fan of Danny and Audrey&#8217;s work, I obviously popped the $12 for it.
I read it in about a couple hours on the plane the next day.</p>

<p>Weighing in at approximately 200 pages, or approximately 1.52x10E-27 grams (<a href="http://www.scottkurowski.com/massbit/index.htm">thanks information theory</a>) on my iPad, there is little fluff in the book, and that is a great thing.
I&#8217;ve been using Django since v0.96 came out in early 2007, and use it for a majority of my web work, and yet the book contained a lot of interesting ideas and insight.
The result of reading the book was a TODO list with many items on it for all my projects to address a lot of issues that I&#8217;d not really thought about.</p>

<p>What more do you want from a book than for it to make you rethink how you approach solving problems? Go buy it.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The neverending story]]></title>
    <link href="http://blog.amber.org/blog/2013/02/19/the-neverending-story/"/>
    <updated>2013-02-19T15:40:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/02/19/the-neverending-story</id>
    <content type="html"><![CDATA[<p>Trying to catch up with my backlog of papers had been put on hold for a little bit as I tried to clean up a mess at work.
Perhaps more on that later.
Still, it seems like every paper I read somehow triggers a cascading avalanche of additional reading material, which means that the backlog never shrinks.</p>

<ul>
<li><cite>Kafka: a Distributed Messaging System for Log Processing</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.233.1726">CiteSeerX</a>)</li>
<li><cite>Paxos Made Moderately Complex</cite> (<a href="http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf">Cornell PDF</a>)</li>
<li><cite>Pregel: A System for Large-Scale Graph Processing</cite> (<a href="http://kowshik.github.com/JPregel/pregel_paper.pdf">Github PDF</a>)</li>
<li><cite>Nonlinear Time-Series Prediction with Missing and Noisy Data</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.1128">CiteSeerX</a>)</li>
<li><cite>Bayesian Time Series: Models and Computations for the Analysis of Time Series in the Physical Sciences </cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.3587">CiteSeerX</a>)</li>
<li><cite>Probabilistic Similarity Search for Uncertain Time Series</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.8831">CiteSeerX</a>)</li>
<li><cite>Processing a Trillion Cells per Mouse Click</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.258.9554">CiteSeerX</a>)</li>
<li><cite>Only Aggressive Elephants are Fast Elephants</cite> (<a href="http://arxiv.org/abs/1208.0287">arXiv</a>)</li>
<li><cite>Uncertain Time-Series Similarity: Return to the Basics</cite> (<a href="http://arxiv.org/abs/1208.1931">arXiv</a>)</li>
<li><cite>Statistical Distortion: Consequences of Data Cleaning</cite> (<a href="http://arxiv.org/abs/1208.1932">arXiv</a>)</li>
</ul>


<p>The last few are all from <a href="http://www.vldb2012.org/">VLDB 2012</a>, and I have another dozen or so papers from the same conference that I want to work my way through.
Looking at these, you can see that a lot of it is an attempt to deal with streams of data, specifically in real-time, as best as possible.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Reading list for today]]></title>
    <link href="http://blog.amber.org/blog/2013/01/14/reading-list-for-today/"/>
    <updated>2013-01-14T21:45:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/01/14/reading-list-for-today</id>
    <content type="html"><![CDATA[<p>Another day, another few papers down:</p>

<ul>
<li><cite>Disco: Running Commodity Operating Systems on Scalable Multiprocessors</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.6674">CiteSeerX</a>)</li>
<li><cite>End-to-end Arguments in Systems Design</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.4167">CiteSeerX</a>)</li>
<li><cite>Weighted Voting for Replicated Data</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6256">CiteSeerX</a>)</li>
<li><cite>A Glossary of Time Granularity Concepts</cite> (<a href="http://www.cs.arizona.edu/~rts/pubs/LNCS1399p406.pdf">PDF</a>)</li>
<li><cite>An Access Control Model Supporting Periodicity Constraints and Temporal Reasoning</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9748">CiteSeerX</a>)</li>
<li><cite>SharedDB: Killing One Thousand Queries With One Stone</cite> (<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.227.4609">CiteSeerX</a>)</li>
</ul>


<p>Some of these are quite old.
For example, David Gifford&#8217;s paper on weighted voting was published in 1979, but it set forth the beginning of weighted r+w quorums that are now quite common.
I&#8217;m pretty sure I had read it before, but sometimes older papers just need to be re-read to make sure that no ideas were missed.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Back in the paper saddle]]></title>
    <link href="http://blog.amber.org/blog/2013/01/11/back-in-the-paper-saddle/"/>
    <updated>2013-01-11T23:17:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/01/11/back-in-the-paper-saddle</id>
    <content type="html"><![CDATA[<p>Ever since I was in college, and would sneak into the AI department&#8217;s area to grab copies of their papers and technical reports, I&#8217;ve been a voracious reader of academic research.
Too often in the &#8220;go go go&#8221; commercial world, we lose our perspective of work that is being done, and especially of the many decades of research upon which all our toys are built.
That&#8217;s not to say that there aren&#8217;t plenty of papers and such from Google, Amazon, et. al., but I actually include many of those in the same academic realm as I would something from Stanford or MIT.</p>

<p>Anyway, for various reasons too tedious to go into, I&#8217;ve allowed my inbox (also known as <a href="http://dropbox.com/">Dropbox</a>) to accumulate over a hundred papers that I intended to read, but haven&#8217;t found time to yet.
That doesn&#8217;t begin to include all the amazing blog articles, etc., that accumulate in <a href="http://instapaper.com">Instapaper</a> at all times.
The Internet may be an amazing thing, but it also is a source of unlimited future reading.
So, starting this year, and today to be exact, I&#8217;ve decided to try and put time aside every day to read a few of the things I&#8217;ve accumulated and try and slowly work down my backlog.
I was asked by a friend on Twitter to keep track of what I&#8217;m reading, and so, this is the start.</p>

<ul>
<li><cite>Hancock: A language for analyzing transactional data streams</cite>
(<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.3721">CiteSeerX</a>).
A DSL for performing some relatively basic stream processing on large amounts of &#8220;sensor data&#8221;, in this case primarily telephone calls.
Interesting ideas:
1) persistence mechanism that mirrors UNIX sensabilities with directories as containers;
2) view representation for abstracting data requirements over time, namely exact versus approximate representations</li>
<li><cite>Crash-only Software</cite>
(<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.9953">CiteSeerX</a>).
What if we just gave up and quit trying to recover from errors?
Sometimes it&#8217;s faster to just crash the system and reboot.
Came with a couple interesting papers I want to read, namely <cite><a href="https://www.usenix.org/conference/usits-03/decoupled-storage-free-replicas">Decoupled storage: Free the replicas!</a></cite> and <cite><a href="http://research.microsoft.com/apps/pubs/default.aspx?id=74713">Session State: Beyond Soft State</a></cite></li>
<li><cite>PASSing the provenance challenge</cite>
(<a href="http://www.eecs.harvard.edu/~syrah/node/201">Harvard</a>).
Integrating <a href="http://en.wikipedia.org/wiki/Provenance#Data_provenance">data provenance</a> into the Linux operating system.</li>
<li><cite>CryptDB: Protecting Confidentiality with Encrpyted Query Processing</cite>
(<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.1498">CiteSeerX</a>).
A new approach to encryption in the database that seems very different than anything I&#8217;ve seen in the commercial sector yet.
One of the things I like is that it is adaptive and blends different approaches to encryption &#8211; including <a href="http://en.wikipedia.org/wiki/Homomorphic_encryption">homomorphic encryption</a> where appropriate &#8211; to obtain maximum functionality with minimized risk.
Definiately want to look at this to play with, and it seems to currently work with MySQL and PostgreSQL with varying degrees.</li>
</ul>


<p>A final note, and something that continues to amaze me in 2012: if your paper is not available <em>for free</em> to read, then why are you publishing your paper?
I continually run into annoying pay walls &#8211; ACM and IEEE, I&#8217;m looking at you &#8211; that do nothing but impede research and progress.
Now, usually you can Google around and find the paper hosted somewhere, but if you are an academic and your profession is (supposedly) about the progress of human knowledge, how can you subscribe to this kind of walling off of knowledge?</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Review: The New Kingmakers]]></title>
    <link href="http://blog.amber.org/blog/2013/01/11/review-the-new-kingmakers/"/>
    <updated>2013-01-11T15:31:00-05:00</updated>
    <id>http://blog.amber.org/blog/2013/01/11/review-the-new-kingmakers</id>
    <content type="html"><![CDATA[<p>For years, Stephen O&#8217;Grady, and his partner in crime and business, James Governor, have been telling anyone who will listen that the IT business is, and has been, undergoing a massive shift in power from the top-down of old to a new &#8211; and sometimes terrifying &#8211; bottom-up perspective.
These thoughts have been collected on <a href="https://twitter.com/sogrady">Twitter</a>, his <a href="http://redmonk.com/sogrady/">blog</a>, and at every conference he spoke at and attended, including the excellent <a href="http://monktoberfest.com/">Monktoberfest</a>.
For those who haven&#8217;t had the opportunity to immerse themselves in the hoppy goodness of their leadership, Stephen&#8217;s put a great compilation of those thoughts and themes together in a new book: <a href="http://thenewkingmakers.com/">The New Kingmakers</a>.</p>

<p>First, it&#8217;s necessary to say that this isn&#8217;t some Alfred North Whitehead tome of inscrutable kōans.
Instead, it is written in a breezy and approachable style that masks the powerful themes contained; themes that are terrifying if you sit atop the old order.
As Mary Poppins says, a spoonful of sugar helps the medicine go down, and in this case the medicine is one that will empower entire new industries and bring others to their knees.
The entire book revolves around the idea that developers &#8211; and not managers and executives &#8211; are the power brokers in the IT industry, and to a lesser extent that all things are IT.
For those of us on the inside of this transformation, this seems a somewhat obvious observation in retrospect, but when Stephen first offered this idea in 2010 &#8211; akin to Tim O&#8217;Reilly&#8217;s early observation of the power of the alpha geeks in 2002 &#8211; this wasn&#8217;t as clear as it is now.
The near-alchemic combination of open source, the Internet, and its bastard love-child cloud computing, have rewritten the rules for everyone, whether they know it or not.</p>

<p>Do yourself a favor, take an hour out of your life to read Stephen&#8217;s mini-book, and then spend many more hours thinking about what it all means to your world.
Like all good writing, it should leave you with more questions than you had when you started.
Use those questions to re-evaluate and re-shape your own future.</p>

<p>Oh, and did I mention that it&#8217;s free?
Thanks to the support of the team at <a href="http://www.newrelic.com">New Relic</a>, Stephen was able to get the book <a href="http://redmonk.com/sogrady/2013/01/09/the-new-kingmakers-the-book/">edited and published for free by O&#8217;Reilly</a>.
It really is a different world.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[AWS Glacier speculation]]></title>
    <link href="http://blog.amber.org/blog/2012/08/24/aws-glacier/"/>
    <updated>2012-08-24T20:36:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/08/24/aws-glacier</id>
    <content type="html"><![CDATA[<p>Let me preface this by say that I know absolutely <em>nothing</em> about exactly what Amazon has done with their new <a href="http://aws.amazon.com/glacier/">Glacier</a> service.In short, the product offers near-line storage of an unlimited amount of data for $0.01/gigabyte. Many people have looked at this and wondered what technology is backing it up, and while I don&#8217;t have any inside information what-so-ever, I do have some knowledge around large-scale storage systems and how you would approach delivering this.</p>

<p>To begin, let&#8217;s take a look at some of the assumptions we have to meet:</p>

<ol>
<li>$0.01 per gigabyte per month</li>
<li>99.999999999% durability, which is not to be confused with availability.</li>
<li>3-5 hour typical retrieval times</li>
<li>&#8220;Unlimited&#8221; scalability</li>
</ol>


<p>Let&#8217;s get this out of the way: Amazon is <em>not using tape</em>. I can&#8217;t provide any evidence for this, but various AWS people have publically made their distate for tape well known. So, with that assumption, I think we can safely say that they are based on traditional rotating hard drives.</p>

<p>The rest of this is going to be very speculative, but I think it likely fits into the general approach that AWS has taken. To analyze the entire system, we&#8217;re going to talk about a couple of areas:</p>

<ol>
<li>Data at rest costs</li>
<li>Power</li>
<li>Data ingest and retrieval</li>
<li>Bandwidth costs</li>
</ol>


<h2>Data at rest costs</h2>

<p>Since I&#8217;ve already said that Glacier is based on traditional rotating media, we need to decide what kind they&#8217;re using. My guess would be 3TB SATA drives, as they currently represent the best capacity for your money. So how much is Amazon paying for these drives?  I would estimate that Amazon is paying less than $100 for a 3TB drive, perhaps even under $75. General retail is around <a href="http://www.newegg.com/Product/Product.aspx?Item=N82E16822148844">$150</a>, but Amazon has (conservatively) over 100PB of data in S3 currently, so they&#8217;re buying power is excellent.</p>

<p>Assuming that Amazon has an approximately 3x overhead for all storage &#8211; including erasure coding and management administrivia &#8211; that&#8217;s upwards of 100,000 hard drives spinning in S3, but this is a very conservative estimate. Based on a $75 per-unit pricing, we&#8217;re talking about $0.025 per gigabyte, and with a tripling for overhead that&#8217;s $0.075 per gigabyte.</p>

<p>Now, those drives have to live somewhere. While you could just pile them in the corner, they&#8217;re harder to get data in and out of at that point. Instead, let&#8217;s presume that Amazon has a custom storage server. It wouldn&#8217;t be the <a href="http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/">first company</a> to do such a thing. Since you don&#8217;t need any redundant power supplies or anything in the system, let&#8217;s assume they&#8217;ve got a box that 45 drives and costs them about $1,500 delivered. That gives you an amortized cost of $44.44 per drive in capital expense.</p>

<p>Put all that data together and we have a capital cost of $4,875 for 135TB of raw capacity, or 45TB of &#8220;usable capacity&#8221;. That is $0.1083/gigabyte. Think about that. In less than 1yr of usage, AWS would recoupe all the costs associated with the storage itself, and it&#8217;s easy to imagine a 7-10 year usable life for the hardware given some of the things that are discussed in the next section.</p>

<h2>Power</h2>

<p>Forrester Research <a href="http://www.cio.com/article/627363/Forrester_3_More_Ways_to_Cut_Data_Center_Energy_Costs">estimates</a> that the cost of power will exceed the capital cost of servers over their useful life. Do you know how you fix that problem?</p>

<p><em>Turn them off.</em></p>

<p>The key is the 3-5 hour retrieval time that AWS quotes.  I&#8217;ll go into it more, but it&#8217;s key to being able to turn the servers off as often as possible. The overall goal is to turn a server on, fill up all it&#8217;s hard drives and then turn it back off for as long as possible.  In fact, you could spin hard drives up and down as needed, saving even more space.  You&#8217;re basically treating them as &#8220;tape&#8221; by filling them up sequentially (approximately).</p>

<p>A powered off server uses almost no power. There&#8217;s a tiny bit for the baseband controller that you need to remotely turn it on, but that&#8217;s nothing compared to spinning up the hard drives.</p>

<p>One additional area AWS may be benefitting from is that as such a huge hard drive purchaser, they might be able to get custom motors or at least firmware that allow the hard drive to spin slower and use less power.  Because of the totally different performance characteristics, a hard drive running at 5,400 RPM, or even 3,600 RPM, is likely to be plenty fast enough for the application.</p>

<p>Note that this also likely has something to do with the odd <a href="http://aws.amazon.com/glacier/faqs/#How_am_I_charged_for_deleting_data_that_is_less_than_3_months_old">penalty for early deletion</a>.  That means they have to keep the server spun up to refill the deleted data.</p>

<p>If you can turn the server off, that means it&#8217;s not generating heat. That means you can use less A/C, which is a huge capital and expense item for a data center. Finally, if you were to build dedicated parts of a data center to just hold Glacier components, then you can run the whole thing at a higher temperature because you only have a small number of servers running concurrently and the remaining air becomes a bit of a heat sink allowing it to absorb the BTUs.</p>

<p>Finally, hard drives have a certain number of hours they&#8217;re rated for, but as Google <a href="http://research.google.com/pubs/pub32774.html">discovered</a>, there&#8217;s a lot of interesting failure issues involved.  Hard drives that aren&#8217;t spinning last long, though of course there are limits even to this.  Eventually, they&#8217;ll fail, but the wear-and-tear is much lower.</p>

<h2>Data ingest and retrieval</h2>

<p>So how does data get into Glacier? Dollars to donuts says that you don&#8217;t talk directly to Glacier. I would imagine there&#8217;s a bit of hierarchical storage going on, with S3 being used as a staging repository. Put simply, data is uploading by customers to Glacier through a front end that stores it in S3 until enough data has accumulated to justify spinning up a new set of servers and filling them with data. Once that&#8217;s done, the data can be aged out of the S3 &#8220;cache&#8221;.</p>

<p>When you ask for a retrieval, however, I suspect there&#8217;s a coalescing of requests to make sure that as many requests from the same set of machines as possible is satisfied at once. This, combined with the power discussed earlier, is why there&#8217;s a 3-5 hour lag, and maybe longer. It also is a psychological pressure not to treat Glacier as a cheap version of S3, but instead as a true &#8220;cold storage&#8221;.  Once retrieved, the data is staged into S3 and available for download.</p>

<p>My guess is that the future capability to do aging in and out of S3 is linked in this. Most likely, AWS is just working out the bugs in their own use before making it generally available.</p>

<h2>Bandwidth costs</h2>

<p>When you buy large amounts of bandwidth, you&#8217;re generally buying symmetrical bandwidth. Buy 100Gbps and you actually are talking 200Gbps, with 100 in each direction. I have no actual data for AWS, but in my experience, most hosting providers have more egress data than ingress. I&#8217;m sure there&#8217;s exceptions, but that&#8217;s generally the case. This is part of why AWS can offer ingress bandwidth for free. For them, it likely is something approaching free, because it&#8217;s already provisioned and sitting idle. Also, every gigabyte you upload costs you money to store.</p>

<h2>Conclusion</h2>

<p>So there you have it.  No miracles, just the intelligent application of a lot of holistic system tuning for a specific workload.  Now some have supposed that this is just a layer on top of S3, but I think that&#8217;s likely not true just because of the totally different workload impacts.  What I suspect is it&#8217;s derived from S3 code wise, but doesn&#8217;t share the same infrastructure in anyway.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Compliance versus velocity]]></title>
    <link href="http://blog.amber.org/blog/2012/08/03/compliance-versus-velocity/"/>
    <updated>2012-08-03T20:36:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/08/03/compliance-versus-velocity</id>
    <content type="html"><![CDATA[<p>In government computing circles this year, discussions of cloud and <a href="http://www.gsa.gov/portal/category/102371">FedRAMP</a> have been all the rage.
While I can&#8217;t get into a lot of customer-specific details, I have a good bit of experience with the predecessor to FedRAMP, namely the GSA IaaS blanket purchase agreement.
In fact, I&#8217;ve helped get several enormous cloud providers through the finish line, which consists of being not only technical translator, but therapist, consultant, language translator and general divinator of government intentions.
It&#8217;s challenging.</p>

<p>Here&#8217;s the big problem I see: even at its most optimal, FedRAMP is too slow.
It&#8217;s not specific to FedRAMP, as those involved have thought through the issues and done a great job with what they have to work with.
The problem is that &#8220;what they have to work with&#8221; is the <a href="http://www.gsa.gov/portal/content/190333">NIST SP800-53</a> framework, and it&#8217;s just not something that fits well with the cloud world.</p>

<p>I take that back, it&#8217;s not cloud that&#8217;s the problem, it&#8217;s modern architectures.
I faced similar challenges when dealing with a huge SOA-based system developed specifically for a government agency.
What is a system when you have hundreds (or thousands) of services that are interdependent in intricate ways?
How do you even begin to think about that?
Clouds add another layer of complexity and vagueness to the whole recipe.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A Voice in the Darkness]]></title>
    <link href="http://blog.amber.org/blog/2012/04/28/a-voice-in-the-darkness/"/>
    <updated>2012-04-28T20:29:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/04/28/a-voice-in-the-darkness</id>
    <content type="html"><![CDATA[<blockquote><p>&#8220;The cosmos is all that is, or ever was, or ever will be.
Our contemplations of the cosmos stir us. There is a tingling in the spine;
a catch in the voice; a faint sensation, as if a distant memory, of falling
from a great height. We know we are approaching the grandest of mysteries.&#8221;
&#8211; Carl Sagan</p></blockquote>

<p>With those eloquent words, Carl Sagan launched his epic series <a href="http://en.wikipedia.org/wiki/Cosmos:_A_Personal_Voyage">Cosmos</a>. I was 8 years old when this carefully spoken man, with a measured cadence, launched my mind on its own voyage into the curiosity and questioning that underpins science. Recently, I re-watched the <a href="http://www.youtube.com/watch?v=R7n71pm0K04">original series</a>, and while the special effects are dated, the message, and the man, are timeless.</p>

<p>For three months, Carl Sagan took the entire country on a journey through the universe, as both a place and more critically as an idea.  I would sit rapt, unable to do anything else, and the words inspired me. I wonder where today&#8217;s youth will find their inspiration? It is difficult to contemplate a world without <a href="http://www.nytimes.com/1998/12/01/science/even-in-death-carl-sagan-s-influence-is-still-cosmic.html">Carl Sagan</a>. Science and technology have advanced so far in the last 20 years, and today we have so few people who can clearly articular the science while not forgetting the awe that the average person experiences when contemplating their place in the universe.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Value of eBooks]]></title>
    <link href="http://blog.amber.org/blog/2012/04/22/the-value-of-ebooks/"/>
    <updated>2012-04-22T17:06:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/04/22/the-value-of-ebooks</id>
    <content type="html"><![CDATA[<p>At lunch today, I was discussing with a friend the current pricing model behind eBooks. It doesn&#8217;t make sense when you look at it on the surface, and it makes even less sense when you dig deeper &#8211; at least if you apply any sense of rationality. It&#8217;s important, however, to understand that rationality does not underly the model.  Note, that I&#8217;m not going to deal with the ongoing lawsuits over the <a href="http://www.mercurynews.com/business/ci_20371139/doj-lawsuit-against-apple-over-e-books-said?source=rss">agency model</a>, which actually has little to do with &#8220;fairness&#8221; in any real sense, and would not, in my opinion, result in a long term reduction in eBook prices.</p>

<p>First, let&#8217;s take a look at some of the top books right now on Amazon that are available both in paper and eBook format:</p>

<table class="table">
    <thead>
        <tr>
            <th>Book</th>
            <th>Paper</th>
            <th>eBook</th>
            <th>Savings</th>
            <th>Discount</th>
            <th>Pages</th>
            <th>Per Page</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td><i>The President&#8217;s Club</i></td>
            <td>$19.36</td>
            <td>$16.99</td>
            <td>$2.37</td>
            <td>12.24%</td>
            <td>656</td>
            <td>$0.0295</td>
        </tr>
        <tr>
            <td><i>The Land of Decoration</i></td>
            <td>$14.85</td>
            <td>$11.99</td>
            <td>$2.86</td>
            <td>19.26%</td>
            <td>320</td>
            <td>$0.0474
        </tr>
        <tr>
            <td><i>The Coldest Night</i></td>
            <td>$14.37</td>
            <td>$11.99</td>
            <td>$2.38</td>
            <td>16.56%</td>
            <td>304</td>
            <td>$0.047</td>
        </tr>
        <tr>
            <td><i>Afterwards: A Novel</i></td>
            <td>$15.00</td>
            <td>$12.99</td>
            <td>$2.01</td>
            <td>13.4%</td>
            <td>400</td>
            <td>$0.0375</td>
        </tr>
        <tr>
            <td><i>Truth Like Sun</i></td>
            <td>$15.41</td>
            <td>$12.99</td>
            <td>$2.42</td>
            <td>15.7%</td>
            <td>272</td>
            <td>$0.0566</td>
        </tr>
        <tr>
            <td><i>Bird Sense</i></td>
            <td>$15.00</td>
            <td>$13.75</td>
            <td>$1.25</td>
            <td>8.33%</td>
            <td>288</td>
            <td>$0.052</td>
        </tr>
        <tr>
            <td><i>Magic Hours</i></td>
            <td>$8.40</td>
            <td>$7.69</td>
            <td>$0.71</td>
            <td>8.45%</td>
            <td>256</td>
            <td>$0.033</td>
        </tr>
        <tr>
            <td><i>Fifty Shades of Grey</i></td>
            <td>$9.57</td>
            <td>$9.99</td>
            <td>-$0.42</td>
            <td>-4.4%</td>
            <td>528</td>
            <td>$0.018</td>
    </tbody>
</table>


<p>Lots of people will look at the cost and think &#8220;wow, you don&#8217;t save any money&#8221; buying an eBook, but they misunderstand the economics. The cost of producing a paperback book is trivial to the overall cost. If you&#8217;re producing 50,000 copies, you&#8217;re talking $2-3 in total cost to print and distribute. That&#8217;s not the problem. The idiosyncrasy of <em>Fifty Shades of Grey</em> is an artifact of the agency model.</p>

<p>The problem is that the <em>value</em> of an eBook is lower. Not only does the publisher not have to produce the physical book, but they also take from you all your rights by implementing <abbr title="Digital Rights Management" >DRM</abbr>. With a physical book I can loan it without restraint to anyone I wish. I can give it to anyone. I can do anything I want. With an eBook, though, I&#8217;m limited to what the publisher wants to allow, and many publishers do not allow anything at all.</p>

<p>If, instead, you offered me an eBook for the current 8-15% discount, but gave it to me in an unrestricted ePub format, then I&#8217;d say you have a good value. Until then, I&#8217;ll find it difficult to stomach the pricing model for DRM-protected eBooks.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Scale in Security Events]]></title>
    <link href="http://blog.amber.org/blog/2012/04/16/scale-in-security-events/"/>
    <updated>2012-04-16T20:41:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/04/16/scale-in-security-events</id>
    <content type="html"><![CDATA[<p>The other day, I was having a discussion with a developer about scaling systems to process security events. Now, let me preface this by saying that I used to work for one of the pioneering companies and have a pretty good understanding of what it takes to scale a <abbr title="Security and Information Management" >SEIM</abbr> solution, although back then, we just called them <abbr title="Security Information Management" >SIM</abbr>. This engineer was talking about how they were handling &#8220;millions of events per day&#8221;. How many millions, I asked? &#8220;Well, we handle over four million for our internal systems.&#8221;</p>

<p>&#8220;Four million&#8221; sounds really impressive, doesn&#8217;t it? That&#8217;s a lot of data, and I won&#8217;t deny that, but it&#8217;s not <em>that much</em> data when it comes down to it. For capacity planning, if all I had was a daily rate, I took that rate over 12 hours to compensate a bit for what is termed the <a href="http://en.wikipedia.org/wiki/Busy_hour">peak busy hour</a>. It&#8217;s not an elegant solution, but it solves the back-of-the-napkin estimate world just fine. So, 4M/day is about 93 events per second.</p>

<p>On the surface, that seems like a lot, but it&#8217;s not. Even when I was immersed in the SEIM world on a day-to-day basis, we were dealing with many customers who were attempting to digest 10,000 <abbr title="Events per Second" >EPS</abbr>. That translates to approximately 432,000,000 events per day, or several orders of magnitude greater. Mind you, that was in the early 2000s, and the world has gotten a lot more intense than it was back then. I know of organizations dealing with 100,000 EPS, or more, today.</p>

<p>Now, if we assume the average security event is (uncompressed) approximately 64 bytes, once some normalization happens, then you&#8217;re talking about 6.25 megabytes per second of data to deal with. It&#8217;s a lot, and 99.999% of it is useless when it arrives. It only becomes interesting later, but you won&#8217;t know how far in the future that will be.</p>

<p>To put it all in perspective, if you could print a single event on a single line on a sheet of paper, and assuming you can print 66 lines on a page (something we used to do in the days of line printers), then that&#8217;s 151 pages <em>per second</em> in my old experiences, and over 1,515 pages today. Every second. Stick that in your shredder and think about it.</p>

<p>So, that brings me to my point: before you go around bragging about performance, it&#8217;s best to understand what state-of-the-art really is. Big data arrived a very long time ago in the security world, but the tools and technologies still haven&#8217;t caught up.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Secure Linux Instances in the Cloud]]></title>
    <link href="http://blog.amber.org/blog/2012/04/16/secure-linux-instances-in-the-cloud/"/>
    <updated>2012-04-16T14:56:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/04/16/secure-linux-instances-in-the-cloud</id>
    <content type="html"><![CDATA[<p>In the security world, we talk about <a href="http://www.nsa.gov/ia/_files/support/defenseindepth.pdf">defense in depth</a> (PDF), which basically means that your castle should have a moat, a drawbridge, a lock, and a lot of archers on the ramparts. Historically, in the computer security world this meant that you would have firewalls, <abbr title="Intrusion Detection System" >IDS</abbr>, and a multitude of different layers of security. Unfortunately, a lot of that is no longer applicable when you deploy applications into &#8220;the cloud&#8221;. Instead, you have to rethink what those defenses are and how they reinforce and support one another.</p>

<p>The first layer of defense you have control over is what packets end up at your systems, and what you do with them. For a Linux machine, this is controlled by the <a href="http://www.netfilter.org/projects/iptables/index.html">iptables</a> component of the operating systems. The goal of this moat around your system is to try and keep a vast majority of the stupid at bay. There&#8217;s lots of things that you should never, ever, see, and that there&#8217;s simply no reason to even bother with.</p>

<p>What I&#8217;m going to do is walk you through the foundation rule set (my starter moat) that I base everything else on, which you can find as a <a href="https://gist.github.com/1959001">gist on GitHub</a>. Please feel free to use however you wish, though if you find a mistake I would ask you just let me know by putting a comment on the gist itself. Feel free to add your own alligators and flaming spikes.</p>

<h2>Categorizing Flows</h2>

<p>The first thing we need to do is group our traffic into different chains of rules that will be applied.  This makes it a bit easier to deal with. Note that the <a href="https://gist.github.com/1959001">gist</a> has a lot of this as comments.</p>

<pre><code>-N ICMP_IN
-N ICMP_OUT
-N SPOOF_LOG_DROP
-N SPOOF_IN
-N SPOOF_OUT
-N BAD_TCP_FLAGS
</code></pre>

<p>The first two, <code>ICMP_IN</code> and <code>ICMP_OUT</code> are somewhat self explanetory.  We want to treat all <abbr title="Internet Control Messaging Protocol" >ICMP</abbr> carefully. The next three, <code>SPOOF_LOG_DROP</code>, <code>SPOOF_IN</code> and <code>SPOOF_OUT</code> are all about address spoofing protection, something everyone <em>should</em> be doing, but usually isn&#8217;t. The last one, <code>BAD_TCP_FLAGS</code> is looking for all sorts of nasty behavior that people use for either OS detection, or often to try and find exploits in a system.</p>

<p>We&#8217;ll be going through them in roughly that order.</p>

<h2>ICMP Management</h2>

<pre><code>-A ICMP_IN -p icmp --icmp-type 8 -j DROP
-A ICMP_IN -p icmp -i eth0 --icmp-type 0 -m state --state ESTABLISHED,RELATED -j ACCEPT
-A ICMP_IN -p icmp -i eth0 --icmp-type 3 -m state --state ESTABLISHED,RELATED -j ACCEPT
-A ICMP_IN -p icmp -i eth0 --icmp-type 11 -m state --state ESTABLISHED,RELATED -j ACCEPT
-A ICMP_IN -p icmp -i eth0 -j DROP
-A ICMP_OUT -p icmp -o eth0 --icmp-type 8 -m state --state NEW -j ACCEPT
-A ICMP_OUT -p icmp -o eth0 -j DROP
-A INPUT -p icmp -j ICMP_IN
-A OUTPUT -p icmp -j ICMP_OUT
</code></pre>

<p>In line 1, we just ignore all the ICMP echo requests (type 8). Ping is a good example of a use of an echo request. There&#8217;s just no reason to respond to them normally. If, however, you have a tool that needs to ping your system to get a response, then you&#8217;ll need to modify the filter slightly to be address-specific.  Line 2 drops anything that&#8217;s an echo response (type 0) which we didn&#8217;t initiate. Next, lines 3 and 4 drop the destination unreachable (type 3) and <abbr title=Time to Live >TTL</abbr> (type 11) responses if they&#8217;re not related to something we sent. These are another sneaky way to peek into a system.</p>

<p>Then, we drop everything else, because they fail the sanity check. Generally, a system would only send an echo request in response to a ping command, and there&#8217;s only three responses that make any sense to those in the modern world: response, TTL-exceeded and destination unreachable.</p>

<h2>Spoofed Packets</h2>

<p>Now that we&#8217;ve dealt with incoming packets, we&#8217;re going to allow echo requests (ping) to leave the system. Everything else ICMP-related, such as redirects, timestamp requests, etc., shouldn&#8217;t be coming or going, and so we drop them without note. Finally, we attach our rule chains to the core rule chains, <code>INPUT</code> and <code>OUTPUT</code>.</p>

<pre><code>-A SPOOF_LOG_DROP -j LOG --log-prefix "IPT: spoofed "
-A SPOOF_LOG_DROP -j DROP
-A SPOOF_IN -i eth0 -s &lt;MYIP&gt; -j SPOOF_LOG_DROP
</code></pre>

<p>With ICMP traffic out of the way, we need to deal with traffic coming and going to addresses that don&#8217;t pass the sanity check. You can find a lot of these addresses in <a href="http://tools.ietf.org/html/rfc3330">RFC3330</a>, &#8220;Special-Use IPv4 Addresses&#8221;. People often forget there&#8217;s more out there than just the addresses in <a href="http://tools.ietf.org/html/rfc1918">RFC1918</a>. So, since we want to keep an eye on this, the first thing we do (lines 1-2) is set up some log configuration. Log messages matching this rule chain will be prefixed with &#8220;IPT: spoofed&#8221;. IPT just stands for IP tables.</p>

<p>So, before we go any further, we need to make sure nobody is spoofing our own addresses. In line 3, you&#8217;ll see something <code>&lt;MYIP&gt;</code>, which needs to be replaced by whatever IP address is used by the host. The rule says &#8220;if I see something with a source address that is mine on my ethernet connection, drop it&#8221;. You shouldn&#8217;t see it showing up there. Ever. If you do, you likely either have a serious security problem, or need to talk to someone about how networking is set up in detail.</p>

<pre><code>-A SPOOF_IN -i eth0 -s 10.0.0.0/8 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 172.16.0.0/12 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 192.168.0.0/16 -j SPOOF_LOG_DROP
</code></pre>

<p>Next, we block all the standard RFC1918 addresses. Now, if you&#8217;re actually using them internally, you can&#8217;t do this, but this is from situations where my server only has a publicly routable address.</p>

<pre><code>-A SPOOF_IN -i eth0 -s 198.18.0.0/15 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 169.254.0.0/16 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 192.0.2.0/24 -j SPOOF_LOG_DROP
</code></pre>

<p>Here, we block (line 1) the official &#8220;benchmarking&#8221; networks, defined in <a href="http://tools.ietf.org/html/rfc2544">RFC2544</a>. While I&#8217;ve yet to see them in the wild, they shouldn&#8217;t show up, and part of the goal of this rule set is to make sure we set a sanity benchmark.
Next, line 2 drops link local traffic (<a href="http://tools.ietf.org/html/rfc3927">RFC3927</a>).  Link local addresses are those that are &#8220;randomly&#8221; assigned when an interface doesn&#8217;t have a static address, and can&#8217;t use something like <abbr title="Dynamic Host Configuration Protocol" >DHCP</abbr> to dynamically assign one. Again, it should never show up in a &#8220;normal&#8221; situation.  Line 3 drops TEST-NET traffic. TEST-NET, as defined in <a href="http://tools.ietf.org/html/rfc5737">RFC5737</a> is intended only for use in documentation. Once again, it should <em>never</em> show up in production use.</p>

<pre><code>-A SPOOF_IN -i eth0 -s 224.0.0.0/4 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 240.0.0.0/4 -j SPOOF_LOG_DROP
</code></pre>

<p>Since I almost never have any use for <a href="http://en.wikipedia.org/wiki/Multicast">multicast</a>, I drop everything associated with the standard multicast blocks, defined in <a href="http://tools.ietf.org/html/rfc5771">RFC5771</a>.</p>

<pre><code>-A SPOOF_IN -i eth0 -s 127.0.0.0/8 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 0.0.0.0/8 -j SPOOF_LOG_DROP
-A SPOOF_IN -i eth0 -s 255.255.255.255/32 -j SPOOF_LOG_DROP
</code></pre>

<p>Now, we also shouldn&#8217;t see loopback addresses, or other bonkers addresses showing up on our Ethernet interface. See below for information on the loopback protections.</p>

<pre><code>-A SPOOF_OUT -i eth0 -s ! &lt;MYIP&gt; -j SPOOF_LOG_DROP
</code></pre>

<p>One thing many people forget to do is block their systems from becoming a source of problems. So, we block any outgoing traffic on our Ethernet interface that isn&#8217;t originating from my IP address.</p>

<pre><code>-A INPUT -j SPOOF_IN
-A OUTPUT -j SPOOF_OUT
</code></pre>

<p>And now, finally, we attach these new rule chains to the primary ones, just a we did before.</p>

<h2>TCP Flags</h2>

<p>That brings us to the last &#8220;protection&#8221; set of rules: those associated with all sorts of crazy flags in the TCP packet. If you&#8217;ve forgotten, the TCP packet has 9 potential flags. Read <abbr title="Least Significant Bit" >LSB</abbr> to <abbr title="Most Significant Bit" >MSB</abbr>:</p>

<ul>
<li>NS: <abbr title="Explicit Congestion Notification" >ECN</abbr>-nonce concealment protection (<a href="http://tools.ietf.org/html/rfc3540">RFC3540</a>)</li>
<li>CWR: Congestion Window Reduced flag is set by the sender to indicate that it received a TCP segment with the ECE flag and had responded in congestion control mechanism (<a href="http://tools.ietf.org/html/rfc3168">RFC3168</a>)</li>
<li>ECE: ECN-Echo indicates:

<ul>
<li>If SYN flag is set, that the TCP peer is ECN capable.</li>
<li>If SYN flag is clear, that a packet with Congestion Experienced flag in IP header set is received during normal transmission (RFC3168).</li>
</ul>
</li>
<li>URG: the Urgent pointer field is significant</li>
<li>ACK: the Acknowledgment field is significant. All packets after the initial SYN packet sent by the client should have this flag set.</li>
<li>PSH: Push function. Asks to push the buffered data to the receiving application.</li>
<li>RST: Reset the connection</li>
<li>SYN: Synchronize sequence numbers. Only the first packet sent from each end should have this flag set.</li>
<li>FIN: Finished. No more data from sender</li>
</ul>


<p>Only some of these can be set &#8220;together&#8221;, and often you find people probing systems to see how they respond to various flag combinations. For example, one of the techniques for OS detection used by <a href="http://nmap.org">nmap</a> is to play with the various flags to see how a host responds.  We don&#8217;t use the SPOOF_LOG_DROP-style reaction because we want to change the log message so we know what&#8217;s going on.</p>

<pre><code>-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN SYN,FIN -j LOG --log-prefix "IPT: Bad SF flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN SYN,FIN -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,RST SYN,RST -j LOG --log-prefix "IPT: Bad SR flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,RST SYN,RST -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,PSH SYN,FIN,PSH -j LOG --log-prefix "IPT: Bad SFP flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,PSH SYN,FIN,PSH -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,RST SYN,FIN,RST -j LOG --log-prefix "IPT: Bad SFR flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,RST SYN,FIN,RST -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,RST,PSH SYN,FIN,RST,PSH -j LOG --log-prefix "IPT: Bad SFRP flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags SYN,FIN,RST,PSH SYN,FIN,RST,PSH -j DROP
</code></pre>

<p>But sometimes, we need things set together, and if they aren&#8217;t, then it doesn&#8217;t make sense from a network stack perspective. Then, we have some things that can not exist in the first SYN packet, so they must be accompanied by the ACK flag. If they&#8217;re not, we don&#8217;t want them.</p>

<pre><code>-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,FIN FIN -j LOG --log-prefix "IPT: Bad F-A flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,FIN FIN -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,PSH PSH -j LOG --log-prefix "IPT: Bad P-A flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,PSH PSH -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,URG URG -j LOG --log-prefix "IPT: Bad U-A flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ACK,URG URG -j DROP
</code></pre>

<p>Then, we have people who think it&#8217;s OK to have no flags or all the flags set:</p>

<pre><code>-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL NONE -j LOG --log-prefix "IPT: Null flag "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL NONE -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL ALL -j LOG --log-prefix "IPT: All flags "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL ALL -j DROP
</code></pre>

<p>Oh, and <a href="http://en.wikipedia.org/wiki/Christmas_tree_packet">merry Christmas</a>. Normally, I&#8217;m all for Christmas, but, these are just insane:</p>

<pre><code>-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL FIN,URG,PSH -j LOG --log-prefix "IPT: Xmas flags "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL FIN,URG,PSH -j DROP
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL SYN,RST,ACK,FIN,URG -j LOG --log-prefix "IPT: Merry Xmas flags "
-A BAD_TCP_FLAGS -p tcp --tcp-flags ALL SYN,RST,ACK,FIN,URG -j DROP
</code></pre>

<p>And then, just attach it to the main <code>INPUT</code> rule chain.</p>

<pre><code>-A INPUT -p tcp -j BAD_TCP_FLAGS
</code></pre>

<h2>Normal Traffic Controls</h2>

<p>Now we get into more &#8220;normal&#8221; traffic controls.  First, we want to allow everything on the loopback (lo) interface. This is used for both local servers (such as databases, proxies, etc.) and for SSH tunneling:</p>

<pre><code>-A INPUT -i lo -j ACCEPT
</code></pre>

<p>And drop it if it is on the loopback network, but not coming through that interface:</p>

<pre><code>-A INPUT -i ! lo -d 127.0.0.0/8 -j REJECT
</code></pre>

<p>We also want to allow all traffic associated with previously permitted connections. These are generally called &#8220;established&#8221; connections:</p>

<pre><code>-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
</code></pre>

<p>Now, it might be helpful if we allow traffic to originate from the system to other places.  On some systems, I also tighten this down to be only a very small subset of traffic, perhaps only HTTP, but that&#8217;s the next step, and this is the 81% rule.</p>

<pre><code>-A OUTPUT -j ACCEPT
</code></pre>

<p>And that brings us to inbound application traffic. This is traffic we expect to be coming in, such as web browser traffic to a web server, or SSH:</p>

<pre><code>-A INPUT -p tcp --dport 80 -j ACCEPT
-A INPUT -p tcp --dport 443 -j ACCEPT
-A INPUT -p tcp -m state --state NEW --dport 22 -j ACCEPT
</code></pre>

<p>Now we need to tweak some of the logging information. We don&#8217;t want to get overwhelmed with logs and have that turn into a denial-of-service attack itself.  So, to prevent that, we restrict it to bursts and 60/minute:</p>

<pre><code>-A INPUT -m limit --limit-burst 100 --limit 60/min -j LOG --log-prefix "IPT: denied " --log-level 7
</code></pre>

<p>Repeat after me: that which is not explicitly permitted is denied:</p>

<pre><code>-A INPUT -j REJECT
</code></pre>

<p>Also, forwarding is evil. Do not forward on this host. Ever.</p>

<pre><code>-A FORWARD -j REJECT
</code></pre>

<p>And that&#8217;s the basic set of rules.  You can customize these till your heart&#8217;s content, but this is a start. Sadly, it won&#8217;t be all the security you need, but it&#8217;s better than what many people have sitting out there.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Once More, With Feeling]]></title>
    <link href="http://blog.amber.org/blog/2012/04/16/once-more/"/>
    <updated>2012-04-16T10:45:00-04:00</updated>
    <id>http://blog.amber.org/blog/2012/04/16/once-more</id>
    <content type="html"><![CDATA[<p>Seriously, I had intended to start writing again. No, really. Stop laughing. The thing is, life &#8211; work mostly &#8211; intervened in my spare time, and it just didn&#8217;t happen.  Now that it has let up, if only <em>just</em>, it&#8217;s time to retry.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A New Hope]]></title>
    <link href="http://blog.amber.org/blog/2012/02/15/a-new-hope/"/>
    <updated>2012-02-15T16:48:00-05:00</updated>
    <id>http://blog.amber.org/blog/2012/02/15/a-new-hope</id>
    <content type="html"><![CDATA[<p>Like Futurama, I&#8217;m back.</p>

<p>Not too long ago, I decided to get rid of my blog. While I had enjoyed writing in it, it had stagnated for a while, and I didn&#8217;t really have any inspiration to write more at the time. Cue the ticking of the clock, and some time has passed. Part of the inspiration comes from Nathan Marz in his post “<a href="http://nathanmarz.com/blog/you-should-blog-even-if-you-have-no-readers.html">You should blog, even if you have no readers</a>”. Now, I feel like I&#8217;m ready to return to a more regular writing habit.</p>

<p>This time around, though, I&#8217;m not interested in dealing with the disaster that is WordPress. Instead, I&#8217;ve decided to go with a &#8220;static site generator&#8221;: <a href="http://octopress.org/">Octopress</a>. In time, I will bring over a lot of the writing from my old site.</p>

<p>Now, let the games begin!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Smoke, mirrors and security]]></title>
    <link href="http://blog.amber.org/blog/2002/07/16/smoke/"/>
    <updated>2002-07-16T15:04:00-04:00</updated>
    <id>http://blog.amber.org/blog/2002/07/16/smoke</id>
    <content type="html"><![CDATA[<p>There&#8217;s this common misconception among Americans that the actions of the US
Government since September 11th have had anything to do with the security of
the common populace. This is a gross mistake. As someone who has been a
practitioner of risk management and mitigation in physical, electronic and
social mechanisms, much of what I see is nothing but a sham, largely created
to distract people from the real problems.</p>

<p>Take airports. The security at airports continues to be abysmal at best,
insulting in the worst, and misleading in all actuality. I know people who
have boarded plans without photo ID through Dulles International Aiport (IAD)
in Washington, DC, and people who are prototypical passengers that would not
be suspicious (this is not race based, but behavioral) who are harassed to no
end, repeatedly. Why? Not because they are suspicious, but because they were
convenient. People are somehow &#8220;comforted&#8221; by seeing people searched, even if
those people are not the ones who could potentially present a security risk.</p>

<p>This is typified by the either intentional or negligent avoidance of dealing
with private jets (otherwise known as &#8220;political supports&#8221;), cargo transports,
and other vectors for attacks that are substantially more effective. Why
weren&#8217;t they dealt with? Because they&#8217;re not visible to the &#8220;average&#8221; person,
yet they represent the actual targets that an intelligent terrorist would go
after. We&#8217;ve been fortunate to not have been targeted by an overly intelligent
and concerted effort, just a rather clumsy one so far.</p>

<p>This is also typified in the methodology of Federal agencies in Washington, DC
that I work with on a daily basis. Many have erected all sorts of physical
barriers, but ignore thousands of other vectors that are exposed. Again, lots
of smoke, no fire. The effective reduction of risk is a difficult job, and
requires enormous intellectual resources &#8212; not necessarily monetary
resources. Much of the current government approaches are focused on spending
money &#8212; often just to line pockets of supporters or constituents &#8212; rather
than the examination of risks strategies.</p>

<p>As I said, risks can not be eliminated, but can only be mitigated, reduced and
managed. I have addressed too many people involved in the &#8220;war on terrorism&#8221;
who feel that there is a 100% solution out there, when that is the fatal flaw.
The assumption of perfection creates it&#8217;s own enormous risk. Only when we
understands the risks can be manage them.</p>

<p>Today, we are no more safe than we were September 10th due to anything the
government has done, but we have lost countless liberties that will be
difficult to regain. Any increase in security is largely due to the attention
of the American people, not the government.</p>
]]></content>
  </entry>
  
</feed>
