FedRAMP was created to provide “a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services.”. To an outsider, this sounds like a whole bunch of nonsense, but if you’ve ever had to deal with the NIST Risk Management Framework, at least as it is always implemented by government agencies, you’ll understand how absolutely critical it is that the approach be standardized. As a geek, nerd, and overall general enemy of paperwork, I have to start by simply saying that conceptually, I don’t have a problem with the Risk Management Framework (RMF). If you sit down and talk to Dr. Ross, the man behind the curtain, you’ll see that he’s generally a very reasonable person, and his goals are simply to provide a conceptual framework for the Federal government to understand and assess the risk of its systems.

The reality, unfortunately, is much darker. Instead of using the RMF, and its accompanying standards as the framework they are intended to be, they are instead generally treated as a veritable gospel that can never be questioned, thought about, reasoned about, or otherwise adapted to the situation at hand. This creates a situation that inverts the incentives and often creates systems that have much lower actual security, and substantially increased risk, but have lots of paperwork to get the approval.

Now that I’ve taken some organizations through the FedRAMP process, I can say that it is an improvement. It is more risk-focused, and more interested in being a collaborative effort to actually identify risk and address them wherever possible. It still suffers from a serious paperwork overhead, however, and worse, there are some conceptual gaps within it that do not address the neads of large cloud providers. I’m going to try and address some of the ideas that I think need to be tackled within FedRAMP to succeed with the likes of Amazon, Windows Azure, Google, Rackspace, etc. Without these changes, or something more effective even, I believe that FedRAMP, for all its admirable goals, will wither and die.

In follow-on articles, I’m going to cover some ideas for how to scale FedRAMP to both larger and, where I can, smaller cloud service providers. Most of my experience is with the gorillas in the yard, so it will focus on that, but I’d like to see it made more flexible for the organizations just starting, especially when they’re in the SaaS/PaaS space.

The topics I intend to cover are:

  • Issues of technical scale – How do you scale FedRAMP to deal with the issues faced by the likes of Amazon, Google, Microsoft, Rackspace, et. al.?
  • DevOps v. The Paperwork Monster – Specifically how do you deal with the enormous velocity exhibited by most cloud providers? The RMF isn’t really used to coping with this rate of change. No calculus, I promise.
  • Importance of automation – Spot checks of technical controls are fine, but the key is all inside the automation, which is rarely covered by many 3PAO.

If anyone has any other ideas they’d like to see addressed, I’d be happy to delve into them. Just as a note for my qualifications, I’m the quality manager for a major 3PAO, and technical lead on large-scale cloud projects.

A recent article in San Francisco’s The Daily Beast came with the following quote that got me thinking:

The fact that conspiracy theories are percolating up to local party leaders and even the halls of Congress should be a warning sign for the GOP. As the faithful know, you reap what you sow, and the steady diet of hyperpartisan media has seeded these conspiracy theories in the minds of party activists to the extent that they are starting to shape policy debates. The embarrassing incidents are evidence of a larger problem that needs to be confronted: when you do not condemn the use of hate and fear to serve as a recruiting tool against your political opponents, the ability to reason together is undermined and self-government is compromised. There is a cost to condoning extremism when it seems to benefit “your team.”

Even The Economist has weighed in on the lunacy of the issue.

As much as it pains me to say, I think the Internet has amplified the crazy by allowing the lunatics among us to connect with each other and reinforce their belief system. At some level, it is the basic “walled garden” cognitive bias, where people associate only with people who reinforce their belief system, thereby removing all doubt and thought from the process. The Internet has simply allowed us to define our garden even more narrowly than ever before, and while this has benefited small groups that often felt isolated – for example, people with a rare disease – it has also increased the “echo chamber” effect. From outside the wall, the system seems obviously broken, but from within, all is in harmony because any element that might undermine that harmony has been removed.

Combine the dominant confirmation biases exhibited by most people – and part of the reason people misunderstand science – with things like the gambler’s fallacy and herd instinct, and you have an environment ripe for exploitation. 50 years ago, the lunatics were yelling on the street corner, and were largely ignored by society. The Internet gave them the ability to find their own “kind”, and reinforce their beliefs, thereby creating a more monolithic social structure that can be used to garner perceived rationality.

Or at least, that’s my view.

In July of last year, I special ordered a Ford Focus ST. At the time, it had just been announced, and I hadn’t even driven one. Instead, my decision was based on the reputation of the previous generation of Focus ST and RS, and the quality of the current third generation Focus). Getting an order in was, shall we say, less than easy, as most dealers weren’t used to pre-ordering things, and I often knew much more about the process than they did. Still, in November of late year, I took delivery of an Oxford White 2013 Focus ST.

So, after six months with the car, I decided that it’s about time to put together my thoughts. The good and the bad, though the good far outweigh the bad.

The Bad

I have to start with the dreadful MyFord Touch system, “designed” by Microsoft. It’s not the worst user interface I’ve ever seen, but it is slow, sometimes oddly unresponsive, and periodically requires some kind of maintenance reboot, sometimes in the middle of a drive. Enough has been written about the system that I don’t need to go into it more here; suffice it to say that it’s not Microsoft’s finest hour. Fortunately, as a small redeemly grace, there has been a code release that was downloadable, and I was able to upgrade the software. The upgrade increased performance a little bit, and generally made the system more stable. Unfortunately, the system – even though it’s capable of connecting to a WiFi network in your house – is unable to download its own updates directly, and instead you have to download them, stick them on a USB drive and sit in the car for the entire time with the engine running while it does an update, which can take an hour.

The next three issues are all tied together. Rear-facing visibility is sub-optimal. It’s certainly worse than my Mazda 3, and comes with some serious blind spots. Unfortunately, unlike the Titanium model of the Focus, you can’t get the ST with either ultrasonic parking sensors, or a back-up camera. This is a gigantic, glaring oversight.

The last big issue is the dealer. No matter how much the domestic car makes improve their product – and they have made gigantic strides – the dealers are still sub-par at best. They fail on sales; they fail on service; and they fail in general communication.

The Good

As I said, the good far outweight the bad for me. For the good, I’m going to just hit them in bullet points:

  • Handling is Teutonic. The handling is more like that of a BMW than anything you’ve ever run into with a domestic car before. The limits are very high, with the car pulling close to 1G on the skid pad. That’s serious sports-car territory.
  • Oversteer. Yes, Virginia, you can hang the tail out of a front-wheel-drive car. The tuning of the stability system, and the chassis as a whole mean that you can create lift-off oversteer on demand if you put the suspension in sport mode.
  • Brakes are excellent. They’re not fancy Brembo brakes, but even a day in the mountains doesn’t cause any fade, and the initial bite is excellent. Pedal feel is outstanding. They are comparable to my Infinti G35 w/Brembo brakes.
  • Engine is a locomotive. 270ft-lbs of torque from just off idle, and it just pulls. The performance curve is not that of a turbo-charged 4 cylinder, but more like a straight 6, or even a nice old-school V8. There is minimal turbo lag.
  • Sounds great. Unlike most “great sounding cars”, though, it’s all intake noise coming through a special Sound Symposer system. It’s a butterfly valve that lets some of the intake noise into the cabin, but only when you step on it. Otherwise, nothing.
  • Sleeper and a car for hooning around. I don’t know how else to describe it other than Jekkyl and Hyde. Keep your foot out of the “go fast” pedal, and the car is a quiet, well mannered, cruiser. Step on it, and you can do all sorts of things that your parents would never approve of. For me, this is the greatest asset of the car.

My first digital camera was an Apple Quicktake 200 that I bought in 1996. It produced images in glorious 640x480 resolution, with a level of oversaturation that would make an 80s music video blush. Soon after, I switched to another camera, a Canon point-and-shoot, but that also didn’t really provide satisfactory results. In late 2005, I bought a Nikon D50 digital SLR (DSLR) camera to get back into photography. In purchasing the DSLR, I had looked at other cameras, and there simply had been nothing else that produced an image of any quality that could be used to make prints at 8x10 or above. It wasn’t just an issue of the sensor – the megapixel war had started already – but of the glass in front of the sensor, or plastic in some cases.

For a couple years, I used it extensively and took thousands of photos, but soon it became a bit of a hassle to keep with me. The DSLR is a bulky camera structure, and when you go to take a picture, it can intimidate your subject if you’re trying to take a picture of people. It’s a “serious camera”, and comes with all the problems associated with it.

So, a couple years later, I bought a Canon Powershot S90. While it might look like a simple point-and-shoot, but it was the first of a new generation of serious small cameras. It had good glass (f/2 at its fastest), a lower resolution sensor that had very low noise levels and good low-light performance, and the ability to operate in full manual with the creation of RAW images. The camera was pocketable in pants, and I took this camera with me all over the world, taking some amazing pictures with it. There’s 16x20 enlargements in my house from this camera that people have commented on repeatedly. Just amazing.

But, it’s time for somethng “new”, and when a good friend bought a Fujifilm X-Pro1, and started showing me the amazing work he was getting out of it, I started looking at the Fujifilm line of cameras. One of the things I learned was that I actually mainly used a 35-50mm effective focal length on my lenses. There is something freeing about not worrying about the lens. My Nikon D50 has had a f/1.8 35mm lens (50mm effective) on it most of its life, and the zooms stay in the bag.

Then, miracle of miracles, Fujifilm announced the X100S, an update to one of the most well reguarded, and troubled camera released. After a few days of research and thought, I decided to buy one from my favorite camera shop in NYC (B&H), and now the wait comes. The hardest part is seeing reviews of the new X100S start to be released, and how amazing the camera is looking.

Now, where’s mine?

What feels like almost 2 years ago, I backed a programmable flashlight on Kickstarter. While it took substantially longer than intended for it to get through the manufacturing process, I actually am quite happy to report that I received it in the mail a few weeks ago. Here it is:

It is perhaps the single most over-engineered flashlight in existance. The body is CNC machined aircraft-grade aluminum that’s been hard annodized. But, that’s not what makes it the most over-engineered flashlight, this is:

Yes, that’s a complete circuit board with microcontroller on it. Not just any microcontroller, though, but an Arduino compatible one. This means that you can program the flashlight just like an Arduino, using the same simple coding environment you’re used to. Then, you just plug in your USB cable, and away you go.

Is it overkill? Is a rechargable flashlight with a three-axis accelerometer absurd? Yes, yes it is. And that’s why I love it. It also happens to be the best flashlight I’ve ever seen, with a CREE XM-L U2 LED that puts out 500 lumens, and a beautiful total internal refraction lens.

Yesterday’s video by James Governor about “how not to define big data” got me to thinking, as so much of James’ writing does. First, go watch the video:

People often talk about big data as though it is a measurable quantity, something quantitative. And it is, but that’s inadequate to understand the different nature of it. For me, the more important aspect is qualitative. Big data isn’t just about the number of gigabytes that your system deals with, but instead about the underlying nature of that data. Take, for example, traditional account data, where we might store orders, and shipping information. This data has a high signal-to-noise ratio. If, instead, we look at things like website logs, telemetry from embedded devices, or even a stream of tweets, we’re talking about what is traditionally a very low signal-to-noise ratio.

Big data, then, isn’t just about the size of your haystack, but instead finding the needle hidden within.

A few weeks ago, Audrey Roy and Daniel Greenfield released the beta of their new book, Two Scoops of Django: Django Best Practices for Django 1.5. Being a fan of Danny and Audrey’s work, I obviously popped the $12 for it. I read it in about a couple hours on the plane the next day.

Weighing in at approximately 200 pages, or approximately 1.52x10E-27 grams (thanks information theory) on my iPad, there is little fluff in the book, and that is a great thing. I’ve been using Django since v0.96 came out in early 2007, and use it for a majority of my web work, and yet the book contained a lot of interesting ideas and insight. The result of reading the book was a TODO list with many items on it for all my projects to address a lot of issues that I’d not really thought about.

What more do you want from a book than for it to make you rethink how you approach solving problems? Go buy it.

Trying to catch up with my backlog of papers had been put on hold for a little bit as I tried to clean up a mess at work. Perhaps more on that later. Still, it seems like every paper I read somehow triggers a cascading avalanche of additional reading material, which means that the backlog never shrinks.

  • Kafka: a Distributed Messaging System for Log Processing (CiteSeerX)
  • Paxos Made Moderately Complex (Cornell PDF)
  • Pregel: A System for Large-Scale Graph Processing (Github PDF)
  • Nonlinear Time-Series Prediction with Missing and Noisy Data (CiteSeerX)
  • Bayesian Time Series: Models and Computations for the Analysis of Time Series in the Physical Sciences (CiteSeerX)
  • Probabilistic Similarity Search for Uncertain Time Series (CiteSeerX)
  • Processing a Trillion Cells per Mouse Click (CiteSeerX)
  • Only Aggressive Elephants are Fast Elephants (arXiv)
  • Uncertain Time-Series Similarity: Return to the Basics (arXiv)
  • Statistical Distortion: Consequences of Data Cleaning (arXiv)

The last few are all from VLDB 2012, and I have another dozen or so papers from the same conference that I want to work my way through. Looking at these, you can see that a lot of it is an attempt to deal with streams of data, specifically in real-time, as best as possible.

Another day, another few papers down:

  • Disco: Running Commodity Operating Systems on Scalable Multiprocessors (CiteSeerX)
  • End-to-end Arguments in Systems Design (CiteSeerX)
  • Weighted Voting for Replicated Data (CiteSeerX)
  • A Glossary of Time Granularity Concepts (PDF)
  • An Access Control Model Supporting Periodicity Constraints and Temporal Reasoning (CiteSeerX)
  • SharedDB: Killing One Thousand Queries With One Stone (CiteSeerX)

Some of these are quite old. For example, David Gifford’s paper on weighted voting was published in 1979, but it set forth the beginning of weighted r+w quorums that are now quite common. I’m pretty sure I had read it before, but sometimes older papers just need to be re-read to make sure that no ideas were missed.

Ever since I was in college, and would sneak into the AI department’s area to grab copies of their papers and technical reports, I’ve been a voracious reader of academic research. Too often in the “go go go” commercial world, we lose our perspective of work that is being done, and especially of the many decades of research upon which all our toys are built. That’s not to say that there aren’t plenty of papers and such from Google, Amazon, et. al., but I actually include many of those in the same academic realm as I would something from Stanford or MIT.

Anyway, for various reasons too tedious to go into, I’ve allowed my inbox (also known as Dropbox) to accumulate over a hundred papers that I intended to read, but haven’t found time to yet. That doesn’t begin to include all the amazing blog articles, etc., that accumulate in Instapaper at all times. The Internet may be an amazing thing, but it also is a source of unlimited future reading. So, starting this year, and today to be exact, I’ve decided to try and put time aside every day to read a few of the things I’ve accumulated and try and slowly work down my backlog. I was asked by a friend on Twitter to keep track of what I’m reading, and so, this is the start.

  • Hancock: A language for analyzing transactional data streams (CiteSeerX). A DSL for performing some relatively basic stream processing on large amounts of “sensor data”, in this case primarily telephone calls. Interesting ideas: 1) persistence mechanism that mirrors UNIX sensabilities with directories as containers; 2) view representation for abstracting data requirements over time, namely exact versus approximate representations
  • Crash-only Software (CiteSeerX). What if we just gave up and quit trying to recover from errors? Sometimes it’s faster to just crash the system and reboot. Came with a couple interesting papers I want to read, namely Decoupled storage: Free the replicas! and Session State: Beyond Soft State
  • PASSing the provenance challenge (Harvard). Integrating data provenance into the Linux operating system.
  • CryptDB: Protecting Confidentiality with Encrpyted Query Processing (CiteSeerX). A new approach to encryption in the database that seems very different than anything I’ve seen in the commercial sector yet. One of the things I like is that it is adaptive and blends different approaches to encryption – including homomorphic encryption where appropriate – to obtain maximum functionality with minimized risk. Definiately want to look at this to play with, and it seems to currently work with MySQL and PostgreSQL with varying degrees.

A final note, and something that continues to amaze me in 2012: if your paper is not available for free to read, then why are you publishing your paper? I continually run into annoying pay walls – ACM and IEEE, I’m looking at you – that do nothing but impede research and progress. Now, usually you can Google around and find the paper hosted somewhere, but if you are an academic and your profession is (supposedly) about the progress of human knowledge, how can you subscribe to this kind of walling off of knowledge?