Notes on the Keynote from Black Hat 2014

A few quick notes to share on the keynote yesterday at Black Hat 2014 in Las Vegas:

The keynote speaker was Dan Geer. Dan Geer is currently Chief Information Security Officer for In-Q-Tel. In-Q-Tel is the venture funding arm associated with the CIA.

The keynote was largely a policy speech with proposals related to security in a variety of technology domains, with roughly a dozen proposals.

His first proposal was a set of mandatory reporting guidelines for exploits and infections, on par with how the CDC mandates infectious disease reporting. He made several parallels between meatspace and cyberspace in handling disease, with this proposal being the most obvious response to the current state of affairs.

Of note to software companies (such as the one that currently employs me and those that have employed me in the past), he proposed that software be subject to product liability law as a means to improve software and service security. This is a provocative proposal, given that it was delivered during the keynote at the premier annual security conference by someone who leads information security for intelligence-related investments. He also noted, to great laughter, that “The only two products in America not covered by product liability law are religion and software, and software should not escape for much longer.” Ha.

There was also an interesting proposal for Net neutrality that suggests an approach I have not heard of before. He suggested that ISPs should be able to opt out of net neutrality, but by doing so they would be inspecting their packets and thus would no longer enjoy Common Carrier liability protections that they enjoy because they claim no knowledge of the packets they carry.

And he also proposed that, for devices in the Internet-of-Things world, devices should have an expiration date after which they no longer operate as a means to ensure that older devices with security vulnerabilities do not render the IoT space completely exploitable. This is a common theme for device security in general, as older, no-longer-supported devices tend to be primary attack vectors for offensive exploits.

One of the more interesting ideas he presented (in my opinion) was the idea that the US should attempt to corner the market in offensive exploit technologies in much the same way the US has used its wealth as foreign aid to influence events on the ground. He cited the surge in Iraq as an example of how the distribution of money to enemy combatants had more of an effect on the ground than did the arrival of more troops.

The full text of the keynote, including all of his proposals, is available here:

And TechCrunch has a decent writeup here:

Interesting times.

Running Sunspot and Solr with Rails 4 in Production

I recently added full text search capabilities to one of my projects and I decided to use Apache Solr and the Sunspot gem for Rails. Sunspot is awesome and ridiculously easy to integrate with Rails in a development setting, and it comes with a built-in Solr binary that is very useful for development and testing. However, I had some trouble finding good examples of how to deploy it to production with a stand-alone Solr deployment on Tomcat. I ultimately figured it out, and here is what I had to do.

First, a few links to related documents that helped me on my way:

Installing Lucene/Solr on CentOS 6

Install Solr 4.4 with Jetty in CentOS, and set up Solr server to work with Sunspot Gem

Install & Configure Solr on Ubuntu, the quickest way,-the-quickest-way

If you follow the approaches in any of these docs, you end up with Solr deployed on Tomcat with a default Solr configuration for both schema and cores in Solr. I used Solr 4.9.0 on Tomcat 7.0.54, but you should be able to use whatever combination suits your Linux distro package manager. The default configuration is nice for sanity testing, but it must be customized for Sunspot before anything will actually work from Rails.

These are the four steps I used to customize a default configuration Solr deployment for Sunspot:

1) Update the directory structure within the Solr home directory

The default directory structure of the Solr home directory looks like this:

4 drwxr-xr-x. 2 tomcat 4096 Jun 27 14:59 bin/
4 drwxr-xr-x. 4 tomcat 4096 Jul 22 06:42 collection1/
4 -rw-r--r--. 1 tomcat 2473 Jun 27 14:59 README.txt
4 -rw-r--r--. 1 tomcat  446 Jul 22 06:46 solr.xml
4 -rw-r--r--. 1 tomcat  501 Jun 27 14:59 zoo.cfg

As this stands, there is a single SolrCore configured as collection1, with config files residing within the collection1/conf/ directory. Sunspot, by default (in config/sunspot.yml in Rails), will look for SolrCores named after its development, test, and production modes. And we also want a single conf/ directory for all three cores. So we need to modify the directory structure to look like this:

4 drwxr-xr-x. 2 tomcat 4096 Jun 27 14:59 bin/
4 drwxr-xr-x. 6 tomcat 4096 Jul 22 06:43 conf/
4 drwxr-xr-x. 3 tomcat 4096 Jul 22 06:44 development/
4 drwxr-xr-x. 3 tomcat 4096 Jul 22 06:46 production/
4 -rw-r--r--. 1 tomcat 2473 Jun 27 14:59 README.txt
4 -rw-r--r--. 1 tomcat  446 Jul 22 06:46 solr.xml
4 drwxr-xr-x. 3 tomcat 4096 Jul 22 06:44 test/
4 -rw-r--r--. 1 tomcat  501 Jun 27 14:59 zoo.cfg

The new top-level conf/ directory contains everything previously in collection1/conf (i.e. “mv collection1/conf .”), which we will soon customize. The new directories development, test, and production are empty for now, but Solr will populate them when it restarts. Note that, strictly speaking, you only need to add the production directory to support Rails in production mode, but I added directories for development and test just in case I ever need to test things against the production Solr instance.

2) Configure the custom Solr schema for Sunspot in schema.xml
3) Configure the custom Solr config for Sunspot in solrconfig.xml

Steps 2 and 3 customize Solr for Sunspot and ActiveModel. You will need to find customized versions of these config files and place them in the new top-level conf/ directory. If you are developing your Rails app with Sunspot in development mode with its built-in binary, then you should see a solr/ directory in your Rails development directory and you can find Sunspot’s schema.xml and solrconfig.xml in solr/conf/. You can also grab these directly from Sunspot on Github here:

Either way, overwrite the existing versions of schema.xml and solrconfig/xml with the customized versions.

4) Configure the SolrCores that Sunspot expects to see for Rails in production mode in solr.xml

This last step was the one I didn’t find in any examples and this took the longest to figure out. By default, the solr.xml file in the top-level Solr home directory looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
  <cores adminPath="/admin/cores" host="${host:}" hostPort="${jetty.port:}">
    <core name="collection1"     instanceDir="." dataDir="default/data"/>

In some distrubutions of Solr this file is omitted entirely, and the comments in this file state that in the absence of this file the default configuration will silently be the same — a single core named collection1 — but without an explicit file-based definition. Fun.

We want to eliminate the unneeded default collection1 core definition and replace it with definitions for our production, test, and development cores. Note again that you can skip the test and development cores if you want. I want cores for test and development, so I updated solr.xml to this:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
  <cores adminPath="/admin/cores" host="${host:}" hostPort="${jetty.port:}">
    <core name="development" instanceDir="." dataDir="development/data"/>
    <core name="test"        instanceDir="." dataDir="test/data"/>
    <core name="production"  instanceDir="." dataDir="production/data"/>

And finally, after all this, you should be able to restart Tomcat and Sunspot and Rails should be able to operate against this stand-alone instance of Tomcat. You can check that the cores are configured correctly by looking at the Solr Admin UI at http://[your Tomcat host]:[your Tomcat port]/solr/#/~cores/. You should see your configured cores in the list on the left.

Two final pieces of advice:

— Make sure your Rails config/sunspot.yml port definitions match the configured TCP port of Tomcat. Sunspot defaults to port 8983 for production (and 8981 for development and 8982 for test), so be sure to either configure Tomcat to listen on these ports or change the port definitions in sunspot.yml to match Tomcat’s TCP listen port (which defaults to 8080).

— If you see HTTP 404 errors from Tomcat in your Rails log that mention “The requested resource is not available”, then your production SolrCore is probably not configured correctly. Check the URI that Rails is hitting on Tomcat and see if it matches your SolrCore config.

Hope this helps.

Nginx Reverse Proxy Config for Rails 4 ActionController::Live Streaming

I have been working with streaming data sources on my current project and I needed to implement a way to stream the contents of Apache Kafka topics over HTTP with 1:N connection fanout. As a quick test, I implemented an ActionController::Live streaming controller in Rails 4, and I ran into some trouble with my Nginx reverse-proxy config for Rails. Specifically, the streaming response from the upstream Rails application would prematurely terminate, closing the downstream Nginx connection to the client and terminating the streaming response. I had to do a lot of searching and trial and error testing to find a working Nginx reverse-proxy config for Rails live streaming, so I thought I would capture it here to save others the trouble.

First, some version info:

  • Nginx: 1.5.8 (testing on both Mac OS X and CentOS)
  • Rails: 4.0.2
  • ruby 2.0.0-p353

As a simple test, I implemented the following action using ActionController::Live streaming. Note the use of the X-Accel-Buffering HTTP header to inform Nginx to turn off proxy buffering for requests to this action. Alternatively, you can uncomment proxy_buffering in the nginx location config to eliminate buffering for all Rails requests, but it is probably safer to selectively disable response buffering via the header.

class OutboundsController < ApplicationController
  include ActionController::Live

  def consumer
    response.headers['Content-Type'] = 'application/json'
    response.headers['X-Accel-Buffering'] = 'no'

    10.times { |i|
      hash = { id: "#{i}", message: "this is a test, message id #{i}" } "#{JSON.generate(hash)}\n"
      sleep 1


The Nginx reverse-proxy config that finally worked for me is as follows. I’m only listing the bits I added to my otherwise “standard” nginx setup.

http {


  map $http_upgrade $connection_upgrade {
    default Upgrade;
    '' close;



upstream rails {
  server localhost:3000;

server {


  location / {
    proxy_pass http://rails;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    #proxy_buffering off; # using X-Accel-Buffering: off in ActionController:Live



With this config, I get the expected output — {“id”:”0″,”message”:”this is a test, message id 0″}, one per second, with incrementing IDs — streamed all the way through to my client.

One other note: if you are using curl as your client, be sure to set --no-buffer to avoid any client-induced delays in streaming data arrival.

That’s it. Hope this helps.

The Myth of Internet Privacy and Security

I have been following the massive coverage of the Snowden affair with detached professional interest as someone with some Internet security experience. I recently read Bruce Schneier’s call to arms to the technical community to take back the Internet, and it got me thinking about what we have lost and what we actually had in the first place. But first, here’s the link to Schneier’s post. It’s worth a read:

The thought that occurred to me is that the collective outrage over the alleged invasions of our privacy obscures a fundamental question about the nature of the Internet: did Internet privacy ever exist? This question is orthogonal to the legal and constitutional aspects of the issue, and goes to the heart of why many services on the Internet are free to end users.

Think about Internet privacy for a moment. Go back to your earliest experience with the internet – email, a web page, perhaps a search engine – and ask yourself if you ever considered if what you were doing was (a) secure, and (b) private. Consider an email service. There is an assumption that your emails on a service like Gmail or Hotmail are private correspondence, and that they are being stored and treated as protected communication under the Electronic Communication Privacy Act and in the context of the Fourth Amendment. However, from a hacker’s perspective the reality is that you are using a free service (unless you are one of the few who pay…more on paid services later) and as such you are not giving the service provider “consideration” (e.g. money) to maintain any specific quality of service, including the privacy and security of your emails. Mail service providers make money as a side effect, usually via advertising or by providing additional services linked to your use of an email service, and none of the mail service providers make any specific guarantees as to the security or privacy of their services. To do so would be to expose them to liability and liquidated damages in the event that security or privacy were breached. The paying customers of an email service are advertisers, not people who send and receive email.

This perspective applies equally well to search engines. Google did not build an intergalactic search engine at enormous cost as an act of altruism; instead, they provide a free search service in exchange for collecting and retaining a large set of information about your interests (your searches) and your location (the latitude and longitude of a device that presents a search request). They use this information to target and sell advertising, and this is an extremely lucrative business for them. There is an assumption that Google will not use the information it collects about you to do evil, but Google does not make any specific guarantees that would accrue liability in the event that the data were misused. Again, the paying customers of Google are advertisers, not people who search the Internet.

And now extend this perspective to social networking, where you literally tell Twitter and Facebook exactly what you are doing (tweets and status updates), who your friends are (your social network), what you see (your photo collection), and what you like (your like votes). Again, there is an assumption of privacy and security here, but these are free services, and free services rarely, if ever, make guarantees about the quality of their service. In fact, there are a variety of quotes attributed to leaders of social networking companies in which they state (paraphrasing) that users should share more information, rather than less, and that people with nothing to hide should have nothing to fear.


The point, I guess, is that we shouldn’t be surprised to hear that the free services we use on the Internet are not as private nor as secure as we assumed they were. We are not the paying customers of the Internet services of Google, Microsoft, Facebook, Twitter, Yahoo!, et al., yet somehow we expected that they would stand up to our government and act in our best interests when our personal interests came into conflict with their commercial interests, responsibilities and obligations. The assumption of security and privacy on the Internet is a myth that has been busted, and the surprise is that it took so long for us to figure it out.

Optimistically, we can hope that Schneier’s call to arms will lead to a new generation of Internet services that are designed to guarantee privacy and security with sophisticated new cryptographic technologies. This will become increasingly necessary as each week we hear about yet another hacked Internet security technology – like the alleged SSL hacks that render browser encryption ineffective reported by the New York Times here:

Maybe part of the solution is that we will have to start paying for Internet services that we have become accustomed to receiving for free, so that the commercial interests of the Internet service companies can be aligned with our personal interests as users of the services. Or maybe we will each have to decide whether we value free Internet services more than the privacy of our information and communication. But one thing is certain: privacy and security on the Internet is fundamentally broken.

libkafka – a C++ client library for Apache Kafka v0.8+

I had a need for a C++ client API for Apache Kafka with support for the new v0.8 wire protocol, and I didn’t see one out in the wild yet. So I found some time and wrote one. You can find it here on github:

Released under an Apache 2.0 license as open source from my current employer (Adobe Research). I plan to provide ongoing support for this library.,

Siri – a Shot Across Google’s Bow

I had a chance to play with Siri on my wife’s iPhone 4S and after the novelty wore off I was left with a very strong feeling that Siri will usher in a paradigm shift in how we search the internet — both semantically and economically. And this can only be bad news for Google, and, to a much lesser extent, Microsoft.

Prior to Siri (and, to be fair, prior to the existing voice recognition tools on Android devices), internet search meant a text box somewhere — on your desktop, laptop, or mobile device — into which you would type your search terms and await a response. The response usually took the form of a web page with a list of matches and some targeted advertising to pay the bills. Many companies have operated in the internet search space over the past 15 years, but Google now unquestionably owns this space (or I should say: Google has unquestionably owned this space until now). It is worth noting that every Apple iPhone and iPad sold thus far uses Google as the default search engine powering its search text boxes, and therefore Apple’s very large customer base could be counted on to provide a steady stream of search traffic to Google. Enter Siri – the new gatekeeper of search. Now, if you want to do a search on your iPhone 4S you speak your request and Siri decides how to respond, collating data from a variety of sources (which might or might not include Google). The response still looks like a list, but it is a list served up by Apple and any advertising that might be associated with those results comes from Apple. With Siri, Apple has entered the search engine business and they pose an existential threat to Google’s (and Bing’s) multi-billion-dollar search businesses because they are vertically integrated. This threat is very similar to the threat that Microsoft created (and used to massive effect) against Netscape by vertically integrating Internet Explorer with Windows — the gatekeeper controls access and, ultimately, the market.

Microsoft has long claimed that Google’s dominant market share gives them an increasing advantage because they see a more comprehensive sample of search requests and therefore they can design better algorithms based on these inputs. This same advantage is likely to accrue to Apple as Siri blazes a trail into the voice recognition semantic search space: Apple will be in possession of a more comprehensive sample of voice requests and the quality of Siri relative to any competitive offering from Google or Microsoft will continue to improve. Which can only be more bad news for Google.

More and more sources are catching on to the threat that Siri poses to Google. One of the more cogent ones is here:

Interesting times.

“Is colocation cheaper than using a cloud computing service to run the same workload?”

This quote comes from an analysis of the costs of cloud-based computing vs. traditional colocation as a function of the work load and duty cycle. This type of analysis is increasingly germane for companies that are looking to make a transition to cloud-based service providers in the hope that it will allow them to lower their overall IT costs. The results, while not surprising, do raise some interesting points. First, here’s the link:

The crux of the argument here is the concept of the duty cycle. The duty cycle for a deployed application is the percentage of available hardware resources that are being consumed at any given moment in time. Intuitively, a higher duty cycle corresponds to more efficient and cost-effective use of the underlying hardware. One of the fundamental promises of cloud computing is that it will allow you to run applications in a way that will produce a much higher duty cycle through elasticity, i.e. idle resources can be released without impacting the ability to scale back up in the future.

The analysis includes a spreadsheet with some hard data for one specific style of application, and the result is that an equivalent workload for this style of application would cost $118,248 at Amazon and $70,079 in a colocation facility, with the implication that a higher duty cycle can be achieved via colocation. However, this result is not as clear cut as it might seem, owing to the fact that the “application style” is an extremely subjective and important attribute for any given application. In my experience, it is rare to find an application that can be characterized in this way; instead, most applications that I have run across are inherently custom in some important way. I would think that this analysis would need to be performed on a case-by-case basis for any application that is considering a move to the cloud, and the specific result of that analysis would apply only to that application.

A subtle conclusion of this type of analysis is that duty cycle optimization for a given application should be a key criterion for cost reduction. And, somewhat conversely, if an application already has a high duty cycle then the opportunities for cost reduction through cloud-based resources will be limited at best. Or, more simply: if you already run highly-utilized servers then you might do better with collocation.

Hope this helps.

“How Big is Amazon’s Cloud Computing Business?”

Everyone seems to think that Amazon’s web services business (a.k.a. EC2, S3, and the rest of AWS) is very big and getting bigger, but Amazon stubbornly refuses to break out the AWS contribution to Amazon’s earnings. A recent blog post on GigaOm is the first that I have seen that includes some real data — both for Amazon and for the total accessible cloud services market — to estimate the size of the AWS business today and the size of the market going forward. First, here’s the link:

The data comes from a UBS Investment Research paper, and it estimates that AWS is a $500M business in 2010. It further estimates that AWS will grow to $750M in 2011 — 50% year over year growth — reaching $2.5B, with a B, by 2014. Amazon as a whole does roughly $25B in revenue, growing in the 30%-40% range over the past year, so the numbers for AWS, while still small, are a high-growth component of Amazon’s overall business. Add in the fact the the UBS paper reports the gross margin of AWS at around 50%, vs. around 23% for Amazon as a whole, and one might draw the conclusion that the profit contribution of AWS will be a growing and very significant piece of Amazon’s pie in the years to come.

The question I have is this: Why aren’t more internet companies doing the same thing? Amazon’s results are a clear and undeniable validation of their AWS business strategy. That strategy, in a generic sense, was to build a very efficient cloud infrastructure for their own retail applications, and sell the excess capacity to the general public. They have proved that there is demand for the excess capacity and the service APIs that they provide for that capacity. And their list of customers has slowly transformed from penny-pinching startups and early adopters to a who’s who list of the largest, richest Fortune 500 companies in every business domain. And don’t forget that AWS has data centers around the world, and there is every reason to think that the demand for AWS from foreign companies will mirror growth in the US.

I can think of a bunch of older, larger internet companies that should definitely be trying to duplicate Amazon’s success. Some of them have already tried, albeit with a slightly different level of offering (e.g. Google’s App Engine, Microsoft’s Azure). But the barrier to entry, such as it is, requires only a large number of machines and the will to build the necessary cloud infrastructure systems. I’m sure someone will call BS on this statement and tell me that some serious skill is also required, and I would agree with that. But we live in a time when skill moves around a lot, and no company has a monopoly on talent. So why isn’t everybody trying to copy AWS?

“Android will run majority of smartphones by Spring”

This latest quote comes from Adobe Systems’ CTO Kevin Lynch during an interview with Fortune. It’s actually an approximation of what he said, but I think it captures his intent, which is to suggest that Android will shortly become the dominant smartphone operating system, and to imply that Adobe — with its close partnership with Google surrounding Flash — is not quite as dead on smart devices as we have been lead to believe. More on that shortly.

But first, here is the link:

The article contains a set of statements attributed to Lynch in which he states that he expects Android to achieve a 50% share “in the springtime”. If true, this implies a meteoric rise in Android adoption from 3% in late 2009, to 26% in late 2010, to 50% by early 2011. The sheer number of devices that can potentially run Android along with the large number of wireless carriers that support it will give it a huge advantage over Apple and iOS in terms of growth and market share. This numerical superiority could justify Lynch’s prediction, but it does not imply that customers are choosing Android because of the quality of the experience. This is a key point, in my opinion, because Apple can close the proliferation gap — via a CDMA-based iPhone for Verizon, perhaps — and after they do so then market share will more closely track the customer experience. This is where the tightly controlled and polished iOS and iTunes App Store experience will continue to shine, and this is where Android will continue to deal with fragmentation issues in their device base that will affect the customer experience. What fragmentation issues, you ask?

A quick look at the Android Developers Site gives us the latest data on which versions of the Android OS are currently in use across the global Android device base. As of November 1, 2010, they are:

Android 1.5: 7.9%
Android 1.6: 15.0%
Android 2.1: 40.8%
Android 2.2: 36.2%
android versions

This data indicates that the Android device base is fragmented by OS version into three large buckets: obsolete versions (both 1.5 and 1.6) at 23%; version 2.1 at 40.8%; and version 2.2 at 36.2%. This presents a challenge, to say the least, to Android developers who wish to write apps that will run on any Android device, and this will certainly affect the quality of the overall Android experience. And, getting back to Adobe, it is worth noting that only Android 2.2 — at 36.2% — supports Adobe Flash 10.1 for a true Flash experience on a mobile device. Collectively, devices running any version of Android might achieve a 50% market share as Lynch suggests, but that metric does not appear to be an apples-to-apples comparison with Apple’s iOS because of this fragmentation. Apple does not report iOS version percentages as Android does, but an educated guess is that a much higher percentage of Apple devices are running the latest version of iOS due to the relative ease of the iOS upgrade experience via iTunes.

Which leads me back to Lynch’s implication that Adobe is well-positioned via its partnership with Google to capitalize on Android’s market share. The data above shows that only a minority of Android devices (36.2%) can run Flash 10.1 today. Unless this percentage rises significantly in the next six months, Flash adoption on Android devices will not keep pace with Android’s growth rate. To be fair, new growth in the Android device base should be almost exclusively newer devices running the latest version of Android, and this should serve to increase the overall percentage of Android devices that can run Flash 10.1. But for now, Adobe and Flash appear to be chasing an Android market that is accelerating away from them.

Amazon EC2 I/O Performance: Local Ephemeral Disks vs. RAID 0 Striped EBS Volumes

I recently ran into an issue with I/O bandwidth in EC2 that produced some unexpected results. As part of my day job, I built a system to run clustered deployments of enterprise software within AWS. The enterprise software I chose for the prototype is, as it turns out, very sensitive to I/O bandwidth, and my first attempt at a clustered deployment using m1.large instances with only ephemeral disks did not provide enough I/O performance for the application to operate. After a few internet searches, I was convinced that I needed to use EBS volumes, probably in a RAID 0 striped volume configuration, to effectively increase the I/O performance through parallelism of the network volumes. We also got some guidance from Amazon to this effect, so it appeared to be, by far, our best bet to solve the problem.

So I converted my prototype to use RAID 0 4X-striped EBS volumes. And the I/O performance was different, but still not enough for the application. Uh-oh.

To better understand the problem, we decided that the next step would be to quantify the I/O performance of the different volume options, in the hope that we could ascertain how to tweak the configuration to get enough performance. A colleague of mine ran a variety of tests with iometer and produced the following data that captures read and write performance for six different volume configurations: a single EBS volume, a 4X EBS striped volume, a 6X EBS striped volume, an m1.large local ephemeral drive, an m1.xlarge local ephemeral drive, and “equivalent hardware” which, roughly speaking, corresponds to a server box that should be similar to a large EC2 instance in terms of performance and which is not heavily optimized for I/O. Some of these choices deserve some explanation. We looked at both m1.large and m1.xlarge instances because we were told that the m1.xlarge instances had a larger slice of I/O bandwidth from the underlying hardware, and we wanted to see how much larger that slice actually is, given the cost difference. The EBS-based volumes ran on m2.large instances, as we were told that these instances had a larger slice of real I/O bandwidth as well as a larger network bandwidth slice that would better support the EBS volumes.

So here is the hard data:

A few things jumped out at us when we saw these numbers:

  • The performance of the 6X EBS striped volume was significantly worse than the 4X EBS striped volume. Going from 4X to 6X must saturate something that effectively causes contention and reduced performance. Some information online suggests that EBS network performance can be quite variable, but we got consistent results over several runs.
  • The EBS-based volumes all had better read performance than the local ephemeral disk-based volumes in both IOPS and Mbps. The local ephemeral disk-based volumes had much better write performance in IOPS, and similar write performance in Mbps.
  • Neither the EBS-based volumes nor the local ephemeral disk-based volumes could stand up to a real piece of dedicated hardware. Which is reasonable, given what EC2 is. But which also implies that it is not a good idea to run I/O intensive operations in EC2. And hardware optimized for I/O would probably blow away the performance of our unoptimized “equivalent hardware” box.

Hopefully this data can point you in the right direction if you are dealing with I/O issues in EC2. We’re hoping for a little magic from Amazon on this issue later this year (and I can’t say anything more about that). In the interim, I/O issues will continue to require some trial and error to find a configuration that best matches your particular I/O profile.

Hope this helps.