I recently ran into an issue with I/O bandwidth in EC2 that produced some unexpected results. As part of my day job, I built a system to run clustered deployments of enterprise software within AWS. The enterprise software I chose for the prototype is, as it turns out, very sensitive to I/O bandwidth, and my first attempt at a clustered deployment using m1.large instances with only ephemeral disks did not provide enough I/O performance for the application to operate. After a few internet searches, I was convinced that I needed to use EBS volumes, probably in a RAID 0 striped volume configuration, to effectively increase the I/O performance through parallelism of the network volumes. We also got some guidance from Amazon to this effect, so it appeared to be, by far, our best bet to solve the problem.
So I converted my prototype to use RAID 0 4X-striped EBS volumes. And the I/O performance was different, but still not enough for the application. Uh-oh.
To better understand the problem, we decided that the next step would be to quantify the I/O performance of the different volume options, in the hope that we could ascertain how to tweak the configuration to get enough performance. A colleague of mine ran a variety of tests with iometer and produced the following data that captures read and write performance for six different volume configurations: a single EBS volume, a 4X EBS striped volume, a 6X EBS striped volume, an m1.large local ephemeral drive, an m1.xlarge local ephemeral drive, and “equivalent hardware” which, roughly speaking, corresponds to a server box that should be similar to a large EC2 instance in terms of performance and which is not heavily optimized for I/O. Some of these choices deserve some explanation. We looked at both m1.large and m1.xlarge instances because we were told that the m1.xlarge instances had a larger slice of I/O bandwidth from the underlying hardware, and we wanted to see how much larger that slice actually is, given the cost difference. The EBS-based volumes ran on m2.large instances, as we were told that these instances had a larger slice of real I/O bandwidth as well as a larger network bandwidth slice that would better support the EBS volumes.
So here is the hard data:
A few things jumped out at us when we saw these numbers:
- The performance of the 6X EBS striped volume was significantly worse than the 4X EBS striped volume. Going from 4X to 6X must saturate something that effectively causes contention and reduced performance. Some information online suggests that EBS network performance can be quite variable, but we got consistent results over several runs.
- The EBS-based volumes all had better read performance than the local ephemeral disk-based volumes in both IOPS and Mbps. The local ephemeral disk-based volumes had much better write performance in IOPS, and similar write performance in Mbps.
- Neither the EBS-based volumes nor the local ephemeral disk-based volumes could stand up to a real piece of dedicated hardware. Which is reasonable, given what EC2 is. But which also implies that it is not a good idea to run I/O intensive operations in EC2. And hardware optimized for I/O would probably blow away the performance of our unoptimized “equivalent hardware” box.
Hopefully this data can point you in the right direction if you are dealing with I/O issues in EC2. We’re hoping for a little magic from Amazon on this issue later this year (and I can’t say anything more about that). In the interim, I/O issues will continue to require some trial and error to find a configuration that best matches your particular I/O profile.
Hope this helps.