amazon
How to Create an Amazon EC2 AMI That is Larger Than 10GB
Recently, I have been dealing with an issue surrounding the 10GB size limit for AMIs within Amazon’s EC2 service. If you don’t what I’m talking about, here is a quick primer: a virtual instance running within Amazon’s Elastic Compute Cloud (EC2) service is launched from a read-only boot image that Amazon refers to as an Amazon Machine Image (AMI); Amazon has set the upper size limit for an AMI to be 10GB, and this restricts the amount of disk content that can be loaded on to the instance at boot. For a Windows-based EC2 instance, the 10GB AMI corresponds to the C:\ drive containing Windows; for a Linux-based instance, the 10GB AMI corresponds to the boot partition containing Linux. EC2 instances have several larger, ephemeral drives with capacities far in excess of 10GB, but those ephemeral drives have no persistence, and they will be empty when an EC2 instance boots. Amazon also has a service called the Elastic Block Store (EBS) that functions like a network mounted file system from a storage area network (but for various reasons EBS was not a feasible solution for my problem).
The problem I faced was that I needed about 16GB of data to be available on an EC2 instance at boot, and I needed it to otherwise operate like a standard instance launched from an AMI. It would be great if I could simply use a 16GB AMI, but Amazon does not permit this due to the 10GB size constraint. I was obviously going to need an alternate mechanism to load additional data on to the ephemeral drives at boot time.
My solution is ultimately derived from the same mechanism that Amazon uses to load an AMI at boot time. AMIs in EC2 are stored in Amazon’s Simple Storage Service (S3). When an instance is started in EC2, the AMI is loaded from S3 into the Xen domain that EC2 has provisioned for the instance (Xen is the open source virtualization software that is at the heart of Amazon’s EC2 service). I decided to take the same approach to populate the ephemeral drives at boot time. Specifically, I store a compressed archive in S3 that is downloaded and inflated on the first ephemeral drive in order to populate the instance with the additional content. The procedure to download the compressed archive from S3 and inflate it in the proper places is scripted and connected to the boot sequence (it’s a Windows service on Windows, and it is linked into the rc startup script mechanism on Linux).
The only issue I have found with this approach is latency. It takes a non-negligible amount of time to download and inflate several GB of data from S3, and this is all happening after the operating system boot has initiated. Amazon provides no quantitative guarantees about the network bandwidth that a given instance will be able to use, so the amount of time that the download will take is dependent on a variety of factors that are out of our control. In experiments I have measured download speeds from S3 to an EC2 instance to be in the range of 15 MB/sec to 25 MB/sec (those units are megabytes per second), so if you downloading several GB of data to your instance via this method then you can expect a delay of several minutes before the ephemeral drives are populated and available. This might or might not be a problem, depending on what else is starting on your instance immediately after boot. In my case, an application is starting that will take up to 10 minutes to start, so I have plenty of time to populate the ephemeral drives. If you are starting up instances to add to a cluster in response to load, and you need the additional cluster capacity as soon as possible, then this method is likely not for you. But in either case it is important to keep in mind that the startup latency will be directly proportional to the size of the additional content.
Hope this helps.
Enterprise Cloud Computing – What is it, exactly?
I am often asked to define cloud computing. The overwhelming marketing hype attached to the term at the moment has obscured the benefits, in both technology and economics, that are available if you look closely. My typical response goes something like this: cloud computing does not allow you to do anything you could not do before, but it does allow you to do those things faster, with better response times, and with a different and mostly better cost model. These benefits, when applied to enterprise software deployments, make the deployment and maintenance of enterprise software faster, easier, and less expensive.
For example, a cloud-based virtualization service like Amazon’s Elastic Compute Cloud (EC2) gives you virtual hardware instances that can be provisioned via an API. The virtual hardware, in and of itself, is no different than real hardware running in a data center. However, a new virtual instance in EC2 can be started in minutes, and a new box running in a data center would need to be purchased, installed, and configured over a period that is measured in hours, if not days. The reduction of the capacity provisioning response time to minutes has a profound effect on the ability of an application to scale in response to load, since new capacity can be called up in response to load, rather than in anticipation of it. And this benefit is not limited to cloud-based virtualization services alone; the collection of cloud based services for data, messaging, and applications all benefit from the ability to respond quickly to capacity scaling requests. This is one of the key benefits of cloud computing relative to traditional deployment approaches based on actual hardware.
In economic terms, the value proposition of cloud computing resides in its cost model. Cloud-based services, and cloud-based virtualization services in particular, have adopted a pay-as-you-go pricing model without initial costs. This is fundamentally different from the traditional enterprise software cost model in which the majority of costs are incurred immediately prior to deployment, and it changes the causality relationship between cost and revenue for enterprise application deployments. In a cloud-based service, cost trails revenue, and cost increases only in response to increasing load, rather than in advance of it. Additionally, cloud-based services can scale down in response to decreasing load, and this provides additional cost benefits that are simply unattainable in the traditional enterprise deployment model. The economic benefits of this pricing model will likely become an irresistible force in the marketplace for enterprise applications.
Ephemeral Drives in Amazon EC2 – When Are They Mounted?
Virtual instances running in Amazon’s EC2 service have several ephemeral disk drives that can be used for temporary storage (temporary because they are not persisted as part of the AMI). Recently, I had to figure out exactly when those drives were mounted and available during boot. The specific issue I was seeing was that I had registered some services to start automatically during boot, and those services started software packages that relied upon the ephemeral drives. This is on Windows 2003 Server, by the way; this is a non-issue on Linux, where mounted drives precede the init sequence for application level processes.
Through some trial and error (and I’ll abridge the details here), I was able to determine that the ephemeral drives are ready in all respects after the following two services have started: Ec2Config and Virtual Disk Service (vds). It was a simple matter of creating service dependencies for my registered services to ensure that they started after Ec2Config and VDS were started, and that fixed the glitch. I was using cygwin so I was able to use the cygrunsrv command to create the dependencies (via the --dep argument). People with more Windows kung fu would probably use regedit to do the same thing.
Hope this helps.
Subscribe
Recent Posts
- Observed Performance of Amazon EC2 Instances
- Cloud Computing and Mobile Devices
- Time and Clock Issues in Windows-Based EC2 Instances
- My experimental local and real-time search engine is now available
- Entropy in Cloud Computing Applications
- How to Jailbreak iPhone 3.01
- How to Detect the Front (Home) Page of a Wordpress Blog
- How to Create an Amazon EC2 AMI That is Larger Than 10GB
- Perl DBI and DBD::mysql on Cygwin — Connecting to a Native Windows Build of MySQL on a Windows 2003 AMI Within Amazon EC2
- Enterprise Cloud Computing – What is it, exactly?
- Ephemeral Drives in Amazon EC2 – When Are They Mounted?
- Cygwin Lighttpd with SSL
- Security for Cloud-based Enterprise Applications
- Cygwin SSHd on a Windows 2003 AMI Within Amazon EC2
- My experimental search engine is now available in 10 languages…
Posts