“We create 5 exabytes every two days.”

This quote comes to us courtesy of Google’s Eric Schmidt, who was offering his top 10 reasons why mobile is #1 in the following InformationWeek article:

http://www.informationweek.com/news/global-cio/interviews/showArticle.jhtml?articleID=224400178

To paraphrase the context of this quote, all of the information created between the beginning of time and 2003 was about 5 EB, give or take (exabyte – 1018 bytes, or 1 Billion GB ). But today, according to him, “we” create 5 EB every 2 days, and it is not clear if this is the majestic plural “we”, as in Google, or “we” as in the entire world. But 5 EB is an awful lot of data (those drunken cat videos on YouTube are really starting to add up…).

Which brings me to my point: the signal-to-noise ratio of that 5 EB is probably low and getting lower every day. And I think we’re starting to get some hard data to prove it. For example, at Chirp, Twitter founder Biz Stone mentioned that Twitter has 105M+ registered users, with 300k new users each day, and 3B+ requests to their API each day. That is a lot of users and a lot of traffic. But how many of those users are active? 20%? 80%? According to the following article, 5% of those Twitter users are responsible for 75% of all traffic, with a rapid drop off in activity beyond 5%:

http://www.readwriteweb.com/archives/twitters_most_active_users_bots_dogs_and_tila_tequila.php

If the premise of this article is true, then only about 20% of Twitter users are active in a measurable way. And the most active 5% are clogging the pipes with an enormous volume of Tweets — from which we might infer that the signal-to-noise ratio of those Tweets is not very high. Ask yourself this question: what percentage of Tweets that you see are relevant to you in any way? These numbers will only get worse, information-density-wise, as more people sign up for Twitter but otherwise don’t use it in a measurable (or relevant) way.

But back to the original quote. If Google is processing 5 EB of new data every two days, then they definitely need to keep building new data centers in the Columbia River gorge and elsewhere where power is inexpensive. But perhaps what they really need is better filtering technology to drop the 80% of that 5 EB that they don’t really need to keep. Maybe Google needs to keep 5 EB every 2 days to ensure that the 1 EB of good data buried within is not lost, but if they could separate the wheat from the chaff, then they could (1) build fewer data centers, which their investors would really like, and (2) deliver information to us with a much higher signal-to-noise ratio, which we would really, really like.

Google made a name for itself by making search results more relevant than any company that came before them; we can only hope that they (and Twitter, and Facebook, et al.) will continue to pursue information relevance with similar zeal as they process those 5 EB every two days. Otherwise, we should all plan for a whole lot more drunken cat videos.

“Go Screw Yourself, Apple.”

First, let me state for the record that I am currently an Adobe employee and that the opinions stated here are entirely my own and not in any way the official position of Adobe. And also let me state that the quote above is not from me (more on that later). And let me state that I am drinking beer right now. But I am also a hardcore developer, and I’m having a really, really hard time digesting the latest directives coming out of Cupertino. If I understand the new rules for the iPhone 4.0 SDK, I can only write code in C, C++, and Objective C if I intend to compile that code, using any compiler, into an executable that runs on an iPad or iPhone. What. The. F.?

Before I launch this rant into orbit, here is the link to the post that gives us that priceless quote:

http://theflashblog.com/?p=1888

Believe it or not, the quote is actually from an Adobe employee (not me) in a blog post. I’m sure the author got some immediate love from Adobe Legal, so I am going to attempt to avoid that situation and I will not comment directly on the situation involving the new iPhone 4.0 SDK rules, Flash, and CS5. Instead, I’m going to step back and ask the real question that should be on everyone’s minds:

“Where have you gone, Joe DiMaggio Apple?”

Apple used to be the company that everyone loved. The underdog company in the 1990’s that fought the good fight against Microsoft and largely lost. The company that took on the music labels and finally, finally gave us a way to purchase and play digital music that made sense with the iPod and iTunes. The company that made laptops cool again with MacBooks and gave all of us hackers in the corporate world a way to use a non-Windows machine. The company that re-invented the smart phone with the iPhone and broke the wireless companies’ stranglehold on third-party applications on mobile devices with the AppStore. And, for better or worse, the company that gave us the iPad, which might just have a shot at initiating a paradigm shift for how we interface and use our personal computers.

But in the past year or two, Apple seems to have been tempted by the dark side of the force, and their response is looking more like Anakin than Luke. Which is another way to say that they are starting to look a lot like Microsoft in the 1990’s, a.k.a. the company that we all loved to hate. They launched the iPhone with Google’s CEO on stage, for crying out loud, but now they have a proxy suit against Google and Android via the lawsuit against HTC. The widespread complaints about iPhone app rejections, and the complete lack of transparency, and perceived capriciousness, of their criteria for acceptance. Their intransigent position about Flash and any technology that has the potential to bypass the AppStore business model. And now the latest rules, which dictate which languages are blessed for use when developing for the iPhone/iPad platform, with the severe implication that applications that were written in any other programming language — e.g. Java, or perhaps ActionScript — will be rejected from the AppStore.

Apple has built up a huge account balance of goodwill over the last decade, and they appear to have decided that it is time to empty that account. To be fair, Apple’s official statements seem to indicate that they believe that their inventions are being infringed (in the case of the HTC lawsuit); that they believe the AppStore requires a high bar for acceptance to maintain the quality of the experience on Apple devices; that they believe Flash is a security nightmare and CPU pigdog (to paraphrase some of the actual quotes that have been attributed to Apple’s CEO in the media); and that they believe that allowing people to compile their runtime interpreters into native applications is somehow an end-around move on their AppStore business model. But whichever side you take in these disputes, I think everyone has to accept that the tone of the conversation has irreversibly changed; with revenue comes arrogance, as they say in Silicon Valley (and probably everywhere else in the business world). The friendly days when Apple was a darling underdog and partner are gone forever, and we should get used to dealing with the 220 Billion Dollar Monster.

As a developer, I couldn’t be more turned off by a company that tells me which languages I can and can’t use. I want the freedom to choose the language that makes the most sense, or maybe seems the most fun to use, or maybe has the most available libraries for what I want to do, rather than being forced to live by someone else’s language laws. And I might just want to port some older code, rather than re-writing it from scratch in a “certified” language that is permitted by the device vendor. At the end of the day it is all object code running on the device, and the programming language that I used to capture my ideas before compiling it to object code isn’t really important. Unless you have a business model that you perceive to be at risk from certain programming languages, in which case it is very important…

So how does this end? I think it can only end badly for everyone. Some number of cool apps will never make it to certain devices under these rules. Some developers will be turned off and will leave this particular ecosystem for less-restrictive ones. And most importantly, all of the big companies will have their shields up and proton torpedoes armed and they will stop collaborating on the things that only big companies can do together. For example, remember how cool it was when the iPhone shipped with Google Maps and location awareness? Some might say that was revolutionary all by itself.

But I think this week was a very bad week for us all, because it clearly marks the end of any effective inter-corporate collaboration in the emerging mobile device universe.

“You are not Google. (or: you don’t really need NoSQL…)”

It’s great to see a thoughtful articulation of the other side of the “everybody needs to dump their SQL database” argument in this blog post:

http://teddziuba.com/2010/03/i-cant-wait-for-nosql-to-die.html

To paraphrase the author’s argument, the vast majority of applications out there will simply never see the load that would require a move to a NoSQL solution. Thinking about scalability is a very good thing to do, but the choice to make the move still needs to flow from a rational decision making process.

I see this all the time in discussions I have about scalability and cloud computing. Everyone wants to claim that they are scalable (because scalable == smart in the current lexicon), but they aren’t backing this up with a rational basis for why their (relatively small) application needs massive, search-engine-class scalability.

“The largest cloud providers are botnets.”

I ran across an interesting article that compares the largest of the botnets — the Conficker botnet — with the largest web application providers and the size of their infrastructure. The article uses the term “cloud” in a way that I don’t necessarily agree with, since they use it to describe the overall size of a company’s infrastructure (e.g. according to the article, Google has 500,000 servers) rather than the size of that company’s infrastructure that is available for use as part of a genuine cloud-based service. But, overall, the article does illuminate the fact that botnets have an architecture that is very similar to that of cloud-based service providers, leveraging a virtual datacenter comprised of compromised hardware around the globe. And the sheer size of the Conficker botnet, which, according the article, is over ten times the size of the largest web application provider, would make it, by far, the largest cloud provider in the world by an order of magnitude.

The article can be found here: http://www.networkworld.com/community/node/58829

This article is also noteworthy, I think, because it includes some sizing data for the largest players in the industry. If true, this data represents one of the first genuine, accurate comparisons of the infrastructure behind the biggest names on the web.

Observed Performance of Amazon EC2 Instances

A thread has been emerging surrounding the observed performance of EC2 instances and the possibility that Amazon is experiencing capacity issues as their business continues to grow. Three excellent articles on this topic are linked below:

http://www.datacenterknowledge.com/archives/2010/01/14/amazon-we-dont-have-cloud-capacity-issues/

http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm

https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/

This is a question that I receive often in my day job, so I have a few comments to add to this thread. First, if you have used EC2 then you know that Amazon explicitly refuses to quantify the performance that you are entitled to receive in real-world units. Instead, they have created qualitative terms — “compute units” for CPU performance, “moderate” bandwidth, etc. — that provide a measure of comparison against the other levels of services that Amazon provides. In and of itself, these qualitative designators are not a problem, except when trying to determine the potential variance between two elements that should have the same qualitative performance. For example, any two “large” EC2 instances running the same operating system should, in theory, provide the same measurable performance against a reasonable benchmark. In practice, however, there is a degree of variance between elements that should be the same; testing across a sample of “large” EC2 instances resulted in variations in the results of the SPECjvm2008 benchmark of ~30%. This would seem to indicate that some “large” instances are better than others.

Second, if you have ever asked Amazon any question about their existing capacity, their existing utilization, or their rate of capacity growth, then you probably received a polite “we don’t break out results or data for our AWS unit” response. But there are some hard data points then can be detected externally. For example, if you have attempted to start a large number of EC2 instances simultaneously then you might have seen an error message stating that not enough instances of the requested type were available. In my experience, this response has been exceptionally rare, and it is a testament to Amazon’s capacity planning that this response is, in fact, very rare. It does provide a hard data point, however, to detect when EC2 becomes oversubscribed, and one would expect the frequency of this response to increase if EC2 were oversubscribed. In the thread thus far, I have not read that anyone has detected a measurable increase in the frequency of this response, and thus one might conclude that EC2 utilization continues to be below the saturation level, or at least it remains similar to historical levels.

And lastly, I do have one comment about network performance. I have observed network latency issues of the type described in the cloudkick blog, but only for smaller EC2 instance types that could be assumed to be sharing hardware and network interface ports with other EC2 instances. I have not observed latency issues with larger EC2 instance types that could be assumed to consume an entire piece of hardware. I am clearly making some unsubstantiated assumptions here, but my guess is that the observed network latency issues could be a sharing or starvation issue in the virtualization infrastructure, rather than a true network capacity problem. But I would stress that this is just an educated guess, so your mileage may vary.

Hope this helps.

Cloud Computing and Mobile Devices

The explosive proliferation of mobile devices — smartphones, netbooks, and tablets — presents new challenges for software development. These devices have limited screen size, limited CPU and memory resources, and most importantly, limited power; these constraints will complicate the direct migration of existing thick client desktop software products to these devices. Computationally expensive applications will be very sensitive to these constraints, given that most devices employ CPU throttling to conserve power and to increase longevity for other functions, and CPU use on these devices will need to be minimized wherever possible.

Future advances in technology may alleviate some of these concerns, but battery technology has traditionally failed to keep pace with Moore’s Law, especially in cases of miniaturization, and thus the power concerns, and by extension the CPU concerns, may persist.

Cloud computing provides a potential solution for these concerns. In terms of power consumption, cloud computing provides a source of remote CPU cycles that do not consume device power, and these remote CPU cycles can be used to enable computationally expensive applications to operate on devices with a significantly lower net device power cost. Network power consumption will be marginally increased when using cloud-based resources, but the research hypothesis is that overall device power consumption will be significantly reduced.

I submit that a new application paradigm for these devices will need to evolve from the seeds of cloud computing, web applications, and client/server software in order to minimize device power consumption while otherwise providing a recognizable application user interface. The tools and infrastructure for these applications must be designed to maintain a bidirectional stream of data, visuals, and interface actions between a device and a cloud-based application provider, and to do so in a manner that will be both cost effective and beneficial for the power consumption of the device.

Time and Clock Issues in Windows-Based EC2 Instances

I’ve recently observed some anomalies in Windows-based EC2 instances that I think are worth sharing. The primary issue appears to affect the clock setting on some of the instances, but my guess is that there is an underlying hardware-dependent bug in the virtualization layer that is the cause of this issue and some other related side-effects (more on those later).

I built and I continue to operate a public facing enterprise software-as-a-service (SaaS) product for my company. Behind the scenes, I run both Linux-based and Windows-based EC2 instances to facilitate the functions of the service, and at times we might run large numbers of EC2 instances concurrently. Lately, I’ve started to see some Windows-based instances that intermittently boot with an incorrect clock setting. The time zone appears to be incorrect, and the time on the box is UTC instead of PST. The time, in hours/minutes/seconds, appears to be otherwise correct, but it is off by eight hours due to the time zone. Ordinarily, I wouldn’t be too worried about this, but in my case I use the S3 API on the instance, and the S3 API calculates a security signature for all requests that incorporates the current time of the machine in the algorithm. If the time setting of the S3 caller differs by more than 15 minutes from the time setting of the S3 servers, then the request will be bounced by the server with the following error message:

The difference between the request time and the current time is too large

The failure of this S3 request is a major problem for my application, so I dug into this to see what was going on. As far as I can tell, Windows is failing to sync up with time.windows.com via NTP and the result of this error seemed to be the incorrect time zone setting (although I can’t tell you why exactly that error would result in that outcome). I switched to an alternate NTP server at nist.gov — time-nw.nist.gov — and this appears to resolve the issue.

Some further searching has led me to believe that there may be an underlying hardware-related bug in the virtualization layer that affects the ability of Windows to access the time.windows.com service, as well as affecting some other applications (e.g. Cygwin, and Bash and Perl running via Cygwin), and this bug is observed only on AMD hardware. This is just a guess, however, and I can not say conclusively that this is a hardware-related issue since there is no transparency into EC2. But if you are running an application that is sensitive to the clock setting on an EC2 instance, then you should pay attention to the time zone setting of your instances as they boot.

Hope this helps.

Entropy in Cloud Computing Applications

Entropy, as it pertains to computer science and cryptography, is one of those topics that most of us (myself included) largely take for granted these days. In this context, entropy is a source of pseudorandomness that is typically collected by the operating system and made available to applications via a pseudorandom number generator (PNRG). We tend to implicitly trust that our applications have a source of entropy that is sufficiently random to ensure that the strength of our cryptographic techniques — SSL handshakes, SSH keys, and the wide variety of other cryptographic techniques used in modern public-facing applications that rely upon a pseudorandom number generator — is as strong, algorithmically speaking, as expected. But what happens when that source of entropy is not as strong as we think it is?

An excellent case study of what happens when a source of entropy is not as random as expected can be found in the weakness that was introduced into the Debian Linux package of the OpenSSL library in May of 2008 (see http://www.debian.org/security/2008/dsa-1571). A change was made to a single line of code in the open source OpenSSL package in order to clean up the output of purify and valgrind as part of the build and test sequence. This minor change had a side effect; it caused the pseudorandom number generator within the OpenSSL library to be predictable because it tightly constrained the number of possible seed values that could be used. Consequently, any cryptographic keys generated using this source of entropy could be guessed within a relatively short period of time using a brute force attack, constrained by the small set of possible seed values to the pseudorandom number generator. This issue was found and addressed quickly, but it illustrates an excellent point about entropy in software applications: a reduction in the quality of a source of entropy can be very difficult to detect if you are not specifically looking for it.

So what does any of this have to do with cloud computing? The current “best practice” for the collection of entropy by an operating system is to collect keyboard timings, mouse movements, network interrupts, disk drive head seeks, and other operating system events that are collectively random and can be processed to generate a stream of randomness to seed the pseudorandom number generator. This works reasonably well for a desktop or laptop that has a keyboard and a mouse and is being used interactively in an arbitrary fashion by a human. It also can be made to work for server hardware, although the rate of entropy generation is slower (and thus activities like key generation are slower) when a human with a keyboard and a mouse is not actively involved, since the technique relies more heavily on unattended events like network and disk use. And this is where a potential problem arises for entropy in cloud computing: a set of virtual machine instances running within a cloud-based virtualization service could potentially share a source of entropy from the underlying hardware. If the instances all share a single piece of underlying physical hardware, then they also all share the same set of network and disk events, and thus a clever attacker might be able to predict the stream of entropy that might be utilized by an application on one of those instances.

There are other techniques for entropy generation (e.g. hardware entropy generators, software techniques involving samples from a microphone or webcam, and entropy services available via the internet) that can be employed to attenuate or eliminate the potential threat of shared entropy sources in cloud computing environments, and as cloud computing environments continue to mature there will undoubtedly be advances in this area to address this issue. In the interim, however, we should all take a closer look at the use of entropy within our cloud-based applications to ensure that we haven’t introduced a “side effect” that will have serious security implications.

How to Jailbreak iPhone 3.01

Apple just released the iPhone 3.01 firmware update, and that means it is time to update my jailbroken iPhone to 3.01 and then jailbreak it again. In the past, I have been a happy user of PwnageTool for the jailbreak, and I would be again except that PwnageTool hasn’t been updated yet for the 3.01 firmware. Doh! I could just wait for the PwnageTool update, but the firmware update is to address a SMS crack that can give someone root on your phone. So I guess I better find a way to do this without PwnageTool.

After the requisite sync and aptbackup, I decided I would first try a quick hack and see how smart PwnageTool is. I put PwnageTool in expert mode and browsed to the 3.01 firmware IPSW to see if I could trick PwnageTool into building a custom IPSW from the 3.01 IPSW. No such luck — PwnageTool checks the firmware and simply won’t do it if it isn’t a supported IPSW version (and 3.01 is not supported in the current version of PwnageTool). So I guess I really do need to use something other than PwnageTool for the jailbreak.

Luckily, I found a post on the dev-team blog that says you can use redsn0w 0.8 to jailbreak the 3.01 firmware provided that you use the 3.0 IPSW as a base. Apparently the changes in 3.01 are very minimal and the redsn0w jailbreak procedure only changes a few things within the existing firmware, rather than completely overwriting it as PwnageTool seems to do. I couldn’t find any good postings with a complete set of instructions on how to do this with redsn0w, but here is what ultimately worked for me:

  1. Connect your phone to iTunes and do a sync. Always good to start with this.
  2. Run aptbackup and select “Backup” so we can restore Cydia after the upgrade and jailbreak.
  3. In iTunes, restore your iPhone. This will also upgrade the firmware to the official 3.01 from Apple.
  4. Run redsn0w 0.8, and select the 3.0 IPSW (iPhone1,2_3.0_7A341_Restore.ipsw) firmware from ~/Library/iTunes/iPhone Software Updates
  5. Follow the instructions to put the phone in DFU mode. Note these are different than how PwnageTool does it, and you need to start with your phone off and connected to iTunes.
  6. Once you are in DFU mode, kickoff the jailbreak.
  7. At some point during the jailbreak, redsn0w told me it was waiting for a reboot. I waited quite a while, and it seemed to be hung. As a last resort, I decided to unplug the iPhone and start over. I unplugged the iPhone and plugged it back in, and…viola! The phone jumped into the redsn0w firmware loader screen and the jailbreak proceeded to completion. I don’t know if I was supposed to do this or not (like I said, I don’t normally use redsn0w)…but it worked.
  8. After a little while my phone came back to life and rebooted and the jailbreak appeared to have succeeded, with Cydia installed.
  9. Run aptbackup and select “Restore”. As part of the process, Cydia asked to upgrade a bunch of essential packages.
  10. One more reboot to check everything and…all done. The firmware revision is now 3.01 according to iTunes, and I have all of my jailbroken applications restored and in place.

That’s it. I hope this helps. And I hope to see PwnageTool updated in the near future, since it has several features (like custom boot images) that I would like to use with my iPhone.