I’ve recently observed some anomalies in Windows-based EC2 instances that I think are worth sharing. The primary issue appears to affect the clock setting on some of the instances, but my guess is that there is an underlying hardware-dependent bug in the virtualization layer that is the cause of this issue and some other related side-effects (more on those later).
I built and I continue to operate a public facing enterprise software-as-a-service (SaaS) product for my company. Behind the scenes, I run both Linux-based and Windows-based EC2 instances to facilitate the functions of the service, and at times we might run large numbers of EC2 instances concurrently. Lately, I’ve started to see some Windows-based instances that intermittently boot with an incorrect clock setting. The time zone appears to be incorrect, and the time on the box is UTC instead of PST. The time, in hours/minutes/seconds, appears to be otherwise correct, but it is off by eight hours due to the time zone. Ordinarily, I wouldn’t be too worried about this, but in my case I use the S3 API on the instance, and the S3 API calculates a security signature for all requests that incorporates the current time of the machine in the algorithm. If the time setting of the S3 caller differs by more than 15 minutes from the time setting of the S3 servers, then the request will be bounced by the server with the following error message:
The difference between the request time and the current time is too large
The failure of this S3 request is a major problem for my application, so I dug into this to see what was going on. As far as I can tell, Windows is failing to sync up with time.windows.com via NTP and the result of this error seemed to be the incorrect time zone setting (although I can’t tell you why exactly that error would result in that outcome). I switched to an alternate NTP server at nist.gov — time-nw.nist.gov — and this appears to resolve the issue.
Some further searching has led me to believe that there may be an underlying hardware-related bug in the virtualization layer that affects the ability of Windows to access the time.windows.com service, as well as affecting some other applications (e.g. Cygwin, and Bash and Perl running via Cygwin), and this bug is observed only on AMD hardware. This is just a guess, however, and I can not say conclusively that this is a hardware-related issue since there is no transparency into EC2. But if you are running an application that is sensitive to the clock setting on an EC2 instance, then you should pay attention to the time zone setting of your instances as they boot.
Hope this helps.