About this blog

'Going Spatial' is my personal blog, the views on this site are entirely my own and should in no way be attributed to anyone else or as the opinion of any organisation.

My tweets on GIS, Humanitarian, Tech, Games and Randomness

Friday 30 November 2012

What has it been like after 12+ months?

My company has made a fundamental shift towards Cloud computing and specifically towards Amazon Web Services (AWS) in the last 12 months.
I think this is a picture of the servers for Facebook?

Quoting the wiki (it’s always good to start with someone’s quote!):

 Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility (like the electricity grid) over a network, typically the Internet.

Why did we do it? At the technical level it was always something that we wanted to do but like all projects it needs strategic buy-in. The potential cost savings (as published in the media) as well as the flexible nature of the cloud meant that it was foolish not to try the cloud if only to compare it with the environment of ‘traditional’ hosting. At a strategic level, our parent company had already migrated some of their main-line hosting applications to the cloud via AWS so there was a path already set out (albeit with differences in application and customers).

So the process of migrating to the cloud took a number of distinctive steps.

1. Planning
2. Test Deployment
3. Parallel Running
4. Go Live
5. Update documentation and processes


PLANNING

Pencil and paper to start
One needs a firm grip on any deployed architecture as under AWS pricing it is the architecture that determines the cost per month. Get it wrong or miss something and it reflects in an increase in the monthly cost. So, before we started moving to the cloud we had to ask ourselves, is our architecture flexible and easy enough to migrate over? More importantly, do we have the system architecture documented; do we know what pieces we’re working with? 

Unfortunately, what we had was not quite up to scratch so the first step was an inventory of the systems ear-marked for 'cloud deployment'. It was a useful exercise and took about four weeks to complete though the process should now become an on-going one.  At the end of this exercise, we had a good idea of the proposed architecture. With this, I had to sit down and justify to my project team and those above me, the cost of this venture. Now, initially I went over to the product pricing pages (http://aws.amazon.com/ec2/#pricing for EC2 pricing and http://aws.amazon.com/rds/pricing/ for RDS prices and threw together a tonne of spreadsheets and tried to work out an annual cost. A bit of headache as AWS pricing also includes a lot of variable related to data transfer, the use (or non-use) of Elastic IP addresses and volume storage to name three. However, there is a handy monthly cost calculator available which was a God-send (http://calculator.s3.amazonaws.com/calc5.html) – if only I knew of this tool when I first started on our AWS adventure. Of course, this tool’s accuracy is much improved if you can insert as much of the relevant bits of information into the appropriate boxes. Hence the review and inventory step was so useful furnishing me with this information to hand.

So I got my architecture and now a proposed monthly figure in Euros and added 15% contingency as you just know something will happen! I calculated the annual cost and presented it for budgetary consideration. Since the decision to move to the cloud was already made, surely, my budget would not detract them? Luckily, my calculations were not that far off and well within the budget. So, green light and all systems go! It was a near thing as most people didn’t think past the initial headline costs (‘instances from $0.03 an hour!’) and were mildly surprised at the proposed figure.

TEST DEPLOYMENT

Should be brighter colours....
With a plan now in mind and a road-map, it was time to prototype the deployment.  We had to get used to using the AWS dashboard and the available tools as well as observing the performance of the applications in this new and exciting environment. Now, back when we started, a lot of the products were not available so the functionality was limited.
A test environment of a SimpleDB with a micro instance (the smallest AWS virtual machine available) was commissioned and then tested. SimpleDB is AWS’ simple and flexible non-relational datastore. It was quick and easy to set-up and worked for the initial testing and deployment stage. We used it to stored all our user and audit data. When we started, it was quite cheap but now, SimpleDB is free the first 25 hours consumed and up to 1GB of storage space per month! More details of Amazon SimpleDB can be found here: http://aws.amazon.com/simpledb/

The initial test did reveal a number of specific issues: we immediately encountered problems with security groups and the incompatibility of SimpleDB to our security model. However, the ease of deployment from some stock template Amazon Machine Images (AMIs) meant that we could easily set-up a new environment in a matter of minutes. The performance of the web site and application was as fast as we had ever seen it and all for a few dollars; a bargain we thought and we were impressed. Of course, there were still unanswered questions over alerting and monitoring; management reporting and metrics but one step at a time eh?

Other issues we had discovered when using SimpleDB were the relatively slow interaction with the micro instance via remote desktop.  Some additional gotchas: do not use the Windows 2008 32-bit edition on a micro instance. With this version, everything was just dreadfully slow. We were advised by AWS to switch to using the Windows 2008 64-bit edition. Once we switched over, things went a lot more smoothly. Of course, the specifications of the micro instance just about meet the recommended installation requirements for our ships no matter.
Out of interest we looked at the specifications of the micro instance and this is what we found:  613 MB of memory, up to 2 ECUs (for short periodic bursts), EBS storage only, 32-bit or 64-bit  platform. Now, interestingly, from experience the minimum RAM required for Windows 2008 is 512 MB of memory but in reality we don’t commission anything with less than 4GB (32-bit) or 8GB (64-bit) and even though theoretically we had about 100MB of RAM available, the micro instance once started up had about 5MB left and would be frantically paging to disk. Basically, the micro instance cannot be used for anything remotely useful commercially; which is a smart thing to do by AWS.

We also realised how quickly Amazon Web Services were bringing out new products almost every month!



RESERVED INSTANCES OR NO? (sorry digression warning!)

One year

On Demand Instances

Large Instance





Hourly Rate:
$0.48
Setup:
$0.00
Cost per year:
$4,204.80


Total
$4,204.80


It was at this time that we looked at reserved instances and the concept was simple: pay AWS an upfront setup fee and get a much reduced hourly fee for either a one-year or three-year period. If the planned utilisation of instances was low (i.e. you would be running an instance only for a few hours per week) than the reserved option was not economical. However if monthly utilisation is high, such as 100% then reserved made financial sense and should be actively considered. Here are some tables I had put together to convince my manager that reserved instances is a good idea.
If we’re just paying the on-demand, pay-as-you-go fee then for one large AWS instance the cost is $4,204.80 and this figure does not take into account the use of other AWS products such as S3, RDS, Cloudwatch etc. This is the pure cost of running one large instance, 24/7.

Now, $4,204.80 sounds a lot but for this you get a very high performance server, running windows 2008 64-bit Advanced. While they are virtual they are equivalent to a quad core server, already licensed and ready to go.

However, if one pays a setup fee of $1,105; then the hourly cost drops from $0.48 to $0.205 per hour; with 100% utilisation then the annual cost drops to $1,795.00 per year and with the setup fee included this result in a net saving of $1,304.00 per year per instance.

Oh, as of November 2011, AWS decided to add some more product differentiation for the reserved instances. Previously there were two rates, one for a one-year deal and second rate for a three-year deal. Well now we have the same two rates but spread out into three ‘utilisation’ groups; light, medium and heavy. The differences relate to how you are charged when you have a reserved instance and it is NOT being used i.e. ‘lights on, no-one home‘ We went with the heavy utilisation.



One year



Three year


Reserved Instances (Heavy Usage)


Reserved Instances (Heavy Usage)
Large Instance



Large Instance
















Hourly Rate:
$0.205


Hourly Rate:
$0.205

Setup:
$1,105.000


Setup:
$1,700.000

Cost per year:
$1,795.800


Cost over 1 year:
$1,795.800





Cost over 3 years:
$5,387.400

Total
$2,900.800


Total
$7,087.40









Saves:
$1,304.00


Saving:
$5,527.00

Pretty good numbers and as they say, a real no-brainer if you need to save some money!
(edit: since this post was originally drafted, there's been further changes in the reserved pricing with high-, medium- and low-utilisation rates now available. Check them out here!)

COMMISSIONING and then PARALLEL RUNNING

This has to be used in the next release phase!
Of course, we couldn’t just then flip on our application and go live and be done with it. The prototype worked but what about the ‘proper’ application and data? The next step was to introduce the finalised architecture slowly. We had decided on a ‘live’, ‘stage’ and ‘dev’ environment that was recommended by our solution architect. Each environment would have its own elastic load balancers and at least two standard instances behind each load balancer. That is six instances now, the cost of the micro instance during the test was $0.035 per hour compared with the standard instance (confusing classed as ‘large’) costing $0.48 per hour and having six of them, the hourly cost shot up by an alarming 168%!



DATA, DATA, how it vexes me, DATA

Since I worked for a mapping company we had lots of data and one challenge was to get the data into the cloud and onto the AWS instances. Two of our instances were our mapping servers with all the data and server software installed on each.
There were a number of options available to use and we considered them all. The options included:

1.       Upload everything to S3.
2.       Use the AWS export / import service.
3.       Install an FTP client / server and use that to move.
4.       Use the RDP copy function.

Using S3

Now, S3 (Simple Storage System) is Amazon’s fast, inexpensive cloud storage offering that Amazon themselves use. You store things in a conceptual folder, called a ‘bucket’. Like the other AWS products; one only needs to pay for as much as one used and the cost of storage is available here (http://aws.amazon.com/s3/#pricing) with additional cost for bundles of requests to and from the S3 bucket.

We had several hundred gigabytes to move into the cloud. We decided to use the very useful Mozilla Firefox add-on, ‘S3Fox’ as the application to move our data onto out S3 bucket. We then installed S3Fox onto each of the instances so that we could then download the new data.  We took timings and not surprisingly, the speed to download the data from an AWS S3 bucket to an AWS instance was significantly fast. The bottleneck was our own internet connection’s speed (though not too shabby at 60mbps) and any limitations to upload bandwidth.

There were some gotchas all to do with file sizes.

First the the S3 bucket was limited to objects no larger than 1TB in size[1] and this posed a number of problems for us since our spatial data was already in the terabyte range. So, we had to chunk up the data prior to transfer. However, S3Fox itself appeared to be limited to files no bigger than 50GB per file. Anything bigger and it would fail. We suspected that this was more to do with the 32-bit operating systems we were using. This further compounded our problem but it forced us to reduce the file of each chunk to no more than 50GB. For a 1.2 TB file, it meant we had to create about 20 files which we would upload sequentially to theS3 bucket in the cloud, then download it each chunk to our selected instance before we stitched it back together again.  The tools we used for this included 7zip[2] and some linux tools.
The entire upload process to move our data from our own local servers to the AWS instance via our S3 bucket took about a week to complete. Once the data was successfully stitched together we copied the contents to a separate Elastic Block Store [3](EBS) and attached it to our mapping server instance as a new drive (volume ‘E’) since the default instance only came with a 35GB C drive and a 100GB D drive, both too small for our needs. We also made a copy of this volume. This new volume would be attached to the second of our mapping server instances. This would be a lot easier than downloading the data a second time.

AWS export/import service

Obviously, AWS had anticipated that some of their customers would need to move large amounts of data into and out of AWS so they have offered a service where one can use a portable storage device to load directly into AWS high-speed network. We liked this idea as the cost of a 2TB drive is now comfortably under $200.00 however, the service isn’t consistent: the AWS import/export service only supports the import of files from the portable drive into Amazon EBS for the US Regions. Everywhere else, the import/export puts data into your Amazon S3 bucket. This obviously cuts out the slowest part of the process (our upload time) but the fiddly stitching together of multiple file still remained – we are keen to cut-out the need to chunk up the data and reassembly the bits into a coherent file. All this takes time that we’re keen to save.

Using the FTP client/server

This was tried and it worked fine; however it did require a change in the AWS security group to allow FTP through as well as extra software installation and configuration. Our aim was to keep the instances as lean and secure as possible and installing FTP software would increase the vulnerability footprint.

Using the RDP copy and paste function

This also worked but only for small (and I mean, small) files. It was unsuitable for the size of files we were using.

At this point it was prudent for us to make a snapshot of this instance, after all, losing this would set-up us back, we then discovered another gotcha.  With the new 1TB EBS volume attached, taking a snapshot of the entire instance would 1) take a long time and 2) most likely fail. Rather perturbed we tried again with the same result. A quick chat to an AWS Solution Architect[4] and it was recommended that we first dismount the new volume under Windows Disk Management and then dismount the volume through the AWS dashboard. Then back up the instance and EBS volume separately.
This worked but it added another step in an ever-growing work list.

Performance testing – does the application rock?

Faster, faster, faster!
Despite the issues we had faced, everything was on still track and all the instances created and ready to roll. It was at this time, with the architecture on a semi-live footing that we ran through a very involved performance testing process. We created a test harness both on a dedicated cloud server (for cloud-to-cloud testing) and another server within our own LAN environment. The latter test harness would probably be a more accurate reflection of real life usage as the test would need to go navigate through network traffic and the firewall before hitting the internet.
Will the cloud server really be as fast as people claim to be?
The results were very pleasing, on average the speed of response was less than one second and the ability to handle a big jump in concurrent users, from one to 100 was easily handled by the AWS infrastructure (ELB and EC2) – we didn’t even look at the auto-scale functionality.

SECURITY

After the testing, we went through an involved round of security testing from brute force hacking of the website (we warned AWS first in case they responded!) to social engineering where we had some of our developers try to hack into our email accounts in order to create sophisticated phishing attempts. While I do not want to get into too much detail, the security is only as strong as your personnel. AWS itself was perfectly fine.

BETA Programme

So, with testing complete it was time to for a limited beta-programme as from our experience, there’s no testing quite like user testing. This was opened to a select number of our clients and the feedback was invaluable – they also discovered some bugs in the application itself that we had failed to notice. The beta-programme should have for a couple of months but we extended it by another three months with the notion that it will over-lap with the go live date. We had some features in beta that would not make it to release due to some legal issues hence the overlap.

GO LIVE

So the go live date arrived and it was actually a bit of an anti-climax. With a marketing campaign kicking off and users signing up, the site smoothly transitioned from beta to live. The usage patterns were very pleasing with a steady upward curve in registrations and users.


THOUGHTS

We had replaced a bunch of physical servers with a new set of virtual machines. We are running the live production servers 24/7; just like they are physical servers so we're paying a constant fee each month. With a physical machine, the initial cost is usually up-front (the capital cost) and the cost (and value) of the machine drops over time, sometimes quite rapidly. Conversely, the requirements of ever newer applications and software places increased demands on hardware.  The cloud servers, (we've been informed) have a rolling programme where the underlying hardware is constantly updated by the provider. Of course, one cannot control this upgrade process and if you really need the latest and greatest hardware, you cannot go out and get one[5]!

There is a misconception that I want to tackle, though I am not a myth buster[6]! The cloud is touted as being cheap and for some users, it is indeed, a cheap and powerful resource. You treat computers and computing power like a utility, turning it on and off whenever you need it. Pay for what you need! An excellent idea and I cannot fault it.  However for a live production system (such as the one I am working on) - that doesn't actually apply. Using the utility example further; our usage is more akin to using the electricity to power our life-support system and there's no compelling reason to switch it off to save money.  

So it makes calculating the annual cost quite easy. The standard AMIs we use are the large AMIs, currently at $0.48/hour which works out at $11.52 a day and $4204.08 a year. The jump from a headline grabbing price of $0.48 per hour (which doesn't sound much) to a more eye watering $4204.08 per year can be disconcerting for those tracking the money closely. Now multiple this with the number of instances one has in their production environment, their staging environment and maybe a development environment and the cost starts to hit the level that makes those in charge of budgets uncomfortable. Counting the cost of the Elastic Block Store (EBS) for the storing all your extra data, throw in a enterprise load balancer and lots of snapshots (you do want backups right?) and a few 100 gigabytes worth of traffic to and from your AWS cloud and the cost just keep going up.

CONCLUSION

The Cloud is here to stay, at least with my company and our parent company. It is definitely more expensive than we had first expected, due to many of our system architects being fooled into the cost savings of a developing an ephemeral system.

My Bug Bears

1. CloudWatch has no SMS notification, only in the US East Region.
2. The import/export service where large files can be loaded into AWS? Ideally they should be made available for the EBS but this is only available in the US East Region. For those in Europe, we have to have the large file loaded to S3 and then downloaded. Well, at least the
Now, we had some veterans in our company who looked at the cloud and said, 'seen it before' and went back to cutting some obscure code.

However, the advantages of rapid up scaling either horizontally and/or vertically meant that responses to architectural changes can be fast, safe and efficient. If you have an excellent technical architecture, a clear understanding of what you are delivering, support processes in the form of configuration management, change management and release management. Then you're good to go.


[2] http://7-zip.org/ - a fantastic tool
[4] We had the benefit of knowing someone at AWS rather than go through Customer Support
[5] Maybe the cloud provider (AWS in this case) can give an indication of this?

No comments:

Post a Comment