About this blog

'Going Spatial' is my personal blog, the views on this site are entirely my own and should in no way be attributed to anyone else or as the opinion of any organisation.

My tweets on GIS, Humanitarian, Tech, Games and Randomness

Monday 8 November 2010

Creating a Map Cache - some trials and tribulations

Here’s an update on the map cache work.

TLDR:
We are optimistic that we can get the map cache creation time down to 7-8 weeks on ONE machine. By splitting up the map cache creation across several machines and spending time recombining them at the end using the ESRI Inc ‘Export/Import Map Cache tool’ this could mean a possible completion date of a few weeks (the original estimate!). Collectively, we’re all in a much happier place.  

Long version:
We’ve been keeping an eye on the processing and while we were aware that the process bar indicated that progress was slow – from our experience; we know that the progress indicator can and does leap from 5% completion to 50% in a few minutes and then sit on the 50% for another couple of hours. It isn’t reliable.

I had decided that if it didn’t ‘improve’ by the beginning of the weekend, then we would have to take some corrective action.

However, after absorbing the information that based on current calculations, the map cache creation might well take almost 5-6 months to complete; we had to go back and bring forward some of our actions. So on Thursday morning, we were faced with two immediate problems:

  • 1.   What was causing the slow map cache creation in the first place?
  • 2.  Is there an alternative work process that can be used to complete the task?

SLOW MAP CACHE CREATION
Luckily for us, the need to perform some action allowed us to detect an obvious issue. There was 0MB disk space available on the C drive of the server. We think that the issue was a combination of running the map cache creation for the entire country in one go (though scale by scale), a pagefile that was not tuned and temp space not relocated to spare partitions. Up until the start of the process, all tests had indicated that this part of the work flow (i.e. AMI configuration) was fine. There was little need to tune the AMI itself as we had other issues to address.  Any window server running with less than 10% disk space on the C drive is asking for trouble.

Am very sure that the lack of disk space on C AND the high RAM usage on the server (about 90% of available physical memory consumed) was detrimental to the map cache creation and probably slowed the process down to a crawl. Maybe the process indicator was right after all. Meanwhile the map cache folder (on a different partition) was still growing, albeit slowly but we could not access ArcCatalog to check. We were stuck.

CORRECTIVE ACTION TAKEN
So we took a copy of the AMI, created a new instance, fired it up and tuned it by moving the pagefile, temp etc from the C drive to the D drive. We also investigated ram usage etc. There is a known issue with Windows2008 and WINSXE folder the grows uncontrollably (http://serverfault.com/questions/79485/windows-2008-winsxs-directory-growing-uncontrollably-blocking-server)  Have a read, it isn’t too inspiring.

After some work and a quick test on map cache creation; we were happy with the results of our tuning. However, the exercise didn’t just include ensuring that our server could redo the same job again (without running out of disk space) but we were keen to take the opportunity to try something even smarter.  With the alternative work flow proposed below; we will also be creating a compact map cache that is of sub-national size thus removing the need for a large memory footprint as well. This is good news.  

ALTERNATIVE WORK FLOW
Using just one machine to process the map cache was never a good idea – we’ve been struggling with a method by which certain portions of the cache can be created by different machines and then eventually recombined to form a whole. We know how to do this in 9.3.1 with an exploded cache but under ArcGIS 10 and compact map caches? It is an area that we’ve been researching for about a month now – the use of an exploded cache makes the update easier BUT it has the disadvantage of slow file copying and possible loss of files to name two. A compact cache is faster to create, better performance and causes less disk fragmentation.

Also, any process of split map cache creation needs the time to copy, recombine and consolidate the map cache into one and then some testing to ensure that all is there. Not an easy job.

Until this Friday morning, the ability to combine a compact cache from different compact caches was deemed to be a difficult task due to the lack of a suitable GP tool,  lack of a known work flow, an example of this work flow ‘working’ and the time to test it.

However, through some hard work, Google and a bit of luck (thanks to a colleage) we discovered that ESRI Inc does have a tool that can import, export and combine compact caches. Additionally, information & details of challenges ESRI Inc had to face when running the same type of map cache creation/consolidation that we’re now doing on Amazon, was also available.

Wish we knew about this a few months ago.

The information was well hidden inside a presentation at the DEV summit of 2010. We took lots of notes, watched the WRONG dev video, then watched the right DEV video, had a meeting, drank lots of tea and swore at people. 

At the end of it, we had a plan.  

THE PLAN
Create a map cache according to a grid (to allow us to track the progress) on two different areas using two different machines, producing two different compact map caches. Then using some ESRI Inc magic (the GP Tool) we will recombine the compact map caches into one. When this works, which it will – then we have a viable work flow. Saving us time and grief.

We are currently running a  compact map cache creation from L0 to L19 (miniscale down to mastermap) for two areas in the UK. We are also applying all the tools and ideas that have worked so far covering file formats, tile size, compression, anti-aliasing etc. We are also using a featureclass (a fishnet covering the UK) to keep track of the map cache creation as per ESRI Inc recommendations. So far, the fishnet doesn’t work but this is a minor issue at the moment. We’re ignoring it.

The objective is two-fold: see if the disk space problems comes back when performing the compact cache creation AND to test if the map cache file recombination works. The recombination is untested by ESRI UK and we can only go with what ESRI Inc has posted on the dev summit. We think it is prudent to run this through ourselves before we are happy to extend it to the rest of the country.

The map cache creation of the two separate areas will run over the weekend. Come Monday, we will have two compact map caches of two different but adjacent areas of the country from scale L0 to L19. The objective is:

1: to record the timings of how long it took to create the map cache (we know the size of each grid, the total number of grids for the UK and can extrapolate to cover the entire country and thus derive timings) and

2: see if the export/import tool can combine the map caches into one – the ESRI documentation and Dev Summit Video indicates that this is exactly what it was designed for.

If the recombination of the compact map cache works, (& early indications is that it will), then all the parts are in place to cache up the entire country in a very reasonable amount of time.

An update, on Monday will be posted.

Ps: the previous map cache had got up to 13% completion before we killed it.

 

Sunday 10 October 2010

A good resource for ESRI ArcGIS Server Map Cache Creation

I found a good resource for ESRI ArcGIS server map cache creation - advanced concepts at this year's 2010 ESRI Developer Summit. 

The video is interesting and recommended for those working with large map caches. Here it is

Video:

There's also a useful presentation that was used in the video:

From this video and presentation - some useful points: 

  1. Get the number of SOC instances correct. The guidance is for the server to be running at about 95% CPU. When we do our test areas of caching, we should look at the CPU used, and increase/decrease the SOCS to reach this level. If the server runs at 100% cache creation time gets worse. The advice is to start at ‘Number of Cores + 1’, and work up or down as necessary. 
  2. Do NOT try to create large caches in one go. Instead use the Feature Class option. Create a feature class which divides the area into manageable chunks. E.g Each chunk being about 4 hours of cache creation time. It is then possible for the progress to be monitored, and also to start off again, in case of server failure just doing the uncached areas.  [Script: http://resources.esri.com/geoprocessing/index.cfm?fa=codeGalleryDetails&scriptID=15896 ] 
  3. Probably do NOT try to use cache on demand. The reason the performance is bad, is that even though the request is only for a tile, ArcGIS server will generate a ‘super tile’. A super tile will typically contain lots of tiles. This means that the map request is taking a lot longer than is necessary. The reason for for creating a super tile, is so that duplicate labels do not appear in the map cache. We could, only as a final resort,  consider caching areas of map based on what they contain (e.g.  urban areas), and leave the cache on demand for large areas of nothing. E.g. the sea. Cache on demand will be quick here, as there is nothing to draw!. 
  4. In addition to these tips, we can consider using the power of Amazon. 
  5. Create several VMs for Caching. Each VM will cache a different region (e.g. England, Scotland, Wales) , using the Feature Class Technique to track each servers progress 
  6. Once caching is complete. Dismount the drives, and remount them all onto one server. [Avoids copy to S3]. Use the ‘Import Cache Toolbox tool, to merge the caches into one.  Allows us to use the Compact Caches all the time.  [http://help.arcgis.com/en/arcgisserver/10.0/help/arcgis_server_dotnet_help/index.html#//009300000078000000.htm
    Image Formats
a. We should use JPEG 50% compression for Imagery.
b. We should use PNG 32 for Vector and >256 colours.
c. We could consider using ‘Mixed Format’ Caches, for the transparency problem.
The slide show discusses a different reason for using this format, but the
transparency problems we have been having may be a good other reason for using
this format. For data updates consider using ‘Compare Feature Classes’ script. This
can generate a list of affected tiles, and therefore determine which tiles need to be
updated in the cache. This can be used in conjunction with the ‘Feature Class’
caching option. [http://arcscripts.esri.com/details.asp?dbid=16866]
3)      
Investigate ‘Keep Alive’ IIS setting. Need to investigate if this will work through Amazon Load
Balancer

Wednesday 6 October 2010

Setting up wildcard SSL on the cloud - breaks the AMI

Setting up wildcard SSL on the cloud - breaks the AMI

Wildcard SSL are great: if you have a domain and you have multiple websites using 'A' records AND you want to enable HTTPS. A convenient way would be to purchase and install a wild card SSL instead of individual certificates per site. It also saves money too. Here's a good link on the GoDaddy site (where we have been buying our certificates) on WILDCARD SSL.

The instructions to install the SSL to your IIS7 webserver is here.

However, once you have installed AND bound the certificate to your website through using a wilcard SSL. There is a problem.
When you restart the instance, it never comes back if you have installed and bound a wilcard SSL certificate.

The only option is to terminate the AMI and hope you have an AMI that you can go back to and create a new instance.

Workaround:

1. You UNBIND the wildcard certificate prior to an instance restart. Maybe through a script.
2. You do NOT use Wildcard SSL certificates but opt for full domain SSL certificates.

Tuesday 21 September 2010

Wildcard SSL certificates and EC2

I needed to secure our site with a SSL certificate and went shopping. Verisign was the first choice but the price was too high - so I was informed that GoDaddy would be a good choice. Indeed they were, with certificates in the dozens of pounds sterling rather than hundreds.

I also saw their 'wildcard' SSL certifcates where one can secure a domain rather than a FQDN. Example: I may have several sites that need HTTPS for credit card transactions but they are all off the same domain (*.mytestsite.com) - so a single wildcard certificate is sufficient to cover all sites running under the mytestsite.com domain!

Brilliant and for a competitive sum too!

The certificate arrived and the transaction went smoothly, we then imported the certificate to the server certificate store and bound the wildcard SSL to our website. All this was done in the IIS Management tool. Then, we ensured that HTTPS was enabled on the amazon firewall (or security group_ and voila it worked, https worked!

However, there's a catch. When the AMI is stopped or rebooted, it never comes back online. Somehow the wildcard SSL, once bound to the default website in IIS, prevents the AMI from ever coming back after a reboot or stop / start. I have some theories as to why this is but a bit of a disappointment for sure.

Am confident that a SSL certificate for the specific FQDN will work but this means purchasing a separate certificate per website. Tedious but doable.

Monday 20 September 2010

Amazon Web Services has new tags

Keeping track of AMIs is a hassle. Who knows what is running on 'instance i-7bf88e0c'? With no descriptive fields to help us, we've gone to using a spreadsheet to keep track of all the instances and what is running on them. Some organisations use different security groups ('live-site', 'staging' etc) as a proxy. 

Well, it was with some delight that I discovered that tags were now available to each instance!


Now one can add custom information and have then displayed alongside the rather cryptic ones more commonly seen. As you can see below, I have created a new tag called 'Notes' which I attached something useful and less cryptic to each of the installed instances.

You can view tags you need to enable them in you show/hide column however. Just click on the show/hide column icon on the dash board and a pop-up window will allow you to show/hide columns.




Very useful and a life saver. 


Oh, this tagging extends to volumes as well. 


Excellent.

Friday 17 September 2010

Amazon Elastic Load Balancers (ELB)


Setting up  Load Balancers inside the Amazon Cloud is deceptively easy but there are a number of 'gotchas'. First, they are called 'elastic load balancers or ELB'. They are also software load balancers of an unknown capacity.There's been some talk on how scalable they really are though I have lost track of the news item that stated this.

Like trying out anything new, I decided to have a delve around for the documentation, just to be sure that what I had in mind would work and oddly, I found this time round that the normally excellent Amazon Web Services documentation was a wee bit sparse. However, the GUI to set up a load balancer looks easy so I decided it probably wasn't too difficult. I also discovered that the GUI tool as provided in the AWS console was not available when this service first came out and all of the work had to be done by command line. I love command line so I read up on it too. So can you here in a nice blog post by Steve Evans.

As ever, I sketched out what I am trying to achieve: a simple website running on two (or more) instances behind a load balancer supporting HTTPS. I will also need to attach a much more memorable name (e.g. loadbalancer.esriuk.com)  to the DNS of the load balancer (currently something like: mytestloadbalancer123-678076149.eu-west-1.elb.amazonaws.com ) in order to make the site easier to use. This is the CNAME as explained in a previous post.

Creating the servers or instances
First step is to create the two instances and I opted to go for the small instances. They're cheaper and for my testing, perfectly adequate for the work in mind. I created one instance (server-01), installed IIS7 on it, patched and slapped on an anti-virus software on it and then took a snap-shot of it. This snap-shot creates an AMI file which I then used to create a second instance, indeed an almost identical copy of the server-01. Very useful. 

Once I had two instances running, I made sure that each one was functioning properly. Tests were done to see that IIS on both of the instances were working. I subtly changed the default landing page on one of the IIS instances to help me differentiate between the two. 
 
The 'secure 2' was part of the HTTPS test and also helped me to remember that we were on a new page.
Creating the Elastic Load Balancer
IMPORTANT!

Now before you even start the creation of the ELB, one needs to decide what protocols are to be used. Obviously, one would want HTTP (80) to be allowed through the load balancer and this is the default but if you want to load balance HTTPS (443) or some other protocol / port - these have to be defined during the creation of the ELB. One cannot add extra protocols / ports once the ELB has been created. It is a one-step process. 


The ELB I need must support HTTP and HTTPS so these had to be defined during the ELB creation stage. HTTPS is called ' Secure HTTP server'. When this is option is selected, the other options are automatically filled in though don't worry that the protocol drop-down still says 'HTTP'; the port changes to '443' - the well-known port for HTTPS so all is well. Am sure that the protocol drop-down error is a bug and a minor one at that. One can also see that you can also re-direct incoming traffic on a particular port (e.g. 80) to a specific one on the EC2 instances. This makes the 




The rest of the process is quite straight forward; though one should spend a bit of time understanding the instance health checks that the ELB will employ to ensure that it does not redirect traffic to a 'dead' or unresponsive instance. 
 

Instance Health Checks

The ELB is quite smart in that it needs a way to check that the instances it is redirecting traffic to are online and valid traffic should still be routed to them.  Otherwise if an instance died (or was accidentally switched off) the load balancer would still continue to route traffic to it, oblivious to the instance's 'dead' state and this would cause partial downtime. What's worse, it could be quite intermittent with one out of every two pages (for example) returning with a '404' error message. 


So how does the ELB work? It does it simply by checking a specific file, usually a web-page file on a set schedule to ensure that the instance is healthy.
There are six parameters that one needs to be configure: 

'Protocol' - you can use either HTTP or TCP as the protocol of choice. 
'Target' - this is the file that the ELB checks to ensure it is all OKAY, usually it will be a simple web-page such as the default.htm file.
'Interval' - this is how often the ELB will perform the above health check.
'Timeout' - this is how long to allow the instance to respond to the health check before a failure is recorded.
'Unhealthy threshold' - is the number of *consecutive* failures before the ELB marks the instance as 'failed' and 'OutOfService'.
'Health threshold' is the number of successful checks before the ELB marks the instance as 'working' and 'InService' 

Create CNAME Record for ELB Instance 

When an ELB instance is first created, AWS provides you a public DNS name. It is unique but quite long and difficult to remember. Much better to have something a bit friendly for it. To do this, you will need to create a CNAME record in DNS to redirect the friendly URL to this DNS name. I blogged about this in my previous post.

Thursday 16 September 2010

More Cloud
Back from leave and onto our cloud work. We've been looking at moving our ArcGIS Server Instances to version 10.0 and it was with some eagerness that on monday, we switched the older ArcGIS Server 9.3.1 instance to ArcGIS Server 10.0 - we had the latter instance running in a staging environment for a couple of weeks and have to admit, it looks very good. 

All things are green and working very well. I am happy.  

There are a few things that I need to look at - there will be no more changes to the base image as we're now on ArcGIS Server 10 but a lot around the effective management of these AMIs. First job is to remove all the older 9.3.1 images and ensure that we're running the minimum number of images. It all costs the business money. 

So what are we looking at now? Well - it is auto scaling; I need a method by which the ArcGIS Server images we have can automatically increase their resources in response to an increase in demand while removing resources when the demand drops. 

I know AWS has this ability so I went looking and it is called 'auto scaling'. However, my first idea that this would add CPU resources to a specific EC2 instance is wrong. It does not. The auto scaling adds instances (whole virtual machines) to your pre-configured pool as required. 

OK so how does this auto scaling work? 

CloudWatch
This is Amazon's monitoring service - it is quite simple but for $0.015/CPU, one can keep track of your instance health on a variety of dimensions such as CPU utilisation, Disk I/O, Availability of RAM, etc.
It would be nice to plug into this data stream directly and use ActiveXperts or something to send warning emails and SMS to those who need it but I think this is limited at the moment. Now CloudWatch is indirectly required for auto scaling as you create new rules based on the metrics you get back - these rules include adding or removing instances as required.

Auto scaling
Auto scaling is free! However, one needs to use CloudWatch (which isn't free) before you can get auto scaling to work. So it isn't as free as one would think.

With auto-scaling, you first have to define an auto scaling group, with virtual machines as members. You then create pre-defined CloudWatch parameters and Amazon will automatically launch and add virtual servers when certain conditions and thresholds are met.

All this is done through the 
This even works with the AWS load balancer. Good stuff and quite easy to implement.

Yes, Amazon has load balancers available.

Elastic Load Balancing
For $0.025/CPU hour plus $0.08/GB data transfer you get an easy to configure and almost instantly available. To configure the load balancer, one maps the external DNS of your choosing (in this case an 'a' record) using a CNAME mapping to the DNS name provided by AWS.

However, there's a great article on why auto scaling without any capacity planning or control is a bad idea. I also learnt a new term while reading this article: being 'SlashDotted'.

Thursday 24 June 2010

Amazon Cloud is Ubuntu cloud - now shall we set up our own private cloud for GIS?

It was with some interest that I discovered that Amazon Web Services (AWS) was built on top of Ubuntu. I thought it was using XenServer or something but no, it is Ubuntu.

The details of their enterprise cloud offering is here.

What is of great interest is the ability to create your own cloud from the same installation. Am very tempted as our work on AWS, while fruitful has met with some odd issues. Also, the AWS isn't cheap - there has been a lot of testing over the past month and every hour that we have the AMIs running, it is costing us. Yes, the large AMIs may only cost $0.48 per hour but it soon adds up over time especially when one is only testing.

As part of our testing, we noticed that the AMIs were a bit sluggish in terms of CPU performance when compared to a standalone server. We decided to investigate this a bit further and discovered that Amazon rates their AMIs in Amazon EC2 Compute Units in order to provide everyone with a consistent measure of CPU capacity. Since Amazon purchase commodity servers all with a different rating, this approach makes perfect sense. However, how much horsepower is a since Amazon EC2 Compute Unit?

According to their website, it One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Not one of the fastest is it? Over time, Amazon will add more processors but this unit will remain. Please note that these are also single-core.


Here's a selection of Instances that Amazon now make available to your CPU hungry GIS application. We've been using the following:

Small Instance

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
32-bit platform
I/O Performance: Moderate
API name: m1.small
Cost: $0.12 per hour


Large Instance
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
64-bit platform
I/O Performance: High
API name: m1.large
Cost: $0.48 per hour

Thursday 10 June 2010

Datahub now live

Very pleased that now, we have our first venture into the brave new world of the cloud. Despite Larry Ellison's disdain for the term, I like it.

The OS OpenData was made available to the general public and we've been busy uploading the data and rendering it through ArcGIS Server. Version 1.0 of the service went live in time for the ESRI (UK) 2010 User Conference.
We have now released version 2.0 of the our service, using the free data and it includes separate mapservices for Meridian2 data, Panorama and Strategi. Release 3.0 will see OS VectorMap District being added to the list. All will be cached and highly available.

Overall, we are very pleased with the speed of the service and look forward to expanding the work more.

To connect:

1. Javascript API : http://datahub.esriuk.com/ArcGIS/rest/services/
2. Through ArcGIS Desktop http://datahub.esriuk.com/ArcGIS/services

Security Groups in AWS

Need to group them!
Just discovered that it isn't possible to move a created AMI from one security group to another security group after its initial creation. This is not too helpful as many organisations probably have a fluid and organic idea over the concept of AWS Security groups and will wish for some flexibility once they have been created.

Losing an AMI is also very easy to do!

Naming and identifying AMI
We've also decided to refer to our AMI using the last three digits of their unique instance id (example: '27b' or 'f4f') as a quick and easy way to identify them. There is room for confusing still between 'instance' and 'ami' but we'll sort this out when we get to it.

Thursday 3 June 2010

Small AMI

Always wondered why the 'smallest' available AMI was called 'M1.LARGE' but I have discovered that if you create an AMI from Amazon's own template, use the basic 32-bit Windows web server and voila, you have the 'M1.SMALL' AMI available to you. This AMI costs $0.12 per hour as compared to the $0.48 per hour for the M1.LARGE AMI. The small AMI also has 1.7GB of RAM available and is optimised for light weight web servers, probably all running behind the Network Load Balancer. 

Tuesday 11 May 2010

Security Groups in Amazon Web Services

Granting access to that lovely new AMI you have created
The concept of a security group in AWS is a nice idea as it is, in effect a firewall. Each AMI that is created and running is allocated a security group. The security group bares little resemblance to what one would normally call a security group, one with users and group permissions in a windows active directory for example. I think the name does confuse.

Anyway, when a new AMI is spun up it needs to have a number of ports open on it to allow web and remote desktop protocol (RDP) to be passed through from the internet to the AMI and back. By default, all AMIs are put into a default security group that has all connections denied. Not a good place to be.

Pro Tip:
So, before you even create your first AMI, as tempting as it may be, create the necessary connection rules in the firewall first. Most will need the minimum of RDP, HTTP and HTTPS to name three and we shall see about creating this group for all internet access. We shall call it 'Internet'.

Go and create your new security group by navigating down the left hand table of contents and selecting 'Security Groups'.

Click create a new security group, call it 'Internet'. Under the connection methods, click on the pull down menu and select one of a dozen well known connection methods. Each one will automatically default to well-known port numbers. You can change these ports if required. Make sure you hit the 'save' button on the right hand column, called 'Actions' to ensure that your new firewall rule (because that is what it is) has been saved. Annoyingly, you have to do this for each connection method. Ensure that RDP is one of the choices as you want to remote desktop to your AMI don't you? Of course, if you have a number of secured services, it might be a good idea to remove this particular connection method just to improve security. I would use NetSupport as an alternative and it uses port 5405. Just make sure that your own corporate firewall or personal firewall allows these ports out!

Once these rules are saved, it is applied almost instantly.

I can't access my AMI!

OK could be due to the following so check again:

1. Your security group - do you have the correct connection method selected?
2. Correct ports?
3. Did you save?
4. Check your own corporate firewall.
5. Check your own personal firewall (i.e. zonealarm) - could be blocking it.
6. Check the external DNS - you might be going to the wrong AMI.

Thursday 6 May 2010

Imagery in web applications - ArcGIS Server Blog

Imagery in web applications - ArcGIS Server Blog

Interesting article here - though our initial experiences with map caches wasn't too successful.

CNAME and Amazon

Amazon Web Services provides some unwieldy names for their AMIs and I believe their IPs change as well. One obvious task after an AMI has been created and spun up is to have something more friendly than en-1002957-gb1-eu1a.aws.amazon.com as a DNS name!

To get round this, one needs to use CNAME to map a different name to it. We're going to alias it. There appears to be some confusion over the exact term of the CNAME v Canonical Name as they are different but over time, both have been used interchangeably.

So, anyway - the goal is to have 'datahub.esriuk.com' as the URL that a user types into the address field. This will then seamlessly resolve to the 'proper' DNS name that is attached to each Amazon AMI.

The process was surprisingly simply: you just had to contact your ISP who holds your domain name, in this case www.esriuk.com and make a request. That's it.

Using CNAMES within the Amazon cloud makes access to data and resources a lot easier. Nearly everyone will have S3 buckets in the cloud as well and using the CNAME as an alias is dead-easy,especially if you're using S3FOX - a wonderful add-on for the firefox browser.

Here's a quick Amazon video on using CNAMEs and S3 buckets.


or shameless borrowed from the following page: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2456


When status becomes deployed, our distribution is ready and we are using Amazon CloudFront. As you can see above, our distribution gives us a new host name; we can now access our content at: http://d2oxqriwljg696.cloudfront.net/media_rdc_share_web-poster.jpg.

Obviously this is a cumbersome URL to work with; you might want to replace this with learn how to create a friendlier alias. The standard way to do this is by creating an alias that maps a friendly name to our actual name - this alias is called a CNAME or canonical name.

A CNAME is simply a way to create an alias or a nickname for a DNS record. In our case, we are going to create an alias for our cumbersome d2oxqriwljg696.cloudfront.net host name.

For this example, we will create demo.learnaws.com as a CNAME that points to d2oxqriwljg696.cloudfront.net.

This is an optional step, if you are comfortable using d2oxqriwljg696.cloudfront.net in your web page or application there is no need to create a CNAME.

The first thing to do is to let Amazon CloudFront know that you plan to create the CNAME. To do this in S3 Organizer, you’ll add the CNAME to the Manage Distribution dialog. Click the Update Distribution  

Next, you need to create a DNS entry for your CNAME. CNAMEs are managed by whoever manages your DNS entries. This is usually your web hosting provider. There is no standard interface for managing DNS entries, so an example from Dreamhost.com is shown below.

Usually, a web hosting provider will discuss how to alter your DNS entries in their support documentation. For our example, we will continue to use Dreamhost.com and create a CNAME for our new Amazon S3 bucket.

The alias, or CNAME that we will use is demo and we simply specify d2oxqriwljg696.cloudfront.net as the value.

It is common to also create a www.demo CNAME entry that maps to the d2oxqriwljg696.cloudfront.net as well. Incidentally, if you have a CNAME for an Amazon S3 bucket, you can simply change its value to your new Amazon CloudFront host.

New DNS entries usually take a few minutes to propagate. When it does, we can access our content at http://demo.learnaws.com. This is the base URL that we can use to access our content in Amazon CloudFront.

Now we have a friendly URL that will serve its content from a data center that is as close as possible to the user requesting it.

2. Use the Amazon CloudFront domain name to reference content in your web pages or applications

Once your content has been uploaded and your distribution has been setup, you can reference your content with your new Amazon CloudFront-based URL.

Your content can be served from any of the following edge locations– depending on where the request is being made:

United States
  • Ashburn, VA
  • Dallas/Fort Worth, TX
  • Los Angeles, CA
  • Miami, FL
  • Newark, NJ
  • Palo Alto, CA
  • Seattle, WA
  • St. Louis, MO
Europe
  • Amsterdam
  • Dublin
  • Frankfurt
  • London
Asia
  • Hong Kong
  • Tokyo

While one or several of these edge locations may serve your requests, your ‘origin’ server will always be the Amazon S3 bucket where you originally uploaded your data.

Your content will be copied to each edge server as it is requested. The first request will be processed by the origin server; then that content will be propagated to the appropriate edge server. The next time this content is requested, it will be handled by the edge server.

When you update your content, those updates are made at the Amazon S3 bucket (i.e. the origin server). Amazon CloudFront will then propagate those changes to the edge servers that have your content – this process can take up to 24 hours, but is usually completed within a few minutes.






Wednesday 5 May 2010

ArcGIS and Amazon

First technical blog for me though I have been blogging under a number of different guises in different subjects for a while, mainly online gaming and photography but I needed a log to keep track of what I was doing, merely as a reminder to myself.

Amazon Web Services (http://aws.amazon.com/) has been a new offering to the market providing fast, reliable and economical cloud computing to anyone who wants to pay. As someone who is managing a hosting service, AWS provides very quick access to resources that would otherwise cost me and the business thousands of pounds just to get started in terms of new machines and licenses. Amazon Web Services gives almost instant access to pay-as-you-go infrastructure and this is a great thing.

Cloud computing has been around for a while but AWS has made it easy for individuals and companies to access it - with a relatively clear pricing structure so that one can keep a track on the cost on a daily basis. Once a service is not required, you can throw the AMI away and not have to worry about disposal/recycling of hardware. This evolution is a natural process for our hosting team: we started off with a few servers, that grew to a few more servers than shrank down to a handful of big servers running virtual machines to now using the cloud.

The evolution of hosting at ESRI (UK)

Year One to Four
One very big project made it a necessity to set up a dedicated team and infrastructure to host an innovative web service. Over the three to four years, the service and accompanying infrastructure grew and grew. Kit was replaced several times and a growing pile of older, possibly obsolete servers started to cause us issues in terms of reliability, storage and recycling/disposal. 

Year Four
The web application and web service was sold off to a third party - who naturally wanted new kit in their new hosting centre. We were left with the older kit to run other hosting applications. It became a balancing act to ensure that we had enough hardware to run existing applications well and to have enough flex room to take on more work. However, we had to be careful that we didn't grow too big in terms of kit without a hosting contract to pay for the kit.

This balancing act went on for a couple of years. A few big hosting contracts were won and this required even more new kit, the cycle was repeating itself and the balancing act was maintained.

Year Five to Six
Virtualisation was touted as a possible answer to lower the cost of ownership down. The idea is very attractive: get a big host server and replace dozens of physical servers with the same number of virtual machines. Spin them up as required and adjust their resource requirements on the fly. Total Cost of Ownership (TCO) should come down as one does not need to buy new kit and there is a saving in power requirements. However, licenses for servers and application still need to be purchased and the 'entry cost' was merely shifted from cost of buying the kit to licensing the software. Still the monetary savings on power alone through virtualisation was significant. The ease of backup and recovery was also noted - a dead virtual machine can be switched off and replaced by a backup virtual machine in a few minutes. No need to keep extra kit involved and the clutter in the server room was reduced. We still had the headache of patching the virtual machines each month made more complicated by the need to spin up back-up VMs to patch and maintain.

Year Seven
Cloud computing offers up an expansion of the in-house virtualisation in that short-lived, high-intensity applications can easily be made available on the cloud and then switched off / thrown away when finished. The OS cost is included in the daily fee so savings are immediate. Creating new VMs in the cloud is easy and the ability to utilise very large servers (in terms of CPU and RAM) is as easy as making a new choice when you are spinning up a new instance. One has to learn a new vocabulary when in the Amazon cloud as well. The ability to keep AMI patched up is still an ongoing issue but the 'entry cost' to hosting has now been almost eliminated. There is no need to buy kit or to license operating systems (or even databases if you opt to use the PostgreSQL AMI) - so a bare-bones system, running IIS can be up and running in a matter of minutes.