About this blog

'Going Spatial' is my personal blog, the views on this site are entirely my own and should in no way be attributed to anyone else or as the opinion of any organisation.

My tweets on GIS, Humanitarian, Tech, Games and Randomness

Monday 8 November 2010

Creating a Map Cache - some trials and tribulations

Here’s an update on the map cache work.

TLDR:
We are optimistic that we can get the map cache creation time down to 7-8 weeks on ONE machine. By splitting up the map cache creation across several machines and spending time recombining them at the end using the ESRI Inc ‘Export/Import Map Cache tool’ this could mean a possible completion date of a few weeks (the original estimate!). Collectively, we’re all in a much happier place.  

Long version:
We’ve been keeping an eye on the processing and while we were aware that the process bar indicated that progress was slow – from our experience; we know that the progress indicator can and does leap from 5% completion to 50% in a few minutes and then sit on the 50% for another couple of hours. It isn’t reliable.

I had decided that if it didn’t ‘improve’ by the beginning of the weekend, then we would have to take some corrective action.

However, after absorbing the information that based on current calculations, the map cache creation might well take almost 5-6 months to complete; we had to go back and bring forward some of our actions. So on Thursday morning, we were faced with two immediate problems:

  • 1.   What was causing the slow map cache creation in the first place?
  • 2.  Is there an alternative work process that can be used to complete the task?

SLOW MAP CACHE CREATION
Luckily for us, the need to perform some action allowed us to detect an obvious issue. There was 0MB disk space available on the C drive of the server. We think that the issue was a combination of running the map cache creation for the entire country in one go (though scale by scale), a pagefile that was not tuned and temp space not relocated to spare partitions. Up until the start of the process, all tests had indicated that this part of the work flow (i.e. AMI configuration) was fine. There was little need to tune the AMI itself as we had other issues to address.  Any window server running with less than 10% disk space on the C drive is asking for trouble.

Am very sure that the lack of disk space on C AND the high RAM usage on the server (about 90% of available physical memory consumed) was detrimental to the map cache creation and probably slowed the process down to a crawl. Maybe the process indicator was right after all. Meanwhile the map cache folder (on a different partition) was still growing, albeit slowly but we could not access ArcCatalog to check. We were stuck.

CORRECTIVE ACTION TAKEN
So we took a copy of the AMI, created a new instance, fired it up and tuned it by moving the pagefile, temp etc from the C drive to the D drive. We also investigated ram usage etc. There is a known issue with Windows2008 and WINSXE folder the grows uncontrollably (http://serverfault.com/questions/79485/windows-2008-winsxs-directory-growing-uncontrollably-blocking-server)  Have a read, it isn’t too inspiring.

After some work and a quick test on map cache creation; we were happy with the results of our tuning. However, the exercise didn’t just include ensuring that our server could redo the same job again (without running out of disk space) but we were keen to take the opportunity to try something even smarter.  With the alternative work flow proposed below; we will also be creating a compact map cache that is of sub-national size thus removing the need for a large memory footprint as well. This is good news.  

ALTERNATIVE WORK FLOW
Using just one machine to process the map cache was never a good idea – we’ve been struggling with a method by which certain portions of the cache can be created by different machines and then eventually recombined to form a whole. We know how to do this in 9.3.1 with an exploded cache but under ArcGIS 10 and compact map caches? It is an area that we’ve been researching for about a month now – the use of an exploded cache makes the update easier BUT it has the disadvantage of slow file copying and possible loss of files to name two. A compact cache is faster to create, better performance and causes less disk fragmentation.

Also, any process of split map cache creation needs the time to copy, recombine and consolidate the map cache into one and then some testing to ensure that all is there. Not an easy job.

Until this Friday morning, the ability to combine a compact cache from different compact caches was deemed to be a difficult task due to the lack of a suitable GP tool,  lack of a known work flow, an example of this work flow ‘working’ and the time to test it.

However, through some hard work, Google and a bit of luck (thanks to a colleage) we discovered that ESRI Inc does have a tool that can import, export and combine compact caches. Additionally, information & details of challenges ESRI Inc had to face when running the same type of map cache creation/consolidation that we’re now doing on Amazon, was also available.

Wish we knew about this a few months ago.

The information was well hidden inside a presentation at the DEV summit of 2010. We took lots of notes, watched the WRONG dev video, then watched the right DEV video, had a meeting, drank lots of tea and swore at people. 

At the end of it, we had a plan.  

THE PLAN
Create a map cache according to a grid (to allow us to track the progress) on two different areas using two different machines, producing two different compact map caches. Then using some ESRI Inc magic (the GP Tool) we will recombine the compact map caches into one. When this works, which it will – then we have a viable work flow. Saving us time and grief.

We are currently running a  compact map cache creation from L0 to L19 (miniscale down to mastermap) for two areas in the UK. We are also applying all the tools and ideas that have worked so far covering file formats, tile size, compression, anti-aliasing etc. We are also using a featureclass (a fishnet covering the UK) to keep track of the map cache creation as per ESRI Inc recommendations. So far, the fishnet doesn’t work but this is a minor issue at the moment. We’re ignoring it.

The objective is two-fold: see if the disk space problems comes back when performing the compact cache creation AND to test if the map cache file recombination works. The recombination is untested by ESRI UK and we can only go with what ESRI Inc has posted on the dev summit. We think it is prudent to run this through ourselves before we are happy to extend it to the rest of the country.

The map cache creation of the two separate areas will run over the weekend. Come Monday, we will have two compact map caches of two different but adjacent areas of the country from scale L0 to L19. The objective is:

1: to record the timings of how long it took to create the map cache (we know the size of each grid, the total number of grids for the UK and can extrapolate to cover the entire country and thus derive timings) and

2: see if the export/import tool can combine the map caches into one – the ESRI documentation and Dev Summit Video indicates that this is exactly what it was designed for.

If the recombination of the compact map cache works, (& early indications is that it will), then all the parts are in place to cache up the entire country in a very reasonable amount of time.

An update, on Monday will be posted.

Ps: the previous map cache had got up to 13% completion before we killed it.