Optimising Sitecore Disk Performance on Azure
If you are starting out hosting Sitecore on Virtual Machines on the Microsoft Azure platform, you may have run into performance issues that you wouldn’t expect to see on comparable hardware on dedicated servers (e.g. on-premise or data centre hosting).
One of the common ‘gotchas’ on Azure Virtual Machines is disk IO latency.
First, lets take a look at basic disk setup on a VM which in this case is the default out-of-the-box setup for most Azure Windows based VM templates:
C: and E: in Azure are ‘Blob Storage’ disks. These do not reside physically on the Virtual Machine hardware. Instead they are exist somewhere else in the Azure cloud infrastructure, therefore all read/writes to these disks are made across a potentially contended network connection.
C: is commonly reserved for the operating system, and E: is typically assigned as the data disk that you would use for whatever applications you are hosting on the server.
The D: is the Virtual Machine temporary storage drive. This drive exists on the same physical hardware that your Virtual Machine is running on. As a result, read/writes to this drive are many times faster as nothing is going across a network – it’s happening on the same hardware that the OS is running on.
However, when you look at the contents of the D: drive you’ll see a text file containing a warning that data stored here may be lost at any time. In reality the scenario where data loss may occur is when the Virtual Machine is re-allocated, at which point it may re-start on a different physical server. The contents of the D: drive will not move with the server, because it was stored on the original hardware the Virtual Machine was running on, and hence is lost.
Considering the E: drive is the data disk, and the D: drive risks data loss, a simple approach to deploying any application is to store it all on the E: drive – nice and safe. Minimal risk of data loss if you are using Geo Redundant Storage.
For a Sitecore site you may deploy it all to a path such as E:\sitecore\website, configure IIS to run the site from that location, and be done with it. The site will run, but it will run slowly – especially noticeable after an IIS application pool recycle when the site is booting up. You may notice that CPU or memory usage on the server isn’t particularly troubled, but digging a big deeper you might notice that disk IO latency is startlingly slow.
Why is this happening?
When a Sitecore application is running, the vast majority of read/write operations are happening in four separate folders:
- The data folder (e.g. \data)
- The app_data folder (e.g \app_data)
- The media folder (e.g. \upload)
- The temp folder (e.g. \temp)
Data is by far worst culprit. It is here Sitecore stores its Lucene content search indexes as well as log files and some other diagnostics information. The IO churn here, particularly when the site first starts up and Lucene indexes and media caches are building up, is substantial. App_data stores cached media items (which otherwise are stored in the Sitecore database), so this folder also has high IO churn when the site starts up and the media cache is built up.
Fortunately, the contents of all three of the folders is either temporary data, or data that Sitecore will automatically rebuild in the event the contents of the folders are empty (e.g. Sitecore will just rebuild the Lucene indexes and media caches for example).
This means it’s relatively safe to store these on the super-fast D: drive. In the event the Virtual Machine is reallocated to different hardware and the contents of the drive is lost, Sitecore will recover when it starts up.
The simplest solution is to create four folders:
Ensure each of the folders has re-write permissions for whatever user the Sitecore IIS site application is under (e.g. the Network Service account).
The next step is to drop a simple patch file into the App_Config\Includes\ folder on your site to configure Sitecore to use these next folder locations, for example:
<patch:attribute name="value">d:\data </patch:attribute>
<patch:attribute name="value">d:\upload </patch:attribute>
For the app_data folder, in IIS create a virtual directory called app_data under your site and point it to D:\app_data.
With these changes in place you should start seeing substantially better page load times on your website. The performance of the site on your Virtual Machine is now far more likely to be constrained to its CPU and memory constraints than disk IO latency of the E: drive.
One thing to consider here is that by default Sitecore stores its license file in the \data folder (e.g. \data\license.xml). So by default, if the data folder as reset and you aren’t storing your license file elsewhere, your site will start throwing errors if it can’t find its license file. The solution for this is simple – move the license.xml file into the App_Config folder (which would still be persisted on the E: drive), and update the following setting in your Sitecore.config file to look for it in the new location:
<setting name="LicenseFile" value="/App_Config/license.xml" />
Finally, we need to ensure that if the D: drive is reset, the folders and permissions we initially created for Sitecore to use need to be recreated automatically. A simple solution is to create a batch file, stored on the E: drive (e.g. E:\initvm.bat), that double checks these folders exist and the permissions are correct. Then use Windows Task Scheduler to run this file on server startup. This final step builds in some resilience so manual intervention is required when the server is reallocated, everything should just get up and running straight away.
icacls "D:" /grant "NetworkService":(OI)(CI)F /T
icacls "D:" /grant "IUSR":(OI)(CI)F /T
Fix your SQL Server whilst you’re at it
If you are using an Azure Virtual Machine for your SQL server, exactly the same disk IO constraints apply.
SQL server has a special system database, called tempdb, which is subject to very high disk IO compared to any other databases on the server (even your Sitecore databases).
If you have used one of the Microsoft SQL Server Virtual Machine templates, the default location for the tempdb storage is on the E: drive along with all the other databases.
This should be moved to the D: drive – again providing significant performance benefits.
The first thing would be to create another initvm.bat file to apply the correct permissions to the D: drive – we need to make sure the service account SQL is running under has read/write permissions to the drive:
icacls "D:" /grant "MSSQLSERVER":(OI)(CI)F /T
Next, execute the following SQL statement on the server and restart the SQL service, after which is will start using the D: drive to store its tempdb:
Alter database tempdb modify file (name = tempdev, filename = 'D:\tempdb.mdf')
Alter database tempdb modify file (name = templog, filename = 'D:\templog.ldf')
Azure Virtual Machines might look and feel just like a physical server, but certain aspects of the cloud infrastructure mean they don’t necessarily perform as you would expect with comparable physical hardware if they are not configured and used appropriately to work around these constraints.
Not only will the configurations discussed in this article achieve better performance for your site, but it means you can serve the maximum amount of traffic possible with the Virtual Machine size you are using. This means lower server and hosting bills and happier site users.