Nico Vuyge's blog

Taking the blue AND the red pill

Very high cpu usage when using the ADO.Net Data Services Client

Nico Vuyge

How to eliminate very high CPU usage when using the Windows Azure StorageClient

In a previous post, I promised to investigate an number of performance issues I've been seeing with using the Windows Azure StorageClient. I noticed the following issues:

  • Very high CPU usage when retrieving a large amount of data from Windows Azure Table Storage.

  • A large number of round trips to the Windows Azure Storage service.

I had expected the large number of round trips, since I hadn't done anything yet to tune the queries. However, I was very surprised to see the high CPU usage.

Very high CPU usage when retrieving a large amount of data from Windows Azure Table Storage

My code retrieves a couple of thousand performance monitor samples from the WADPerformanceCountersTable filled in by the Windows Azure Diagnostics subsytem. These are fairly small records, and the number is not that high for todays hardware, so this shouldn't be a problem CPU wise. After profiling, I got the following results:

System.Data.Services.Client.DataServiceContext.ResolveTypeFromName(string,class System.Type) 91,06%

Current function
System.Data.Services.Client.ClientType.ResolveFromName(string,class System.Type)	91,06%

Functions that were called by System.Data.Services.Client.ClientType.ResolveFromName(string,class System.Type) 
[mscorwks.dll]	0,45%
System.AppDomain.GetAssemblies()	0,08%
System.Data.Services.Client.ClientType.ResolveSubclass(string,class System.Type,class System.Type,class System.Type&) 44,45%
System.Reflection.Assembly.GetType(string,bool) 2,52%
System.Reflection.Assembly.GetTypes() 42,91%
    

Apparently there's a serious performance issue resolving the types used in the query. Some googling brought me to this blog about ADO.Net performance tips for Azure tables. The article is about a previous CTP version, but the results seem to apply to the current RTM release of the Windows Azure StorageClient. The issue is actually related to the ADO.Net Data Service Client library, which is used as base implementation by the Windows Azure StorageClient. If your TableServiceContext derived class doesn't explicitly provide the information to resolve the type used in the query, the default mechanism is used, which seems to consume an awfull lot of CPU power. So the solution is simple: Always be explicit about which data type the results from you Azure Table query should be mapped, like you can see in the following code:

    internal class PerformanceCounterSampleContext : TableServiceContext
    {
        internal PerformanceCounterSampleContext(string uri, StorageCredentials creds)
            : base(uri, creds)
        {
            this.ResolveType = ResolveEntityType;
        }

        internal const string TableName = "WADPerformanceCountersTable";

        public IQueryable<performancecountersampleentity> PerformanceCounters { get { return this.CreateQuery<performancecountersampleentity>(TableName); } } public Type ResolveEntityType(string name) { var type = typeof (PerformanceCounterSampleEntity); return type; } } </performancecountersampleentity></performancecountersampleentity>

The crucial part in this sample is the line

            this.ResolveType = ResolveEntityType;

which explicitly tells the ADO.Net Data Services Client to map any record from the WADPerformanceCountersTable to my performance counter entity class:

        public Type ResolveEntityType(string name)
        {
            var type = typeof (PerformanceCounterSampleEntity);
            return type;
        }

According to Yanan's post, the performance issue should not appear if your Azure Table Name is exactly the same as your entity class name to which that table is mapped. He mentions an overhead of 8 to 16 ms per record, which is awfully high!

In my test case, the execution time dropped from 40 seconds (where my client code was using 100% of a single core between two requests to Windows Azure Storage), to 20 seconds (with practictically 0% CPU usage) with the explicit type resolving fix. That particular query returned 3500 performance counter samples, which amounts to 5 milliseconds overhead per record! On my live Windows Azure site, the actual queries were returning even more data, so a single request could easily consume 100 CPU seconds in my Windows Azure instance! That kind of performance overhead can easily bring down an entire site, so it is absolutely mandatory to take this issue into account. In the case of the Windows Azure Diagnostics, I can't image anybody would name their entity class "WADPerformanceCountersTable". In combination with the fact that when you're using the performance counter logging feature of Windows Azure Diagnostics you're most likely going to create and read a high number of records, it is practically unavoidable that you will hit this performance issue. Luckily it can easily be solved with a couple of lines of code.

The Windows Azure Diagnostics documentation only talks about getting the data in Azure Storage and not about retrieving the actual data. The StorageClient section on MSDN also doesn't contain any samples. With the lack of documentation and guidance, it is easy to encounter these performance issues.

A large number of round trips to the Windows Azure Storage service

Retrieving these performance samples seems to require a lot of round trips and data traffic. In the test case I profiled (see above), retrieving the 3500 sample resulted in the following network traffice (as seen from a Fiddler trace):

#	URL                                                                             Size	
1	/WADPerformanceCountersTable()?$filter=...	                                602	
2	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0ODgyOTIwMDAwMDAwMA--&amp;NextRowKey=...	602
3	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0ODg5NTIwMDAwMDAwMA--	                602
4	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0OTAyNzgwMDAwMDAwMA--&amp;NextRowKey=...	1.612.996
5	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0OTA0MDQwMDAwMDAwMA--&amp;NextRowKey=...	1.612.909
6	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0OTA1MDAwMDAwMDAwMA--&amp;NextRowKey=...	1.612.982
7	/WADPerformanceCountersTable()?$filter=...&amp;NextPartitionKey=1!28!MDYzMzk0OTA2MDIwMDAwMDAwMA--&amp;NextRowKey=...	734.151

From the URL parameters you can see how a table scan on Windows Azure Table Storage works (it is also more graphically explained in the PDC session on Table Storage). The query starts and doesn't find any results in the first partitions (request 1). The next requests use continuation tokens to keep scanning over the table, hopping from one partition to the next partition on each round trip. It is needless to say that Azure Table Storage table scans should be prevented as much as possible. A table scan that does a round trip to the Azure Table Storage on every partition change will be terrible for performance. Unfortunately this is the way the partitioning is designed in the Windows Azure Diagnostics system, because all performance samples have a partition key that is derived from the timestamp of the performance counter sample, so that performance counter samples that occur close to each other will also be located in the same partition. This is good if you want to examine a short period of time (although the query will have to be optimised to hit only the partitions known to contain relevant data, but I haven't done that yet), but is not good if you want to examine performance over a longer period, because you'd have to scan over a large number of partitions. Cleary there's room for third party vendors to make sense of all the raw diagnostics data available in Windows Azure. IConstructions might even provide such a product in the (near) future.

What is also troublesome is the high amount of data transferred to retrieve these samples. These 7 roundtrips return 5.6 MB of data and contain about 3500 samples, or about 1.6KB per trace sample. When I tried this out on the local devfabric, the query took about 20 seconds in total. I expected that this would be much better once my code was deployed on Windows Azure, but that didn't make much difference. Another reason for third party software developers to step in and provide a mechanism that allows easy and fast monitoring of Windows Azure performance counter data.