Friday, December 28, 2018

How to tune up Cosmos DB queries

        This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

       First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

      Let's look at the Cosmos DB Query Processor first. Here is a diagram I prepared for my Cosmos DB talks about the query execution in Cosmos DB.

       As you can see, Cosmos DB is going to check if you are using the partition key in your query first. If you pick wrong partition key when you configured your Cosmos DB container or for some reason if you cannot use the right partition key in your query guess what's going to happen? This query is going to cost you more!

      If you pick the right partition key and you still want to tune up your query like using SQL Server hints. You can do that by passing FeedOptions from SDK 2.x. FeedOptions are similar to SQL Server hints. If you want to use them, you better know what you are doing. Here is the list of available FeedOptions in .NET environment.

     I want to make one thing clear here. When you use HINTS in SQL Server, you are forcing SQL Server query processor to run your query in a special way. Some of the following options for Cosmos DB are for client side not for Cosmos DB Query Processor. For example, when you use MaxDegreeOfParellelism, you are setting max number of parallelism in client side not in Cosmos DB Query processor!

EnableCrossPartitionQuery
Set this true if you want your query to execute more than one partition. This is a useful option when you are in developing the queries for performance tradeoffs.
EnableScanInQuery
You must set this true if you have custom indexing policies, but you want to run the query with scan. This works only if you have custom indexing policy.
MaxItemCount
Keep this -1 to let the server manager the number of items to return per round trip to the server. You can keep this number small to retrieve a small number of items to return per round trip. Default is 100
MaxBufferedItemCount
This setting is for client side, It controls the memory consumption of client when performing cross-partition ORDER BY. You can reduce the latency of cross-partition sorting by using a higher value.
MaxDegreeOfParallelism
This setting is for client side. Keep it -1 for system to decide the number of concurrent operations to run in client side. You can limit to a specific number of concurrent operations by specifying a positive number.
PopulateQueryMetrics
You can get all kind of information about the query execution like compilation time, index loop time, load time, etc by setting this value true. This is a great parameter specially when in developing the queries.
RequestContinuation
Cosmos DB Queries return continuation token. You can pass a continuation token to resume query execution by using this parameter.
ResponseContinuationTokenLimitInKb
This is a useful setting if your application host has response header size limit. You can control the max size of continuation token returned by the server by specifying a number in KB. This setting may increase the overall duration and Request Unit cost consumed for the query since Cosmos DB needs to do additional work.

       Now, Let's try to use some of these options. First, I will try the SDK 3. You can define the MaxItemCount and RequestContinuation by using CreateItemQuery function. In the following example, I am overriding the maxItemCount parameter. SDK 3 is still in preview mode and it looks like RequestContinuation is becoming continuationToken.

var query = new CosmosSqlQueryDefinition("SELECT * FROM Orders o WHERE o.OrderId = @orderid").UseParameter("@orderid", orderid); 
var queryResultSetIterator = container.Items.CreateItemQuery<Order>(sqlQueryDefinition : query, partitionKey: partitionkey, maxItemCount:150, continuationToken: null);

        To specify the rest of the options in SDK 3, We need to declare CosmosQueryRequestOptions object. As today CosmosQueryRequestOptions do not support all these options since this version is still in Preview version.

      To use the same options in 2.x version, we need to use the FeedOptions object. Here is an example to do that. All you need to do is declaring FeedOptions with the options you like to use and pass it to CreateDocumentQuery function.

var query = client.CreateDocumentQuery(UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName), "SELECT * FROM Orders o WHERE o.OrderId = 100"
new FeedOptions
 PopulateQueryMetrics = true
 MaxItemCount = -1, 
 MaxDegreeOfParallelism = -1, 
 EnableCrossPartitionQuery = true }).AsDocumentQuery(); 
var result = await query.ExecuteNextAsync();

No comments:

Post a Comment