Hasan Savran: Introduction to Cosmos DB Indexes

INDEXING IN COSMOS DB

In SQL Server, you need to create indexes manually. SQL Server likes to suggest indexes, but it does not create them for you. After seeing some of those suggestions from SQL Server, I am glad it does not create indexes itself. By Default, CosmosDB indexes everything. You don’t need to worry about missing indexes in your containers. Does that mean we don't need to worry about indexes anymore in CosmosDB? Short answer is No! you still need to adjust settings of CosmosDB indexing policies mainly for saving money.

In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It's kind of same thing at the end. You might ask "If everything is indexed and working fine, why do you want me to poke the well running system?" When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.

You can save good amount of money by customizing your database’s Indexing Policy. You can check or update your current Indexing Policy from the Azure portal. by clicking Scale & Settings link under any of your containers.

Here is the default settings of Indexing Policy of my container.

Indexing Mode

This property tells CosmosDB when to change indexing for your data. You need to pick one of the following options for this property.

Consistent: This is the default setting. CosmosDB synchronously updates the indexes when your application changes any data.

Lazy: CosmosDB maintains your indexes eventually not as soon as you change data in your container. By just changing indexingMode to Lazy, you can save 20 to 30 percent Request Units!

None: By selecting this option, you are telling CosmosDB to not Index anything. This makes sense if you have a solution which uses CosmosDB as key-value storage.

Automatic: Default option is true, CosmosDB handles what needs to index by selecting true. If you select false, you need to tell CosmosDB what needs to index by using SDK in each transaction. I recommend keeping this true.

includedPaths & exludedPaths

You need to use these properties to tell CosmosDB which nodes should or should not be indexed. You don’t need to list every column thanks to wildcard support. Custom paths should start with “/” and typically end with “?” wildcard. For example, let’s say you have a node named "Comments", You store long text in this column and you are never going to use this column in any of your Where clauses. It makes sense to not index this column. To exclude this column from indexing, all you need to do is adding the following path to “exludedPaths”.


"excludedPaths": [

"path":  "/Posts/Comments/?"

]

Indexes Property

Kind: CosmosDB supports 3 types of indexes for data. Depending how you are going to filter data, you might want to change the kind of index on columns.

Hash Index: This index is used for equality indexes. It supports String or Number type of data.
Range Index: This index is used if you want to use range operators (> < !=) and ORDER By operator.
Spatial Index: This index is for GeoJSON documents in your tables. It supports Point, Polygon and LineString spatial objects.

Data Type: Probably you are going to pick String, or Number for this property. If you store spatial data in CosmosDB you can pick Point, Polygon, or LineString for this property.

Precision:
For number type of data, I would leave this alone and keep the maximum value which is -1. In Json documents, Number data type takes 8 bytes and by keeping precision -1, we keep it 8

Precision property might make more sense for string data types mainly because strings can be any length. For example, if you store State codes, and you are sure that data length will not be larger than 2 characters you can specify 2 for precision. Just be careful how specific you want to be with precision. You don’t want to go back and change your index policy when length of data change in your documents.

9 comments:

UnknownJanuary 9, 2019 at 2:58 PM
Thank you, wonderful.
KicsApril 13, 2019 at 11:00 PM
@hsavran: how will i handle sorting of queried data across multiple partition. I can do some aggregation at client side, but this will lead to slowdown, when i have data in TB's. Is there is any way, i can get sorted results across multiple partitions from CosmosDB itself.
Preetham ReddyMarch 4, 2020 at 9:48 PM
@Kics - The best way to query for items across multiple partitions is through a read optimized store like Azure Search. We use change feed feature of CosmosDB to sync changes incrementally to Azure Search. All read requests go to Azure Search while all state changes go to Cosmos DB. Works really well. Keep your throughput under control while getting rich querying/aggregation capabilities
DwaineApril 14, 2020 at 9:30 PM
Great article Hasan!
But I have a question on the mslearn material that I submitted feedback on. Doesn't sound like I won't get an answer there.

The issue is on page:
https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/8-index-exercise

The page states "For example, if we run a query for a customer id instead of State, the search consumes more RUs."

However, there is no logical explanation as to WHY that is the case! Neither State OR customer ID appears in the partial index listing. Could you please clarify the citation.
DwaineApril 14, 2020 at 9:31 PM
* Doesn't sound like I'll get an answer there. *
hsavranApril 15, 2020 at 8:53 AM
Dwaine,
I think Azure Cloud Shell does not help with demos in this page. It looks like there is a partial index file named index-partial.json
My guess is, this file changes policy and add State to the indexing policy.
I tried to find the file and view it but I couldn't find the file by using Azure Cloud Shell.
DwaineApril 15, 2020 at 5:25 PM
Sorry, I should have included the content of the file. (from https://github.com/MicrosoftDocs/mslearn-monitor-azure-cosmos-db/blob/master/ExerciseCosmosDB/IndexConfig/index-partial.json#L1)

{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/Item/id/?"
},
{
"path": "/Customer/email/?"
},
{
"path": "/OrderTime/?"
},
{
"path": "/Merchant/?"
},
{
"path": "/OrderStatus/?"
}
],
"excludedPaths": [
{
"path": "/"
}
]
}
LeeJune 5, 2020 at 12:26 PM
This comment has been removed by the author.
AnanthaAugust 28, 2022 at 12:12 AM
Thanks, love all your posts.

Tuesday, January 8, 2019

Introduction to Cosmos DB Indexes

9 comments: