INDEXING IN COSMOS DB
In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It's kind of same thing at the end. You might ask "If everything is indexed and working fine, why do you want me to poke the well running system?" When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.
You can save good amount of money by customizing your database’s Indexing Policy. You can check or update your current Indexing Policy from the Azure portal. by clicking Scale & Settings link under any of your containers.
Here is the default settings of Indexing Policy of my container.
Indexing Mode
This property tells CosmosDB when to change indexing for your data. You need to pick one of the following options for this property.Consistent: This is the default setting. CosmosDB synchronously updates the indexes when your application changes any data.
Lazy: CosmosDB maintains your indexes eventually not as soon as you change data in your container. By just changing indexingMode to Lazy, you can save 20 to 30 percent Request Units!
None: By selecting this option, you are telling CosmosDB to not Index anything. This makes sense if you have a solution which uses CosmosDB as key-value storage.
Automatic: Default option is true, CosmosDB handles what needs to index by selecting true. If you select false, you need to tell CosmosDB what needs to index by using SDK in each transaction. I recommend keeping this true.
includedPaths & exludedPaths
"excludedPaths": [
"path": "/Posts/Comments/?"
]
Indexes Property
- Hash Index: This index is used for equality indexes. It supports String or Number type of data.
- Range Index: This index is used if you want to use range operators (> < !=) and ORDER By operator.
- Spatial Index: This index is for GeoJSON documents in your tables. It supports Point, Polygon and LineString spatial objects.
Data Type: Probably you are going to pick String, or Number for this property. If you store spatial data in CosmosDB you can pick Point, Polygon, or LineString for this property.
Precision:
For number type of data, I would leave this alone and keep the maximum value which is -1. In Json documents, Number data type takes 8 bytes and by keeping precision -1, we keep it 8
Precision property might make more sense for string data types mainly because strings can be any length. For example, if you store State codes, and you are sure that data length will not be larger than 2 characters you can specify 2 for precision. Just be careful how specific you want to be with precision. You don’t want to go back and change your index policy when length of data change in your documents.
Thank you, wonderful.
ReplyDelete@hsavran: how will i handle sorting of queried data across multiple partition. I can do some aggregation at client side, but this will lead to slowdown, when i have data in TB's. Is there is any way, i can get sorted results across multiple partitions from CosmosDB itself.
ReplyDelete@Kics - The best way to query for items across multiple partitions is through a read optimized store like Azure Search. We use change feed feature of CosmosDB to sync changes incrementally to Azure Search. All read requests go to Azure Search while all state changes go to Cosmos DB. Works really well. Keep your throughput under control while getting rich querying/aggregation capabilities
ReplyDeleteGreat article Hasan!
ReplyDeleteBut I have a question on the mslearn material that I submitted feedback on. Doesn't sound like I won't get an answer there.
The issue is on page:
https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/8-index-exercise
The page states "For example, if we run a query for a customer id instead of State, the search consumes more RUs."
However, there is no logical explanation as to WHY that is the case! Neither State OR customer ID appears in the partial index listing. Could you please clarify the citation.
* Doesn't sound like I'll get an answer there. *
ReplyDeleteDwaine,
ReplyDeleteI think Azure Cloud Shell does not help with demos in this page. It looks like there is a partial index file named index-partial.json
My guess is, this file changes policy and add State to the indexing policy.
I tried to find the file and view it but I couldn't find the file by using Azure Cloud Shell.
Sorry, I should have included the content of the file. (from https://github.com/MicrosoftDocs/mslearn-monitor-azure-cosmos-db/blob/master/ExerciseCosmosDB/IndexConfig/index-partial.json#L1)
ReplyDelete{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/Item/id/?"
},
{
"path": "/Customer/email/?"
},
{
"path": "/OrderTime/?"
},
{
"path": "/Merchant/?"
},
{
"path": "/OrderStatus/?"
}
],
"excludedPaths": [
{
"path": "/"
}
]
}
This comment has been removed by the author.
ReplyDeleteThanks, love all your posts.
ReplyDelete