Tuesday, April 9, 2019

Why Partitioning is very important in Azure CosmosDB



      In any Introduction level CosmosDB talk, the Presenter will suggest you pay more attention to the Partitioning part of the talk. Microsoft wants you to create a great solution by using Azure CosmosDB and there are a lot of resources out there for developers. The challenge in Azure CosmosDB is, that you don't need a DBA in CosmosDB and developers may not pay attention to details like Partitioning when they create databases in Azure CosmosDB.
 
      It is crucial to pick the right partition key for your databases in CosmosDB simply because you cannot repartition your databases. If you pick the wrong partition key for your data model, it will be simply too late to change it later. You might have a good amount of data in your databases, which means you spent a good amount of RU to insert this data, and the only way to fix this problem will be to start from scratch and upload all data back again to CosmosDB with the right partition key.

     I like to use analogies in my talks and when it comes to Partitioning in CosmosDB, I like to use the Container ship example. (Latest Container technology has nothing to do with the container ship analogy, I am talking about literally a container ship) When you create a container in CosmosDB, in the backend CosmosDB creates physical partition/s for your data. You have no control over the size or the number of physical partitions in CosmosDB, all physical partitions are managed by Microsoft. A Container ship represents the physical partition that will keep your data. CosmosDB gives you this Container ship which can carry 50 GB of your data. If you will have more data than 50 GB, CosmosDB will give you another container ship with a 50GB limit which is another physical partition. There is no space limit.

    Azure CosmosDB does not tell you how you should store your data in this container ship. Since this is a container ship, you need to load your data in containers. You are in charge to figure out how to organize/group your data. CosmosDB does not care how small or large your containers are. You should care about that! You can put all your data in one big container if you like! This is the point where the problem begins, developers do not pay attention to how to organize or group their data in this stage. Your container ship needs to be balanced, your container sizes should be similar. You should not have one large container and hundreds of small containers. It will be longer, and more expensive to find the data you are looking for in a big container. Containers represent logical partitions, each logical partition can not be larger than 20 GB.

Don't organize your data like this!


    If you are carrying a car, you might want to organize your containers by VIN number. If you have an e-commerce website, you might want to organize your containers by userid. You might want to organize your containers by account number if you are carrying money. Be sure that, the id you are going to use to organize your data is always in your where clause of your queries. When you need more than 50GB for your data, CosmosDB is going to create new partition/s and It is going to start to move some of the containers from partition to partition in the background. If the containers which are your logical partitions are small, it will be easier to move from partition to partition for Azure CosmosDB which gives you better scalability in CosmosDB.

No comments:

Post a Comment