Monday, July 13, 2020

How to do case insensitive searches on Azure Cosmos DB?



     Data gets saved as JSON in Azure Cosmos DB. JSON documents supports string, numeric and Boolean data types. Because of limited data types, you might end up picking string data type for most of your data. You may need to use Cosmos DB's string functions to help you in many situations. For example; you might need to find names start with bunch of characters or you might need to find data ends like a word or bunch of characters. You may need to find all data that has address contains a street name. What do you do in these cases? Azure Cosmos DB has couple of system functions to help you, I will focus on the following functions in this post. These are the functions just got updated in June 2020.
  • Contains
  • EndsWith
  • StartWith
     Before we go in details, we need to cover an important topic. Usually when you use functions in databases, you trigger a table scan. Database needs to run the function for every row to evaluate the result of function. As you might guess, that will take long time and you will end up with slow queries and locks if you use SQL Server. What about Cosmos DB? Good news about Cosmos DB is, functions will not cause any locks since there is no locks in Cosmos DB but they might cause a scan, this will cause large Request Units. It will make your Cosmos DB bills higher.

     In June, an interesting update came from awesome Cosmos DB team. They announced that Contains and EndsWith functions will work much faster in SQL API. Thanks to this performance improvement, when you run a query with Contains or EndsWith you will end up with 99% Request Unit decreases. All you need to do is to include string properties in your indexing policy.

     Here is an example. I have 250.000 documents in my container. Following query works like T-SQL LIKE command. If we write this query in SQL Server, it will be like

SELECT * FROM Table
WHERE PostBody LIKE '%SQL%'

In Cosmos DB

     It found 496 documents and it costed 146.76 Request Units. This is great! Rather than scanning the whole table and cost thousands of Request Units, now Contains and EndsWith uses indexes to find data and it reduces the Request Unit significantly. But wait! Don't go anywhere yet, I have one more great news! Now, you can specify if this search should be case sensitive or not. Before this update, probably you had to use LOWER() or UPPER() functions to deal with case sensitive problems. As you might know, both functions will cause table scan, and you will end up with really high Request Units.

     Now you can specify if you want to ignore case sensitivity or not in String functions! Here is another example. By passing true to Contains functions, I am forcing it to do a case-insensitive search.



To make a case-sensitive, you can pass false or don't pass any value as third parameter since false is the default value.


1 comment: