Like keyword is introduced to Azure Cosmos DB in January 2021. We had to use one of the following string search functions if we wanted to make a search for specific string pattern.
System Function | Description |
RegexMatch | You can use to check if a string contains a regular expression. |
Contains | You can use to check if a string contains a string value |
StringEquals | You can use to check if a string equals to specific string value. |
StartsWith | You can use to check if a string starts with a specific value. |
EndsWith | You can use to check if a string ends with a specific value. |
You can use the following wildcards with LIKE keyword.
Wildcard | Description |
% | Any string characters |
_ | Any single character |
[] | Any single character within the specified range. |
[^] | Any single character not within the specified range. |
So far, It looks and sounds like LIKE keyword works just like the LIKE keyword in SQL Server. Hold on that idea for a second until you see the following examples. Following example find all documents that have "SQL" in title.
It looks like I have 145 documents in my database. LIKE is a case-sensitive function. If I will search by "sql",you will get different results.
What if we do not want to be case-sensitive? CONTAINS keyword is here for situations like that.
You can search the same way by using LIKE, it does not look pretty but it is possible. On this example, LIKE keyword costed us little bit more than CONTAINS, it returns the same data.
Values of title property is not that large, what happens when we search a large text? I will use the PostBody property to make a search in the next example. This property has large text with HTML coding in it. Largest length is 15kb. How does LIKE will perform on this?
PostBody is an indexed property. Query is completed fast and it returned 1777 results, it was expensive too. I wonder if CONTAINS might give me a better Request Unit result. When I tried the CONTAINS function, I see the following results. The number of the documents count does not match. LIKE returned more data than CONTAINS function??? but why?
I contacted to Tim Sander from Azure Cosmos DB team to get answers. He explained me that Cosmos DB uses index file to run system string functions like (Contains, Like, RegexMatch or EndsWith) This is good news that means database is not running a table scan for these functions. The bad news is Cosmos DB indexes first 1KB of all textual properties. If property size is greater than 1 KB like my PostBody property, the query engine might not return all possible results. Cosmos DB will return only what is in index file. We can see that in CONTAINS functions, it didn't return all results and it costed almost %50 less Request Units.
Be careful If you want to use any string functions in a property that has large text (>1KB). Cosmos DB might not always return all the data as you saw in my examples.
No comments:
Post a Comment