Wednesday, July 26, 2023

How to perform a Full-Text Search in Azure Cosmos DB

     Incorporating Full-Text Search functionality into your application can enable users to locate what they are searching for effortlessly. Searching for specific words or phrases within a database has always been a difficulty, particularly for relational databases. Throughout my career, I've had countless discussions/arguments with DBAs about the importance of implementing full-text search in a relational database. We are in totally different times now, and users want to search by voice, image, or video.

     Full-Text Search functionality is not part of Azure Cosmos DB's Database Engine. Firstly, we must establish the Azure Cognitive Search service and link the data from Azure Cosmos DB to the Search Service. The process of setting up Azure Cognitive Search is relatively straightforward. Like other Azure services, you will need to answer similar types of questions beforehand. (Subscription, Resource Group, a name for the service, region, and tier)

     I selected the Basic for Tier option. You can select the Free option if you want to try it.



     Once the service is available, you must link your Azure Cosmos DB database to the Cognitive Search service. Go to the Overview page, where you can locate the Import icon, which will direct you to the Import Data Setup page.


     For optimal performance, it is recommended that you set the indexing type of your Azure Cosmos DB collection to Consistent. "Consistent" means that your data file and index file will always be synchronized. As soon as data changes, the index file changes. 

    When running a Query, the default setting selects all content within a document. However, it's important to note that Cognitive Search is not free, and there are storage limits for each tier level. To avoid exceeding these limits, only searchable fields in the Query should be included. In my data model, I have chosen to include only the PostId, PostBody, Title, and Tags for Cognitive Search.


     The Query box does not support the DISTINCT and the GROUP BY keywords. Try to flatten all the nested properties or arrays.



You may have noticed that I utilized the IS_DEFINED function to ensure that there is a value present for the PostId property. In order for the dataset to be compatible with Cognitive Search, it must have a key field. I have designated PostId as the key field. If PostId is undefined or null, the Cognitive Search indexer cannot function properly.



On the last page, you have the option to set the frequency of data imports into Cognitive Search. I have selected "Once" for now.



    Once you have clicked the Submit button, the Import Data process will begin. You can keep track of the progress by clicking on the Refresh button to view the most recent progress data.



     Once Indexes is finished, you can easily search using the Azure Portal through the Search Explorer option. The screenshot below displays the full-text search results for "SQL Server" utilizing Azure Cognitive Search.


     The PostId property is the partition key of my Azure Cosmos DB container. I included the id property in the Azure Cognitive index, too. This means I can make a point-read call to get a document from Azure Cosmos DB. Point-Read is the cheapest way to retrieve data from Azure Cosmos DB, and you should use it as much as you can to keep your Cosmos DB requests low.

No comments:

Post a Comment