Thursday, January 14, 2021

How to retrieve data from Azure Cosmos DB Analytical Storage

 

     Analyzing data in Azure Cosmos DB by using one of its API used to be the only way to analyze data. Using OLTP Database as source of your report is not a good idea for many good reasons. Azure Cosmos DB delivered a new storage system to fix this problem, You can read the details about it in my earlier post, In this post I will help you understand how to query this new storage system.

     First thing you need to know is, you need to use Azure Synapse to access to the storage system. You have two ways to retrieve data, you can use Serverless SQL pool and use OpenRowSet to get the data or you can use Apache spark pool if you are familiar with PySpark. I will focus on Serverless SQL pool in this post.

    To create Azure Synapse workspace, type Synapse to the search box of Azure. Select Azure Synapse Analytics to continue.


     You should see the following page, click on Add to create a new Azure Synapse Workspace.

     
     Just like creating any other feature of Azure, you need to specify couple of things to continue. You can define SQL admin credentials in the Security tab. SQL Pool will use these credentials.


    When you click Review + Create button, Azure will deploy your service. When it is completed, you can see your new workspace in Azure Synapse section.


     
     So far, we created a new Azure Synapse Analytics workspace. We are ready to make a connection to one of our Cosmos DB databases. To do that, click on the workspace first. You will see the summary of the workspace with couple of buttons. There is already a built-in SQL pool in your workspace, so you don't need to create a new one. Click on the Open Synapse Studio to define connections by using the built-in SQL pool.


     
     Azure Synapse Studio will open in a new tab. We are ready to define the Cosmos DB connections. Click on Data icon in the right side the click on + icon and select the Connect to external data. I am not sure why Microsoft uses External Data term here. At the end all the options you will see as "External Data" are Azure data products. I guess "External Data" term would not bother me that much if I was configuring a linked server or Polybase in my local SQL server.



     You will see the following options in your right side. You can select Mongo DB API and SQL API for now. I want to connect to my SQL API Cosmos DB Database, so I pick the Azure Cosmos DB(SQL API)


     You need to select the location of your database and click Create to continue.


     
You will get a confirmation message saying that connection is created. Your Cosmos Db connection should be under Linked Tab. I had to refresh the page to see it there. Now, we have the connection to Cosmos Db database, and we can retrieve data from Cosmos DB's Analytical Storage.



     To start querying the Cosmos DB's Analytical Storage, click on Home button and select New option. There are couple of options here,  Let's pick SQL Script to keep things easy for this for now.



     
We need to use OPENROWSET to retrieve data from Cosmos DB. We need to include credentials in the query. There are two options to do that. I pick the first one in the following example.





   Here is my query and its results. You can control which containers can have Analytical Storage, If the container you are trying to access does not have analytical storage you will get an error in this part.








No comments:

Post a Comment