Tuesday, December 17, 2019

Azure Cosmos DB Notebook : Installing Custom packages and Adding data into Panda dataframe


     You can use Jupiter Notebooks in Cosmos DB to analyze and visualize your data. Good news is, Jupiter Notebooks are internal, and you don't need to use a third party tool to use them. You can enable them when you are creating your Cosmos DB database.



     If your current database is older and it does not have Notebooks available. You can enable them from the following link in Data Explorer.


     After you enable the Notebook options, you are ready to analyze or visualize your data thanks to Python language and Python packages. Cosmos DB makes your life easy to write Python and install custom packages to use with your data. There are couple of great internal commands and wildcards you should know if you like to use Notebooks in Azure Cosmos DB. First one I want to introduce you is, %%sql command. This command lets you select data from your containers by using SQL API. You can select data and add it to your Python data frames. You need to define which database and container you want to use before you pass your query. Here is an example. In the following example, I want to use my database named Stackoverflow, and container named Posts. Then I pass the query.

%%sql --database Stackoverflow --container Posts 
SELECT p.PostId, p.CommentCount, p.OwnerDisplayName FROM p WHERE p.OwnerUserId = 1

This query returns the following data in a grid.


      If you like to add this data into a Panda data frame, you can add another magic parameter named
--output and pass your data frame name. Following command selects the same data and adds it your data frame named dframe1. From that point, you are good to go to analyze or visualize this data!


%%sql --database Stackoverflow --container Posts  --output dframe1
SELECT p.PostId, p.CommentCount, p.OwnerDisplayName FROM p WHERE p.OwnerUserId = 1

     If you have many queries, it might be annoying to define which database or container before each query. You can set default database and container with %database and %container commands.

%database Stackoverflow
%container Posts

%%sql
SELECT p.PostId, p.CommentCount, p.OwnerDisplayName FROM p WHERE p.OwnerUserId = 1

     This returns the same data, just to be sure that you ran them in separate cells. When you put all that code in the same cell, it throws error.



     Cosmos DB Notebooks come with a built-in set of packages. You can add custom packages to use in Cosmos DB Notebooks too. In the following example, I am installing the bokeh to use in my notebooks.  If you like to install a different package, just change the package name in the following code.

import sys 
!{sys.executable} -m pip install bokeh –user



No comments:

Post a Comment