SQL Server Polybase Services lets us to pull data from other data
resources by using T-SQL queries. SQL Server 2019 introduces new connectors for Polybase services like Oracle, Teradata and MongoDB. In one of the SQL Server 2019 presentations from Bob Ward, I
saw the CosmosDB logo when he was talking about the new connectors of SQL
Server 2019. CosmosDB already has an ODBC driver and you can use it as a datasource for your Power BI, SSIS or SSMS. SQL Server 2019 makes this connection easier by using the
Polybase services.
In this post, I am going to show you how to configure SQL Server 2019 Polybase to connect Azure CosmosDB Mongo databases. Before we start, you want to be sure that following requirements are available in your environment.
- Sql Server 2019 with Polybase services installed
- TCP/IP Protocol must be enabled.
- Polybase services must be enabled.
- Azure Cosmos DB Account with MongoDB API
Polybase Services has dependency on TCP/IP Protocol and that's why TCP/IP protocol is in the requirement list. Without TCP/IP, you can not enable the Polybase services in SQL Server. You can find TCP/IP status by checking the SQL Server Configuration Manager. If it is disabled, Enable the TCP/IP and restart the SQL Server service.
To use the Polybase Services, you need to enable it first. Enabling Polybase services is just like enabling the Machine Learning Services. Run the following script to enable it.
exec sp_configure @configname= 'polybase enable', @configvalue = 1
RECONFIGURE WITH OVERRIDE
exec sp_configure @configname= 'polybase enable'
You should see the following after you ran this script.
Before we start to setup the polybase services to connect CosmosDB, we need to have all information about our CosmosDB account. You need to use Azure Portal to get connection and user information of your CosmosDB. You can find this information in the Connection String tab.
Next, we can start to configure the Polybase by using SSMS. We need to create a master key first. Master Keys are used to protect the private keys of certificates in SQL Server. You can find more information about Master Key from here.
USE [WideWorldImporters]
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1234'
-- I am sure you will have a better password!
GO
Now, we need to use the CosmosDB connection information to create database credential. SQL Server 2019 will use these credentials to access CosmosDB. To do that, we are going to use CREATE DATABASE SCOPED CREDENTIAL command. We need to use this command to tell SQL Server how to access to the CosmosDB. You need to open your CosmosDB page in Azure Portal and click on the Connection String tab to get this information. Use Username and Primary Password fields for the following command from the Azure Portal.
CREATE DATABASE SCOPED CREDENTIAL CosmosDBCredentials
WITH IDENTITY = 'username from cosmosportal',
Secret = 'primary password from cosmosdb'
GO
We setup the credentials to access CosmosDB. Next, we need to define the location of our CosmosDB Mongo API. To do that, we are going to use CREATE EXTERNAL DATA SOURCE command. This command has a Location parameter which will have the information of your CosmosDB endpoint. You need to use your Azure Portal to find the endpoint. For the mongo database, endpoint should start with mongodb:// and it should continue with the host information and the port number.
CREATE EXTERNAL DATA SOURCE CosmosDB
WITH (
LOCATION = 'mongodb://url:port',
CREDENTIAL = CosmosDBCredentials
)
We are ready to map the CosmosDB table/s to SQL Server. To make things clear, Let's create a schema for CosmosDB tables and create the external tables in this schema.
CREATE SCHEMA cosmosdb
GO
So far, we defined how SQL Server is going to access to Cosmos DB by defining the location and the security access of CosmosDB. We are ready to tell which CosmosDB table we like to query by using Polybase services. To do that, we are going to use CREATE EXTERNAL TABLE command. You need to use this command for any other remote sources you like to use with Polybase or Elastic Database queries. The schema needs to match your CosmosDB table. Here is a document from my MongoDB in Azure CosmosDB.
I am going to use the schema of this document and create the following external table in SQL Server.
CREATE EXTERNAL TABLE CosmosDB.Orders
(
_id NVARCHAR(100) NOT NULL,
id NVARCHAR(100) NOT NULL,
OrderId int NOT NULL,
CustomerId int NOT NULL,
SalesPersonPersonId int NOT NULL,
CustomerName NVARCHAR(100) NOT NULL,
CustomerContact NVARCHAR(100) NOT NULL,
OrderDate NVARCHAR(100) NOT NULL
),
CustomerPO NVARCHAR(100) NOT NULL,
ExpectedDeliverDate NVARCHAR(100) NOT NULL)
WITH (
LOCATION='WideWorldImporters.Orders',
DATA_SOURCE = CosmosDB
)
We are set to query CosmosDB from SQL Server 2019. Before we do that, Let's create statistics on this external table by using the CREATE STATISTICS command.
CREATE STATISTICS CosmosDBOrdersStats ON CosmosDB.Orders (CustomerId) WITH FULLSCAN
That's it, we are ready to pull data from CosmosDB by using SQL Server 2019's Polybase services. All we need to do is, start querying it by using SELECT statement! Here is examples.
SELECT * FROM cosmosdb.Orders WHERE CustomerId = 10
There you go, we pulled data from CosmosDB to SQL Server, you can join this table to other tables in SQL Server too. This is a great feature but there are many questions on my head specially about Request Units. This procedure can be really expensive and I don't believe polybase services has any information or limit for CosmosDB Request Units. Potentially, polybase services might use most of the Request Units and your web application can have issues because of that. As I said before, It is a great feature but you should think twice before you use it and be sure you have enough Request Units!
This comment has been removed by the author.
ReplyDeleteHi Hasan, what are the components of LOCATION='WideWorldImporters.Orders. Can you define what makes up these values?
ReplyDelete