Monday, June 24, 2019

How to transform documents with CosmosDB Data Migration Tool


     Azure CosmosDB Data Migration Tool is a great application to import data into your CosmosDB databases. It supports the following sources.

Json Files CSV Files
SQL Server MongoDB
Azure Table DynamoDB
HBase CosmosDB

    All you need is the connection string of your database or location of your source files and your CosmosDB keys. If you are using a database as source, you can format the data model pretty easy. You might have issues if the source you use does not have a JSON data type. JSON Array might look like string in CosmosDB because of data type mapping problems.

     I am going to use CSV file as source in the following example. You can't define a column as JSON array in excel. I have the following two columns in my CSV file. If I import these values into CosmosDB as they are, I see coordinates field's data type will turn into a text field in CosmosDB. You can go back and try to update them in CosmosDB but that will be an expensive solution.


       Here what it looks like in CosmosDB if I insert it as it is. This is spatial data and coordinates should be array not text.


      To fix this problem, I am going to customize one of the sprocs of Data Migration Tool. Here is my Source Information tab. Nesting Seperator creates sub documents when it finds defined character. In this case, application will create a sub document when it finds period. I have two columns named location.type and location.coordinates. Data Migration tool will create a sub document named location and adds the column values under type and coordinate properties.



      I picked Bulk Import option in the target tab. Also, in the Advanced Options; I picked a javascript file which contains a Azure CosmosDB stored procedure. By default, there are two javascript files in the Data Migration Tool. (BulkInsert.js and BulkTransformationInsert.js) You can edit these stored procedures if you like. In my case, I want to edit BulkTransformationInsert file.



     I editted the transformDocument function in BulkTansformationInsert.js. I have the following code in it.  First I check if location column has any values. If it has, I get its value then parse its value to JSON. At the end, I take the parsed JSON value as result. This should convert text value to JSON array in CosmosDB.


function transformDocument(doc) {
 if (doc["location"]){
let currentval = doc["location"].coordinates;
if (currentval){
doc["location"].coordinates = JSON.parse(currentval);
}
    }       
 return doc;
}

     After I click on Next couple of times and Import button in CosmosDB Data Migration Tool, all data imported to CosmosDB and when I check my data, I can see coordinates comes up as valid JSON array.



3 comments:

  1. Pretty convenient tool, not only for transormin documents.

    ReplyDelete
  2. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area. AsSame as your blog i found another one Data Migration .Actually I was looking for the same information on internet for Data Migration Tools and came across your blog. I am impressed by the information that you have on this blog. Thanks once more for all the details.

    ReplyDelete
  3. Is there any way we can update the same data every time?

    ReplyDelete