Monday, July 1, 2019

Importing and using Spatial Data in Azure CosmosDB


    Azure CosmosDB stores spatial data type in GeoJson format. Spatial Data comes in many formats, the most common format types are; WKT (Well Known Text), KML (Keyhole Markup Language by Google), GeoJson and Shape files by Esri. It's good to know these formats even CosmosDB does not support all of them because spatial data you might want to import into CosmosDB might be in one of these formats and you might need to convert them to GeoJson format to store in CosmosDB.

    Importing Spatial Data into CosmosDB can be a challenge. CosmosDB is not a relational database and you may need to change your data model structure to add spatial data. You cannot create a new container for spatial data and plan to join this container to your other containers. There are free tools which might help with GeoJson conversion, but you may still need to add converted geoJson data into your data models. Spatial data becomes very powerful when you find a way to join it with your application's data.

    In the following example, I am going to download the hurricanes from NOAA website. Data is in CSV format so we may need to transform data to create a good data model for CosmosDB. I downloaded the all hurricane data for 2005 which was the year of Katrina hurricane. First thing I did, was to change the name of columns and make them more user-friendly. I have used the following names for columns. Here is a sample row from the CSV file.

Serial_Num 2005236N23285
Year 2005
Num 12
Basin NA
Sub_Basin NA
Name KATRINA
ISO_Time 8/23/2005 00:00
Nature TS
Latitude 23.4
Longtitude -75.7
Wind_WMO 30
Pres_WMO 1007
Center atcf
Wind_WMO_Percentile 17.645
Pres_WMO_ Percentile 21.551
Track_Type main

    As you can see, I have the latitude and longitude locations of a hurricane in each row. If you want to be able to run any spatial functions in CosmosDB, you need to save spatial information in geoJson format that means we need to make this data look like following example.

coordinates : [Longtitude, Latitude]

"location": { 
 "type":"Point", 
 "coordinates":[ 31.9, -4.8 ] 
}

    To achieve this, I created two new columns in CSV file named (location.type and location.coordinates) I filled the values of location.type as Point. To generate valid coordination syntax, I used Excel's CONCATENATE function.

=CONCATENATE("[",I2,",",J2,"]") generates location.coordinates



     I am ready to use CosmosDB Data Migration Tool.  We are going to have one problem when we try to push this data to CosmosDB. We cannot specify the datatypes in CSV files. location.coordinates needs to be a JSON Array and it will get imported as text when we use the CosmosDB Data Migration tool. You need to override one of the sprocs to define location.coordinates as Json Array. I already have a post about this. You can find how to override the sproc and change data type of location.coordinates step by step.

    After pushing data to CosmosDB, I go to the Data Explorer to see how my data looks like. As you can see, I have a valid geoJson object in my data model and coordinates is json array.



Now, we are ready to test CosmosDB spatial functions. There are 5 spatial functions you can use.

ST_DISTANCE(spatial expression, spatial expression)
    It takes two arguments and finds the distance between two spatial objects in meters. You can make range search with it if you use it in the where clause. For example, Let's find the hurricanes got close to New Orleans in 2005. First, I need the coordination of New Orleans, I got the coordination's of New Orleans from Google maps.

SELECT h.Name,h.Year,h.Wind_WMO, 
ST_DISTANCE(h.location, {'type': 'Point', 'coordinates':[-90.0,29.9]}) distance 
FROM Hurricanes h 
WHERE ST_DISTANCE(h.location, {'type': 'Point', 'coordinates':[-90,29.9]}) < 60000

This query lists all the hurricanes that got close (60km) to New Orleans. You can import your customers locations into database and find which customers might be affected by hurricanes in this way.

ST_WITHIN(spatial obj, spatial obj)
    This function returns true if the first spatial object completely within in the second one. To demo this, I generate a polygon around New Orleans and find if there were any hurricane was in this area. I use the this polyline application to do this. This polygon can come from your user from your website. That's how Zillow makes a search when you select an area on map to search available houses. Also, you can download states, cities polygons from Census bureau and use this function by using those polygons. If I had states in my database, I could search all hurricanes in that state/county/zip etc..


SELECT h.Name,h.Year,h.Wind_WMO
FROM Hurricanes h 
WHERE ST_WITHIN(h.location, {'{"type": "Polygon","coordinates": [[[-90.0280152, 30.265521],[-90.0335083,30.26315],[-90.0032959,29.3532687],[-89.1930542,29.3460904],[-89.2067871,30.267892],[-90.0280152,30.265521]]]})

This query finds all hurricanes in given polygon. It looks like I have 4 hurricanes.

ST_INTERSECTS(spatial obj, spatial obj)

    I use this function frequently in SQL Server. It is great to see that Azure CosmosDB supports it too. This function returns true if the given spatial objects intersects. This is a great function to use in your where clauses. For example, if you have a polygon of a city and you have a location of something you can call this function to see if location of something is in city or not. I am going to use the same object I used for ST_WITHIN() function. I have the same polygon, I am going to use hurricane location data, Let's see if any hurricanes intersected in this area.

SELECT h.Name,h.Year,h.Wind_WMO
FROM Hurricanes h 
WHERE ST_INTERSECTS(h.location, {'{"type": "Polygon","coordinates": [[[-90.0280152, 30.265521],[-90.0335083,30.26315],[-90.0032959,29.3532687],[-89.1930542,29.3460904],[-89.2067871,30.267892],[-90.0280152,30.265521]]]})

I get the same data with ST_WITHIN() function, because the hurricanes intersecting with given polygon.

ST_ISVALID(spatial obj) and ST_ISVALIDDETAILED(spatial obj)

    These two functions check if given spatial object is a valid spatial object. ST_ISVALID returns a Boolean value, ST_ISVALIDDETAILED returns detailed information.

SELECT ST_ISVALID({ "type": "Point", "coordinates": [31.9, -190] }) as isvalid

SELECT ST_ISVALIDDETAILED({ "type": "Point", "coordinates": [31.9, -190] }) as isvalid
    That's all I have for you about the spatial data in CosmosDB, you can create great solutions by using spatial data in your data models. It's great to see that CosmosDB supports Spatial Data and Spatial functions. Currently, the number of spatial functions is limited in CosmosDB. I hope to see more spatial functions in CosmosDB in future.

No comments:

Post a Comment