Azure CosmosDB stores spatial data type in GeoJson format. Spatial Data comes in many formats, the most common format types are; WKT (Well Known Text), KML (Keyhole Markup Language by Google), GeoJson and Shape files by Esri. It's good to know these formats even CosmosDB does not support all of them because spatial data you might want to import into CosmosDB might be in one of these formats and you might need to convert them to GeoJson format to store in CosmosDB.
Importing Spatial Data into CosmosDB can be a challenge. CosmosDB is not a relational database and you may need to change your data model structure to add spatial data. You cannot create a new container for spatial data and plan to join this container to your other containers. There are free tools which might help with GeoJson conversion, but you may still need to add converted geoJson data into your data models. Spatial data becomes very powerful when you find a way to join it with your application's data.
In the following example, I am going to download the hurricanes from NOAA website. Data is in CSV format so we may need to transform data to create a good data model for CosmosDB. I downloaded the all hurricane data for 2005 which was the year of Katrina hurricane. First thing I did, was to change the name of columns and make them more user-friendly. I have used the following names for columns. Here is a sample row from the CSV file.
Serial_Num | 2005236N23285 |
Year | 2005 |
Num | 12 |
Basin | NA |
Sub_Basin | NA |
Name | KATRINA |
ISO_Time | 8/23/2005 00:00 |
Nature | TS |
Latitude | 23.4 |
Longtitude | -75.7 |
Wind_WMO | 30 |
Pres_WMO | 1007 |
Center | atcf |
Wind_WMO_Percentile | 17.645 |
Pres_WMO_ Percentile | 21.551 |
Track_Type | main |
As you can see, I have the latitude and longitude locations of a hurricane in each row. If you want to be able to run any spatial functions in CosmosDB, you need to save spatial information in geoJson format that means we need to make this data look like following example.
coordinates : [Longtitude, Latitude]
"location": {
"type":"Point",
"coordinates":[ 31.9, -4.8 ]
}
To achieve this, I created two new columns in CSV file named (location.type and location.coordinates) I filled the values of location.type as Point. To generate valid coordination syntax, I used Excel's CONCATENATE function.
=CONCATENATE("[",I2,",",J2,"]") generates location.coordinates
I am ready to use CosmosDB Data Migration Tool. We are going to have one problem when we try to push this data to CosmosDB. We cannot specify the datatypes in CSV files. location.coordinates needs to be a JSON Array and it will get imported as text when we use the CosmosDB Data Migration tool. You need to override one of the sprocs to define location.coordinates as Json Array. I already have a post about this. You can find how to override the sproc and change data type of location.coordinates step by step.
After pushing data to CosmosDB, I go to the Data Explorer to see how my data looks like. As you can see, I have a valid geoJson object in my data model and coordinates is json array.
Now, we are ready to test CosmosDB spatial functions. There are 5 spatial functions you can use.
ST_DISTANCE(spatial expression, spatial expression)
It takes two arguments and finds the distance between two spatial objects in meters. You can make range search with it if you use it in the where clause. For example, Let's find the hurricanes got close to New Orleans in 2005. First, I need the coordination of New Orleans, I got the coordination's of New Orleans from Google maps.
SELECT h.Name,h.Year,h.Wind_WMO,
ST_DISTANCE(h.location, {'type': 'Point', 'coordinates':[-90.0,29.9]}) distance
FROM Hurricanes h
WHERE ST_DISTANCE(h.location, {'type': 'Point', 'coordinates':[-90,29.9]}) < 60000
This query lists all the hurricanes that got close (60km) to New Orleans. You can import your customers locations into database and find which customers might be affected by hurricanes in this way.
ST_WITHIN(spatial obj, spatial obj)
This function returns true if the first spatial object completely within in the second one. To demo this, I generate a polygon around New Orleans and find if there were any hurricane was in this area. I use the this polyline application to do this. This polygon can come from your user from your website. That's how Zillow makes a search when you select an area on map to search available houses. Also, you can download states, cities polygons from Census bureau and use this function by using those polygons. If I had states in my database, I could search all hurricanes in that state/county/zip etc..
SELECT h.Name,h.Year,h.Wind_WMO
FROM Hurricanes h
WHERE ST_WITHIN(h.location, {'{"type": "Polygon","coordinates": [[[-90.0280152, 30.265521],[-90.0335083,30.26315],[-90.0032959,29.3532687],[-89.1930542,29.3460904],[-89.2067871,30.267892],[-90.0280152,30.265521]]]})
This query finds all hurricanes in given polygon. It looks like I have 4 hurricanes.
ST_INTERSECTS(spatial obj, spatial obj)
I use this function frequently in SQL Server. It is great to see that Azure CosmosDB supports it too. This function returns true if the given spatial objects intersects. This is a great function to use in your where clauses. For example, if you have a polygon of a city and you have a location of something you can call this function to see if location of something is in city or not. I am going to use the same object I used for ST_WITHIN() function. I have the same polygon, I am going to use hurricane location data, Let's see if any hurricanes intersected in this area.
SELECT h.Name,h.Year,h.Wind_WMO
FROM Hurricanes h
WHERE ST_INTERSECTS(h.location, {'{"type": "Polygon","coordinates": [[[-90.0280152, 30.265521],[-90.0335083,30.26315],[-90.0032959,29.3532687],[-89.1930542,29.3460904],[-89.2067871,30.267892],[-90.0280152,30.265521]]]})
I get the same data with ST_WITHIN() function, because the hurricanes intersecting with given polygon.
ST_ISVALID(spatial obj) and ST_ISVALIDDETAILED(spatial obj)
These two functions check if given spatial object is a valid spatial object. ST_ISVALID returns a Boolean value, ST_ISVALIDDETAILED returns detailed information.
SELECT ST_ISVALID({ "type": "Point", "coordinates": [31.9, -190] }) as isvalid
SELECT ST_ISVALIDDETAILED({ "type": "Point", "coordinates": [31.9, -190] }) as isvalid
No comments:
Post a Comment