Monday, July 22, 2019

Making smart decisions by using Graph Processing Tables and Spatial Data in SQL Server


     Today, I want to show you how Graph Processing Tables can make your data models flexible and smart. Let's say we work in a e-commerce company, we have many users and products just like Amazon. We also have many warehouses, same product might be located in multiple warehouses. Whenever we want to ship a product, we want to pick the closest warehouse to buyer. In this way, we should be able save good amount of money for shipping and products will arrive to our customers locations faster.

     I am planning to have three entity tables (Products, Users and Warehouses) and two relation tables (InStock and InProcess). Here is the script creates those tables in SQL Server 2017 or later.


DROP Table if exists products, warehouses, users, instock, inprocess

CREATE Table products (
id int primary key identity(1,1), 
name varchar(50)) as NODE

CREATE Table warehouses (
id int primary key identity(1,1), 
name varchar(50), 
location geography) as NODE

CREATE Table users (
id int primary key identity(1,1), 
name varchar(50), 
city varchar(15), 
citygeog geography) as NODE

CREATE Table instock (
INDEX Ix_instock_from ($from_id, $to_id)
) as EDGE

CREATE Table inprocess(
soldOn datetime
INDEX Ix_inprocess_from ($from_id, $to_id) INCLUDE(soldon)
) as EDGE

     Our schema is ready.As you can see, I have spatial data type column in Warehouses table and Users table. I found the latitude and longtitude of the cities from Google Maps for Warehouse table. For users table, I don't need exact location of users. If I have the city or zipcode of my users, I should be able to find latitude of longtitude that city or zipcode from Census Bureau's free datasets.
(If you have SQL Server 2017, You might get error when you try to create indexes in relation tables by using CREATE TABLE, to solve that problem use ALTER TABLE to create relation indexes in InStock and InProcess tables.)

     Next, I will insert some data in the entity tables. Warehouses table contains the location of each warehouse, also Users table contains the location of user's city location.

INSERT into products (name) values 
('Kindle Fire'),('Kindle Fire HD'),('Ipad'),('Ms Surface'), ('Dining Room Set'), 
('27" Monitor'),('Outdoor Furniture Set'),('Xbox'),('Ps4'),('Cell Phone')

INSERT into users values 
('Liz','Kansas City', geography::Point(39.0997, -94.5785, 4326)),
('Tina','Memphis',geography::Point(35.1495, -90.0489, 4326)),
('Tom','Cleveland',geography::Point(41.4993, -81.6943, 4326)),
('Bob','Boston',geography::Point(39.0997, -94.5785, 4326)),
('George','Virginia Beach',geography::Point(42.3600, -71.0588, 4326)),
('Lisa','Jacksonville',geography::Point(30.3321, -81.6556, 4326)),
('Mike','Las Vegas',geography::Point(36.1699, -115.1398, 4326)),
('Alec','Los Angeles',geography::Point(34.0522, -118.2436, 4326)),
('Anna','Salt Lake City',geography::Point(40.7607, -111.8910, 4326)),
('Avery','Chicago',geography::Point(41.8781, -87.6297, 4326)),

('Bridgette','Napa Valley',geography::Point(38.5024, -122.2653, 4326))

INSERT into warehouses (name, location) values
('Dallas', geography::Point(32.7767, -96.7970, 4326)),
('Detroit', geography::Point(42.3314, -83.0458, 4326)),
('Atlanta', geography::Point(33.7490, -84.3880, 4326)),
('Los Angeles', geography::Point(34.0522, -118.2437, 4326)),
('Denver', geography::Point(39.7392, -104.9903, 4326)),
('Minnesota', geography::Point(46.7296, -94.6859, 4326)),
('Seattle', geography::Point(47.6062, -122.3321, 4326)),

('New York', geography::Point(40.7128, -74.0060, 4326))

     We are ready to create relations between the entity tables. Following script creates relations between Warehouses and Products by using InStock relation. In this way, we know which product is available in which warehouse.

INSERT into instock values
(select $node_id from warehouses where id =1), 
(select $node_id from products where id = 2));

INSERT into instock values
(select $node_id from warehouses where id =1),
(select $node_id from products where id = 6)); 

INSERT into instock values
(select $node_id from warehouses where id =2), 
(select $node_id from products where id = 1)); 

INSERT into instock values
(select $node_id from warehouses where id =2), 
(select $node_id from products where id = 5)); 

INSERT into instock values
(select $node_id from warehouses where id =3), 
(select $node_id from products where id = 3));

INSERT into instock values
(select $node_id from warehouses where id =3),
(select $node_id from products where id = 4)); 

INSERT into instock values
(select $node_id from warehouses where id =4), 
(select $node_id from products where id = 4)); 

INSERT into instock values
(select $node_id from warehouses where id =4), 
(select $node_id from products where id = 2)); 

INSERT into instock values
(select $node_id from warehouses where id =5), 
(select $node_id from products where id = 5)); 

INSERT into instock values
(select $node_id from warehouses where id =5), 
(select $node_id from products where id = 3)); 

INSERT into instock values
(select $node_id from warehouses where id =6), 
(select $node_id from products where id = 6)); 

INSERT into instock values
(select $node_id from warehouses where id =6), 
(select $node_id from products where id = 5)); 

INSERT into instock values
(select $node_id from warehouses where id =7), 
(select $node_id from products where id = 5)); 

INSERT into instock values
(select $node_id from warehouses where id =7), 
(select $node_id from products where id = 6)); 

INSERT into instock values
(select $node_id from warehouses where id =8), 
(select $node_id from products where id = 2)); 

INSERT into instock values
(select $node_id from warehouses where id =8), 
(select $node_id from products where id = 4));

     Next, I will create some relations between Users and Products by using InProcess relation. This relation should be created by our web application when user completes a transaction and buy product or products. This will be a good point to remind you that you cannot update relations. You need to drop a relation and recreate it. You can update relation table's custom column not the $from_id and $to_id

INSERT into inprocess values (
(select $Node_id from users where id = 1), 
(select $Node_id from products where id =4), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 1), 
(select $Node_id from products where id =3), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 2), 
(select $Node_id from products where id =1), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 3), 
(select $Node_id from products where id =3), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 4), 
(select $Node_id from products where id =2), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 5), 
(select $Node_id from products where id =6), getdate());

INSERT into inprocess values (
(select $Node_id from users where id = 6), 
(select $Node_id from products where id =1), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 7), 
(select $Node_id from products where id =5), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 8), 
(select $Node_id from products where id =3), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 9), 
(select $Node_id from products where id =4), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 10), 
(select $Node_id from products where id =2), getdate()); 

INSERT into inprocess values (
(select $Node_id from users where id = 11), 
(select $Node_id from products where id =4), getdate());

     Whenever user orders a product. I create a relation between the product and user. Now I want to pick the closest warehouse that has this product so we can ship this product to user from closest warehouse. It will be cheaper for us to ship the product also user will receive the product faster this  way. Graph Processing Tables give us a great flexibility when it comes to the relations. Here how easy it is to pick the closest warehouse to the users thanks to spatial data and Graph Tables structures.

SELECT username, city, productname, warehouse, miles, WPriority
FROM
(
   SELECT u.id as userid, u.name as username, u.city,  u.citygeog.STDistance(w.location)/1609.344 as miles, p.id as productid, p.name as productname, w.name as warehouse, 
RANK() OVER (Partition by p.name Order by u.citygeog.STDistance(w.location)/1609.344) as WPriority
   FROM users u, inprocess inp, products p, instock ins, warehouses w
   WHERE MATCH(u-(inp)->p<-(ins)-w)
) as temp
WHERE temp.WPriority =1

This returns the following results.
We can use this data to find closest warehouse to our users. Or you can analyze the data and make some products available in other warehouses. For example, first row tells us that 27" monitor will be shipped to George from Minnesota. If there are many shippings like this, we might be able to save money by making this product available in other warehouses.

     Inner Query finds all the inprocess orders and calculates the distance between users and the warehouses. I used STDistance spatial function to find the distances between two spatial object. You can read about it in my older post.  Same products might exist in multiple warehouses, to pick the closest one, I used RANK function to prioritize the warehouses depending on the distance between user and warehouse. Outer query picks the closest warehouses by filtering by prioritize column.

1 comment:

  1. Nice use of graph concept along with geography datatype. Thanks sharing this.

    For readers who are not much aware like me this link will help you get the fundamentals first to able to consume this article -https://docs.microsoft.com/en-us/sql/t-sql/queries/match-sql-graph?view=sql-server-ver16

    ReplyDelete