Companies have separate data warehouse databases for reports and analyzing the data. Usually these separate environments have copy of OLTP Environment tables. You may need to sync these tables daily/hourly by using SSIS. The challenge in this situation is to find what needs to be copied from OLTP tables to your data warehouse tables. Finding the new rows should be easier but finding what is updated might not be that easy. You have to find updated rows fast without locking OLTP tables.
SQL Server 2016 has temporal table options which might be a good option depending on your table sizes. I am going to give you another option in this post. SQL Server has a function named CHECKSUM. This function has been with SQL Server since 2008. CHECKSUM is intended for use in building hash indexes.It accepts number of values and creates a unique integer number for given results. It computes a hash value over its list of arguments. Here is an example using it.
SELECT CHECKSUM('test',1,9000);
SELECT CHECKSUM('test',2,9000);
SELECT CHECKSUM('test',1,9000);
176718392 176718344 176718392
As you can see if the values are same, CHECKSUM returns the same integer value. I am going to use this function to create hash values for the tables i want to compare. I am going to use the Wide World Importers sample database in this demo. I pick the table named Sales.Invoices and I am going to use some of its columns. Let's say we have this table in OLTP server and it constantly gets updating by a web application. Also table exists in the warehouse database in a different server. We need to keep these tables sync.
I am going to use two temp tables to simulate the tables in different environments. Table named OLTP_Customers represents the OLTP Environment table, Warehouse_Customers represents the table in Data Warehouse server. First let's populate the both tables with the same data.
DECLARE @OLTP_Invoices as TABLE ( InvoiceId int PRIMARY KEY, CustomerId int, OrderId int, InvoiceDate datetime ) DECLARE @Warehouse_Invoices as TABLE ( InvoiceId int PRIMARY KEY, CustomerId int, OrderId int, InvoiceDate datetime ) INSERT INTO @OLTP_Invoices (InvoiceId, CustomerId, OrderId,
InvoiceDate) Select InvoiceId, CustomerId, OrderId, InvoiceDate FROM Sales.Invoices INSERT INTO @Warehouse_Invoices (InvoiceId, CustomerId, OrderId,
InvoiceDate) Select InvoiceId, CustomerId, OrderId, InvoiceDate FROM @OLTP_Invoices
Now we have two tables with the same data. Let's run bunch of updates and inserts in the OLTP table
UPDATE @OLTP_Invoices set Comments = 'comment is updated'
where invoicedate between '2013-1-1' and '2013-2-2' INSERT INTO @OLTP_Invoices (InvoiceId, CustomerId, OrderId) VALUES (80000,100,500),(800001,101,501),(800002,102,502), (800003,103,503),(800004,104,504),(800005,105,505)
Now the tables doesn't match anymore. Let''s use CHECKSUM to create hash for both tables
SELECT InvoiceId,CustomerId, OrderId, InvoiceDate, CHECKSUM(InvoiceId, CustomerId, OrderId, InvoiceDate) FROM @OLTP_Invoices SELECT InvoiceId,CustomerId, OrderId, InvoiceDate, CHECKSUM(InvoiceId, CustomerId, OrderId, InvoiceDate) FROM @Warehouse_Invoices
As you can see we have a new column and it has the integer number created by CHECKSUM.
Now all we need to do is to join these tables by the primary key and compare the CHECKSUM values to find what is updated in OLTP table. In your environment probably you are going to do this part by using SSIS. You may need to add a new column to your source and destination table in SSIS and populate this new column with CHECKSUM then you can compare the tables.
Select * from @OLTP_Invoices as o join @Warehouse_Invoices as w on o.InvoiceId = w.InvoiceId Where CHECKSUM(o.InvoiceId, o.CustomerId, o.OrderId, o.InvoiceDate,o.Comments) != CHECKSUM(w.InvoiceId, w.CustomerId, w.OrderId, w.InvoiceDate, w.Comments)
I found what is updated pretty fast. 6 seconds contains the time creating the tables, populating them, updating them and finding the updated results. What about the new rows? We can use the EXCEPT to do that. Here is how.
SELECT * FROM @OLTP_Invoices WHERE InvoiceId in ( select invoiceId from @OLTP_Invoices as o EXCEPT select invoiceId from @Warehouse_Invoices as w )
Here is the new rows we need to copy to Warehouse.
We have all the updated and new rows now, we can go a head and sync the data warehouse table now.

No comments:
Post a Comment