SCD type 1 & type 2 in MERGE statement
Merge for SCD with more than two conditions
In my previous article, I have explained what does the SCD and described the most popular types of Slowly Changing Dimensions. I also mentioned that for one process, one table, you can specify more than one method. In this article, I would like to show Type 1 and Type 2 for a table, which is probably the most common conjunction.
SCD Type 1 & 2 in one permanent house
Let’s create an example tables: [stg].[CustomerPhone] (source) and [dim].[CustomerPhone] (destination).
I’m planning to implement different behaviour for selected columns:
- Telephone – always UPDATE this column, but I’m not interested in keeping history of the values. It is SCD Type 1 thus.
- Address – INSERT new row once the value change in this column. Means keeping history, means SCD Type 2.
- StartDate, EndDate – we need these columns to provide point in time for SCD Type 2.
- ScdVersion – optional column.
CREATE SCHEMA dim AUTHORIZATION dbo; GO CREATE SCHEMA stg AUTHORIZATION dbo; GO CREATE TABLE dim.CustomerPhone ( [Id] INT NOT NULL , [Name] VARCHAR(50) NULL , [Telephone] VARCHAR(50) NULL , --For Update in place (without history) = SCD Type 1 [Address] VARCHAR(50) NULL , --For INSERT, so keep history = SCD Type 2 [StartDate] DATETIME NOT NULL , --Date attribute for SCD Type 2 [EndDate] DATETIME NOT NULL , --Date attribute for SCD Type 2 [ScdVersion] INT NULL --version attribute for SCD Type 2 ); CREATE TABLE stg.CustomerPhone ( [Id] INT NOT NULL , [Name] VARCHAR(50) NULL , [Telephone] VARCHAR(50) NULL , [Address] VARCHAR(50) NULL , ); GO
Processing SCD through MERGE statement
To achieve the goal, I will use one of my favourite method – MERGE. This T-SQL statement was introduced to SQL Server 2008. In one command you can make either UPDATE, INSERT and DELETE. Considering that you can use it in conjunction with OUTPUT – makes the very powerful alternative for SSIS.
Nevertheless, MERGE does not belong to pleasant for writing. The syntax is complex and very easy to make a mistake. Not mention how to use it correctly to implement established logic. And then, SCD Merge Wizard comes with help. I will use this application to create full MERGE statement for above table.
In short, I would like to run the following script:
--Prepare test TRUNCATE TABLE stg.CustomerPhone; TRUNCATE TABLE dim.CustomerPhone; GO --TEST 0: Insert first row INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE'); SELECT * FROM stg.CustomerPhone; EXEC dbo.MergeDimCustomer; SELECT * FROM dim.CustomerPhone; GO
and whole logic should be applied. Only we need to have is Stored Procedure which contains “magic” MERGE. Let’s create the statement in an easy way. Using…
SCD Merge Wizard
You can download this free, open-source application from GitHub. When you install and run it – you’ll see the main window:
Click “Next” in order to go to the first step: set Source Connection. Use “Edit” button to establish the connection to source database:
Then select a source table like below:
Next step is Target Connection. If your server is the same as a source – just use “Copy from the Source Connection” option. Then, the only difference here will be a destination table.
Click “Next” to go to “User Variables” step, which contains certain pre-defined values we will use later on.
Step: Transformations
Yeah! This is the chamber of the sorcerer. At the beginning, the app maps all columns by names. Your role is to set up the transformation rules.
Here we prepare all magic, I mean… business logic:
- Each MERGE must have a column key: set “Business Key” for column [Id]
- Set “SCD1” for columns [Name] and [Telephone] as we want to update these fields every time.
- Set “SCD2” for column [Address] as we want to create a new row in dimension table once the value change.
Next window is Business Key Check. Merge is very sensitive to duplicate values for column(s) defined as “Business Key”. They must be unique in the source, otherwise, it leads to unexpected behaviour or error. It’s totally understandable. In this step, you can check your source data with only one click.
Step: Options
Options are in the next step where you can establish the behaviour of MERGE and other parts of the script. The description of each option you can find on the bottom part of the window. For our purposes set the options as presented below:
Last step: Query
Congrats! Your first MERGE statement using SCD Merge Wizard has been prepared:
Now, copy the body (T-SQL) and enclose it into Stored Procedure:
CREATE PROCEDURE dbo.MergeDimCustomer AS BEGIN -- ================================================== -- Slowly Changing Dimension script by SCD Merge Wizard -- Author: Miljan Radovic -- Official web site: https://github.com/SQLPlayer/SCD-Merge-Wizard/ -- Version: 4.2.1.0 -- Publish date: 08/11/2017 03:17:27 -- Script creation date: 28/01/2018 12:31:13 -- ================================================== -- ================================================== -- USER VARIABLES -- ================================================== DECLARE @CurrentDateTime datetime DECLARE @MinDateTime datetime DECLARE @MaxDateTime datetime SELECT @CurrentDateTime = cast(getdate() as datetime), @MinDateTime = cast('1900-01-01' as datetime), @MaxDateTime = cast('9999-12-31' as datetime) -- ================================================== -- SCD1 -- ================================================== MERGE [dim].[CustomerPhone] as [target] USING ( SELECT [Address], [Id], [Name], [Telephone] FROM [stg].[CustomerPhone] ) as ON ( .[Id] = [target].[Id] ) WHEN MATCHED AND ( ([target].[EndDate] = @MaxDateTime OR ([target].[EndDate] IS NULL AND @MaxDateTime IS NULL)) ) AND ( (.[Name] <> [target].[Name] OR (.[Name] IS NULL AND [target].[Name] IS NOT NULL) OR (.[Name] IS NOT NULL AND [target].[Name] IS NULL)) OR (.[Telephone] <> [target].[Telephone] OR (.[Telephone] IS NULL AND [target].[Telephone] IS NOT NULL) OR (.[Telephone] IS NOT NULL AND [target].[Telephone] IS NULL)) ) AND ( (.[Address] = [target].[Address] OR (.[Address] IS NULL AND [target].[Address] IS NULL)) ) THEN UPDATE SET [target].[Name] = .[Name], [target].[ScdVersion] = [target].[ScdVersion] + 1, [target].[Telephone] = .[Telephone] ; -- ================================================== -- SCD2 -- ================================================== INSERT INTO [dim].[CustomerPhone] ( [Address], [EndDate], [Id], [Name], [ScdVersion], [StartDate], [Telephone] ) SELECT [Address], [EndDate], [Id], [Name], [ScdVersion], [StartDate], [Telephone] FROM ( MERGE [dim].[CustomerPhone] as [target] USING ( SELECT [Address], [Id], [Name], [Telephone] FROM [stg].[CustomerPhone] ) as ON ( .[Id] = [target].[Id] ) WHEN NOT MATCHED BY TARGET THEN INSERT ( [Address], [EndDate], [Id], [Name], [ScdVersion], [StartDate], [Telephone] ) VALUES ( [Address], @MaxDateTime, [Id], [Name], 1, @MinDateTime, [Telephone] ) WHEN MATCHED AND ( ([EndDate] = @MaxDateTime OR ([EndDate] IS NULL AND @MaxDateTime IS NULL)) ) AND ( ([target].[Address] <> .[Address] OR ([target].[Address] IS NULL AND .[Address] IS NOT NULL) OR ([target].[Address] IS NOT NULL AND .[Address] IS NULL)) ) THEN UPDATE SET [EndDate] = @CurrentDateTime OUTPUT $Action as [MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c], .[Address] AS [Address], @MaxDateTime AS [EndDate], .[Id] AS [Id], .[Name] AS [Name], INSERTED.[ScdVersion] + 1 AS [ScdVersion], @CurrentDateTime AS [StartDate], .[Telephone] AS [Telephone] ) MERGE_OUTPUT WHERE MERGE_OUTPUT.[MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c] = 'UPDATE' AND MERGE_OUTPUT.[Id] IS NOT NULL ; END GO
Let’s get the party started!
It’s high time to test our brand-new procedure. Let’s insert the first row into staging and check if the row will be copied to target dimension.
--TEST 0: Insert first row TRUNCATE TABLE stg.CustomerPhone; INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE'); SELECT * FROM stg.CustomerPhone; EXEC dbo.MergeDimCustomer; SELECT * FROM dim.CustomerPhone; GO
So far, so good.
New customer added to staging should also appear in the target table.
--TEST 1: MATCH doesn't happen using Id, then the record must be inserted INSERT INTO stg.CustomerPhone VALUES (2, 'DEF', '666', '5TH AVE'); EXEC dbo.MergeDimCustomer; SELECT * FROM dim.CustomerPhone; GO
New records always have min and max date, as we requested.
Test: SCD Type 1 (Update in place)
Now, we will test change for column [Telephone], which has been configured as SCD1. Update on this columns should cause a simple update on destination row, only for that column, nothing else. Let’s check out:
--TEST 2: MATCH but only [Telephone] field has changed - UPDATE only --Means SCD1 UPDATE stg.CustomerPhone SET [Telephone] = '777' WHERE Id = 2; EXEC dbo.MergeDimCustomer; SELECT * FROM dim.CustomerPhone; GO
Great. A new value for telephone can be seen, no other columns were changed.
Test: SCD Type 2 (Update+Insert)
Next test is to update [Address] column which has been configured as SCD2. That means we want to keep history changes and must create new row holding old version as well.
--TEST 3: MATCH happens but only the [Address] field has changed I want to add a new record and update the start and end dates. --means SCD Type 2 UPDATE stg.CustomerPhone SET [Address] = 'Summerhill Way' WHERE Id = 2; EXEC dbo.MergeDimCustomer; SELECT * FROM dim.CustomerPhone; GO
In above example we experienced following actions:
- New record has been created with [StartDate] as Current Time – GETDATE()
- The previous record has been marked as ended at Current Time (in normal scenario, I suggest to use real time of the change from OLTP, when available)
Summary
As you noticed – you can very quickly develop the business requirements doing the loading process into Data Warehouse’s dimension. Furthermore, you are able to save to a file whole mapping, connections and all options. Need to change the logic? Simply, open saved project, made the changes and replace procedure.
At any time you can review the Stored Procedure and even maintenance of that code is much easier comparing to SSIS.
SCD Merge Wizard on GitHub – last release
Facebook profile
Give me a shout if you have any concerns or questions.
Thanks!
About author
You might also like
BI Connected by SII – conference in Wrocław
Good day SQL folks! The conference season has begun. You can find very interesting conferences every week in almost every place in the globe. Of course you do not need
Auditing in Azure SQL Data Warehouse
The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone
Slowly Changing Dimensions (SCD)
In today’s article I’d would like to focus on Slowly changing dimension, aka SCD. Dimension is a word excerpted from data warehousing as such. Not without a reason SCD is used
18 Comments
DD
July 17, 10:35After I downloaded the git folder how do I open the wizard?
Kamil Nowinski
July 17, 22:56If you downloaded source code – there is a project file in “SCD-Merge-Wizard” folder. Open it in Visual Studio. Alternatively, you can download MSI file and install the tool. Let me know if you have any other questions.
Persistent
November 28, 20:14Great stuff! The MERGE statement is really useful, but a hassle to use, so a generator is a fantastic idea.
Just had a chance to play with your sample. However I had errors in the MergeDimCustomer sproc when copying it from the web page (line 99 among other things), but following your sample does produce a correct result 🙂
Could you elaborate on the effect of the various settings on the Options pane?
Kamil Nowinski
November 28, 21:06Thank you for your kind words. Much appreciated!
In terms of a bug in script: one of recently installed WordPress’ plugin had to screw it up as the code has not been changed since post publishing. I applied better plugin to present T-SQL code in this post and everything should be fine now. Thanks for being vigilant!
Option pane – great idea as I suppose that some of those options could not have to be obvious. Stay tuned.
Thank you for reading. I hope that the post helped you.
deniznur
January 29, 09:07Hi Kamil,
I can’t see MSI file. How can I reach and download MSI File ? Can you help me ?
Kamil Nowinski
January 30, 11:33hi denizur,
The MSI file should be present here: https://github.com/SQLPlayer/SCD-Merge-Wizard/releases
There is a link at the end of this post.
akash
June 20, 14:58Hi Kamil,
Great article and really helpful. It does what I want to do so far. I wanted to integrate additional feature to add a row with null values when something is deleted. How could I achieve that?
Kamil Nowinski
June 20, 21:49Thank you Akash, I’m really glad that it’s helping you. Could you raise a new issue on GitHub and describe what exactly do you need with an example, please?
https://github.com/SQLPlayer/SCD-Merge-Wizard/issues
dataanalytics
June 25, 22:37Hi Kamil,
Great job in putting this together. It helped a lot. I have a question.
For SCD2, can I record changes in multiple columns instead of just one? In your example, what if we want to change telephone no also along with address? What kind of changes are required in the code?
Kamil Nowinski
June 25, 23:04hey dataanalytics. I’m glad that it is helpful.
In a case when you want to track more columns changes – simply, in Step: Transformations, do set up ‘SCD2’ for ‘Address’ in ‘Transformation’ column rather than SCD1. That’s all.
Jenda
December 03, 16:24Thank you for an excellent tool. I would like to ask. Is there a way to add a row to transformations in a saved project if the source / target table structure has changed (add a new column)? Thanx
Kamil Nowinski
December 04, 15:45Thanks for your comment and question, Jenda. Good one! Right, that might be a tricky as you’ll not find any button to refresh the list of columns. However, once you reopen project from previously saved file – a new column appears. Bear in mind that this takes place when the application will be restarted only.
Manga
February 17, 00:45hi Kamil,
I am an ETL tester. How do I test SCD’s ?
Kamil Nowinski
February 21, 18:04Assuming that only SQL Server is involved in your SCD – you can use tSQLt for Unit tests. Everything depends on your test scenario.
Travis
November 17, 16:45Hello, thanks for great tool. I am trying to use the setting “Update Target Field” for option “Records on Target not found on Source”, but when I try to continue on I receive the error message “cannot select update target field when there are no ‘deleted date’ and/or ‘is deleted’ transformations defined”. How do I define this transformation?
BV
February 23, 21:47Thank you, with this approach – I think both the staging database and DW database should be on the same server ? Trying to see if there are any other options for implementation of SCD Type 2 (other than wizard) and not having dependency of stage and DW databases on same sql server.
Kamil Nowinski
February 26, 16:28Correct. Both should be on the same server, I can’t see any sensible reason to do that other way (performance). Other options? You can build your own generator in a Stored Procedure based on metadata.
Vedant
August 02, 10:29Not working if there is a foreign key in target table.