SCD type 1 & type 2 in MERGE statement

SCD type 1 & type 2 in MERGE statement

Merge for SCD with more than two conditions

In my previous article, I have explained what does the SCD and described the most popular types of Slowly Changing Dimensions. I also mentioned that for one process, one table, you can specify more than one method. In this article, I would like to show Type 1 and Type 2 for a table, which is probably the most common conjunction.

SCD Type 1 & 2 in one permanent house

Let’s create an example tables: [stg].[CustomerPhone] (source) and [dim].[CustomerPhone] (destination).
I’m planning to implement different behaviour for selected columns:

  • Telephone – always UPDATE this column, but I’m not interested in keeping history of the values. It is SCD Type 1 thus.
  • Address – INSERT new row once the value change in this column. Means keeping history, means SCD Type 2.
  • StartDate, EndDate – we need these columns to provide point in time for SCD Type 2.
  • ScdVersion – optional column.
CREATE SCHEMA dim AUTHORIZATION dbo;
GO
CREATE SCHEMA stg AUTHORIZATION dbo;
GO

CREATE TABLE dim.CustomerPhone
(
    [Id]		INT NOT NULL ,
    [Name]		VARCHAR(50) NULL ,
    [Telephone] VARCHAR(50) NULL ,  --For Update in place (without history) = SCD Type 1
    [Address]	VARCHAR(50) NULL ,  --For INSERT, so keep history = SCD Type 2
    [StartDate] DATETIME NOT NULL , --Date attribute for SCD Type 2
    [EndDate]	DATETIME NOT NULL , --Date attribute for SCD Type 2
    [ScdVersion] INT NULL			--version attribute for SCD Type 2
);

CREATE TABLE stg.CustomerPhone
(
    [Id]		INT NOT NULL ,
    [Name]		VARCHAR(50) NULL ,
    [Telephone] VARCHAR(50) NULL ,
    [Address]	VARCHAR(50) NULL ,
);
GO

Processing SCD through MERGE statement

To achieve the goal, I will use one of my favourite method – MERGE. This T-SQL statement was introduced to SQL Server 2008. In one command you can make either UPDATE, INSERT and DELETE. Considering that you can use it in conjunction with OUTPUT – makes the very powerful alternative for SSIS.

Nevertheless, MERGE does not belong to pleasant for writing. The syntax is complex and very easy to make a mistake. Not mention how to use it correctly to implement established logic. And then, SCD Merge Wizard comes with help. I will use this application to create full MERGE statement for above table.

In short, I would like to run the following script:

--Prepare test
TRUNCATE TABLE stg.CustomerPhone;
TRUNCATE TABLE dim.CustomerPhone;
GO

--TEST 0: Insert first row
INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE');
SELECT * FROM stg.CustomerPhone;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

and whole logic should be applied. Only we need to have is Stored Procedure which contains “magic” MERGE. Let’s create the statement in an easy way. Using…

SCD Merge Wizard

You can download this free, open-source application from GitHub. When you install and run it – you’ll see the main window:

Click “Next” in order to go to the first step: set Source Connection. Use “Edit” button to establish the connection to source database:

Then select a source table like below:

Next step is Target Connection. If your server is the same as a source – just use “Copy from the Source Connection” option. Then, the only difference here will be a destination table.

Click “Next” to go to “User Variables” step, which contains certain pre-defined values we will use later on.

Step: Transformations

Yeah! This is the chamber of the sorcerer. At the beginning, the app maps all columns by names. Your role is to set up the transformation rules.


Here we prepare all magic, I mean… business logic:

  • Each MERGE must have a column key: set “Business Key” for column [Id]
  • Set “SCD1” for columns [Name] and [Telephone] as we want to update these fields every time.
  • Set “SCD2” for column [Address] as we want to create a new row in dimension table once the value change.

Next window is Business Key Check. Merge is very sensitive to duplicate values for column(s) defined as “Business Key”. They must be unique in the source, otherwise, it leads to unexpected behaviour or error. It’s totally understandable. In this step, you can check your source data with only one click.

Step: Options

Options are in the next step where you can establish the behaviour of MERGE and other parts of the script. The description of each option you can find on the bottom part of the window. For our purposes set the options as presented below:

Last step: Query

Congrats! Your first MERGE statement using SCD Merge Wizard has been prepared:

Now, copy the body (T-SQL) and enclose it into Stored Procedure:

CREATE PROCEDURE dbo.MergeDimCustomer
AS
BEGIN
-- ==================================================
-- Slowly Changing Dimension script by SCD Merge Wizard
-- Author: Miljan Radovic
-- Official web site: https://github.com/SQLPlayer/SCD-Merge-Wizard/
-- Version: 4.2.1.0
-- Publish date: 08/11/2017 03:17:27
-- Script creation date: 28/01/2018 12:31:13
-- ==================================================

-- ==================================================
-- USER VARIABLES
-- ==================================================
DECLARE @CurrentDateTime datetime
DECLARE @MinDateTime datetime
DECLARE @MaxDateTime datetime

SELECT
	@CurrentDateTime = cast(getdate() as datetime),
	@MinDateTime = cast('1900-01-01' as datetime),
	@MaxDateTime = cast('9999-12-31' as datetime)


-- ==================================================
-- SCD1
-- ==================================================
MERGE [dim].[CustomerPhone] as [target]
USING
(
	SELECT
		[Address],
		[Id],
		[Name],
		[Telephone]
	FROM [stg].[CustomerPhone]
) as 
ON
(
	.[Id] = [target].[Id]
)

WHEN MATCHED AND
(
	([target].[EndDate] = @MaxDateTime OR ([target].[EndDate] IS NULL AND @MaxDateTime IS NULL))
)
AND
(
	(.[Name] <> [target].[Name] OR (.[Name] IS NULL AND [target].[Name] IS NOT NULL) OR (.[Name] IS NOT NULL AND [target].[Name] IS NULL)) OR
	(.[Telephone] <> [target].[Telephone] OR (.[Telephone] IS NULL AND [target].[Telephone] IS NOT NULL) OR (.[Telephone] IS NOT NULL AND [target].[Telephone] IS NULL))
)
AND
(
	(.[Address] = [target].[Address] OR (.[Address] IS NULL AND [target].[Address] IS NULL))
)
THEN UPDATE
SET
	[target].[Name] = .[Name],
	[target].[ScdVersion] = [target].[ScdVersion] + 1,
	[target].[Telephone] = .[Telephone]
;

-- ==================================================
-- SCD2
-- ==================================================
INSERT INTO [dim].[CustomerPhone]
(
	[Address],
	[EndDate],
	[Id],
	[Name],
	[ScdVersion],
	[StartDate],
	[Telephone]
)
SELECT
	[Address],
	[EndDate],
	[Id],
	[Name],
	[ScdVersion],
	[StartDate],
	[Telephone]
FROM
(
	MERGE [dim].[CustomerPhone] as [target]
	USING
	(
		SELECT
			[Address],
			[Id],
			[Name],
			[Telephone]
		FROM [stg].[CustomerPhone]
	) as 
	ON
	(
		.[Id] = [target].[Id]
	)
	WHEN NOT MATCHED BY TARGET
	THEN INSERT
	(
		[Address],
		[EndDate],
		[Id],
		[Name],
		[ScdVersion],
		[StartDate],
		[Telephone]
	)
	VALUES
	(
		[Address],
		@MaxDateTime,
		[Id],
		[Name],
		1,
		@MinDateTime,
		[Telephone]
	)
WHEN MATCHED AND
(
	([EndDate] = @MaxDateTime OR ([EndDate] IS NULL AND @MaxDateTime IS NULL))
)
AND
(
	([target].[Address] <> .[Address] OR ([target].[Address] IS NULL AND .[Address] IS NOT NULL) OR ([target].[Address] IS NOT NULL AND .[Address] IS NULL))
)
	THEN UPDATE
	SET
		[EndDate] = @CurrentDateTime
	OUTPUT
		$Action as [MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c],
		.[Address] AS [Address],
		@MaxDateTime AS [EndDate],
		.[Id] AS [Id],
		.[Name] AS [Name],
		INSERTED.[ScdVersion] + 1 AS [ScdVersion],
		@CurrentDateTime AS [StartDate],
		.[Telephone] AS [Telephone]

) MERGE_OUTPUT
WHERE MERGE_OUTPUT.[MERGE_ACTION_91025df2-1344-4e4e-84f7-e790d1f77d7c] = 'UPDATE' 
	AND MERGE_OUTPUT.[Id] IS NOT NULL
;
END
GO

Let’s get the party started!

It’s high time to test our brand-new procedure. Let’s insert the first row into staging and check if the row will be copied to target dimension.

--TEST 0: Insert first row
TRUNCATE TABLE stg.CustomerPhone;
INSERT INTO stg.CustomerPhone VALUES (1, 'ABC', '555-555', '5TH AVE');
SELECT * FROM stg.CustomerPhone;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

So far, so good.
New customer added to staging should also appear in the target table.

--TEST 1: MATCH doesn't happen using Id, then the record must be inserted
INSERT INTO stg.CustomerPhone VALUES (2, 'DEF', '666', '5TH AVE');
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

New records always have min and max date, as we requested.

Test: SCD Type 1 (Update in place)

Now, we will test change for column [Telephone], which has been configured as SCD1. Update on this columns should cause a simple update on destination row, only for that column, nothing else. Let’s check out:

--TEST 2: MATCH but only [Telephone] field has changed - UPDATE only
--Means SCD1
UPDATE stg.CustomerPhone
SET [Telephone] = '777'
WHERE Id = 2;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

Great. A new value for telephone can be seen, no other columns were changed.

Test: SCD Type 2 (Update+Insert)

Next test is to update [Address] column which has been configured as SCD2. That means we want to keep history changes and must create new row holding old version as well.

--TEST 3: MATCH happens but only the [Address] field has changed I want to add a new record and update the start and end dates.
--means SCD Type 2
UPDATE stg.CustomerPhone
SET [Address] = 'Summerhill Way'
WHERE Id = 2;
EXEC dbo.MergeDimCustomer;
SELECT * FROM dim.CustomerPhone;
GO

In above example we experienced following actions:

  1. New record has been created with [StartDate] as Current Time – GETDATE()
  2. The previous record has been marked as ended at Current Time (in normal scenario, I suggest to use real time of the change from OLTP, when available)

Summary

As you noticed – you can very quickly develop the business requirements doing the loading process into Data Warehouse’s dimension. Furthermore, you are able to save to a file whole mapping, connections and all options. Need to change the logic? Simply, open saved project, made the changes and replace procedure.
At any time you can review the Stored Procedure and even maintenance of that code is much easier comparing to SSIS.

SCD Merge Wizard on GitHublast release
Facebook profile

Give me a shout if you have any concerns or questions.

Thanks!

Previous Last week reading (2018-01-28)
Next Last week reading (2018-02-04)

About author

Kamil Nowinski
Kamil Nowinski 200 posts

Blogger, speaker. Data Platform MVP, MCSE. Senior Data Engineer & data geek. Member of Data Community Poland, co-organizer of SQLDay, Happy husband & father.

View all posts by this author →

You might also like

Data Warehousing 1Comments

Slowly Changing Dimensions (SCD)

In today’s article I’d would like to focus on Slowly changing dimension, aka SCD. Dimension is a word excerpted from data warehousing as such. Not without a reason SCD is used

General 0 Comments

BI Connected by SII – conference in Wrocław

Good day SQL folks! The conference season has begun. You can find very interesting conferences every week in almost every place in the globe. Of course you do not need

Auditing in Azure SQL Data Warehouse

The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone

18 Comments

  1. DD
    July 17, 10:35 Reply

    After I downloaded the git folder how do I open the wizard?

    • Kamil Nowinski
      July 17, 22:56 Reply

      If you downloaded source code – there is a project file in “SCD-Merge-Wizard” folder. Open it in Visual Studio. Alternatively, you can download MSI file and install the tool. Let me know if you have any other questions.

  2. Persistent
    November 28, 20:14 Reply

    Great stuff! The MERGE statement is really useful, but a hassle to use, so a generator is a fantastic idea.

    Just had a chance to play with your sample. However I had errors in the MergeDimCustomer sproc when copying it from the web page (line 99 among other things), but following your sample does produce a correct result 🙂

    Could you elaborate on the effect of the various settings on the Options pane?

    • Kamil Nowinski
      November 28, 21:06 Reply

      Thank you for your kind words. Much appreciated!
      In terms of a bug in script: one of recently installed WordPress’ plugin had to screw it up as the code has not been changed since post publishing. I applied better plugin to present T-SQL code in this post and everything should be fine now. Thanks for being vigilant!
      Option pane – great idea as I suppose that some of those options could not have to be obvious. Stay tuned.
      Thank you for reading. I hope that the post helped you.

  3. deniznur
    January 29, 09:07 Reply

    Hi Kamil,

    I can’t see MSI file. How can I reach and download MSI File ? Can you help me ?

  4. akash
    June 20, 14:58 Reply

    Hi Kamil,

    Great article and really helpful. It does what I want to do so far. I wanted to integrate additional feature to add a row with null values when something is deleted. How could I achieve that?

  5. dataanalytics
    June 25, 22:37 Reply

    Hi Kamil,

    Great job in putting this together. It helped a lot. I have a question.

    For SCD2, can I record changes in multiple columns instead of just one? In your example, what if we want to change telephone no also along with address? What kind of changes are required in the code?

    • Kamil Nowinski
      June 25, 23:04 Reply

      hey dataanalytics. I’m glad that it is helpful.
      In a case when you want to track more columns changes – simply, in Step: Transformations, do set up ‘SCD2’ for ‘Address’ in ‘Transformation’ column rather than SCD1. That’s all.

  6. Jenda
    December 03, 16:24 Reply

    Thank you for an excellent tool. I would like to ask. Is there a way to add a row to transformations in a saved project if the source / target table structure has changed (add a new column)? Thanx

    • Kamil Nowinski
      December 04, 15:45 Reply

      Thanks for your comment and question, Jenda. Good one! Right, that might be a tricky as you’ll not find any button to refresh the list of columns. However, once you reopen project from previously saved file – a new column appears. Bear in mind that this takes place when the application will be restarted only.

  7. Manga
    February 17, 00:45 Reply

    hi Kamil,
    I am an ETL tester. How do I test SCD’s ?

    • Kamil Nowinski
      February 21, 18:04 Reply

      Assuming that only SQL Server is involved in your SCD – you can use tSQLt for Unit tests. Everything depends on your test scenario.

  8. Travis
    November 17, 16:45 Reply

    Hello, thanks for great tool. I am trying to use the setting “Update Target Field” for option “Records on Target not found on Source”, but when I try to continue on I receive the error message “cannot select update target field when there are no ‘deleted date’ and/or ‘is deleted’ transformations defined”. How do I define this transformation?

  9. BV
    February 23, 21:47 Reply

    Thank you, with this approach – I think both the staging database and DW database should be on the same server ? Trying to see if there are any other options for implementation of SCD Type 2 (other than wizard) and not having dependency of stage and DW databases on same sql server.

    • Kamil Nowinski
      February 26, 16:28 Reply

      Correct. Both should be on the same server, I can’t see any sensible reason to do that other way (performance). Other options? You can build your own generator in a Stored Procedure based on metadata.

  10. Vedant
    August 02, 10:29 Reply

    Not working if there is a foreign key in target table.

Leave a Reply

Click here to cancel reply.