Introduction
In today’s data-driven world, maintaining the historical accuracy of data is crucial for effective business analytics. Slowly Changing Dimension (SCD) Type 2 is one of the most commonly used methods for tracking changes in dimension tables over time. In this blog post, I’ll walk you through the process of implementing SCD Type 2 in Informatica Intelligent Cloud Services (IICS) using the inbuilt template mapping. We’ll also explore how to test the implementation with sample data.
Understanding SCD Type 2
SCD Type 2 allows us to preserve the history of data by creating multiple records for each entity, with separate rows for each change. This enables us to track changes to key attributes over time, providing a comprehensive view of data evolution.
Tables Used in the Implementation
To demonstrate the SCD Type 2 implementation, we use two tables:
src_customer
: This is the source table where customer data is stored.
CREATE TABLE raw.src_customer (
customer_id INT,
customer_name STRING,
customer_address STRING,
order_id INT,
create_date DATE,
modift_date DATE
);
dim_customer_scd2
: This is the dimension table that follows the SCD Type 2 design. It stores the historical data along with the current data.
CREATE TABLE raw.dim_customer_scd2 (
id INT,
customer_id INT,
customer_name STRING,
customer_address STRING,
order_id INT,
crc_num INT,
current_flag STRING,
valid_from DATE,
valid_to DATE
);
crc_num
: Cyclic Redundancy Check (CRC) number used to detect changes in the data.current_flag
: A flag to indicate the current record (Y for current, N for historical).valid_from
andvalid_to
: Date fields to store the period during which a record was valid.
Steps to Implement SCD Type 2 in IICS
- Create the Source and Target Tables: As defined above, create the
src_customer
anddim_customer_scd2
tables in your database. - Populate the Source Table: Insert sample data into the
src_customer
table. This data will be used to test the SCD Type 2 implementation.
INSERT INTO raw.src_customer (customer_id, customer_name, customer_address, order_id, create_date, modift_date) VALUES
(101, 'John Doe', '123 Elm St',1001,'2024-08-30','2024-08-30'),
(102, 'Jane Smith', '456 Oak St',1002,'2024-08-30','2024-08-30'),
(103, 'Alice Brown', '789 Pine St',1003,'2024-08-30','2024-08-30');
3.Configure the IICS Mapping:
- Use the inbuilt template mapping in IICS for SCD Type 2. The mapping should include the following:
- Source Transformation: Reading data from the
src_customer
table. - Expression Transformation: To calculate the CRC and other necessary fields.
- Lookup Transformation: To check if the record exists in the
dim_customer_scd2
table. - Router Transformation: To route new and updated records accordingly.
- Update Strategy Transformation: To insert new records and update existing records with historical data.
- Target Transformation: To load data into the
dim_customer_scd2
table.
- Source Transformation: Reading data from the
4. Test the Implementation: Once the mapping is configured, run the workflow and verify the results by querying the dim_customer_scd2
table.
Testing the SCD Type 2 Implementation
Let’s test the implementation by inserting new customers and updating existing ones in the src_customer
table.
- Insert New Customers:
INSERT INTO raw.src_customer (customer_id, customer_name, customer_address, order_id, create_date, modift_date) VALUES
(104, 'Alice Green', '321 Cedar St',1004,'2024-08-31','2024-08-31'),
(105, 'Alex', '234 London',1005,'2024-08-31','2024-08-31');
2. Update an Existing Customer:
UPDATE raw.src_customer SET customer_address = '999 Maple St', modift_date = '2024-08-31' WHERE customer_id = 102;
After running the updated data through the SCD Type 2 mapping in IICS, the dim_customer_scd2
table should now reflect the new customer data and the updated address for customer ID 102, with the previous address stored as a historical record.
Watch Video Tutorial:
Conclusion
Implementing SCD Type 2 in IICS using the inbuilt template mapping is a powerful way to track changes in your dimension tables. By following the steps outlined in this blog, you can easily manage historical data and ensure accurate, up-to-date information in your data warehouse. This approach is essential for businesses looking to maintain a comprehensive view of their data over time.
Hi @ Regu, Awesome content on SCD type 2. Very Crisp and clear. I tired this and when ever I rerun or update a record I am getting the same set of records added to the table again. What could be the error ? How can I contact u for queries ?
Hi Raj, Thanks for trying this. Please feel free to contact me on WhatsApp I have shared my number to your Yahoo email.