My solution was to use a Merge and Conditional Split to look at the entire dimension table.
Let’s start with the dimension (OLE_SRC Dimension). We’ll use DimStudent as the dimension. Here’s the query I used
Select StudentID, StudentNaturalKey, EffectiveStartDate, COALESCE((SELECT MIN(EffectiveStartDate) FROM DW.DIMstudent where EffectiveStartDate>s.EffectiveStartDAte and StudentNaturalKey=s.StudentNaturalKey),'12/31/2099') NextEffectiveStartDate FROM DW.DimStudent s ORDER BY StudentNaturalKey
I’m pulling the surrogatekey (StudentID), Natural Key (StudentNaturalKey) , EffectiveStartDate, and determining the NextEffectiveStartDate instead of using EffectiveEndDate because the data warehouse may have gaps or overlap in the dates. I’m going to join on the NaturalKey in the Merge Transformation so I’m using it to order by.
This is the source import query
SELECT DISTINCT StudentNaturalKey, RecordDate From Import.Student WITH (NOLOCK) Order by StudentNaturalKey
I’m pulling back the NaturalKey and RecordDate from the source and ordering by StudentNaturalKey for the Merge Transformation.
Here’s the Merge Transformation joining on natural key
Next there’s the conditional split with the following condition to determine the correct record
ISNULL(RecordDate) || ISNULL(StudentID) || (RecordDate >= EffectiveStartDate && RecordDate < NextEffectiveStartDate)
If RecordDate is null then the source record has no date and consequentially there is no corresponding record in the dimension table. If StudentID is null then there was no corresponding record in the dimension. Otherwise it checks to see if the RecordDate is between the EffectiveStartDate and the NextEffectiveStartDate.
I then load the matching records into a cache connection manager. This isn’t the only way but because of the complexity of the transformation dataflow I’d have to use the sort transformation for the merges so caching and then using the lookup transformation performed much better.
The cache consists of the natural key, record date, and StudentID. I look up on the natural key and record date to get the surrogate key. This allows me to keep the number of records to a minimum as records are often loaded in batches with the same record date.
No comments:
Post a Comment