Al's BrainDump

RSS

Navigation





Quick Search
»
Advanced Search »

PoweredBy
Types: Type 1 will just overwrite - standard approach. Type 2 will create another dimension record and the fact row points to this. No extra fact records. Type 3 adds columns to the dimension table. unpopular.

Type 2

Classic example of salesman changing region and with type 1 his new region shows all historic sales he has made even tho he didnt make any in that region. Non accurate reporting.

Typical solution is to then make a type 2 and create a new version of the record in the dim table (with same business key(s)) and new attribute value. ValidFrom/ ValidTo fields: get used when fact filling code runs

Alternative: split out the region attribute of the employee dimension into its own dimension (now have an employeeKey and a regionKey in the fact) and this would accurately track someones sales across different regions but you lose the hierarchy in the employee dimension. Aggregations would be lost so recreating the drilldown path region->employee by stacking the 2 dims may well result in poor query performance.

Current Flag/ From To Dates

Remember that the majority of the time the scd logic and flag setting will be executed directly before the filling of the fact table and the flag in the dim table is queried - 'tightly coupled' execution. The flag is not saying 'I am correct for the current real world date' it is saying 'I am current in the context of whatever file is currently being loaded'.

Are you going to load data for multiple dates before loading the fact table? This is the big question that seperates simplistic ETL from heavily involved. If this is the case then need to make sure the multiple rows for an SCD all have correct dates set for them. E.g a monthly sales file is being loaded and contains an employee that changes region twice in a month (unlikely - that is why they are slowlyCD) - 3 records will be needed in the dim table to represent this.
Obviously cant set a true flag on one of them as the fill fact code will use that for all fact records and ignore the other 2. This is where to and from dates HAVE to be used accurately.

With 1 file/data load = 1 date you could reload a historic file successfully and leave a current flag turned on for that historic record, it doesnt matter. Whenever another file is loaded for a different period the flag will be correctly set and then immediately used by the fact load routine.


Having a boolean flag you lose the ability to open the dim table and see the periods a record was valid for, as you can with from/to date fields. Who cares? The flag version is easier/quicker and if you need to see dates the record was valid for just query through the fact table to the date dim.

SCD WIzard: Duplicate rows may appear in the dim table as any rows with an expired current flag are just plain ignored by the SSIS SCD wizard.



ScrewTurn Wiki version 3.0.5.600. Some of the icons created by FamFamFam.