Data warehousing – Bringing down data twice but avoiding deleting on the data warehouse Fact Table
This is better explained with our example below.
Example:
· We had a situation where we were getting data down, but due to the nature of the data we were getting is we were missing data.
· Also another note was the fact that we did not store the granular data, but aggregated the data, so due to this it would be difficult to try and find out the new data, or missing data
· So what we wanted to do, was to find a way where we could reload the missing data, but avoid doing a delete on the fact table.
o NOTE: The reason for avoiding the delete is because the fact table was large, so the delete would take a long time.
§ Along with this it means that the log increases and the loading time increases.
· So for the solution below it is easier explained with some dates.
o Our Current Date is:
§ 2013-11-06
o Previous Days Date is:
§ 2013-11-05
o 2 Days ago Date is:
§ 2013-11-04
Solution
Below is what we implemented so that there would be no deletes but a truncation of the data.
1. Initially our starting point to get new data down was always the previous day.
a. We changed this so that it would always be 2 days back.
b. NOTE: This is so that we could then get our data for 2 days.
c. So as per our example our start date would be:
i. 2013-11-04
2. Next once the data was down we then went through all the same steps until it got to the point where we wanted to insert it into the Fact Table.
a. Here we created a new table called Mart_TF_TableName_Yesterday
3. What we then did within our SSIS package is we then created a conditional split and based on the date we would then transfer the data to the required table.
a. So if the date was for the previous day it would go into the table Mart_TF_TableName_Yesterday
i. As with our example the conditional split would take the date: 2013-11-05 and split that into the table: Mart_TF_TableName_Yesterday
b. And then if the date was older than the previous day it would go into the Fact Table.
i. So if the date was before 2013-11-05 it would then go into the Fact Table
4. Now what this meant is that we always had our previous day’s data in a separate table.
a. So when the next load happens we could then just truncate the table Mart_TF_TableName_Yesterday
i. NOTE: This is because we are loading 2 days’ worth of data, the data from 2 days ago would be going directly into the Fact table.
1. So as with our example when the data loaded tomorrow, the data for the date 2013-11-05 would now go into the Fact Table.
5. The final thing that we did was to then add the table Mart_TF_TableName_Yesterday into our SSAS Cube as a partitioned table.
a. This was so that for your current data and date, even though the data would not be 100% correct it would be 97% correct.