Aug 28 2009
In the first post in this series I defined what time-sliced measures are and why they can be useful in a universe. I also shared the design requirements for a particular project I was on. The requirements included:
- Each report is expected to have multiple time-sliced measures
- Users must be able to select the type of calendar during the refresh process; calendar types include Monthly and Fiscal
- Users can provide any possible date as the “to date” for the time slices
- The process of splitting each time-slice time period into its own SQL statement should be completely transparent
- The resulting SQL should be as efficient as possible
In this post I am going to cover the design of a special table that we built in order to support our solution for these requirements. After I talk about the table design I will cover how I use it in the universe, as well as provide a few pros and cons about this solution as I have outlined it so far. Just to prepare you, this post is a bit longer than most that I write, and gets fairly detailed. This post will cover slides 23 through 25 from my 2008 GBN Conference presentation. There is a link to download the file at the end of this post.
Time Slice Calendar Table Definition
Each time slice has a start date and an end date. The actual dates will change based on whether I am looking at a fiscal quarter or a calendar quarter. I need to be able to provide any reference date as my “to date” or input date for each time slice. Finally, I want to be able to compare the current year values with the matching prior year values. All of these requirements can be solved with the following table structure:
|ref_cal_dt||Calendar date used as input for prompts|
|cal_tmslc_mn_val||Time slice name|
|curr_yr_tmslc_strt_dt||Current year date range start date|
|curr_yr_tmslc_end_dt||Current year date range end date|
|prior_yr_tmslc_strt_dt||Prior year date range start date|
|prior_yr_tmslc_end_dt||Prior year date range end date|
ref_cal_dt is the reference calendar date. Every time I use this table I need to provide that date, either via a prompt or condition of some kind. If you don’t need the flexibility of redefining the “to date” value in your environment this can be removed. The next two columns are codes for the time slice and calendar type. The last four columns contain the starting and ending dates for each time slice; the first two for the current year and the last two are the matching dates for the prior year. I will talk about each of these columns in a bit more detail in the following paragraphs. I am going to save the reference date for last.
cal_tmslc_mn_val contains values like MTD, QTD, YTD, FM, FQ, and FY. Each of these codes represents a time slice. For example, QTD is Quarter To Date and FQ is Full Quarter. The type of quarter (monthly or fiscal) is determined by the next column.
cal_nm or Calendar Name column is used to differentiate between the different types of calendars stored in the table. For our implementation we had two calendars, but there is nothing that says I could not create more if needed. I had to create fiscal time slices and calendar type slices to solve my requirement, and the names became Month Calendar and Period Calendar. I was asked at one point why we stored the full text representation rather than a code. If my memory is correct, we configured this column as a compressed character column in Teradata. Since there are only two values, the compression algorithm was very effective.
Here is some sample data from my table. I mentioned earlier that in order to use this table I have to provide an input or “reference” date. For the sample data shown below, the reference date was August 15, 2008. The resulting rows show the start date and end date for all of the available time slices for each calendar type (Month or Period). Both the current year (CY) and prior year (PY) date ranges are retrieved on the same row.
|Code||Calendar||CY Start Date||CY End Date||PY Start Date||PY End Date|
What does this tell me? I see that for August 15, 2008, using the Month Calendar, the month to date (MTD) range starts on 8/1/2008 (all dates in this blog post are in MM/DD/YYYY format) and ends on 8/15/2008. The full month starts on 8/1/2008 and runs until 8/31/2008. If I switch to the Period Calendar instead, I see that the MTD range runs from 8/10/2008 to 8/15/2008, and the full month (fiscal month rather than calendar month at this point) runs from 8/10/2008 to 9/6/2008.
Now, about that reference date… in our requirements we had to allow the user to redefine the “to date” for any time slice. It is fairly standard for the “to date” to be the last warehouse refresh date. Since we had to be able to run any report for any “to date” our table starts with the reference date. In the sample data posted above, the reference date was always 8/15/2008. I would see another set of rows for 8/16/2008, for 8/17/2008, for 8/18/2008, and so on. Every single possible date value was stored in the time slice table, and the starting / ending date ranges for all time slices and calendar types were computed and stored for each one.
There is one important note about performance that is appropriate at this point. Because we had to be able to roll values up by multiple calendars, we were required to store all of our data at a daily grain in our fact table. Since weeks do not fall on month boundaries, and since fiscal and monthly calendars do not share the same date ranges, the only possible way to solve the issue was to leave things at the daily grain. As you can imagine, this caused some challenges with query performance.
Join Fact Table to Time Slice Table
Now that I have thoroughly explained my special calendar table and its contents, how am I going to use it? This table is going to be the “driver” for my time-sliced measures, and drivers need roads. (That is, unless you have a nuclear powered DeLorean. ) I’ve posted before about contexts being paths or routes through a universe, and that’s what I need here too. But with only one calendar table and one fact table, where will I get different roads? First I need aliases.
I will create an alias of my special calendar table for each time slice definition. That means since I have six time slices (MTD, QTD, YTD, FM, FQ, and FY) and two years (Current Year CY and Prior Year PY) I need to create 12 aliases. After I do that, I create joins from each of these aliases to my fact table. Normally having multiple dimensions per fact is going to be fine, but in this case I want to ensure that every different time slice forces a separate SQL path… and now I’m back to where I need to create the contexts.
Here’s what a slightly simplified version of the universe looks like. I only created the “To Date” slices and left the “full” versions out for now.
Yes, that is a picture of the venerable Island Resorts Marketing universe, but with some aliases of my time-sliced calendar table added. Each of those aliases is joined to the “fact” table called
SALES. The original
CAL_TIMESLICE table is in the universe in the lower left corner. The six aliases are outlined in red on the right. Each alias has a join that looks like this:
SALES.INVOICE_DATE between cy_mtd.CURR_YR_TMSLC_STRT_DT and cy_mtd.CURR_YR_TMSLC_END_DT AND cy_mtd.CAL_TMSLC_MN_VAL = 'MTD' AND cy_mtd.CAL_NM = @Prompt('Please select Calendar Type','A','Time Slice Calendar\Calendar Name',mono,free) and cy_mtd.REF_CAL_DT = @Prompt('Please select Calendar Date','D',,mono,free)
I will break this down into steps.
The most direct part of the join takes the invoice date and compares it to the time slice start date and end date, using the
SALES.INVOICE_DATE between cy_mtd.CURR_YR_TMSLC_STRT_DT and cy_mtd.CURR_YR_TMSLC_END_DT
Next, I want to make sure I get exactly one date range, and that is done first by hard-coding the time slice mnemonic value…
cy_mtd.CAL_TMSLC_MN_VAL = 'MTD'
… and then prompting the user to select the calendar type.
cy_mtd.CAL_NM = @Prompt('Please select Calendar Type','A','Time Slice Calendar\Calendar Name',mono,free)
As an aside, I created a join with a prompt for time-slice code as well. This is a nice enhancement that lets the user pick a single measure for their report and on-the-fly swap it from MTD to QTD or YTD or any other defined time-slice. Of course they can only use one at a time, so that is a limitation of that technique.
The final part of the join in my case was a date prompt for the reference date. This is the “to date” that I talked about earlier in this post. Please note that there is no LOV for the reference date; I rarely provide a list of values for dates since there is not much value added by that overhead.
cy_mtd.REF_CAL_DT = @Prompt('Please select Calendar Date','D',,mono,free)
The join logic is the same for every alias except for the hard-coded mnemonic code for time slice.
Checking Join Structure
In this scenario as I have built it so far, have I introduced any loops? any fan traps? The answer to both of those questions is “No, I have not.” However, I have introduced incompatible joins.
For example, suppose that a user picked a MTD and YTD object at the same time. The SQL (abbreviated here) would include this:
cy_mtd.CAL_TMSLC_MN_VAL = 'MTD'
AND cy_mtd.CAL_TMSLC_MN_VAL = 'YTD'
This is a problem but one that I anticipated. At the beginning of this process I said I wanted to create a way to dynamically and transparently generate separate SQL passes for each time slice. I explained the structure and data for my table, and now I have shown how I join that table to my sales fact. The last step is to isolate each one of these aliases into a separate context. Using my shortcut for copying a universe context this step takes only a few minutes to complete for the initial configuration. Each context includes all of the tables from the original Sales context plus exactly one alias of my special calendar table. Here are the joins for the current year MTD alias:
And then I create a context for each of the others as well; here is a list of the contexts from my universe at this point:
Why This Strategy?
As I have been sharing this design concept at various conferences or with other universe designers I invariably get questions like “What made you go this route?” or “Why not alias the fact table instead?” and so on. They are fair questions, and I don’t have time to answer all of them in this post. I will try to quickly point out a few advantages to this technique.
If I alias a fact table I can almost certainly guarantee that I will introduce additional loops in my universe structure. There is a limit to the number of loops that can be detected / displayed by Designer, so adding extra loops when I don’t have to is not in my best interest. Plus, as blog readers will see soon when I start building objects on this structure, there is a big benefit to having all of my measures come from a single fact.
This technique does not add any new chasm or fan traps to the universe.
This technique allows me to support a potentially infinite number of time slices without doing any date calculations on the fly.
This technique allows me to support a potentially infinite number of calendar types. Here I have shown both fiscal and calendar types, but I could easily add more.
On the negative side, there are complexities associated with creating contexts in the universe. I mentioned above that creating the initial contexts is simple. Any future maintenance will be more complex as a result of all of the contexts in the universe. However, I submit that the complexities of the universe are more than paid for by the advantages to the users who will be creating reports. And whether I chose to alias the fact or the calendar table I would still have this issue so it’s really a moot point.
At this point I have accomplished the following:
- Created and populated my special calendar table
- Created aliases for each time slice / year combination (only “to date” versions are shown in this post, but I did create the “full” aliases and contexts as well in production)
- Created joins that link each time-slice calendar alias to my fact
- Created contexts to ensure that each time slice will appear in a separate query pass
What’s next? I have to create some objects so my users can make use of all of my hard work so far. That’s my next post in this series, which will cover slides 26 through 30 of my talk. Stay tuned.