Mar 03 2008
I started to write this a long time ago, but a few things (okay, lots of things) got in the way. I did manage to write a brief introduction a while back but that’s as far as it got. So today I want to write at least a little bit more about fan traps in particular.
What is a Fan Trap?
A fan trap occurs when joins “fan out” over multiple one -< many relationships in a row. For example:
A -< B -< C
This is one of the reasons why it is important to set cardinalities in your universe design. If you don’t set the cardinalities it becomes more difficult to identify fan traps (and other design issues). I frequently get asked in Designer training classes if the cardinality changes the way the joins are written. The answer is that it will not. But it could change how many joins are included in a single query.
Sample Fan Trap Data
It’s a subtle issue, so let me demonstrate with a simple example. I will use two hypothetical tables for this example. First is a table used to store order information, and the second is used to store details about the items for the order.
create table orders (order_id number(8) not null ,order_date date not null ,order_budget decimal(11,2) not null); create table order_lines (order_line_id number(11) not null ,order_id number(8) not null ,item_id number(11) not null ,order_qty number(5) not null);
These are very simple tables, and the syntax may not be exactly correct for all databases. There will be an order number object built from the
orders.order_id column. I will build a Total Order Budget measure object from the
orders.order_budget column. And I will also create a measure object called Total Order Quantity using the formula
sum(order_lines.qty). This gives me one dimension object and two basic measure objects.
Here is what the raw data will look like:
Before We Go On, A Brief Aside…
One of the statements that I make frequently in the Designer training class is this:
As a designer you cannot control which objects a user selects to use on a query. You are not building a specific query, you are building a framework for any query.
Great. Now what does it mean?
It means that the objects I have created above will generate the wrong result if they are not used correctly. Since I cannot control how they are used, I need to fix the mistake before it occurs. The mistake in this case is the Fan Trap.
A Simple Query
If I write this sql using the data above I will get the results shown below. Everything looks fine.
group by order_id
|Order Number||Total Order Budget|
It all works because of the “group by” clause that is automatically generated in my SQL. And it only works because I have not included anything from the order lines table. What happens if I (or a user) were to select all three of my objects?
Fan Trap Demonstrated
If I included all three of the objects from my imaginary universe I would get this sql code, and the following results:
where orders.order_id = order_lines.order_id
group by orders.order_id
|Order Number||Total Order Budget||Total Order Quantity|
Anybody see a problem with this?
The issue is that the line item total (sum) is correct, but the header total budget (also a sum) is way too high. If you look back at the raw data shown earlier, you might notice that the budget amount has been multiplied by the number of line items in the second table. This is a fan trap in action. The number of lines in the detail table has an impact on the measures from the header table.
Simple Fan Trap Solution
In this case where I only have measure objects from the detail table there is a very simple solution. I would open my universe and mark the “Multiple SQL Statements for each Measure” checkbox. And here is where the cardinality settings become important. Back at the beginning of this post I said this, or something along these lines:
Cardinality won’t change the logic of how joins work, but it might change which joins are included in your query.
What this means it when the SQL generator sees the measures and notes that they are spread across a one-many join, it will split the query into two passes and then merge the results back together. So instead of the infated results shown above I will get these two sets of data:
|Order Number||Total Order Budget|
|Order Number||Total Order Quantity|
There were only two rows processed by the first query, and seven rows processed by the second query. There are two rows in each result set because of the “group by” clause. And when the query results are merged back together (in a process that is completely transparent to the user, by the way) they will be correct.
So Far, So Good…
The problem is this is not a complete solution. As long as I only have measure objects from the lower (more detailed) table then this shortcut will work. And as long as I never ever add a new table to the universe, or add a new dimension or detail object, it will continue to work. But it’s a fragile solution, because nothing has really been fixed.
To solve this issue in the most robust and correct way possible requires setting up a couple of contexts and an alias. But before I go through that process I want to talk about Chasm Traps, and I’m out of time for today.
But Do I Have a Fan Trap?
I will conclude this post by answering the question I started with. You have a fan trap if you:
- Have two (or more) tables in a one-many chain of joins
- Have measure objects at more than one level
You can solve a fan trap with a single checkbox if you have only measure objects from your detail table(s). Otherwise you will get inflated values, confused users, and irritated management.
- Blog Post: Designer SQL Traps