Mar 03 2008

Do I Have a Fan Trap?

Categories: Fan / Chasm Trap,Universe Design Dave Rathbun @ 3:00 pm

I started to write this a long time ago, but a few things (okay, lots of things) got in the way. I did manage to write a brief introduction a while back but that’s as far as it got. So today I want to write at least a little bit more about fan traps in particular.

What is a Fan Trap?

A fan trap occurs when joins “fan out” over multiple one –< many relationships in a row. For example: A –< B -< C

This is one of the reasons why it is important to set cardinalities in your universe design. If you don’t set the cardinalities it becomes more difficult to identify fan traps (and other design issues). I frequently get asked in Designer training classes if the cardinality changes the way the joins are written. The answer is that it will not. But it could change how many joins are included in a single query. :)

Sample Fan Trap Data

It’s a subtle issue, so let me demonstrate with a simple example. I will use two hypothetical tables for this example. First is a table used to store order information, and the second is used to store details about the items for the order.

create table orders
(order_id number(8) not null
,order_date date not null
,order_budget decimal(11,2) not null);

create table order_lines
(order_line_id number(11) not null
,order_id number(8) not null
,item_id number(11) not null
,order_qty number(5) not null);

These are very simple tables, and the syntax may not be exactly correct for all databases. There will be an order number object built from the orders.order_id column. I will build a Total Order Budget measure object from the orders.order_budget column. And I will also create a measure object called Total Order Quantity using the formula sum(order_lines.qty). This gives me one dimension object and two basic measure objects.

Here is what the raw data will look like:

Order Number Budget Quantity
1 500 10
1 500 20
1 500 25
1 500 10
2 1500 10
2 1500 10
2 1500 5

Before We Go On, A Brief Aside…

One of the statements that I make frequently in the Designer training class is this:

As a designer you cannot control which objects a user selects to use on a query. You are not building a specific query, you are building a framework for any query.

Great. Now what does it mean? ๐Ÿ™‚

It means that the objects I have created above will generate the wrong result if they are not used correctly. Since I cannot control how they are used, I need to fix the mistake before it occurs. The mistake in this case is the Fan Trap.

A Simple Query

If I write this sql using the data above I will get the results shown below. Everything looks fine.

select orders.order_id
, sum(orders.order_budget)
from orders
group by order_id

Order Number Total Order Budget
1 500
2 1500

It all works because of the “group by” clause that is automatically generated in my SQL. And it only works because I have not included anything from the order lines table. What happens if I (or a user) were to select all three of my objects?

Fan Trap Demonstrated

If I included all three of the objects from my imaginary universe I would get this sql code, and the following results:

select orders.order_id
, sum(orders.order_budget)
, sum(order_lines.order_qty)
from orders
, order_lines
where orders.order_id = order_lines.order_id
group by orders.order_id

Order Number Total Order Budget Total Order Quantity
1 2000 65
2 4500 25

Anybody see a problem with this?

The issue is that the line item total (sum) is correct, but the header total budget (also a sum) is way too high. If you look back at the raw data shown earlier, you might notice that the budget amount has been multiplied by the number of line items in the second table. This is a fan trap in action. The number of lines in the detail table has an impact on the measures from the header table.

Simple Fan Trap Solution

In this case where I only have measure objects from the detail table there is a very simple solution. I would open my universe and mark the “Multiple SQL Statements for each Measure” checkbox. And here is where the cardinality settings become important. Back at the beginning of this post I said this, or something along these lines:

Cardinality won’t change the logic of how joins work, but it might change which joins are included in your query.

What this means it when the SQL generator sees the measures and notes that they are spread across a one-many join, it will split the query into two passes and then merge the results back together. So instead of the infated results shown above I will get these two sets of data:

Order Number Total Order Budget
1 500
2 1500

Order Number Total Order Quantity
1 65
2 25

There were only two rows processed by the first query, and seven rows processed by the second query. There are two rows in each result set because of the “group by” clause. And when the query results are merged back together (in a process that is completely transparent to the user, by the way) they will be correct.

So Far, So Good…

The problem is this is not a complete solution. As long as I only have measure objects from the lower (more detailed) table then this shortcut will work. And as long as I never ever add a new table to the universe, or add a new dimension or detail object, it will continue to work. But it’s a fragile solution, because nothing has really been fixed.

To solve this issue in the most robust and correct way possible requires setting up a couple of contexts and an alias. But before I go through that process I want to talk about Chasm Traps, and I’m out of time for today. ๐Ÿ™‚

But Do I Have a Fan Trap?

I will conclude this post by answering the question I started with. You have a fan trap if you:

  • Have two (or more) tables in a one-many chain of joins
  • Have measure objects at more than one level

You can solve a fan trap with a single checkbox if you have only measure objects from your detail table(s). Otherwise you will get inflated values, confused users, and irritated management.

Related Links

39 Responses to “Do I Have a Fan Trap?”

  1. Comment by Josh Fletcher

    As you’ve already noted, resolving fan traps with ‘Multiple SQL Statements for each Measure’ has a number of drawbacks. the universe is also generating inefficient SQL queries in some cases, and as well, the resolutino doesn’t work for dimension objects. These obviously won’t aggregate incorrectly, but will be duplicated unnecessarily, which can confuse the end-user.

    I look forward to the explanation of resolution using contexts! ๐Ÿ™‚

    Cheers, Josh Fletcher

  2. Comment by Amit

    I have one query about your statement.

    Cardinality won’t change the logic of how joins work, but it might change which joins are included in your query.

    I am still searching for the answer to this statement. It would be grat if you could provide m with some more insight on this.

    Regards
    Amit

  3. Comment by Dave Rathbun

    The simple “fan trap” solution is to use the option “Multiple SQL Statements for each Measure” which – to the best of my knowledge – will use cardinalities to split a single SQL statement into more than one. If you have a one-to-one join between two tables then a single SQL statement should be generated. If you have a one-to-many cardinality defined then it should split into two, based on measures being used from both tables of course.

  4. Comment by Shiva

    Hey Dave i’ve a couple of questions for you.

    1. I’m not clear with the following part.
    By Dave:
    select orders.order_id
    , sum(orders.order_budget)
    , sum(order_lines.order_qty)
    from orders
    , order_lines
    where orders.order_id = order_lines.order_id
    group by orders.order_id

    The issue is that the line item total (sum) is correct, but the header total budget (also a sum) is way too high. If you look back at the raw data shown earlier, you might notice that the budget amount has been multiplied by the number of line items in the second table.

    I tried a similar query in my database it gave me the same answer. now when you are summing the total order qty and total order budget and grouping it,…. it is taking the total qty sold =65 and total budget =2000 for order 1 which i feel is correct….. had it got multiplied then it should have been 32500 under “total order budget” column.

    Assuming you have a budget of 500 for each…. 10 pair of shoes, 10 t-shirts, 20 plastic combs and 25 pencils. So that makes your total budget to 2000 for all the item…..

    I hope my question is clear.

    2. If my datamodel has m:m cardinality between the fact and dimension table… and i happen to use the Bridge table(DW concept) then i get a fan trap as per your first statement which is correct.
    Now to resolve this i think i can use context or alias. (please confirm) If yes.. IF i have to choose one among the two (alias /context) which one whould i choose in this scenario… As i feel having alias will increase thenumber of joins in my Universe schema structure.. (the more number of joins the slow may be the performance …) So is it better to use context here or there is any other way to it.

    Thanks for the patience in reading it ๐Ÿ™‚
    Cheers,
    Shiva

  5. Comment by Tamir Bar-Netzer

    Hi,

    One of the best ways of solving the example listed above would use Average projection on ‘Total Order Budget’. This is provided that users normally retreive data and not fool around with it (e.g., slice and dice)

    Cheers,

    Tamir

  6. Comment by Dave Rathbun

    Tamir, if you are talking about the example in my post, it would not be solved by an average projection since the data would be wrong coming from the database.

    Shiva, the “multiple” is not the quanty of items but rather the count of rows in the items table. If there is one line per header everything is fine. If there are two lines for a specific order, then the budget number for that order will be doubled (multiply by 2). If there are three lines for a given order then the budget for that order will be tripled, and so on. It has nothing to do with the number (quantity) of items ordered on each line. I hope that clears it up.

  7. Comment by Shiva

    Thanks for the quick response Dave.

    Yeah i got it. Thankyou very much for the explaination. One more point if you could please explain the second question of my post ….. I’ll be grateful to you. ๐Ÿ™‚

    Regards,
    Shiva

  8. Comment by Dave Rathbun

    Unless you have measures from your bridge table you have no fan traps.

  9. Comment by Shiva

    Thankyou very much for the Reply.

    Best Regards,
    Shiva

  10. Comment by thlZ

    My scenario is that my fact is at the week level.
    My time dimension is a star schema(Calender table).month to week is many to many..so i broke up the table into snowflake…
    My time dimension is Year-<Quarter-<Month-<date and week-<date seperately.

    If i want to c my data in the fact at quarter level it shd agg. accordingly…. but it is not happening..
    How shd my context be….

  11. Comment by Dave Rathbun

    If you only have one fact table you do not have a fan trap. Your challenge is that you don’t have a clean hierarchy, so your week-date relationship might need to be in one context and your year-quarter-month-date relationships might need to be in another. You also can’t really use @Aggregate_Aware() in this situation because your tables are not a hierarchy…

  12. Comment by Kerry Green

    I have a fan trap using PO header, PO lines and PO distributions which has been resolved using alias tables. The problem I am having is applying conditions at the PO distribution level. Since BO is creating multiple queries for header and distribution data, the condition on distribution will not be applied to the header query. I have read about using predefined conditions in a derived table, but this is not an option due to too many dimensions which required for filtering. Wondering if there is another solution I have missed. Thanks!

  13. Comment by Dave Rathbun

    You have hit on the main problem with the alias + context solution for fan traps; condition propagation. I have been working on a blog post with some ideas but it’s nowhere near ready for publication yet. However, your description doesn’t fit what I would expect to see. Conditions at the distribution level should roll up find to the header level. It’s conditions as the header level that don’t roll down to the distribution level that’s typically the problem.

  14. Comment by Ashok

    Dave,

    OI created the following table
    OrderID Order_date Order_budget
    1 23-MAR-12 500
    1 23-MAR-12 500
    1 23-MAR-12 500
    1 23-MAR-12 500
    2 23-MAR-12 1500
    2 23-MAR-12 1500
    2 23-MAR-12 1500

    I created the scenario and ran your simple query

    select orders.order_id , sum(orders.order_budget) from orders
    group by order_id

    Below is the result.
    1 2000
    2 4500

    Please correct me if i am wrong.

  15. Comment by Dave Rathbun

    Hi, I can’t tell you that you’re wrong when I don’t know what you’re trying to prove. ๐Ÿ™‚

    I suspect that you’ve got the idea wrong. The order header table should only show one row per order ID, and you show several. Of course if you sum those rows you’re going to get an inflated value. A fan trap comes into play when you have two (or more) tables and measures from both. You should create an order header table with two rows, and an order detail (order lines) table with seven rows and repeat your experiment.

  16. Comment by Sammy

    Hi Dave,
    I worked with this setting before and i never noticed this. The group by or any other filter in the query is not getting applied on my second SQL. So, if you have a condition in the above example,say where Order_date = ‘6/12/2012’ , this condition is getting applied only Order_Budget query not the Order_quantity. Can you please help me understand as to why its doing this.

    Thanks
    Sammy

  17. Comment by Dave Rathbun

    Conditions at the “upper” level of a fan trap should be included in both queries. It’s conditions at a lower level (order lines in this case) that are a problem. Any condition on the orders table should be fine.

  18. Comment by Avinash

    Hi Dave,

    It has been truly very informative reading about the fan traps and ways to solve them.
    Thank you very very much!!

    Cheers!
    Avinash

  19. Comment by Saurabh Gupta

    Hi Dave,

    Need your help to understand one scenario, what if I have two fact tables in my universe joined as M:M (many to many) and now i want to bring some specific columns from table 1 as in the above Given scenarion if I want to bring some dimension only from Orders table and also want to apply filter on order_date. Now as order_lines table does not have order_date column and also some columns which are specific to Orders table. What will be the behaviour of query generated after splitting. As we would be creating a Single Query where we will put all objects and condition but how Business Objects will split the query to understand that some objects should go with Query1 and filter should only be applied on query1. I can see order id is the only common Dimension between them so surely we can bring it in both queries. I beleive that separting the two query is the best possible solution for this scenario (different query from orders and order_line tables with specific object and filter to each table and merging them at report level) but anyhow I am not sure if I use Universe option for generating multiple SQL for each measure then how would it behave to capture some dimensions and filter condition only in query1 coming from Orders table and different query from Order_line tables. Thanks in Advance…

  20. Comment by Dave Rathbun

    Conditions on dimension objects from the “header” table are fine, as that table will be shared between the two query passes. It’s conditions on dimension objects on the lowest level table that cause problems.

  21. Comment by mynigoo

    Dave, there is no problem with conditions on dimension objects on the lowest level table: try to set TRUST_CARDINALITIES to YES

  22. Comment by Dave Rathbun

    When a condition is placed on a lower-level dimension object, it “breaks” the fix that the context / alias solution fixes. We end up inflating row counts from the header table after the join to the child table. That has been my usual experience; I have not researched the TRUST_CARDINALITIES setting but I’m not sure how it would fix this.

  23. Comment by sree

    Hi Dave,
    Thanks for your explanation on Fan Trap.

    Questions on Fan Trap. (A -< B -< C)
    If I select dimension from A, Measure from B and also Measure from C then
    I would be getting fan trap.( understood as per the explanation above)
    Suppose
    I select dimension from A, Measure from B (Only) and dimension from C (No measures in C) then
    Still would I get Fan Trap?

    Example: Client–<Sale—<SaleModel is the model.

    Client(Clientid,Name)

    Sale(Saleid,clientid,SaleTotal)

    SaleModel (Saleid,Modelid,Qty)

    I would be getting Fan Trap, if I select Client.Name,SaleTotal(Measure),Qty(Measure).

    Suppose if I select, Client.Name,SaleTotal(Measure),ModelID ,then still would I get Fan Trap?

    Please let me know.

    Thanks,
    Sree

  24. Comment by Dave Rathbun

    Yes, Sree, any time you invoke (include) the lower level table “C” you’re going to end up with a fan trap that will inflate the value of measures from table “B”

  25. Comment by sree

    Thank you so much..Dave!!

  26. Comment by Ganesh

    Dave if i execute the below query as per your rawdata……
    select orders.order_id
    , sum(orders.order_budget)
    from orders
    group by order_id

    I am getting the budget for order 1 as 2000.
    and for order 2 as 4500..

    What went wrong.
    Kindly help on this.

  27. Comment by Ganesh

    Dave,
    Just i saw your comment
    “Hi, I canโ€™t tell you that youโ€™re wrong when I donโ€™t know what youโ€™re trying to prove. ๐Ÿ™‚

    I suspect that youโ€™ve got the idea wrong. The order header table should only show one row per order ID, and you show several. Of course if you sum those rows youโ€™re going to get an inflated value. A fan trap comes into play when you have two (or more) tables and measures from both. You should create an order header table with two rows, and an order detail (order lines) table with seven rows and repeat your experiment.”

    My doubt is cleared.
    Thank You.

  28. Comment by Kuntal

    Hi Dave,

    If we have joins like this D1 -> D2 -> F in our universe where
    D1 and D2 are dimension tables and F is fact table and I will be using objects from all three in my query. Can I have Fan Trap in this case also.
    It would be very helpful if you please answer this.

  29. Comment by Dave Rathbun

    As stated in the post, you have a fan trap if you:

    Have two (or more) tables in a one-many chain of joins
    Have measure objects at more than one level
    Your scenario fits the first part. You need to know where your measure objects are coming from. If your measures are only coming from your fact table “F” then you’re fine.

  30. Comment by Dawei

    Hi Dave, in star schema usually we have only 1 fact table, that means the fan trap will not happen, right?

  31. Comment by Dave Rathbun

    You can have a fan trap in a star schema if you do things like count dimension values. It’s not typically done that way, but it could happen.

  32. Comment by Yvonne

    As it stands, I don’t think you have a fan trap, as long as you add Order_Budget to your Group By instead of summing it then it’s resolved.

    However, if you add a value which you would want to aggregate to the Orders table (fulfills the ‘measures in both tables’ condition), then you definitely end up with a Fan trap, because the value is multiplied by the number of detail records. Programmatically the only way around this would to use multiple statements to SUM the detail cost (in a temp table for instance)

  33. Comment by varun

    hi Dave,

    I Went through Document here my question is you have used only 2 tables that confuse me alot actually fan trap menas when we are using two connected fact tables with dimension table with one to many relationship then we will get fan trap i hope. so here you used one fact table and one dimension table . so if we use 2 tables we will get fan trap.could you brief me any suggestions much appreciated

    Thanks,
    Varun

  34. Comment by Dave Rathbun

    A fan trap simply means you are defining (and using) measures from two parts of a one-to-many join. Therefore you can actually have a fan trap with only two tables. Assume you have a measure that is a count() or something from the dimension table (assuming the two tables are dimension + fact) and you have a fan trap, where the count will be inflated.

  35. Comment by Fabrice

    Hi Dave,

    Hi Dave,

    I’m in the case that you describe in a commentary (number 13), “You have hit on the main problem with the alias + context solution for fan traps; condition propagation.” You talked that you have been working on a blog post with some ideas to solve this problem, did you already complete this blog post?

    Thanks,
    Fabrice

  36. Comment by Dave Rathbun

    I have not yet written that up. Basically it requires a condition to be a sub-query, so as to isolate the filter application from the one-to-many relationship. If that makes sense. ๐Ÿ™‚

  37. Comment by DA

    I understand the concept of Chasm and Fan trap, my questions are:

    if there is scenario like A โ€“ < B -< C in this one table A is an aliased table โ€“ so does this still a Fan trap? If yes how do I solve it ?

    Now next scenario A -< B -< C โ€” E in this table A and D are aliased tables so does this join path creates a trap โ€“ its understood that B -< C -< D is Fan trap and how do I solve this scene?

    Appreciate the help,

    Regards,
    DA

  38. Comment by Jason

    Hi Dave,

    I understand how the fan trap can happen, but why would you purposely write a query that sums a measure that isn’t meant to be summed? If Budget is truly related to Order_ID on a 1-to-1 relationship (which it is, according to your example), there’s no reason to sum it and then group by the one field that is associated with it. Can you explain why one would do this? Instead, why not write your query like this:

    select orders.order_id
    , orders.order_budget
    , sum(order_lines.order_qty)
    from orders
    , order_lines
    where orders.order_id = order_lines.order_id
    group by orders.order_id
    , orders.order_budget

    I’m confused. (Disclaimer: I am not a BI universe designer)

  39. Comment by Dave Rathbun

    Jason, the challenge rises in what happens if the order lines table is not included in the query. In that case you typically would want to sum the Budget amount.

    Bottom line is that a universe is simply a framework…a collection of bits that a user can pull together. As the universe designer you want to provide the most flexible framework possible, and that includes solving fan and chasm traps for any future “potential” query. When you write SQL by hand for a specific purpose, you do this already. As you mention, you simply would not include the sum on the order_budget column in the case above. Or if you wanted to sum the budget you would write a query that doesn’t include the detailed table. The beauty of a BOBJ universe is that you can solve for both using the appropriate techniques. ๐Ÿ™‚