Jun 16 2010

Universe Models For Recursive Data Part I: Introduction

This is the first of several posts that will review my presentation “Universe Models For Recursive Data” which was originally presented at the 2009 GBN conference, then at the North Texas / Oklahoma ASUG chapter meeting, and finally at the Mastering BusinessObjects conference in Melbourne. After presenting it three times it seemed like an appropriate time to (finally) get started writing up the blog posts. As with my other presentations there is a PDF file that can be downloaded from my conference presentations page.

This post will cover slides 6 through 13 as a basic introduction of recursive data and challenges presented to universe designers.

Defining Recursive Data

Sometimes there is confusion about the distinction between hierarchical and recursive data. Hierarchical data does not present a big challenge for BusinessObjects. It can be something related to time (Year, Quarter, Month, Day), geography (Country, Region, State, City), or something more specific like an accounting structure (Business Unit, Account, Sub-Account). What makes this hierarchical structure work easily is that each element is stored in a different place. It could be in a different column in the same table (flattened) or even in different tables (snowflake). As long as I can drill from one column to another in the hierarchy everything works fine.

Self-referencing or recursive data may initially look like a hierarchy. The key difference is that all of the elements are stored in the same place. There are keys that relate one row in a table back to a different row in the same table. That’s how recursive data is different from hierarchical data.

Why is recursion is a problem for BusinessObjects? The language used “behind the curtain” is SQL, and SQL does not natively support recursion. Some database vendors offer extensions (for example the CONNECT BY PRIOR structure in Oracle) but these are not used by BusinessObjects.

How common is recursive data? It is certainly not unusual. Consider any of the following:

  • Company organizational structure
    Object levels: President – Vice President – Director
    Object type: Person
  • Inventory BOM (Bill of Materials)
    Object levels: Product – Assembly – Sub-Assembly – Component
    Object type: Inventory item
  • Project Management
    Object levels: Project – Task – Sub-Task
    Object type: Project entry
  • Multi-Level Marketing (MLM)
    Object levels: Founder – Recruit – Recruit Level 2
    Object type: Person

In each of the above examples the type of object (or node) type is the same at any level. For example, a company organization chart is made up of people. Some people are at different levels, and there are therefore relationships from one person to another. In order to show all of the relationships from the top of the company to the bottom (or the bottom to the top) I have to keep going back to the same table. That is recursion.

Because it’s easy to think about a company organizational structure I used that example for the rest of the presentation.

Note: The Motors database is used in the standard Universe Designer training course and will not be presented in its entirety in the download package for this presentation for copyright reasons. However, I will be providing the standard HR table and all of the modified versions used in this presentation.

Example of Recursive Data Using Prestige Motors HR

A picture will help at this point. Here is a screen shot from the Prestige Motors HR universe that I built for this presentation. Notice that there are two tables in the picture, but one is an alias of the other. In other words, I am really using the same table twice.

screen shot of recursive relationship in a BusinessObjects universe

The table on the left is the Employees table. I have aliased the table and called it Manager. The two tables are joined using the link from EMPLOYEE.EMP_MGR_ID to Manager.EMP_ID. Since this is really the same table twice, this join defines the relationship from any particular person to their immediate manager. It’s a recursive relationship from a person to a person.

Notice that in this case I have defined the join as an outer (optional) join? That’s because the top person in the company does not have a manager, and the relationship would fail in that case. I want to ensure that I return every person and their manager… even if that person does not have a manager. Here is a sample of some of the data to help show why this is important.

Sample data from HR table

I can review the relationships manually if I want. I can look at the data (shown above) and determine that Pickworth works for Noakes. Davis and Ferrerez also work for Noakes. How am I making that determination? Each of those three folks has a manager ID of 101, and 101 is the employee id for Noakes.

Who does Noakes work for? The EMP_MGR_ID column is blank (null) for Noakes, which implies that he is at the top of the company organization chart.

Another way to see where people fall in the organization chart is to look at their level. Here is output from a report that I eventually will want to generate from my recursive data. It is shown in the format of a tree, with each person showing up as a node on the tree.

Tree output from HR database table

Noakes is at level 1. Davis, Ferrerez, and Pickworth are all at level 2. But the tree does not stop there. I have employees at level 3 and level 4 as well.

Typical Recursion Questions

This brings me to the set of questions that I need to be able to answer with my recursive data. I need to know:

Who do I work for?
Who works for me?
Who works at my same level and shares the same manager?
Who is my manger’s manager? My manager’s manager’s manager?
What is the total salary of my direct reports (people who work directly for me)?
What is the total salary of my indirect reports (people who work for people who work for me)?

I am sure there are many more questions but these should serve as a starting point. Some of the questions only require one level of the hierarchy (who works for me, or who do I work for). Those are simple enough to answer, and in fact can be answered with the simple alias structure already shown in this post. But in order to traverse the tree for multiple levels I need a solution that is a but more robust.

Next Time

In the next post of this series I want to talk about some of the different challenges encountered when working with recursive data. Once I define the challenges I will be in a position to start talking about solutions. As a preview, here are the four types of hierarchies I will be talking about:

  • Clean – a hierarchy with clean data, consistent node depths, and consistent node paths
  • Unbalanced- a hierarchy with inconsistent node depths
  • Ragged- a hierarchy with inconsistent node paths
  • Lateral- a hierarchy with sideways node paths

If it is not clear what some of those mean, don’t be too concerned; I will be defining each with examples in the next post.

Finally, here is a preview of the various solutions I will talk about:

  • Universe aliases
  • Flattened structures (columns or snowflake tables)
  • Ancestor / Descendant model
  • Depth first tree traversal

And a few that I won’t:

  • Oracle CONNECT BY PRIOR
  • Stored procedures

Part II of this series will talk in more detail about each of the recursive challenges. After I detail the different challenges the next post will talk about the solutions. My plans for the final post for this series are to review the impact of each solution on the native drilling functionality and then to wrap things up.

Related Links

5 Responses to “Universe Models For Recursive Data Part I: Introduction”

  1. Comment by Andreas (Xeradox)

    Dave, nice post, but of course I am looking forward to part 2 and 3 and …
    This is one of the things where traditional SQL falls flat on its nose IMHO, and I have to admit that OLAP seems the way to go here.
    Also the way hierarchies are represented in Web Intelligence XI 3.x is not optimal, I am looking forward to Pioneer (Advanaced Analysis) and Aurora (SAP BOE XI 4) in that respect.

  2. Comment by Dave Rathbun

    Hi, Andreas, I have part 2 and part 3 done, they are pending publication once I go back and do a final review and edit as needed. Parts 4 and 5 are in draft mode but essentially complete. Part 6 is only an abstract at this point. Lots of material coming, including scripts to build and populate the data used for the demonstrations that I used during my presentation.

  3. Comment by Martin

    Looking forward to part 2 !

  4. Comment by rey_havoc

    hi Dave, eager to know when part 4 and 5 getting published. your examples and explaination are fantastic as always.
    very thanks

  5. Comment by Reinaldo Nunez

    I hope isn’t too late to comment about the great examples and simplicity choosen to explain this topic. Great!