Aug 17 2010
Handling Conditions on Outer Joins
I don’t like outer joins in my reporting universes. Never have. Sure, if I am creating a universe against an application system I might consider using outer joins because of the normalized nature of the data. But if I am reporting against a warehouse schema of some kind, I really prefer to use inner joins. That way I avoid any potential performance issues caused by outer joins, but more importantly I avoid questions about report data. That being said, outer joins do have a specific purpose, and if I need to use them in my universe I certainly can.
One of the biggest challenges with outer joins (other than potential performance issues) is explaining to a user why their query results changed because they added a condition to their query. Remember that users don’t (typically) look at the SQL, so they won’t know that I have created an outer join. It can be confusing. Fortunately I have options as to how my outer joins are executed, so once I determine their usage requirements I can change the way my universe behaves.
Defining the Problem
For this post I will am going to use a very simple universe with only three tables, shown here.

This universe joins a customer to an order, and an order to order lines. In my database I have one customer that does not yet have any orders. If I run a query against the current universe structure, this new customer will not show up. My requirement is to show all customers, whether they have orders or not. This must be true even if I put a condition on the order table. That’s where it gets tricky.
ANSI_92 Parameter Setting
In order to make this work, I have to make sure that my universe parameter setting for ANSI_92 is set properly. From the File menu I will select Parameters, and on the tabbed dialog box that appears I will click on the Parameters tab. I need to set the ANSI_92 parameter to “Yes” in order for this technique to work. Without this setting, my joins will be created using Oracle-specific syntax and that isn’t what I want here.

Once I have verified this setting I can close the parameter screen. Next I need to define my outer join.
Defining an Outer Join
The outer join definition is done the same way with all databases, but it’s not always done in the same direction. In some databases I have to check the primary table and in others I need to check the secondary or optional table. Rather than trying to remember which database does what, I will pick a side and check to make sure that the tiny “o” appears on the expected side of my join. For example, I want to reconfigure my existing inner join between customers and orders and make it an outer join as shown here.

The structure window looks a bit different now.

As mentioned earlier, the tiny “o” on the right side of the join denotes the optional side. If the “o” is on the wrong side, I will open the join properties window and select the other checkbox.
That’s how I create an outer join in my universe. Let me show how it works first, then I will show how I can break it.
Outer Join Queries
I have exported my outer join universe and created a very simple query. I have included the customer ID, name, and their order total. There is one customer (appropriately named “New Customer”) that has no orders and their order amount shows up blank because of the outer join setting.

So far everything is working as I expected.
Breaking an Outer Join
An outer join is designed to allow rows that don’t completely match to still show up on my report. What happens if I apply a condition to the outer relationship? For example, I might want to add a condition to limit orders to a particular date range.

Here are the results after running this query.

What went wrong? (Answered on next page…)
Pages: 1 2
The date restriction is in the Where-clause which limits the output to customers with orders only. If the condition is moved to the outer join definition this query would produce customers without orders and customers that meet the condition on order date.
Hi Dave,
The row is missing because the customer doesn’t have an order date, therefore that row will not be found in the range you picked.
To my opinion only solid RI (relationship integrity) is the real answer for getting rid of outer joins.
Negative results when using Outer joins:
1. Bad performance
2. Writing Free hands (sometime the outer join is opposite to the one in the universe)
3.Miss understanding of the users the business logic rather than using RI and giving “Unknown” or zero values for unmatched rows between the fact table to the dimension table .
Hi, folks, make sure you read “page 2″ of the post, because I do answer the question left at the bottom of page one.
Thanks for the tip Dave, that was a feature that I was unaware of. I think that this also illustrates the value of using properly designed star schemas for reporting. A lot of people are hesitant to snap their data off into a warehouse structure because they fear the complexity and time required to do it. In my experience, 90% of most reporting requirements can be dealt with using a very simple datamart that is loaded nightly. And the time spent building this is saved by the speed with which reports can be designed.
Thank’s Dave. I never explored the advanced tab (simply overlooked it for many years). That’s what you get when you think you know it all
Hi, great article and always good to learn new tricks! My initial thought on this would to resolve the problem in the report: have two queries, one listing all customers and the other getting customers with orders in the data range. Then after merging on customer dimension, and setting the ‘extend merged dimensions’ option, we should be able to build table of required result.
Hi, Al, thanks for your comment and welcome. Using two data providers is indeed one way to answer the question of “which customers have no orders” but it’s not the most efficient. If I want to see all customers and be able to identify those with no orders it works fine. But if I want only a list of customers with no orders, it’s going to download too much data and overwhelm me with data that exists when I am interested only in finding those customers where data does not exist.
I’ll write up the techniques I have in mind in another blog post at some point.
Hi,
We can achieve the same using Derived table by pushing the filter condition (between) to the definiton of derived table and apply the outer join with customer table? isn’t it?
We have overlooked this option for these years. Thanks throwing light on this option (reason being most cases we wont edit PARAMETER options (ANSI 92))
Please share your thought on this view.
HI Dave..,
I have encountered the same issue, i used to handle this by adding OR condition in Where Clause as below.This will get the result as expected.
SSG.ORD.ORDERDATE BETWEEN ‘01-01-1987 00:00:00′ AND ‘31-03-1987 00:00:00′ or SSG.ORD.ORDERDATE is null
Thanks,
Thani
Okay Dave….. I got a related issue that I had hoped would be solved with Nested Derived Tables and forcing the filter in the FROM clause. I had hoped you had an idea for me.
I have 1a table that I alias twice to join to itself for VARIANCE reporting. There are 5 keys that may or may not be in both data sets. Thus I need to full outer join. However, I need to restrict the table BEFORE doing the full outer join (otherwise, the full outer join is negated.) I was hoping that I could expose a NESTED DERIVED table that the user can add FILTER CONDITIONS to and that “table” would be included in the derived table with the conditions nested, before I outer join. For the longest time I had belived Netsed Derived tables were going to work like this. They do not.
Any ideas on getting user supplied Filters into the inline tables From cluase?
thansk!
Dave, your blog provides so much information and very much happy with that. Thanks a ton.
But i would like to request you to cover a topic in Designer for “Aggregate awareness for other purposes”, if possible.
Some of the people think that it is just confined to store pre-calculated data to use sum() but i heard, not sure, it can be used for many other purposes.
Hope you will take up this subject in your next post.
Thanks
Raj
Hi, the @AggregateAware() has been on my list of topics to cover for quite some time. Unfortunately it never seems to bubble to the top.