Aug 17 2010
Handling Conditions on Outer Joins
I submit the answer is that nothing went wrong.
When I set up an outer join, I told my universe that some customers did not have orders, and that was okay. When my user set up their query, they said they wanted to restrict their report to show customers and order totals but only for orders that fell between a certain date range. Where does a customer with no orders go in this case? Or in other words, what can I tell from an empty amount on this report? It doesn’t tell me if the customer has zero orders, or if they have zero orders within the date range. Those are two possible cases. In order to avoid the ambiguity, the outer join is turned off and my new customer is missing.
Altering the Outer Join Behavior
In the scenario I have outlined so far, I don’t think I would change anything. In my opinion, the answer provided is the correct answer. But there could be cases where I want to allow conditions on an outer join without breaking the outer join behavior. Fortunately it is possible to do that. What I have to do is move the condition (query filter) to the FROM clause instead of applying it to the WHERE clause.
Here is the SQL code that generated the report shown above.
SELECT SSG.CUSTOMER.CUSTID, SSG.CUSTOMER.NAME, SSG.CUSTOMER.STATE, sum(SSG.ORD.TOTAL) FROM SSG.ORD RIGHT OUTER JOIN SSG.CUSTOMER ON (SSG.CUSTOMER.CUSTID=SSG.ORD.CUSTID) WHERE SSG.ORD.ORDERDATE BETWEEN '01-01-1987 00:00:00' AND '31-03-1987 00:00:00' GROUP BY SSG.CUSTOMER.CUSTID, SSG.CUSTOMER.NAME, SSG.CUSTOMER.STATE
To fix this, I need to go back to my universe and fine tune the join settings for my outer join. I showed this same screen earlier but this time I am going to click the Advanced button so I can fine-tune my join behavior.

Clicking that button gets me to this screen.

Of the options shown, I can be selective or all-inclusive. If I want every single object from the impacted table to appear in the FROM clause for a condition, there is an easy setting to do that. In this example I am going to specify that only the Order Date object goes into the FROM clause.
The “Fix” In Action
After making these changes and exporting my universe, I can open my Web Intelligence report and see what happens. Here is what the report looks like.

And here is the SQL code.
SELECT SSG.CUSTOMER.CUSTID, SSG.CUSTOMER.NAME, SSG.CUSTOMER.STATE, sum(SSG.ORD.TOTAL) FROM SSG.ORD RIGHT OUTER JOIN SSG.CUSTOMER ON (SSG.CUSTOMER.CUSTID=SSG.ORD.CUSTID AND SSG.ORD.ORDERDATE BETWEEN '01-01-1987 00:00:00' AND '31-03-1987 00:00:00') GROUP BY SSG.CUSTOMER.CUSTID, SSG.CUSTOMER.NAME, SSG.CUSTOMER.STATE
Is this better or worse?
Remember that earlier I talked about how an outer join condition can make a report ambiguous. I remember (from looking at my full data set earlier) that there is only one customer in my database that has no orders. Yet in the results shown above there are two customers showing with no orders. One of them has no orders. The other has no orders within the time frame specified in the query filter.
Before moving the order date condition into the FROM clause here are the steps my database was using to create my report dataset:
- Get a list of orders
- Join the customers to the orders; include all customers even if they do not have a matching order
- Apply the WHERE clause to restrict the results to those orders in the date range
Since the WHERE clause happens after the join has been done, it essentially invalidates the outer join process. Customers with no orders have a NULL order date, and NULL orders do not exist between any date range. Therefore the rows are dropped.
Here is the modified process with the order date moved into the from clause:
- Get a list of orders, limited to those orders within the date range (the date condition is in the FROM clause now)
- Get a list of customers, and join that list to the list of orders returned in the first step. Customers do not have to match up with an order
In the modified version, the order date filter is still used to reduce the list of order rows, but it happens before the join to the customer list. That means every customer still shows up. It’s all in the sequence of operations, and that sequencing is controlled by whether conditions appear in the FROM or WHERE clause.
Which version is correct?
Conclusion
Ultimately it is up to the business user community and the universe designer to figure out which answer is the right answer. I would submit that removing the outer join in this case is the best solution, rather than trying to tweak how it behaves. If I am asking for oder-related information, to me it stands to reason that I only want to see customers with orders. I would rather do that than try to explain all of the different possible answers that users could get with an outer join.
If I want to get a list of customers that have no orders, there are more effective ways to do that than use an outer join. That may be another blog post.
But if I need to include an outer join in my universe, at least it’s nice to know I have options as to how it behaves. In this example I only tweaked a single object but as I mentioned I can extend the behavior to an entire table by selecting a different option. The “Advanced” button has been available since version 6 (if my memory is correct) and certainly appears in all versions of XI.
Addendum
Like many posts, this one was inspired by a question posted on BOB. The poster tried to use the Advanced tab, and was told that the join was “too complex” because it was a two part key. In my simple example I only had one-part keys for my joins. The easy way to work around this is to create two simple joins between the tables instead of a single complex join. By doing that the Advanced button works as I described in this post. Just make sure both simple joins are in the same context(s) if the universe includes that feature.
Pages: 1 2
The date restriction is in the Where-clause which limits the output to customers with orders only. If the condition is moved to the outer join definition this query would produce customers without orders and customers that meet the condition on order date.
Hi Dave,
The row is missing because the customer doesn’t have an order date, therefore that row will not be found in the range you picked.
To my opinion only solid RI (relationship integrity) is the real answer for getting rid of outer joins.
Negative results when using Outer joins:
1. Bad performance
2. Writing Free hands (sometime the outer join is opposite to the one in the universe)
3.Miss understanding of the users the business logic rather than using RI and giving “Unknown” or zero values for unmatched rows between the fact table to the dimension table .
Hi, folks, make sure you read “page 2″ of the post, because I do answer the question left at the bottom of page one.
Thanks for the tip Dave, that was a feature that I was unaware of. I think that this also illustrates the value of using properly designed star schemas for reporting. A lot of people are hesitant to snap their data off into a warehouse structure because they fear the complexity and time required to do it. In my experience, 90% of most reporting requirements can be dealt with using a very simple datamart that is loaded nightly. And the time spent building this is saved by the speed with which reports can be designed.
Thank’s Dave. I never explored the advanced tab (simply overlooked it for many years). That’s what you get when you think you know it all
Hi, great article and always good to learn new tricks! My initial thought on this would to resolve the problem in the report: have two queries, one listing all customers and the other getting customers with orders in the data range. Then after merging on customer dimension, and setting the ‘extend merged dimensions’ option, we should be able to build table of required result.
Hi, Al, thanks for your comment and welcome. Using two data providers is indeed one way to answer the question of “which customers have no orders” but it’s not the most efficient. If I want to see all customers and be able to identify those with no orders it works fine. But if I want only a list of customers with no orders, it’s going to download too much data and overwhelm me with data that exists when I am interested only in finding those customers where data does not exist.
I’ll write up the techniques I have in mind in another blog post at some point.
Hi,
We can achieve the same using Derived table by pushing the filter condition (between) to the definiton of derived table and apply the outer join with customer table? isn’t it?
We have overlooked this option for these years. Thanks throwing light on this option (reason being most cases we wont edit PARAMETER options (ANSI 92))
Please share your thought on this view.
HI Dave..,
I have encountered the same issue, i used to handle this by adding OR condition in Where Clause as below.This will get the result as expected.
SSG.ORD.ORDERDATE BETWEEN ‘01-01-1987 00:00:00′ AND ‘31-03-1987 00:00:00′ or SSG.ORD.ORDERDATE is null
Thanks,
Thani
Okay Dave….. I got a related issue that I had hoped would be solved with Nested Derived Tables and forcing the filter in the FROM clause. I had hoped you had an idea for me.
I have 1a table that I alias twice to join to itself for VARIANCE reporting. There are 5 keys that may or may not be in both data sets. Thus I need to full outer join. However, I need to restrict the table BEFORE doing the full outer join (otherwise, the full outer join is negated.) I was hoping that I could expose a NESTED DERIVED table that the user can add FILTER CONDITIONS to and that “table” would be included in the derived table with the conditions nested, before I outer join. For the longest time I had belived Netsed Derived tables were going to work like this. They do not.
Any ideas on getting user supplied Filters into the inline tables From cluase?
thansk!
Dave, your blog provides so much information and very much happy with that. Thanks a ton.
But i would like to request you to cover a topic in Designer for “Aggregate awareness for other purposes”, if possible.
Some of the people think that it is just confined to store pre-calculated data to use sum() but i heard, not sure, it can be used for many other purposes.
Hope you will take up this subject in your next post.
Thanks
Raj
Hi, the @AggregateAware() has been on my list of topics to cover for quite some time. Unfortunately it never seems to bubble to the top.