Jul 06 2009
Measure objects are – in my opinion – the most critical objects in your universe. Without them you can only make lists. While lists might be good for a grocery store or library, they don’t do much to help someone run their business. Measures give your universe purpose. They also are the source of most of the design challenges like fan or chasm traps.
In this post I’m going to go back to a basic concept and talk about SQL aggregation and universe projection and how they work together.
An aggregate function is one that operates on a group of rows rather than a single row. Some SQL aggregate functions include sum(), count(), and max(). When you use an aggregate function you also have to use a GROUP BY clause to pull rows together. If the request is for Resort, Year, and a sum of the number of guests, then all of the rows are first grouped by the resort name and the year of the invoice and then the matching rows are summed up. This step happens on the database server.
Here is the code used in the Island Resorts universe for the Number of Guests object:
Here is the SQL generated for a query that includes Resort, Year, and Number of Guests:
SELECT Resort.resort, 'FY'+Format(Sales.invoice_date,'YYYY'), sum(Invoice_Line.nb_guests) FROM Resort, Sales, Invoice_Line, Service_Line, Service WHERE ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND ( Resort.resort_id=Service_Line.resort_id ) AND ( Service.sl_id=Service_Line.sl_id ) AND ( Service_Line.service_line = 'Accommodation' ) GROUP BY Resort.resort, 'FY'+Format(Sales.invoice_date,'YYYY')
Note: the condition on Accommodation is automatically applied for this object. It is not required for the SQL to be correct but is required for a business rule. The GROUP BY clause, however, is required for the SQL to be valid. The good news is that universe designers do not have to anticipate all of the potential combinations of objects a user might put together. This clause is created automatically by the query tools.
Measures can also include a Projection setting, as shown in this screen shot:
Note that there are several projection functions available, including None. What is the purpose of the projection function?
When a user is viewing a report they might see this:
From there they might remove the Year object in order to get a total rolled up by Resort. Once the Year is removed the block on the report dynamically rolls up and presents this information:
That is projection in action. The function selected on the drop-down box for the measure properties is used by the report client to know which summary function to apply.
How do these functions work together?
Is Aggregation Required?
I have seen universes where the designer did not use any SQL aggregation functions at all. Instead they assigned projection functions only. With no aggregation the database returns the rows as they are extracted. The impact? The report client gets a lot more rows, as shown here:
The first execution of this query returned only nine rows. There are three resorts and three years of data so there are nine combinations, resulting in nine rows of data. The second execution did not have any database aggregation so 324 rows were returned instead. The report looks the same, but it has to deal with a lot more data. You can imagine what the performance would be like if there were several zeros after either (or both) of the numbers from this example.
Is Projection Required?
If the universe does have aggregation, does it have to include projection as well?
Without a projection function, the measures will no longer roll up after an object is removed from the block. Here is what happens to the same report when the Year object is removed; see how the resort names still appear more than once? And the number of guests does not get rolled up? This is what happens when the projection function of “None” is used.
Aggregation and Projection Work Together
Every measure object should have an aggregation function. (If a universe designer can’t figure out which aggregation function makes sense, then the object should probably be something other than a measure.) Each aggregation operation typically gets combined with an appropriate projection function as well. Most of the aggregation + projection function pairs make sense, but there are a few that might require a bit more explanation. First, here is the chart that shows the typical matches.
|SQL Aggregate||Client Projection|
|Avg||None or Average|
The first three entries are easy to see; a sum from the database is projected using a sum on the client. The settings for counts and averages might require a bit more explanation.
I created a new object in Island Resorts called Invoice Count with the SQL code
count(Sales.inv_id) and a projection function of Count. This is wrong, and I will show you why. Here is a query with Country of Origin, Region, and Invoice Count as results.
Notice that in the raw data I do get a count of invoices by region and country. However, here’s what the block looks like.
The projection function of “count” is counting the number of rows for each combination. Oddly enough, there is one row of each. In order to get the correct results, I need to use the Sum projection function for my count, which gives me the following corrected data.
I covered averages at some length in a prior blog post so I will skip covering the same ground here. To summarize (in case you don’t want to read the link) averages cannot be projected because the math doesn’t work. The only exception (where you can average an average) is when the sample size is exactly the same for all values.
Every measure object should have an aggregation function in order to group rows on the database and return a smaller dataset to the client. The projection function should coordinate with the SQL aggregation function in order to support “slice and dice” operations on the client.