Oct 28 2009
When working with the reporting suite from Business Objects there are many different calculation engines. A report developer can create custom formulas or variables in Desktop Intelligence, Web Intelligence, and of course Crystal. A universe designer can build custom objects using database functions in the universe. An ETL architect can design special query transformations. So where do you do the work?
This post covers slides 6 through 9 from my 2009 GBN presentation titled “Return of the Variables” which can be downloaded from my conference page.
Push it Back, Push it Back, Way Back!
I can’t help but hear some football cheerleaders calling out inspirational words to their team as I write that heading. But the reality is that the concept is quite appropriate for this discussion. There are quite a few advantages to putting calculation logic into your ETL (extract / transform / load) tool. For example…
- Build on core data
ETL tools or other scripts working on core data have an advantage… they’re working on the core data. (That’s almost a recursive definition, which is another presentation. ) What I mean by that is there isn’t anything getting in the way between the process and the data. My scripts can do whatever they want because everything they need should be available. If it’s not immediately available I can build it in a temporary table and move on from there. ETL tools are very powerful in this regard.
- Procedural languages / scripts
Even in a simple ETL environment where no formal tool is being used I can still write procedural scripts. I can write procedural SQL (PL/SQL) or even C scripts. The only limits are my imagination and the grammar of the selected tool.
- Consistency across all access paths
This is a real key advantage of pushing calculations back to the ETL layer. Since this process is generally responsible for filling your data warehouse tables, and the calculations are done during that load, it means that any tool used to access that database inherits the results. It means I can use any query tool and still gain the benefits of the calculation results.
- Calculate and store once, retrieve many times
This is my primary differentiator between putting a calculation in the universe versus doing it in the ETL. I was recently given a block of code and asked to create a predefined condition in the universe to handle the logic. The problem was it was extremely complex and included a number of case statements and outer join requirements… and it was going to impact the fact table. I pushed back and requested that the logic be placed into the ETL. The end result was I got a simple Boolean flag (zero or one value) on the fact table that was much easier to use. An added benefit? If the logic required to populate that field ever changed it could easily be done in the ETL.
Unfortunately there are other issues to consider before putting a calculation into the ETL.
- Change Management
Not all of the aspects of doing calculations in the ETL are good. For example, most companies seem to have far stricter controls and change management processes around their ETL than they do for reports. Getting a change pushed through the ETL team can have a far wider impact and therefore can take longer and require more justification. Sometimes it is easier to keep a calculation closer to the report to avoid development delays.
ETL scripts can already be quite complex. Adding new calculations might slow the scripts down and cause them to run beyond their available load window.
- Impact Analysis
One of the primary advantages of including calculations in the ETL is that the results are shared by everyone. One of the disadvantages of including calculations in the ETL is that a change in that area will impact everyone as well. That means more teams to talk to in order to get a consensus that the change is appropriate and approval to start the process.
In my opinion, despite the potential for slower turn-around time on development requests and the need for greater impact analysis, complex calculations should be pushed back to the ETL if at all possible. This is especially true if the calculations impact join logic, affects security profiles, or needs to be performed on the fact where it will impact nearly every single query that is executed.
Using Universe Objects
The next possible location for calculations is the universe. There are quite a few advantages to putting something into the universe as opposed to having it done in the report. Such as…
- Build once – use many times
Universe objects are designed to be reusable. Once an object is built it can be used / reused in any number of reports. This is one of the primary reasons we even build universes (that and providing the abstraction layer so business users don’t need to know technical terms.)
- Use full range of database functions
There is a wide range of functions available inside each of the different report engines, but you can’t access the power of the database. I can use almost any function available from my database to build an object. And if there isn’t a function available to do what I want, I can build one.
- Ensure consistency from report to report
For quite a long time Business Objects used the slogan “Single version of the truth” when talking about their products. This was based on the concept of reusable objects that I already covered above. Since every business user will be using the same object for Sales Revenue, they should all get the same result when they ask the same question.
- Updates automatically propagate
Reports built on a universe will automatically get updates when they are published. This means if I discover a bug in one of my objects and fix it, every report that uses that object will get the updated code the next time the report is refreshed. This was a big advantage of the “classic” Business Objects tools over Crystal reports for many years. Any tool that requires a developer to manually write their own SQL code is subject to the same limitation.
That’s a nice list of advantages for universe objects. What about the disadvantages?
- Limited to information from a single universe
Universe calculations cannot combine objects from two different universes. In fact it’s worse than that; you can’t combine objects from two different contexts in the same universe! This can be a very limiting factor depending on how complex your universe models are.
- Maintenance required by universe designer
Getting universe work done should be less traumatic than changing the ETL, but it can still be a roadblock depending on the availability of your developer. Notice I said “developer” as it is very difficult to manage a process where more than one person works on the same universe at the same time.
- Some aggregation issues can be tricky
Percentages and ratios and average calculations are all difficult to do in a universe. There is a new feature called “smart” or “database delegated” measures that started in XI 3.0 that helps some but it’s still an issue to be considered.
- Some functionality might be missing from the database
It’s less likely today then when I wrote the very first Variables presentation back in 1997, but it might be that the functionality you want or need just isn’t available in your database. Rather than writing a custom database function it might be easier simply to create report calculations instead.
Report Structure Items
There are three basic options available in the report engines provided by Web Intelligence (and Desktop Intelligence as well if you’re using that tool.) They include constants, formulas, and variables. I will talk more about these three choices in the next post. For now I would like to consider the pros and cons of this option.
- Available on all platforms
As mentioned both Desktop Intelligence and Web Intelligence offer this feature. Crystal goes even further and provides a language that includes scripting features. A report writer can pick the appropriate tool for the job without worrying about losing a calculation engine.
- Independent of SQL restrictions
The grammar for local calculations comes from the report engine, not the SQL database. That means if you are using a database with limited functions (Sybase IQ comes to mind) you can still accomplish complex tasks in the report by using the available report functions instead. The first time I worked with Sybase IQ was years ago, and it didn’t even offer a way to retrieve the current system date from the host.
- Calculations based on document data
When you refresh a query you download a microcube into your report. The calculations are then done on that smaller summarized dataset rather than applied to the entire rowset processed during the query process. It could be a performance benefit to be able to do the calculations locally.
That all sounds great, but there are disadvantages as well.
- Stored in a single document
In the ETL and Universe section I talked about being able to reuse calculation results. I can do that with report variables too, but only in different reports (tabs) within the same document. If I want to use a complex formula in a new document I have to copy / paste or recreate from scratch.
- Require some level of technical expertise
Most report writers can build simple calculations, especially operations found on the toolbar buttons like sum and count. However, getting complex calculations correct can be a struggle even for seasoned developers. It took a long time for the concept of calculation context to “click” for me, and even today it can sometimes be a challenge.
- Volume of data could impact performance
Earlier I said that calculations at the report level might improve performance, and now I am saying that the volume could impact performance. Is this a contradiction? No, not really. What I said earlier was that if the report calculations are done on data already summarized by the database engine they should be fast. If the data is not summarized, meaning if the microcube present in the report has a large number of rows, then having additional calculations can certainly slow things down.
Wow, that was a lot of text. It’s easier to say all of this stuff than to write it down. Can you all do me a favor and just come to the conference next time so I don’t have to write so much?
In all seriousness, what I tried to do with these few slides in my presentation was to outline some of the general thoughts that go into deciding where to put a calculation. Is there a clear and obvious choice as to which to use? No, not really. I apologize if that’s what you were hoping for at this point. Each of these three areas (ETL, universe, report) has clear advantages. In general I prefer to push complex calculations as far back as possible. But if needed we have quite a few options available.
The next post in this series will focus on the different types of report calculations (specifically formulas and constants) and discuss which of those is better. In this case (unlike this post) there is a very clear choice. You can probably guess what it is… after all the presentation title was Return of the Variables.