Did J.R. Smith Miss?
Posted by Speros in QlikView, Visualization on May 15th, 2013
During last night’s Knicks - Pacers playoff game, J.R. Smith turned in a less-than-lackluster performance. Twitter account @DidJRSmithMiss live-tweeted the showcase and summed up the results at the end of the night:
Recap: yes, yes, yes, yes, no, yes, yes, yes, yes, yes, no, yes, yes, no, yes, yes, no, no, yes, no, yes, no.
— Did JR Smith Miss? (@DidJRSmithMiss) May 15, 2013
Here is his tweet visualized using QlikView:
Advanced Table Formatting in QlikView
Posted by Speros in QlikView, Visualization on April 28th, 2013
I was looking through tweets about the NBA playoffs as it’s that time of year and stumbled upon Andrew Powell-Morse’s NBA Fines: Ballers Behaving Badly. In the post, Powell-Morse uses various data visualizations to illustrate how fines have been distributed across the NBA in the last decade. One chart in his post that stood out was the following straight table:
What struck me about the chart was the formatting used. By coloring specific regions and reducing ink in other areas, the chart organizes the information in a helpful way while increasing the presentation value of the overall image. Take the top 5 rows for example. Often at clients, I receive requests to illustrate the top 5 or 10 dimensional values based on a metric. This approach can feel limiting to myself and the client when there is plenty of other data to explore, especially if the top values do not change often. However, when I follow my natural instinct to add the entire list of values, still sorted by the metric, the users often feel lost. While adding the full list provides more data, it also provides more noise that drowns out the top values. It does not jump out to users immediately who the top 5 members are, even if the obvious explanation is that they are the top 5 rows.
Formatting the top rows separately provides a clever compromise to this problem. By highlighting top values with a different color, the table can provide the full list of values to explore while still intuitively presenting users with the top values that may be of interest.
The table utilizes some other interesting formatting techniques. Rather than highlight the entire row, only the dimensional value is highlighted, reducing the amount of ink necessary to prioritize the data. I also liked the organizing of the metrics and their percentages of the total, as well as the white space included between colored regions to keep the colors from bleeding into each other.
After looking at this table for a few minutes, I decided to try to produce a similar result in QlikView. First I loaded the source data and created the base chart without any formatting changes. All screenshots are taken in IE8 using AJAX.

- Table border
- Vertical cell borders
- Numeric formatting
- Header names
- Total row label and placement
- Header and total row coloring
Even with this simple set of changes, I had some struggles. For example, I had to use Custom Format Cell to alter the colors of the header and total row. This presented a problem when I tried to edit the total row on the bottom - it was not possible to select it in the Custom Format Cell menu. However, moving the total row to the top position gave me access in the menu and the ability to alter the formatting. Returning totals to the bottom of the table retained the formatting applied.
The result:

Halfway there. My final step was to add some of the special formatting from the original chart. This presented a special set of challenges:
- Coloring the top 5 dimension values - Using the rowno() function in conjunction with the dimension background formatting setting disabled my defined sort order. Replacing with rank(Expression) solved this problem
- Inserting white spaces between the columns - I accomplished this using empty expressions, but there is a limit to the minimum width of the columns that is undesirable
- Line coloring - Custom Format Cell allowed this, but I also had to manipulate the line coloring for the header and total rows. The combination of bottom and top border row colors across cells, headers, and rows was not obvious.
- Column formatting - I adjusted the formatting to clean up the presentation. This includes right aligning all of the numeric values and moving the “$” and “%” signs to the column headers.
The end result, with a title:

A decent amount of effort, but the end result is a dressed up table that may be suitable for a styled, presentable dashboard.
Some notes on the formatting constraints of QlikView:
- I couldn’t apply custom formatting to the total row like the header row. The total row seems to follow the formatting of the table cells, even if you change it with Custom Format Cell
- The column width minimum size is frustrating
- The header shares font settings with the table. In the QlikView chart, it would be nice if you could de-emphasize the headers by making them a lighter grey
- The lack of margins makes the table harder to read. Notice how poorly the dimension values fit inside the rows. The descenders of the font even cut off! This was true in both IE and Chrome.
Logging row counts
Posted by Mike S in Data Model, Load Script, QlikView on April 27th, 2013
Something that would be nice to see in QVW log files is the number of rows that result from executing a join.
It would be useful to test assumptions about unfamiliar data, like the distinctness of a field on which you are joining, and for people who are new to transforming data and trying to understand the impacts of different join types. As it is, the “lines fetched” that appear in the log after the join is the number of rows retrieved in the load that was joined to another table.
To help remedy this, I wrote a subroutine that can be called to return the row count of any table at any point during the script execution:
Sub RowCount(vTableName) LET vRowCount = NoOfRows('$(vTableName)'); TRACE --Current '$(vTableName)' row count: $(vRowCount); End Sub
Here’s how you call it, where “Base” is the table name:
CALL RowCount ('Base')
And here is the result in the execution window, which is mirrored in the log file:

Here is a slightly more advanced version of the subroutine, which displays the delta, as well, if the same table is called multiple times in a row, which would also be nice to use following concatenate loads:
LET vLastTableName = ;
Sub RowCount(vTableName)
LET vRowCount = NoOfRows('$(vTableName)');
LET vRowCountChange = If(vTableName = vLastTableName, ' (' & Num(vRowCount-vLastRowCount,'+#,##0','-#,##0') & ')', '');
TRACE --Current '$(vTableName)' row count: $(vRowCount)$(vRowCountChange);
LET vLastTableName = vTableName;
LET vLastRowCount = vRowCount;
End Sub

Concatenate loading rows: behavior and performance
Posted by Mike S in Data Model, Load Script, Performance, QlikView on April 24th, 2013
Given two lists of values that I want to concatenate into a single list of distinct values, I noticed some interesting behavior: It didn’t matter whether I used DISTINCT in Load 1, Load 2, or both to make the final list of values distinct.
//Load 1 Base: LOAD [DISTINCT] * INLINE [ FV 1 1 2 2 3 3 ];
//Load 2 CONCATENATE (Base) LOAD [DISTINCT] * INLINE [ FV 3 3 4 4 ];
Here are the results of all combinations of DISTINCT usage:
| Scenario | Load 1 | Load 2 | Resulting Rows |
|---|---|---|---|
| 1 | No distinct | No distinct | 10 |
| 2 | No distinct | Distinct | 4 |
| 3 | Distinct | No distinct | 4 |
| 4 | Distinct | Distinct | 4 |
What number of rows would you have expected to result from the second and third scenarios? I would have guessed incorrectly.
Looking in particular at the second scenario, it retroactively made Load 1 distinct, even though there was no DISTINCT for that load. The number of rows in the table went down between Loads 1 and 2, and the log file bears that out. (When concatenate loading, the “lines fetched” is the total rows in our table at that point in the script execution.)
Scenario 1:
Load 1 (Non-distinct) -- 1 fields found: FV, 6 lines fetched Load 2 (Non-distinct) -- 1 fields found: FV, 10 lines fetched
Scenario 2:
Load 1 (Non-distinct) -- 1 fields found: FV, 6 lines fetched Load 2 (Distinct) -- 1 fields found: FV, 4 lines fetched
So what it must be doing in these cases is reloading the entire resident table when concatenating more rows. If that is true, then what would the scripting and reload performance implications be? For one thing, when creating concatenated fact tables, I would load the source tables in ascending order of row count to minimize the reload time. But, sticking to the original scenario…
Expanding my sources to two 10 million row QVDs with randomly generated integer values, then repeating scenarios 2, 3, and 4, here are my results:
//Load 1 Base: LOAD [DISTINCT] FV FROM Base1.qvd (qvd); //Load 2 CONCATENATE (Base) LOAD [DISTINCT] FV FROM Base2.qvd (qvd);
| Scenario | Load 1 | Load 2 | Reload Time (seconds) |
|---|---|---|---|
| 1 | N/A | N/A | N/A |
| 2 | No distinct | Distinct | 26 |
| 3 | Distinct | No distinct | 30 |
| 4 | Distinct | Distinct | 30 |
While scenario 4 is what I would naturally be inclined to write, it turned out to be a few seconds faster to allow the first load to be non-distinct, then wait and apply the DISTINCT only once, after all of the data had been combined. I repeated this test several times with the same result, and I would wager that the improvement would be even more pronounced with more rows or columns or more than two sources.
Updated RSS aggregators for viz and QlikView
Posted by Mike S in Miscellany, Presentation, QlikView, Visualization on April 24th, 2013
I updated the RSS aggregator feeds in the lower right and moved from Yahoo! Pipes over to Feed Rinse. Here are the current contents:
QlikView
- (this site)
- …and points beyond
- BI Review
- iQlik
- Living QlikView
- Michael Ellerbeck
- mindspank
- Official QlikView blog
- One Qlik at a Time
- Qlik Tips
- qlikblog.at
- QlikMetrics
- Qlikshare
- QlikView Addict
- QlikView Maven
- QlikView Notes
- QlikView Podcast
- Quick Inteligence
- Quick Qlear Qool
- QV Design
- The Qlik Board
- The Qlik Fix
- TIQView Blog
Visualization/Presentation
- Flowing Data
- Information Is Beautiful
- Juice Analytics
- Junk Charts
- Presentation Zen
- The Extreme Presentation Method
- The Why Axis
- Visual Business Intelligence
Performance (and other) benefits of using expression column and label references
Posted by Mike S in Performance, QlikView on April 23rd, 2013
As has been documented elsewhere, there is not always a performance benefit from using column or label references in charts, but there are performance benefits when you really need them: resource intensive functions. Consider the following example, using the dreaded Aggr function.
Source data:
Base: LOAD 'C' & Left(RowNo(),1) as Country, Left(RowNo(),4) as OrderID, Round(Rand()*500) as Sales, Round(Rand()*30) as UnitCost, Round(Rand()*10) as Quantity AUTOGENERATE 10000000;
Output:

Scenario expressions and calc times:
| Explicit | Label | Column | Explicit (sloppy) | |
| Average Order Sales | Avg(Aggr(Sum(Sales),OrderID)) | Avg(Aggr(Sum(Sales),OrderID)) | Avg(Aggr(Sum(Sales),OrderID)) | Avg(Aggr(Sum(Sales),OrderID)) |
| Overall Average Order Sales | Avg(TOTAL Aggr(Sum(Sales),OrderID)) | Avg(TOTAL Aggr(Sum(Sales),OrderID)) | Avg(TOTAL Aggr(Sum(Sales),OrderID)) | Avg(TOTAL Aggr(Sum(Sales),OrderID)) |
| Average Order Sales Index | Avg(Aggr(Sum(Sales),OrderID)) / Avg(TOTAL Aggr(Sum(Sales),OrderID)) | [Average Order Sales] / [Overall Average Order Sales] | Column(1) / Column(2) | avg(Aggr(sum(Sales),OrderID)) / avg(TOTAL Aggr(sum(Sales),OrderID)) |
| Indicator | If( Avg(Aggr(Sum(Sales),OrderID)) / Avg(TOTAL Aggr(Sum(Sales),OrderID)) > 1, ‘qmem:///BuiltIn/arrow_n_g.png’, If( Avg(Aggr(Sum(Sales),OrderID)) / Avg(TOTAL Aggr(Sum(Sales),OrderID)) < 1, ‘qmem:///BuiltIn/arrow_s_r.png’ )) |
If( [Average Order Sales Index] > 1, ‘qmem:///BuiltIn/arrow_n_g.png’, If( [Average Order Sales Index] < 1, ‘qmem:///BuiltIn/arrow_s_r.png’ )) |
If( Column(3) > 1, ‘qmem:///BuiltIn/arrow_n_g.png’, If( Column(3) < 1, ‘qmem:///BuiltIn/arrow_s_r.png’ )) |
If( Avg( Aggr( Sum( Sales ) ,OrderID ) ) / Avg( TOTAL Aggr( Sum( Sales ) ,OrderID ) ) > 1, ‘qmem:///BuiltIn/arrow_n_g.png’, If( Avg( Aggr( Sum( Sales ) ,OrderID ) ) / Avg( TOTAL Aggr( Sum( Sales ) ,OrderID ) ) < 1, ‘qmem:///BuiltIn/arrow_s_r.png’ ) ) |
| Calc Time (Init) | 2059 | 1186 | 1154 | 3198 |
| Calc Time (Filter) | 1201 | 670 | 671 | 1841 |
Column and Label reference examples performed almost 2x as quickly as explicit formulas.
Note the other possible danger of using explicit expression formulas in the final column: writing an explicit formula slightly differently, like changing the case of a function or inserting spaces before or after parentheses, makes performance even worse (about 3x worse, in this example). QlikView doesn’t detect that sum(Sales) and Sum(Sales) mean the same thing within a chart, even though they have the same result. With column or label references, inconsistency of expressions within a chart isn’t a possibility. They also result in fewer expressions to maintain if fields or business rules change, and derived expressions are shorter and easier to understand.
In another example, I found a slight calculation time improvement with column and reference labels when I added a flag to the data model, then used IF statements to conditionally return the values where the flag was equal 1. That said, I found no performance improvement when using set analysis to accomplish the same end, e.g. {<Flag={1}>}. In no instances tested was the performance worse with column or label references than explicit expressions.
To summarize, the best case scenario is that using column or label references shortens calculation time and makes the chart expressions easier to understand and maintain. The worst case scenario is that using column or label references merely makes the chart expressions easier to understand and maintain. Thus, consistently using column or label references remains a good development practice. Just remember to Disable Drag and Drop in the chart if you are using column references in a table.
Bob Pozen on Personal Productivity
Posted by Mike S in Productivity on May 26th, 2011
Harvard Business Review printed this series of blog posts from Bob Pozen in its May issue, which focused on productivity.
He manages to touch on topics that are relevant to most every knowledge worker, including PowerPoint and business travel, but here is one of the better excerpts:
I’ve seen many executives realize what they wanted to say only after they had written a lengthy draft. That’s unfortunate. You should know where you will end up before start.
Try this: After composing an outline, write the concluding paragraph. That will tell you whether you really know where your article or memo is going.
The Joy of Stats, and some less joyous stats
Posted by Mike S in Miscellany, Reading, Visualization on December 30th, 2010
Hans Rosling’s The Joy of Stats is now available in its entirety on YouTube, via FlowingData:
You may recognize Rosling from his 200 Countries, 200 Years, 4 Minutes video, which has gotten a lot of press - on the Internet, at least.
This dovetails nicely with the classic How to Lie with Statistics, which I just read over Christmas. It re-sharpened my eye towards deceptive visual representations, and not a day after finishing it did I see some intellectually dishonest charts on The Heritage Foundation’s Top Ten Charts of 2010. Their #1 is pretty egregious - see if you can guess why:

“BI has hit the wall”
Posted by Mike S in Business Intelligence on September 28th, 2010
I Am the Orson Welles of PowerPoint
Posted by Mike S in Presentation on September 17th, 2010
From McSweeney’s Internet Tendency:
I believe that sometimes a PowerPoint demands five consecutive pages of full bleed abstract images. Just to make people feel. To get mid-level managers to loosen their ties and take off their name card necklaces. Too much logic is the death knell of any PowerPoint pursuit.




