Squeezing more performance out of a large, concatenated fact table


  1. Several years ago, I discovered that joining to a resident table in QlikView causes the resulting table to be ordered by the key field, in that field’s original load order. See screenshot below. This would be like if you executed an ORDER BY and had an option for ascending original load order as a sequence.
  2. I also discovered that, similar to indexing a table in a database, ordering a table by a field commonly used in calculations, filters, and dimensions can improve the general calculation performance of an application. I found this to be especially true when ordering a table containing a dimension that was referenced in an Aggr() function. In general, though, for large applications with various point-in-time analyses, trend graphs, and no problematic Aggr usage, the central date field is a great candidate.

Preview of data model after joining Customer to Sales fact table on CustomerKey:

Customer Sort

Recently I got around to doing some more testing involving both variables and codifying this in a subroutine so it would be easy to leverage more widely to improve application performance.

To the naked eye, the three tested scenarios have identical data models, anchored with a 39M-row concatenated fact table: the only difference is in how that fact table was sorted in the final application.

  • “Field Grouping” is grouping like field values together, as would result from a join, not necessarily in a natural sequence for analysis. See #1 from above. The first scenario is what I believe the default would be if you simply built a concatenated fact table with no consideration given to table sorting at all. The second scenario is left joining the distinct Date values back to the fact table in order to group all rows for the same Date together in the final application.
  • “Field Ordering” is the result of doing a true ORDER BY at the end of the reload. See #2 above. In the third scenario, the like Date values are not just grouped together but are also in sequential numeric order. I thought this may have made a difference because it would make all of the days in the same month, year, etc. adjacent in the application’s final data model.
Field Grouping:
Row Type
Field Grouping:
Field Ordering:
Size on disk (compressed) 267MB 263MB 282MB
Size in RAM 2,473MB 2,474MB 2,474MB
Initial page calc time (avg) 5.9s 3.9s 3.9s
Peak RAM consumption during subroutine - 3,738MB 4,645MB
Subroutine run time - 29s 2m40s

As you can see, in my test, applying logic to the fact table to group like Date values together improved performance by a full third, lowering calculation time from 5.9 to 3.9 seconds. Not bad for so little additional work and reload time.

As you can also see, I did not observe better performance from ordering the dates sequentially (scenario 3) compared to just grouping the like values together (scenario 2). Because that script’s execution took significantly longer and used a lot more RAM (resident loading the entire concatenated fact table again, with an ORDER BY), I would stick with the lower overhead field grouping approach.

Here’s the subroutine script I used for testing, ready to be re-used. You just have to plug in the table name and the field whose values you want to group.

Sub Ax.IndexResidentTable( v_TableName, v_GroupFieldName )

	LET v_StartSubTimestamp = Now();
	TRACE [MESSAGE]:  Grouping like '$(v_GroupFieldName)' field values in table '$(v_TableName)' to improve calculation performance;

	LEFT JOIN ([$(v_TableName)])
	LOAD FieldValue('$(v_GroupFieldName)',RecNo()) as [$(v_GroupFieldName)]
	AUTOGENERATE FieldValueCount('$(v_GroupFieldName)');

	LET v_SubDuration = Interval(Now() - v_StartSubTimestamp, 'h:mm:ss');
	TRACE [MESSAGE]:  Finished grouping like '$(v_GroupFieldName)' field values in table '$(v_TableName)' in $(v_SubDuration);

	LET v_TableName = ;
	LET v_GroupFieldName = ;
	LET v_StartSubTimestamp = ;
	LET v_SubDuration = ;

End Sub

Here is what you see when you call the sub:

Sort Panel

That said, if you are already normalizing the data model to reduce hops between tables by combining tables, this subroutine would not even be needed. Based on what we have learned about joins, we need only to join the calendar table to the fact table last in the overall sequence of joins to group the like date values together in the final data model.


No Comments

Lasso filtering in Qlik Sense Extensions

A few months ago I created a lasso plugin for D3 that mimics the lasso functionality available in Qlik Sense out-of-the-box charts. The ultimate goal in building the plugin was to use it in Qlik Sense Extensions to enable selections in the same way the out of the box charts function.

Recently, Brian Booden and Ralf Becher built an awesome hexagonal binning extension that utilizes the d3 lasso plugin to enable selections. Through this process they gave me some great feedback on issues they found. Having resolved those, I’ve updated the library and would like to demonstrate how others can integrate the lasso plugin with their own Qlik Sense Extensions.

Brian and Ralf's extension

Brian and Ralf’s extension

Read the rest of this entry »


1 Comment

US Choropleth Using QlikView Pivot Table

We had some conversations internally around the idea that it might be easier to read a choropleth if the regions on it (in this case US states) were all a uniform size and shape. It is possible to do this using QlikView’s standard pivot table. Here I’ll explain how to set this up and provide a sample application.

First, we start with a data set. We’ll need 2 tables. The first is our [State Data] that has the data we want to compare for each state. The other table, [Map Grid], has the row and column information that will tell us where to put each state on the map. I used a relatively simple data set for my [State Data] table – just a list of states and a random number between 0 and 1 to make it simple to use the ColorMix1 function on the front end. The [Map Grid] table will have 3 columns – the row, column and state name that will relate to our [State Data] table. We’ll still need to store rows for the blank spaces between states because otherwise there isn’t actually any data to show – there needs to be something there to represent a space.

Once you have your data set, you’ll load that into QlikView. The [State Data] table and [Map Grid] table should be related on the state field.

US Choropleth - script US Choropleth - data model

On the front end, create a pivot table and add the Row and Column from the [Map Grid] table as dimensions. Set the background color and text color properties to argb(0,0,0,0) – this will essentially hide the values Row and Column values from the user as they are not useful anyway.

US Choropleth - dim background color US Choropleth - dim text color

Go to the Presentation tab and turn on the always fully expanded option. We’re using a pivot table, but we don’t actually want the user to be able to use it like a pivot table.

US Choropleth - always fully expanded

In your expressions, add the state name field. I’ve set it up so the expression is actually only({1} State), so the grid always shows all states. Selections will be highlighted with colors instead of hiding the state in the chart.

US Choropleth - expressions

In the background and text color properties, we’re going to add an expression that will color the cells based on the value of the metric being measured. In the background color, first we’re checking to see if there is a valid state in the cell – if there’s not, we’ll just use that argb(0,0,0,0) value again to make it transparent. If there is a state name, we’ll use the ColorMix1 function to return a color between my c_RiskLight and c_RiskDark variables (light blue and dark blue) based on the value of Rating 1, which from the first step is a random value between 0 and 1. The text color is just going to check for the value of Rating 1 and return white if it is greater than .5 and black if it is less than .5. This will make it easier to see the state name against the different colored backgrounds.

US Choropleth - background color US Choropleth - text color

At this point, you have a standard looking pivot table, but it doesn’t look much like a map. You’ll want to take the Column dimension and drag that up towards the top over the State Name expression.

US Choropleth - unpivottedUS Choropleth - pivotingUS Choropleth - pivoted

There’s a few touch ups we can do to make this look a little nicer. Make the column width on the Row and Column dimensions as small as you can, turn off drop down select, and make the labels just blank spaces. On the style tab, click the background button on the bottom of the window and change the transparency to 100% (not the cell background color transparency – this will turn off all coloring on your states). Soften the cell borders between states by changing the cell borders transparency on the Style tab to 100% (or whatever you’re comfortable with). On the presentation tab, wrap the cell text to 2 or 3 lines and make the Data (Text) alignment centered for the State Name expression. On the Sort tab, un-check all boxes for both the row and column dimensions so when you select a state, it stays in place relative to the other states.

When you’re done, you should have something that looks like this:

US Choropleth - complete

You can see this on Qlik Branch here or git here.


No Comments

Hands-On Workshop (Hollywood, FL): Unleash Your Intuition

Learn How to Bring Your Data to Life

Imagine an analytical tool so intuitive, anyone in your company could easily create personalized reports and dynamic dashboards to explore vast amounts of data to find meaningful insights.

That’s Qlik® Sense - a revolutionary self-service data visualization and discovery application designed for individuals, groups, and organizations.

Please join us for a half day, hands-on workshop where you can roll up your sleeves so that you can see the power of Business Discovery firsthand. During this session you will:

  • Gain an understanding of why associative, in-memory analysis is currently driving the change in the Business Intelligence market
  • See the power of visualization and how it enables your organization to make faster, smarter decisions
  • Construct a fully-functional dashboard application
  • Learn how to navigate your application and present it to your colleagues

Axis Group will also be demoing one of our visual analytics applications.Register today to reserve your seat! We look forward to seeing you.

June 18, 2015
11:30 a.m. – 4:00 p.m.
Cocktail reception to follow

Registration is from 11:30-1:00 p.m. Lunch will be provided

Hard Rock Hotel & Casino 
1 Seminole Way
Hollywood, FL 33314

Who Should Attend 
IT or BI professionals
Business Analysts

What is Required 
Latest edition of Qlik Sense


No Comments

Consistent font usage

“…Thirteen’s just obnoxious.”


No Comments

We’re hiring QlikView consultants

Apologies for the lag between posts lately, but we at Axis Group are ridiculously busy…in a good way.

In fact, we could use the help of some additional experienced QlikView consultants, if you’re looking for an opportunity with our company. We’re hiring both engineer- and architect-level positions.

  • Required
    • QlikView knowledge
  • Recommended
    • Ability to use Tuftean in a sentence

See our job page for more information, but bear in mind that the location is negotiable if you are willing to travel.

We look forward to hearing from you.



d3vl extension and updates to senseUtils

I’ve uploaded a new extension to GitHub called the d3 visualization library, or d3vl for short. It is a collection of 30 examples from the d3js.org gallery incorporated into a single Qlik Sense extension. The goal is to show how easily d3 can be integrated with Qlik Sense.

View a demo of the d3vl and it’s charts here.

The d3vl also demonstrates the concept of using an extension as a library of charts, rather than a single chart. Using an extension as a library of similar charts has a few advantages, such as an easy installation, minimal impact on Sense’s chart object list, and common themes across the charts. In the future I hope to see more libraries like these, as opposed to one-off extensions. A great example would be a network extension that contains charts related to networking, all with a common styling and notation.

The senseUtils library has also been updated with new functions. I will write up a future post about how to use them, but here is a quick rundown:

  • pageExtensionData: pages an extension’s data; currently, the max page size for an extension is 10,000 cells, so anything beyond that will not be returned to your extension object. if you need to use more data than that, you can use this function
  • flattenData: flattens pages of extension data into one array. Currently, the page arrays are contained in an outer array. This function concatenates them into a single array
  • getDimLabel: gets the dimension labels for an extension
  • getMeasureLabel: gets the measure labels for an extension
  • setupContainer: sets up the container for an extension, which includes logic for emptying the container on redraw
  • extendLayout: extends the layout object by adding some useful properties. An example is a property on the qMatrix row data that has a pre-defined function for selecting that row’s values. This makes it possible to easily incorporate selections into visualization libraries like d3 without having to write new selection functions

A post demonstrating how to use these new functions will be available soon. In the meantime, d3vl uses a lot of these new methods and can be used as an example.



No Comments

Mastering QlikView Book Review

Stephen Redmond is a QlikView expert, as indicated by his book Mastering QlikView in which he clearly illustrates myriad advanced techniques from data modeling and performance tuning to expression writing and visualization best practices. While there is a disclaimer in the beginning of the book that the content caters to more advanced QlikView developers, I believe that Mastering QlikView is a must-read for anyone endeavoring to create QlikView applications. Though having in-depth knowledge of the tool does help to provide more context to the optimization portions, Redmond does an excellent job of giving the reader a basic overview of each chapter’s content prior to delving into more detail; it doesn’t read at all like a technical manual, but more like a tutorial.

Mastering QlikView starts with a very technical-heavy performance tuning and optimization guide with detailed instructions on how to debug any QlikView application for slow behavior; the section on using JMeter for user simulation is particularly well-written and explains each step carefully without assuming foreknowledge of the tool itself. Redmond also explains how to most efficiently acquire test data and provides links to sources he uses when testing a QlikView environment. He then goes on to detail the process of data modeling, theories on data warehousing, and performance testing on the front end – he even includes a guide on how to use Expressor to optimize the ETL process.

What struck me most about this book is that it not only answers the question of “How?” but also the question of “Why?”. Redmond gives sound reasoning and empirical proof of the benefits of using a variety techniques for each topic he covers. The section on data visualization is imperative for any developer to have in his or her back pocket in order to understand how and why users interact with QlikView applications (or really anything) the way that they do – he uses an example of a door with a handle that says “PUSH”, a situation to which I’m sure all of us can relate.

Mastering QlikView is a great addition to any QlikView developer’s bookshelf – even the most sophisticated developer would certainly come away with something new or at the very least a better understanding of QlikView. I will certainly be referring back to this book in my future engagements and would recommend that any QlikView developer, novice or expert, keeps Mastering QlikView close at hand while creating QlikView applications.


No Comments

Implementing the D3 General Update Pattern into a Qlik Sense Extension

Animation in data visualization is a relatively new area that we still don’t know much about. Like any visual component of a visualization, animations used in the wrong way can be unnecessarily flashy and distracting. However, there is evidence that animations can also add value to visualizations when used appropriately.

In this interactive example, Mike Bostock demonstrates the concept of object constancy:

a graphical element that represents a particular data point…can be tracked visually through the transition. This lessens the cognitive burden by using preattentive processing of motion rather than sequential scanning of labels.

Animation therefore can help users track changes in the data. When filtering, this can help the user discern what changes their filters caused, as opposed to having to reset their bearings on a newly appearing chart. When data in a chart is filtered, there are three possible events that can happen: data elements either update with new properties, exit because they were filtered out of the data set, or enter because they are filtered in to the data set. These are the main events we handle in typical Qlik Sense extensions. For more on animation possibilities, check out https://www.youtube.com/watch?v=vLk7mlAtEXI#t=54

Read the rest of this entry »


No Comments

Your Axis Doesn’t Always Have to Start at Zero

Recently, Justin Fox of Harvard Business Review shared a chart about law school enrollment that he found interesting:

The chart shows how law school enrollment has seen a relatively drastic decline in the last 3 years. You may have noticed that the chart’s y-axis doesn’t start at 0. This design decision caused a bit of an outrage amongst what are now being called “Y-Axis-Zero Fundamentalists”.

Justin shares a lot of the feedback and discussion around this point in his post The Rise of the Y-Axis-Zero Fundamentalists.

The discussion around this point is pretty thorough but I want to add my two cents.

Line charts aren’t just for displaying quantities. They are also used to examine changes in quantities. Sometimes, the change in quantity in relation to the entire size of the quantity is not as important as visualizing the direction and magnitudes of the changes in relation to each other. The chart Justin showcased is a perfect example of this type of analysis. It is probably not important to the average reader that law school enrollment was at ~38,000 in 1974, ~52,000 in 2010, and ~40,000 in 2013. What is important is that the change from 2010 to 2013 was significantly large and fast when compared to the historical trend of enrollment numbers. This analysis comes from comparing the slopes and positions to each other, not from comparing the distance of each point to a zero-based axis.

The exact same chart can be viewed with a zero-axis thanks to a slight variation. By changing the metric being plotted, we get:

Law School Enrollment

Is this chart now valid because the axis has a 0 next to it at the baseline? One could argue that each point is now quantified by it’s distance from the axis. That isn’t the point of this visualization however. The average reader probably does not want to quantify the % change since 1974. The metric also doesn’t provide any means to derive the nominals, a metric that is more real to a reader and easier to interpret.

The original chart’s trade-off of using a non-zero axis with nominal amounts in order to focus on relative changes while retaining a link to the nominal source data so that exact quantities can be derived is valid. The distance of each point to the axis is not visually defined like in a bar chart, so the implication that the distance is equal to the size of the value is not there. Scatter plots use the same positional method of encoding each data point, but I have never heard anyone say that scatterplot axes should start at zero.

In most cases, a zero-based axis makes sense, but it ultimately depends on the data and visualization used. If you are tracking the trend in revenue of a multi-billion dollar company with a line chart for example, you probably won’t learn much if you start the axis at 0.



No Comments