Two excellent posts about *just* data versus actual analysis

I’ve had these bookmarked for a while and periodically revisit them. They are worth checking out.

Juice Analytics: Filling the gap between reporting and Reporting
Avinash Kaushik: Consultants, Analysts: Present Impactful Analysis, Insightful Reports

  • Share/Save/Bookmark

No Comments

Commenting out an entire Load or Select statement

Rather than using double slashes (//) on all lines of a Load or Select statement or opening and closing comments (/* */) before and after, typing rem comments the script until it reaches a semicolon.

For instance:

SalesFact:
LOAD Date,
    CustID,
    Amount,
    Quantity
FROM Sales.qvd (qvd);

…becomes…

rem SalesFact:
LOAD Date,
    CustID,
    Amount,
    Quantity
FROM Sales.qvd (qvd);
  • Share/Save/Bookmark

No Comments

03/31/10: SAP achieves awareness.

04/01/10: The end of life as we know it.

(Cue Terminator theme.)

sap-site

This was the page that loaded when I logged into the SAP BusinessObjects Innovation Center (formerly BusinessObjects Labs) today.

  • Share/Save/Bookmark

No Comments

Axis Group is hiring experienced QlikView consultants

QlikView Consultant (Berkeley Heights, NJ/Atlanta, GA)

We are also hiring for entry-level positions and summer internships, so please pass this along to bright, young people you know who are pursuing IT-related degrees and might have an interest in Business Intelligence and Data Warehousing.

We are hiring for other positions, as well, related to different verticals and software partners. Here is the full list.

Again, please pass this along to anybody who you think may be interested, and feel free to contact me with any questions. Thanks.

  • Share/Save/Bookmark

1 Comment

Edward Tufte on NPR’s On The Media

Minister of Information (follow link to embed of audio)

Edward Tufte is perhaps the country’s foremost evangelist for the clean, clear and rich presentation of complex information. The Obama administration’s stimulus package is flooding the economy with 787 billion dollars for employment and public works projects. Put the two together, as Obama did earlier this month when he nominated Tufte for the stimulus advisory board with the hopes that the public will have a fighting chance of understanding where the stimulus money went and what it’s doing.

  • Share/Save/Bookmark

No Comments

Bar charts vs. line charts

It may be difficult to attribute the following points to a specific source, but here are all of the guidelines I can remember off the top of my head about bar charts vs. line charts, mostly learned from Edward Tufte and Stephen Few. It’s a bit of an art, and how you represent your data depends on what exactly you are intending to find in it, so it’s difficult to write finite rules that dictate what to do. If you want to learn more, check out their books on our Reading page.

line

Line

  • When to use them: Line charts should be used only for time series (chronological) or when there is some other sequence to the dimensions on the x-axis, e.g. dates, months, sequence of stages of a project, sequence of meters along on a gas pipeline, and they should be used to detect trends and patterns, not to give people exact quantitative readings.
  • Scale: As line charts are not really intended to give people exact numbers, forcing zero scaling is not necessary and can make it considerably more difficult to detect said trends and patterns.

bar

Bar

  • When to use them: Bar charts should be used for comparing specific x-axis values, though they can certainly be used for time series, like line charts. They can also be used to display parts of a whole in favor of pie charts, in which case, the space between the bars should be reduced.
  • Orientation: Do not use vertical or diagonal text to label the axis of a bar chart. If the x-axis has longer text descriptions, use horizontal bar chart, so the text can read left-to-right, horizontally (the way we normally read).
  • Scale: As the area of bars implies volume, it can be deceptive to use dynamic scaling with bar charts (see: Lie Factor). If the differences between the data points is difficult to distinguish with forced-zero scaling, use symbols/points in favor of bars and use dynamic scaling.

Applies to both

  • Dimension order: There should be some logical order to the dimensions on the x-axis. In the case of a line chart, it should follow the chronological, process, or stage order that caused you to select a line chart in the first place. In the case of bar charts, the order should have some rhyme and reason to it: sorted by y-axis value, alphabetical, etc., depending on the content of the chart and what its intended use is, e.g. ranking, distribution.
  • Scale labels: If the numbers are already being displayed on the data points, it is redundant to label the axis with numbers, too.
  • Axis labels: If you can incorporate the metric names and dimension names into the chart title or legend, do not waste space on axis labels.
  • Share/Save/Bookmark

1 Comment

The ultimate in information on demand

  • Share/Save/Bookmark

No Comments

From GraphJam.com: Things I Want to Do in New Jersey

new-jersey

Might I add “Visit the Axis Group headquarters“? Just kidding, New Jersey people. It was good seeing you last week.

Very amusing (though occasionally vulgar) site.

  • Share/Save/Bookmark

No Comments

Book Review: Now You See It

nysi_cover_small

At the beginning of June, Stephen Few released Now You See It: Simple Visualization Techniques for Quantitative Analysis, his first book since 2006’s Information Dashboard Design. Divorced from the strict context of dashboards, it focuses on fundamental techniques for presenting data for analysis. Here is his description from the book cover:

Before you can present information to others, you must know its story. “Now You See It: Simple Visualization Techniques for Quantitative Analysis” teaches simple, fundamental, and practical techniques that anyone can use to make sense of numbers. These techniques rely on something that almost everyone has—vision—using graphs to discover trends, patterns, and exceptions that reside in quantitative information and interactions with those graphs to uncover what the discoveries mean.

Although some questions about quantitative data can only be answered using sophisticated statistical techniques, most can be answered using simple visualizations—quantitative sense-making methods that can be used by people with little statistical training. Until “Now You See It,” no book has taught the basic skills of data analysis to such a broad audience and for so many uses, even though the need is huge, critical, and rapidly growing.

For starters, this book is HUGE, with a larger footprint than Tufte hardcovers and nearly thrice the thickness of Information Dashboard Design, so do not order it, expecting to throw it in your laptop case to take your next trip. I actually laughed when I opened the box from Amazon.

It is organized into large sections on Building Core Skills for Visual Analysis and Honing Skills for Diverse Types of Visual Analysis, with a short section at the end for Further Thoughts and Hopes. The first section is like an extended introduction to data visualization vocabulary, concepts, and patterns, while the second digs into different types of analysis, like time series, part-to-whole, deviation, distribution, and correlation. The structure is logical, and the book flows well, as a result.

Every chapter is beautifully presented and rich with examples that both illustrate Few’s points and help you remember them. Absent the emphasis on dashboards, he has the opportunity to delve deeply into visual representations that are not necessarily well-suited for the precious real estate of executive information systems. So if you have read IDD, don’t worry - there is not a lot of repeat information.

It’s likely that several of the techniques will jump off the page as being applicable to data you have been studying for a long time, but you just have never thought to look at them in the ways described in the book. As you read, it’s difficult to resist the temptation to go to your computer to play, spinning your data to look at them differently.

As you have no doubt heard Few opine, many popular Business Intelligence tools do not possess the out-of-the-box capabilities to present data how he would like, so it was fun to read about some slightly more unusual chart types and figure out how to create them in BI applications that I use. At this point, I don’t think any single piece of software can be expected to encompass every possible representation, though most of the examples in the book can be approximated in Excel.

whisker-plot
QlikView whisker plot

I strongly recommend Now You See It to anybody for whom analyzing data is a part of their jobs. I finished reading it months ago, but can still list, off the top of my head, several lessons and ideas I got from the book. Though some of the topics discussed, like geo-spatial analysis, may not be relevant for the data you use at work or the type of analysis you are capable of conducting with your current collection of tools, I can carry on a fifteen-minute conversation about something as universal as usage of bar charts.

For a more representative preview, Few has published an excerpt from Chapter 5, Analytical Techniques and Practices, on his site:

Solutions to the Problem of Over-Plotting in Graphs (PDF)

Several other Visual Business Intelligence Newsletters from 2008 and later also contain lessons and examples that appear in the book.

Bonus: A Graph Design I.Q. Test has been added to the Perceptual Edge site. If you read that site or Few’s books - or even this blog - you should do well.

  • Share/Save/Bookmark

No Comments

QlikView mentioned in latest Stephen Few paper

Fundamental Differences in Analytical Tools: Exploratory, Custom, or Customizable (PDF)

Excerpt from the September/October 2009 Visual Business Intelligence Newsletter, by Stephen Few, linked above:

Customizable Analytics Requirements

To build custom analytical applications, you need programming power. The tool ideally exhibits the following characteristics:
•  Provides the means to develop an application that supports precisely what’s needed in the most effective way possible. This requires a high degree of programmability, both in terms of power and flexibility.
•  Provides ready-made libraries of useful functions that can be easily plugged into the application with much less effort than it would take to build them from scratch.
•  Easy and efficient to use by those who develop the applications.
•  Provides the means to remove everything from view in the ?  nished application that isn’t needed.
•  Provides the means to guide the analyst step by step through the process.
•  Provides the means to coach the user through the process with instructions and examples, as needed.

One of the products that I’ve seen that seems to do this fairly well is QlikView. You don’t need to be a professional programmer to work with QlikView. Most of what you need exists as ready-made widgets (for example, particular charts with built-in functionality) that can be easily plugged into the developing application and much of the customization is done by selecting the appropriate parameters from lists that are found in dialog boxes. Programming code might need to be written, but it’s the exception, not the rule.

When you’re developing a custom analytical application, you don’t mind wading through lists of parameters in dialog boxes or writing a little code. Unlike the process of analysis itself when you must remain immersed in thinking about the data without distraction, these steps are less disruptive to developers. Although even developers benefit from programming interfaces that keep them focused on the task at hand, what they need most is the ability to do everything that’s needed, precisely and efficiently. Writing code in this case isn’t a distraction, it’s the task itself.

Tools such as QlikView are often handy because they have much of the infrastructure that is often needed for data analysis built right into the product, relieving us of the task of creating it, which in some cases would be virtually impossible. For example, QlikView includes a powerful in-memory management infrastructure that makes it possible for data to be manipulated at extremely fast speeds. This is powerful, because when you move a slider control to filter 100,000 rows of data or you drill from the country to the state level, you want the results of that action to appear without delay.

Please check out the rest of the paper, or subscribe to Stephen Few’s newsletter here.

  • Share/Save/Bookmark

No Comments