Archive for category Business Intelligence

Two excellent posts about *just* data versus actual analysis

I’ve had these bookmarked for a while and periodically revisit them. They are worth checking out.

Juice Analytics: Filling the gap between reporting and Reporting
Avinash Kaushik: Consultants, Analysts: Present Impactful Analysis, Insightful Reports

  • Share/Save/Bookmark

No Comments

Book Review: Now You See It

nysi_cover_small

At the beginning of June, Stephen Few released Now You See It: Simple Visualization Techniques for Quantitative Analysis, his first book since 2006’s Information Dashboard Design. Divorced from the strict context of dashboards, it focuses on fundamental techniques for presenting data for analysis. Here is his description from the book cover:

Before you can present information to others, you must know its story. “Now You See It: Simple Visualization Techniques for Quantitative Analysis” teaches simple, fundamental, and practical techniques that anyone can use to make sense of numbers. These techniques rely on something that almost everyone has—vision—using graphs to discover trends, patterns, and exceptions that reside in quantitative information and interactions with those graphs to uncover what the discoveries mean.

Although some questions about quantitative data can only be answered using sophisticated statistical techniques, most can be answered using simple visualizations—quantitative sense-making methods that can be used by people with little statistical training. Until “Now You See It,” no book has taught the basic skills of data analysis to such a broad audience and for so many uses, even though the need is huge, critical, and rapidly growing.

For starters, this book is HUGE, with a larger footprint than Tufte hardcovers and nearly thrice the thickness of Information Dashboard Design, so do not order it, expecting to throw it in your laptop case to take your next trip. I actually laughed when I opened the box from Amazon.

It is organized into large sections on Building Core Skills for Visual Analysis and Honing Skills for Diverse Types of Visual Analysis, with a short section at the end for Further Thoughts and Hopes. The first section is like an extended introduction to data visualization vocabulary, concepts, and patterns, while the second digs into different types of analysis, like time series, part-to-whole, deviation, distribution, and correlation. The structure is logical, and the book flows well, as a result.

Every chapter is beautifully presented and rich with examples that both illustrate Few’s points and help you remember them. Absent the emphasis on dashboards, he has the opportunity to delve deeply into visual representations that are not necessarily well-suited for the precious real estate of executive information systems. So if you have read IDD, don’t worry - there is not a lot of repeat information.

It’s likely that several of the techniques will jump off the page as being applicable to data you have been studying for a long time, but you just have never thought to look at them in the ways described in the book. As you read, it’s difficult to resist the temptation to go to your computer to play, spinning your data to look at them differently.

As you have no doubt heard Few opine, many popular Business Intelligence tools do not possess the out-of-the-box capabilities to present data how he would like, so it was fun to read about some slightly more unusual chart types and figure out how to create them in BI applications that I use. At this point, I don’t think any single piece of software can be expected to encompass every possible representation, though most of the examples in the book can be approximated in Excel.

whisker-plot
QlikView whisker plot

I strongly recommend Now You See It to anybody for whom analyzing data is a part of their jobs. I finished reading it months ago, but can still list, off the top of my head, several lessons and ideas I got from the book. Though some of the topics discussed, like geo-spatial analysis, may not be relevant for the data you use at work or the type of analysis you are capable of conducting with your current collection of tools, I can carry on a fifteen-minute conversation about something as universal as usage of bar charts.

For a more representative preview, Few has published an excerpt from Chapter 5, Analytical Techniques and Practices, on his site:

Solutions to the Problem of Over-Plotting in Graphs (PDF)

Several other Visual Business Intelligence Newsletters from 2008 and later also contain lessons and examples that appear in the book.

Bonus: A Graph Design I.Q. Test has been added to the Perceptual Edge site. If you read that site or Few’s books - or even this blog - you should do well.

  • Share/Save/Bookmark

No Comments

QlikView mentioned in latest Stephen Few paper

Fundamental Differences in Analytical Tools: Exploratory, Custom, or Customizable (PDF)

Excerpt from the September/October 2009 Visual Business Intelligence Newsletter, by Stephen Few, linked above:

Customizable Analytics Requirements

To build custom analytical applications, you need programming power. The tool ideally exhibits the following characteristics:
•  Provides the means to develop an application that supports precisely what’s needed in the most effective way possible. This requires a high degree of programmability, both in terms of power and flexibility.
•  Provides ready-made libraries of useful functions that can be easily plugged into the application with much less effort than it would take to build them from scratch.
•  Easy and efficient to use by those who develop the applications.
•  Provides the means to remove everything from view in the ?  nished application that isn’t needed.
•  Provides the means to guide the analyst step by step through the process.
•  Provides the means to coach the user through the process with instructions and examples, as needed.

One of the products that I’ve seen that seems to do this fairly well is QlikView. You don’t need to be a professional programmer to work with QlikView. Most of what you need exists as ready-made widgets (for example, particular charts with built-in functionality) that can be easily plugged into the developing application and much of the customization is done by selecting the appropriate parameters from lists that are found in dialog boxes. Programming code might need to be written, but it’s the exception, not the rule.

When you’re developing a custom analytical application, you don’t mind wading through lists of parameters in dialog boxes or writing a little code. Unlike the process of analysis itself when you must remain immersed in thinking about the data without distraction, these steps are less disruptive to developers. Although even developers benefit from programming interfaces that keep them focused on the task at hand, what they need most is the ability to do everything that’s needed, precisely and efficiently. Writing code in this case isn’t a distraction, it’s the task itself.

Tools such as QlikView are often handy because they have much of the infrastructure that is often needed for data analysis built right into the product, relieving us of the task of creating it, which in some cases would be virtually impossible. For example, QlikView includes a powerful in-memory management infrastructure that makes it possible for data to be manipulated at extremely fast speeds. This is powerful, because when you move a slider control to filter 100,000 rows of data or you drill from the country to the state level, you want the results of that action to appear without delay.

Please check out the rest of the paper, or subscribe to Stephen Few’s newsletter here.

  • Share/Save/Bookmark

No Comments

The Hawthorne Effect: soft benefit of BI?

An excerpt from The Nike Experiment, in the most recent issue of Wired:

In the mid-1920s at Western Electric’s manufacturing plant in Cicero, Illinois, the management began an experiment. The lighting in an area occupied by one set of workers was increased so there was better illumination to help them see the telephone relays they were building. Perhaps not surprisingly, workers who had more light were able to assemble relays faster.

Other changes were then made: Employees were given rest breaks. Their productivity increased. They were allowed to work shorter hours. Again, they were more efficient during those hours.

But then something weird happened. The lighting was cut back to normal … and productivity still went up. In fact, just about every change the company made had only one effect: increased worker productivity. After months of tinkering, the work conditions were returned to the original state, and workers built more relays than they did in the exact same circumstances at the start of the experiment.

What was happening? Why was it that no matter what the Hawthorne plant managers did, the workers just performed better? Researchers puzzled over the results, and some still doubt the details of the experiment’s protocols. But the study gave rise to what’s known in sociology as the Hawthorne effect.

The gist of the idea is that people change their behavior—often for the better—when they are being observed (which is why it’s sometimes called the observer effect). Those workers at Western Electric didn’t build more relays because there was more or less light or because they had more or fewer breaks. The Hawthorne effect posits that they built more relays simply because they knew someone was keeping track of how many relays they built.

It can already be difficult to quantify the exact ROI of Business Intelligence, but imagine the potential increase in service quality from implementing a system in a call center that allows a company to study detailed metrics down to the person level.  According to the Hawthorne Effect, there may be improved performance not just from optimizing the number of staff working at a peak times or identifying call topics with long average call times that may indicate a need for additional training, but simply from the employees knowing that they can be more effectively measured and, in turn, held accountable.  How does one quantify that?

wired-709

Very interesting stuff.  Wired is already my favorite magazine, and the focus of several articles (and the cover) of the latest issue is data and measurement, so I highly recommend it.

  • Share/Save/Bookmark

No Comments

“In God we trust; all others must bring data.” W. Edwards Deming

I came across that Deming quote in a book I’m currently reading, Competing on Analytics: The New Science of Winning.

I had never really heard of Deming, but he is certainly an interesting character, and several of his ideas about systems and management are easily applicable to Business Intelligence and Data Warehousing projects.  For instance, his 14 Points for Management, from Out of the Crisis:

  1. Create constancy of purpose toward improvement of product and service, with the aim to become competitive and stay in business, and to provide jobs.
  2. Adopt the new philosophy. We are in a new economic age. Western management must awaken to the challenge, must learn their responsibilities, and take on leadership for change.
  3. Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place.
  4. End the practice of awarding business on the basis of price tag. Instead, minimize total cost. Move towards a single supplier for any one item, on a long-term relationship of loyalty and trust.
  5. Improve constantly and forever the system of production and service, to improve quality and productivity, and thus constantly decrease cost.
  6. Institute training on the job.
  7. Institute leadership. The aim of supervision should be to help people and machines and gadgets to do a better job. Supervision of management is in need of overhaul, as well as supervision of production workers.
  8. Drive out fear, so that everyone may work effectively for the company.
  9. Break down barriers between departments. People in research, design, sales, and production must work as a team, to foresee problems of production and in use that may be encountered with the product or service.
  10. Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the work force.
  11. a.) Eliminate work standards (quotas) on the factory floor. Substitute leadership. b.) Eliminate management by objective. Eliminate management by numbers, numerical goals. Substitute workmanship.
  12. a.) Remove barriers that rob the hourly worker of his right to pride of workmanship. The responsibility of supervisors must be changed from sheer numbers to quality. b.) Remove barriers that rob people in management and in engineering of their right to pride of workmanship. This means, inter alia, abolishment of the annual or merit rating and of management by objective.
  13. Institute a vigorous program of education and self-improvement.
  14. Put everyone in the company to work to accomplish the transformation. The transformation is everyone’s work. “Massive training is required to instill the courage to break with tradition. Every activity and every job is a part of the process.”

deming

Check out his lengthy Wikipedia entry for more on his concepts and philosophies.

W. Edwards Deming

  • Share/Save/Bookmark

2 Comments

The book on sparklines

sparkline

Sparklines, a term coined by Edward Tufte, are becoming increasingly popular in Business Intelligence software.  Some applications, like Excel (through various add-ins) and QlikView (starting in version 9.0), have the ability to make them, out of the box, while they can be created elsewhere, like Xcelsius, with a bit of creativity.

You’ve likely seen them before, but do you know when it is appropriate to use them?  They’re not to be thrown around just because all of the cool data visualization kids are using them.

The background, from Wikipedia:

The term ‘Sparkline’ was proposed by Edward Tufte for “small, high resolution graphics embedded in a context of words, numbers, images.” Tufte describes sparklines as “data-intense, design-simple, word-sized graphics“. Whereas the typical chart is designed to show as much data as possible, and is set off from the flow of text, sparklines are intended to be succinct, memorable, and located where they are discussed.

The clearest and most instructive examples, not surprisingly, can be found in one of Tufte’s books, Beautiful Evidence.

tufte-sparkline

Pictured components

  • Line representing the last n data points
  • Data point for most recent reading highlighted in red
  • Value of most recent reading in corresponding red type
  • Name of metric
  • Acceptable/normal range as gray, shaded area

Another example of his incorporates lows and highs over the period represented:

high-low

(Note that, while the horizontal axis is not labeled, the 12 months header indicates the time period being displayed.)

There isn’t a single pixel wasted on meaningless or redundant data, embodying Tufte’s data-ink ratio.  Another way in which he is practicing what he preaches is that all of the data related to each metric is in close proximity, not requiring repeated references to scattered information.  Of course, those are Tufte’s specs, and different BI companies and the people who have created custom sparkline components may choose to implement them differently.

If you’re looking for guidance on the best way to apply them in your applications, I like how Stephen Few succinctly puts it: “Think of them as an enhanced, much more informative substitute for the trend arrows that often appear on dashboards.”

For only marginally more space than a trend marker, sparklines provide significantly more information and paint a more complete picture than simple up/down or green/red indicators.  The lack of context surrounding trend indicators leaves open the possibility that a positive indicator represents a minuscule uptick at the end of a significant and long-term drop.  In other words, when you look at your dashboard for the day and see a green, up arrow for margin %, that means margin % has improved in the most recent period, while it could still be down for the week, month, quarter, or year (Few explains something similar on page 140 of Information Dashboard Design).

While the line obviously represents some period of time, the horizontal, dimensional axis is not labeled.  In fact, neither axis is.  The reason is that sparklines are meant to show trends and comparisons, not detailed values, like standard line graphs.  This helps explain why they are not a substitute for the standard line graph, which can more easily compare multiple dimensions or multiple measures with greater precision.

And don’t forget that the line chart is but one type of sparkline.  This image from Juice Analytics shows a catalog of examples from one Excel add-in (some of which are at least mildly objectionable, in my opinion):

sparklinegallery

Finally, see this thread on Edward Tufte’s message board for the single longest conversation about sparklines since the dawn of time.

  • Share/Save/Bookmark

No Comments

The quotable Stephen Few

The nice thing about Stephen Few is that, as he is not beholden to any software companies, he can be blunt in his appraisals of the programs we know and love (and hate).  Here are a few gems:

“Just for fun, I decided to go all out and take advantage of the one other visual design option that Graphwise offers: the ability to put an image in the background of the graph, which they call a watermark. From the many pictures of animals, buildings, furniture, etc., I decided to dress up the arctic cool version of my graph by appropriately pairing it with a penguin.  I particularly like how I was able to make the penguin’s beak reach for the high value of 100,000. This might look cool (arctic cool, even) , but it is an example of dysfunctionality at its worst.”  [In what respect is this venture wise?]

graphwise-figure-_9

“Try to decipher the patterns and values in the following chart. Come on, give it your best shot. Even if I offered a cash prize to anyone who managed to come close, it wouldn’t be worth your effort to try, because you’d be forced to use the prize money to pay a doctor to fix the damage done to your eyes.” [Dysfunction at its finest]

step-chart

“…Here’s a radar chart that you could use to compare the performance of three products across eight years of time. Did you know that time is circular and that in the year 2007 we have returned to where we began in 1999? Despite this revelation, I’m finding it hard to relinquish my notion that time is linear and my desire to see this information in a simple line graph.” [Dysfunction at its finest]

radar-chart

“A vendor that claims to be the best, which this one unabashedly claims (just like every other major BI vendor), should be ashamed of selling such moronic products. Don’t reward them for irresponsible work—products that assume their customers are halfwits—by wasting your money on them.” [Fast track to nowhere]

“…Don’t insult the intelligence of the business intelligence community by gluing a carrot on the head of a goat and calling it a unicorn. That only works at carnivals for children and drunks.” [Newsflash: BI discovers the obvious]

This is not a knock on him; I’m a little jealous.  I think his books are fantastic (and have the new one on preorder), thoroughly enjoyed his keynote at the QlikView partner conference last year, and have no doubt of his objectivity.  He’s doing his job.  Mine is to communicate data effectively…even with some of the tools he is referencing in those quotes.  It is possible, even if it cannot be found in the vendors’ sales material or default visualization settings.

  • Share/Save/Bookmark

No Comments

Continuing education in Business Intelligence

I just read an interesting article on the BeyeNETWORK from Richard Herschel, a professor at St. Joseph’s University, about what a graduate Business Intelligence program might entail.  Here’s the outline (see the article for the full details):

  1. Introduction to Business Operations
  2. Developing Decision-Making Competencies
  3. Concepts and Practice of DSS Modeling
  4. Database Management Theory and Practice
  5. Enterprise Data
  6. Applied Business Intelligence
  7. Advanced Business Intelligence
  8. Critical Performance Measurement
  9. Advanced Business Intelligence II
  10. Management Issues in Business Intelligence

I don’t know about the rest of you, but for a long list of reasons, I will not be enrolling in graduate school anytime soon.  That said, there is a free initiative called the OpenCourseWare Consortium that allows you to access selected course material from several colleges and universities and study them on your own time.

An OpenCourseWare is a free and open digital publication of high quality educational materials, organized as courses. The OpenCourseWare Consortium is a collaboration of more than 200 higher education institutions and associated organizations from around the world creating a broad and deep body of open educational content using a shared model. The mission of the OpenCourseWare Consortium is to advance education and empower people worldwide through opencourseware.

It might be hard to argue that OCW would be a substitute for real classroom study, but they could come in handy if you need to acquire a certain skill for a new role or just want to brush up on something.  Among the available courses,

Here’s another course of interest from MIT, with the accompanying image from the course page:

Street-Fighting Mathematics
street_fight

Amusingly, there is exactly one class available on OCW that I actually took in college: Ancient Wisdom and Modern Love (Philosophy 202).

  • Share/Save/Bookmark

No Comments

BI/DW and SaaS Stock Indexes

Rick Sherman of the Data Doghouse has compiled Google spreadsheets tracking Business Intelligence/Data Warehousing- and Software as a Service-related stocks.  Perhaps not surprisingly, they are faring better than the market as a whole.

(If you’re unfamiliar with the GoogleFinance functions, these spreadsheets actually refresh, unlike the ones on your desktop.)

I am not surprised for a few reasons.

  • Business Intelligence is frequently cited as the top technological initiative in organizations, regardless of ups and downs in the market
  • Well-run data warehousing implementations have high ROI and can help identify “found money”
  • Business Intelligence expedites decisionmaking and empowers decisionmakers to find their own answers, which are critical in a volatile market
  • SaaS is ostensibly a way to get BI for your company without adding significantly to your staff or spending money to train your existing staff on new technologies
  • Perhaps there is some even anticipation of stimulus money making its way to these companies in the IT-related initiatives (enormous stimulus visualization from the Washington Post here)

Nice work, Rick.

  • Share/Save/Bookmark

No Comments

Mint.com: secretly exposing people to BI and Data Warehousing concepts

If you’re like me, explaining to friends, strangers, and your parents what you do for a living can be an arduous task.  (No, I don’t store different companies’ data in a warehouse somewhere, nor do I fix people’s computers.)  When I recently registered with Mint.com, however, I saw parallels that could make understanding what Business Intelligence and Data Warehousing are and how they benefit companies simpler, because what is Mint if not an online, personal finance data warehouse?

Staging disparate data in one place

What’s your net worth? Just add your checking and savings account balances, plus the values of any investments you have - 401(k), IRA, individual stocks - and subtract the current balances of your credit cards, mortgage, student loans, car loans, bookie, and any other debts you may have. That’s about a dozen web sites I would have to visit (and countless passwords to remember) to calculate my current net worth, and it’s actually easier for me, because I have gone to the trouble of setting up online access to those accounts. Otherwise, I would have to peruse dated, tree-killing statements to retrieve numbers that change daily. And as timeconsuming as the task is, periodically repeating the process doesn’t make it any faster or easier.

Mint allows you to register all of your accounts in one place - checking and savings, investments, loans, and credit cards - and refresh the balances on demand, even keeping track of historical values. Any question you have about your finances can be answered almost instantly, for faster, better informed decisions.

For a company - a commercial bank, for instance - the equivalent would be tracking multiple lines of business: deposit accounts, loans, credit cards, wire transfers, etc. Without a data warehouse, they would be adding the bottom lines of each of them manually, in a spreadsheet, the same way you would calculate your net worth, despite the fact that they have a few billion more in assets. It’s a tedious process that is dangerously prone to error and not conducive “drilling” into the data, i.e. conducting detailed analyses of numbers that stand out to the report recipients, because the output is anything but dynamic.

Data warehouses can contain a company’s data from multiple source systems, plus custom data, like goals and projections, staged in a single place and organized in a way that is conducive to getting the data back out in the form of reports or dashboards.  They keep historical data, and new rows from the source systems are added when the warehouse is refreshed, usually daily or weekly.  At that point, the reports and dashboards can be smoothly updated, as they sit on a predictable data structure.

Data cleansing

How good is the data in Mint? It makes its best guess as to how to categorize your transactions, but if you want to use its budgeting capabilities, they had better be pretty accurate. For instance, you may think you have already exceeded your entertainment budget for the month, but then come to realize that your cable/internet bill was mistakenly classified in the Entertainment category.  Mint gives you the ability to reclassify that transaction as well as having that rule persist for future transactions with the same description. Transaction Description

For data warehouses, data quality is most always problematic. Transactions need to have valid reference data so they can be properly classified, addresses in customer or property data must be validated, and other cleansing/special rules must be done with data cleansing tools or freehand code. In practice, when creating a data warehouse, this can be a lengthy process, as all parties and departments must agree on data definitions, what is valid, how metrics are to be computed, and against what those metrics will be compared, e.g. last year, last month, goals, projections, etc.  And every department has a set of rules regarding their data that only they know - “these transactions don’t count towards the total, those transactions are treated differently” - like the ability in Mint to tag rows as reimbursable, tax-related, etc.  Fitting everything together within a predictable, repeatable, accepted framework is one of the most challenging aspects of data warehousing.  Remember that the processing will ultimately be done by computers, and a human cannot feasibly eyeball every transaction (not in any budgets I’ve seen).

Data visualization, Key Performance Indicators (KPIs), and alerts

Now that your data is all clean and organized, you can confidently look at it.

The main page is like a dashboard, showing your high-level balances, current amounts spent relative to targets, and any alerts you may have triggered for exceeding those targets.

A BI/DW term commonly used to describe these is KPIs. A metric is anything that is being measured (total dollars spent), while a Key Performance Indicator is a metric with the added context of being measured against a specific target (amount of budget spent).

mint-alerters

A trending page has more spending details, as well as your spending history.  Oddly, the “trend” page largely utilizes pie charts – a poor choice for displaying trends (PDF).

The transactions tab, pictured in the Data Cleansing section of this post, contains the row-level data.

mint-trends

An investments page shows how your current holdings are tracking relative to several common indices. As you can see, I have been killing the market the last few months, only losing about 20% of the total value of my portfolio.

mint-investments

Some screenshots from Lifehacker

The functions are very familiar to BI tools: a high-level dashboard, various visual representations to help identify trends and outliers, and the ability to access row-level transactions.  This is the realm where I spend much of my work time, personally.

A typical bank might have a dashboard containing high level information about revenue, profit, average balances, and new/lost customers, measured against targets for the current year to date and quarter to date.  It may also have information about certain types of customers, regions, or services and products offered.  A mature data warehouse might even have information about that bank compared to competitors or the market, but getting feeds of that data can be difficult, just as getting people to agree on targets is.  Reports contain the granular data needed to investigate notable findings in the dashboard.  Depending on the tool, some reports are dynamic, allowing for drilling and other interactivity, while others are static.

A bit more on KPIs in both Mint and BI tools: in both, the user can set rules that, if violated, trigger alerts that notify the appropriate parties.  In Mint, if you exceed a budget, you can be notified via email or text message.  Similarly, most BI tools have the ability to send notifications (email is most common) if business rules are violated, thereby enabling better exception management.

Data moves in one direction

When Mint alerts you that a credit card bill is due, it does not provide you the functionality to pay it, and you can’t reallocate your portfolio through their interface upon viewing your atrocious recent investment performance.  You merely decide what to do by viewing their site, with no ability to manipulate the data.

Similarly, data in a warehouse only travels in one direction: from source systems into the warehouse, and then from the warehouse to various end user applications (reports, dashboards).  It is extremely uncommon that an end user would be able to modify the data that they are viewing, as would be the case with data stored in a spreadsheet.  The warehouse is the standard, and end users cannot write data back to it.  In short, a warehouse expedites decisionmaking, but does not facilitate it.

Mint can do more than I have enumerated here, as can a warehouse, and I’m sure I’ve missed some similarities, but they do have quite a bit in common.  Both ideally grant the user better, more accurate, more timely data, affecting behavior positively and leading to faster, better informed decisions.  One major area in which they part ways, however, is that Mint is free.

  • Share/Save/Bookmark

,

1 Comment