Becoming a legend by removing legends

Pertinent observations from a re-reading of Eugene Wei's classic post on Amazon and data visualisation

Sep 13, 2024

The more perceptive of you will know that this post is a reference to Eugene Wei’s classic “remove the legend to become one”. I don’t know for how manyeth time I’m reading that, but it always shows up some new insights about analytics, business, data interpretation, visualisation and all related areas.

If you are remotely interested in any of the above topics (or even about Amazon’s history), you should read the article in full. In any case, here are my pertinent observations from my current reading of the post (I’m sure each time I read, the set of observations / insights I draw will be different).

On CFOs

Recently, a friend who has worked closely with CFOs told me that “there are three kinds of CFOs. A vast majority just controllers, interested in exact numbers and keeping down costs. Then there are those who are good at fund raising and other investor relations. Only a small number, the best, are good at making financial decisions that truly make sense for the business”.

From this blogpost, it looks like Amazon, in its early days, had one of the last kind:

I'm convinced that because Joy [Covey] knew every part of our business as well or better than almost anyone running them, she was one of those rare CFO's that can play offense in addition to defense. Almost every other CFO I've met hews close to the stereotype; always reigning in spending, urging more fiscal conservatism, casting a skeptical eye on any bold financial transactions. Joy could do that better than the next CFO, but when appropriate she would urge us to spend more with a zeal that matched Jeff's. She, like many visionary CEO's, knew that sometimes the best defense is offense, especially when it comes to internet markets, with their pockets of winner-take-all contests, first mover advantages, and network effects.

Tuftefying Graphs

Edward Tufte is a legend when it comes to data visualisation. I have copies of each of his books on the topic. And unsurprisingly, via Eugene Wei, he seems to have had a major impact on how data is presented in meetings in Amazon.

The blogpost has a detailed section where Wei takes some toy data to produce a graph according to Excel defaults, and then beautifies it according to Tufte’s principles. The title of the post (about “removing the legend”) comes from here.

The problem with a legend is that it asks the user to bounce their eyes back and forth from the graph to the legend, over and over, trying to hold what is usually some color coding system in their short-term memory.
Look at the chart above. Every time I have to see which line is which country, I have look down to the legend and then back to the graph. If I decide to compare any two data series, I have to look back down and memorize two colors, then look back at the chart.

It is interesting that even in 2024, not many of the popular charting packages offer intuitive ways to add data labels directly to graphs, rather than using legends.

CFOs’ need for detail

Back to Joy Covey, there is this interesting tidbit on why data labels are marked directly on the graphs in Amazon’s standard formats. It is to do with her need for precise numbers. In fact, this is something I’ve faced as well, in terms of my interactions with finance leaders - they care a LOT about precision, so you better label the data points directly.

At some point, no set of principles is one size fits all, and as the communicator you have to make some subjective judgments. For example, at Amazon, I knew that Joy wanted to see the data values marked on the graph, whenever they could be displayed. She was that detail-oriented. Once I included data values, gridlines were repetitive, and y-axis labels could be reduced in number as well.

Related to this - a few people have told me that we should target CFOs’ offices for Babbage Insight. My main worry about trying to sell to CFOs is their need for accuracy - what do you suggest?

Direct labelling of interesting happenings

Last month, while writing about offering Babbage as a service, I wrote:

Think about your current analytics workflow - right now you either have highly rigid repeatable metric calculations, which appear in your dashboard; or entirely bespoke analyses, for which you need to rely on your analytics team.
It doesn’t all have to be this way - a lot of the currently “bespoke” questions answered by the analytics team are predictable, if only one were to look at the data in the dashboard. Technology available in 2024 means that this is actually a repeatable process, one that can be done by an AI data analyst - or a pair of bots that together function as an AI data analyst.

Whenever the graph shows an interesting story, it is natural for the viewer to ask the question on what happened. Current workflow is to ask an analytics team, or to dig deep into some dashboard, to find what happened. Instead, if the story were available right there, it would make for significantly superior communication.

Wei writes:

For sharp changes, like an anomalous reversal in the slope of a line graph, I often inserted a note directly on the graph, to anticipate and head off any viewer questions. For example, in the graph above, if fewer data series were included, but Greece remained, one might wish to explain the decline in health expenditures starting in 2008 by adding a note in the plot area near that data point, noting the beginning of the Greek financial crisis (I don't know if that's the actual cause, but whatever the reason or theory, I'd place it there).

And includes this helpful example:

Charts beget charts

I’ve written too many times here about why companies get lost in a deluge of data - you notice something interesting this month based on some bespoke analysis, and want that number to be always available, and you get it added to the dashboard. Soon the dashboard bloats up.

Wei writes that “some charts beget further charts”. The job of an intelligent analyst is to dynamically figure out which of these further charts are interesting, and include precisely those. And this is the kind of intelligent analyst we’re building at Babbage.

Line charts are the OG

I’ve said this before - it is a pity that when they teach us graphing in school (I was ~11 when I first learnt it), they start with bar graphs. And soon graduate to that monstrosity - pie charts. From a business perspective, if we were to decide that Babbage is only going to produce one kind of visualisation, we will simply use the line chart. There are no two ways about it.

Wei writes:

A good line graph is a fusion of right and left brain, of literacy and numeracy. Just numbers alone aren't enough to explain the truth, but accurate numbers, represented truthfully, are a check on our anecdotal excesses, confirmation biases, tribal affiliations.

And he has a nice checklist (to long to copy here, so I’ll include just a few) on best practices on using line graphs.

Don't include a legend; instead, label data series directly in the plot area. Usually labels to the right of the most recent data point are best. Some people argue that a legend is okay if you have more than one data series. My belief is that they're never needed on any well-constructed line graph.
Use thousands comma separators to make large figures easier to read
Related to that, never include more precision than is needed in data labels. For example, Excel often chooses two decimal places for currency formats, but most line graphs don't need that, and often you can round to 000's or millions to reduce data label size. If you're measuring figures in the billions and trillions, we don't need to see all those zeroes, in fact it makes it harder to read.
Format axis labels to match the format of the figures being measured; if it's US dollars, for example, format the labels as currency.

In closing

Today’s reading of “remove the legend to become one” was prompted by this article by Cedric Chin on Amazon’s WBR process, which links to this. That one is an interesting article as well, though it keeps talking about the “612 graph”, and a number of Amazonians (and ex-Amazonians) I’ve asked haven’t even heard of the concept!

Anyway, here is what a 612 graph looks like. Sadly, despite all of Eugene Wei’s work, this includes a legend:

Maybe that’s why none of my Amazonian friends know it - it’s simply not legendary.

Art of Data Science

Discussion about this post