Afterword

ThimphuTech was the first technology blog in Bhutan. We started writing it in 2009, just as broadband and mobile internet started to take off. (Although internet in Bhutan was launched in 1999, it was either super-slow or super-expensive, and was only used by a selected few).

In the blog, we wrote about technology and food, but also about plenty of other stuff. The blog became popular and influential in Bhutan. A companion bi-weekly column -- Ask Boaz -- was published for many years in the Kuensel, Bhutan's national newspaper. (The complete Kuensel columns are available as an ebook, Blogging with Dragons).

We stopped updating the blog when we left Bhutan in 2014, but the information within the posts can still prove useful, and thus we decided to keep it online.

We thank all our readers.
Tashi Delek,
Boaz & Galit.
Showing posts with label Data Visualization. Show all posts
Showing posts with label Data Visualization. Show all posts

Thursday, April 25, 2013

Visualize the NC elections results

This week's National Council elections results were reported in this form or the other by the media and by the Election Committee of Bhutan. Most of the results were reported one dzongkhag at a time, which made it difficult to see the overall picture. Using the numbers from the ECB's report (which were presented in one big table), I created two dashboards, each telling a story. One about voter percentage and the other about winner share of votes.

Dashboard 1: Voter Participation by Dzongkhag

I computed the per-dzongkhag percentage of votes (= total votes for all candidates / registered voters). Some dzongkhags clearly stand out! To focus on a particular dzongkhag, click on the map, bar-chart or scatter plot.


Dashboard 2: Winner's Share of Votes

The map shows the % of votes that the winner received from the total number of votes in his dzongkhag. Some winners received a very large majority, while others did not. We'd expect that the share would be affected by the number of candidates: the more candidates, the lower the winner's share. While this is the case in most dzongkhags, notice an interesting exception: Samtse's Sangay Khandu won almost 49% of the votes despite having the largest number of competing candidates. Also, in Dagana and Trashigang, which had a single contestant, each received more than 90% ("yes") votes, which is much higher than even dzongkhags with two contestants.

These dashboards were created with the free Tableau Public tool. It would be great to see the media and other organizations create similar visualizations and interactive dashboards for the upcoming elections.

Sunday, April 14, 2013

Resist the temptation: avoid pie charts

With the upcoming elections, numbers can reveal important information. Unfortunately, our human brains are not that great at translation lots of numbers into a coherent stories. That's why we need charts and table to summarize and preset the data for us in meaningful ways.

Yesterday's Kuensel article "Youth to the fore in Thimphu" told the story of the young candidates in Thimphu. The main visual in the article was a pie chart showing the number of eligible voters in Thimphu, broken down by gewog. The reason for presenting these numbers was to relate the chances of election to the number of registered eligible voters in that gewog ("...if the number of voters in gewogs is of any assurance [for getting elected]"). I was therefore curious to figure out the story that the chart was trying to tell. Here is what I saw:

Chart from Kuensel, April 13, 2013

The first questions that come to mind are: which are the gewogs with the highest numbers of eligible voters? The lowest? The difference between the highest and lowest? etc. We explained in many previous posts that pie charts are notoriously ineffective for conveying information and answering basic questions such as those above. 3D charts are even worse. Pie charts not only make it difficult to decipher information, but they can also lead to misinformation.

I once again diligently created a bar chart version to help us all better see the data. While doing this, I started wondering about the meaning of these numbers. How do they relate to the population size of the gewog? I went to NSB's website and grabbed 2011 gewog-wise population statistics for Thimphu. I then compared the eligible voter numbers to the population numbers. I also computed the eligible voters as a percentage of that gewog's population. Take a look at the dashboard below. Not surprisingly, Thimphu Thromde has the largest number of eligible voters (those eligible to vote in Thimphu). But, relative to its population, it has the lowest % of eligible voters (8%). Maybe this is not surprising, because so many Thromde residents are not "native". The next two gewogs with highest eligible voter numbers are Mewang and Karwang. If we compare to population numbers, the third largest gewog after Thromde and Mewang is... Chang! (sort by the red bar chart or table below it). This means that Chang, although 3rd in terms of population size in the dzongkhag, has a relatively low number of eligible voters. Finally, sorting by % of eligible voters (green bars), we see the actual percentages in each gewog, with Naro leading (91% eligible voters) and then numbers declining ten-fold when we get to Thromde (8%) .

----------------------------
-----------------
Whether the actual voter numbers tell a more important story than their percentage of the gewog population, our main point is to highlight the power of bar charts (and especially when sorted by bar length), and the inappropriateness of pie charts, not to mention 3D pie charts. Once again, we subscribe to visualization expert Stephen Few's motto "Save the Pies for Dessert".

Wednesday, March 27, 2013

Unfit data presentation

Yesterday's Kuensel article "Bhutanese are becoming fat and unfit" reported on the results of a health check up on International Happiness Day of 1,580 people. While the topic is extremely important, the article was confusing and somewhat misleading due to lack of important information about the sample and ineffective communication of the results. Here are some issues that I had trouble figuring out:

1. Who are the "1,580 people screened for lifestyle related diseases in six hours" and how were they chosen? It is important to understand whether the sample is representative of "Bhutanese", as the article title claims. Are these people who visited the hospital in Thimphu? Are these Thimphu residents only?

2. The results of this checkup are shown in the top chart. But what is this colorful 3D pie chart telling us? Here's what it is not telling us, and therefore misleading the viewer:

  • The chart only shows the results only for the 600 "unhealthy" people. The article mentions that 980 (62% of the sample) did not exhibit any of these symptoms.
  • The numbers on the pie slices are supposed to represent a percentage, not the number of people. A % sign would have made it clear.
  • But wait: the percentages in the pie chart add up to 100%. This means that each person in that pie has only one of the four symptoms  Of course, that is not true. Overweight and hypertension is a popular combination, and so is obesity and diabetes. So the picture does not tell us the correct story.
3. Reporting of numbers requires telling a story that is meaningful. When we hear statistics, we try to make sense of them by comparing them to some benchmarks or to each other. So, sentences must be coherent and meaningful. For example, what does the following sentence tell you, and how does this information match the pie chart?
"Of the 600 unhealthy, 51 percent were men and 61 of them were diabetic."
4. A chart based on the (2008  or 2010?) GNH survey is attached right below the Happiness Day checkup pie chart. But I only understood that the two charts are from different data sources (and periods) after spotting the tiny text "(courtesy: GNH survey)" and reading the article carefully.

5. But wait: While the chart is trying to tell the story of obesity by age group, it conceals and distorts more than it reveals:

  • The age groups are ordered from young to old in the legend, but from old to young on the chart. Confusing! So those longer pink, green, and red noodles do not correspond to older folks, but rather to the younger ones. In this type of (highly unrecommended) chart it is already difficult to match age groups with obesity rates, so why make it even more confusing by reversing the order?
  • What do the numbers on the dial represent? the title reads "proportion", but these are more likely percentages.
  • It is difficult for our brains to compare the lengths of bending noodles. Try this: Which age group has a higher obesity rate, 46-50 or 61-65?
  • Are the 56-60 year olds extremely healthy compared to everyone else (with nearly 0 obesity rate) or maybe their data is unfit?
In short: A simple bar chart with straight bars, one for each age-group, ordered from young to old, would have been a better choice. And larger annotation telling us that this is based on a different dataset from a different source and period.

Lastly, in order to show a trend ("becoming unfit"), I'd like to see numbers and charts showing the increase over the years.

Thursday, March 14, 2013

Two-day crash course: Effective Data Presentation (Mar 28-29)


Prof. Galit Shmueli will be conducting another round of the popular 2-day workshop Effective Data Presentation on March 28-29, 2013.

More and more data and statistics are being collected in Bhutan. They are communicated to the public through reports, presentations, newspaper articles and other media. Yet, current presentation practices are lacking in effectiveness.

Workshop participants will gain knowledge and experience with best practices for communicating data. We will use data from Bhutan and software that is accessible to many (Microsoft Excel and the Tableau Public). The workshop is a must for those who collect, analyze or present data or statistics...

For full details and registration click here.

Wednesday, November 7, 2012

Trends over time: effective and misleading line charts

Chart from Kuenselonline.com, Nov 7, 2012
Today's Kuensel's cover story "Good fences don't always make good neighbours" featured a series of three charts showing trends in cordyceps harvest, export and average price from 2004 to 2012. While line graphs are excellent for displaying trends over time, there are several guidelines that must be followed to avoid creating a misleading picture. The Kuensel charts suffer from the following issues:

  1. While the x-axis conveys years (2004 to 2012), the points are not equally spaced. For example, in the middle chart the distance between 2011 to 2012 is larger than between other pairs of neighboring years. A trend can appear much more (or less) dramatic if the time axis is not properly spaced.
  2. The three charts all use year on the x-axis. Most readers would expect the same placement of years on all charts, yet that is not the case here due to the over-stretched distance to 2012 in the Export chart.
  3. Values that appear on a line graph typically convey the y-axis value (harvest, export or average price). Including the year label just below these values is confusing and difficult to read.
  4. Using dashed lines for interpolating missing values works well. However, the extrapolation for Export in 2012 is suspect. Why is it assume to be equal to the 2011 value?
  5. It is good practice to keep the same number of decimal for all years. We automatically use the length of the number to infer its size (longer=larger). If some values include a decimal and others do not, we mistakenly infer the longer number to be larger.




I recreated the charts in Microsoft Excel using the numbers from the Kuensel charts. These charts avoid the above pitfalls -- see for yourself whether a different story emerges. 

Note that I also chose to overlay the harvest and export lines in the same chart, since they share the same units and have the same order of magnitude. Line graphs are powerful for comparing trends by overlaying multiple lines on the same chart (if the scales are different, we can use double y-axis scales or normalize all the series). 

The two sets of charts (left and right) differ only by the inclusion of values near the lines. Note that it is easier to compare trends when there are no numbers near the lines. However, if you must include values, make sure that those are only the y-axis values, and that they have the same number of decimals. 


More effective use of line charts for conveying the cordyceps trends. The trends in the charts on the left are easier to grasp, compared to the right charts, which are identical but also include values.

Monday, October 22, 2012

Visually beautiful, but scientifically ineffective

Congratulations to the new Raven magazine that was launched this month. The first issue is beautifully designed. Kudos also for reporting on a scientific survey about Pedestrian Day opinions. The reporter clearly explained issues with the previous surveys (which we wrote about in an earlier post). There is also an attempt to use charts for telling the story, and I am a big fan of charts as storytellers. However, creating effective charts requires more than Excel skills.

With all this good work, the last mile is to replace the current ineffective and confusing charts with effective ones. Below are some charts from the article on Pedestrian Day. Start from this chart. What does it teach us?


The pie chart is trying to convey the "percentage of respondents wanting Ped Day to be discontinued". A main feature of a pie chart is that its parts should add up to 100%. Yet, the percentages on the slices here do not add up to 100%, causing confusion. A careful look uncovers that the person creating the chart was trying to convey the percentage separately for each sector. If that's the case, then the pie chart is not set up correctly! You would need separate pies for separate sectors (civil servants, H-wives,...), and then each pie would have a breakdown of  that sector's opinion of Ped Day. Using a pie chart incorrectly is very misleading.

But even using a pie chart correctly is known to be ineffective at best, and misleading at worse. Our brain is not good at comparing sizes of pizza slices. With 3D pizza slices, we are even worse off! The best solution: use a bar chart.

Here's a simple bar chart of the same data, using Excel. See if you get a better idea of responses to this survey question by different sectors (by the way, might a truck driver also be "Business" or "Pvt Employee"?)



While we're on the subject of 3D charts, you can ruin a perfectly effective bar chart by turning it into a 3D bar chart, as you can see in the following charts (from the same article). I included the original 3D chart alongside an ordinary bar chart of the same data, which I created in Excel. Not only is there no need for the third dimension (it does not represent any information in these cases), but it makes the comparison between bars much more difficult on our brain.

Left: original 3D chart. Right: ordinary bar chart for the same data

Left: original 3D chart. Right: ordinary bar chart for the same data


Creating effective charts is not rocket science and does not require expensive software. It does require learning key principles and implementing them meticulously. The media should invest in training editors and reporters in effective data presentation -- not only creating effective charts, but also being able to critically assess charts presented by their sources.

Saturday, August 25, 2012

Infographics: the good, the bad, and the ugly

Infographics refers to graphic visual representations of information, and data. Local newspapers, magazines, blogs and other media in Bhutan have started joining the infographics wagon, and in particular using charts to convey a story based on data. Charts are a great way to capture the reader's attention, and if done properly, can convey an easily understandable story. As the going says "a picture is worth a thousand words".

Creating an effective chart, however, is not simple and requires understanding the basics of human perception as well as charting basics. A 'bad' chart is one that does not tell the story and is not understandable. An even worse chart ('the ugly') is one that conveys a wrong story. Here are some examples from the last week. Try to figure out the story that each of these charts are telling:

Wednesday, July 18, 2012

"Effective Data Presentation" workshop offered again: July 26-27, 2012

More and more data and statistics are being collected in Bhutan. They are communicated to the public through reports, presentations, newspaper articles and other media. Yet, current presentation practices are lacking in effectiveness.

Effective Data Presentation is a 2-day workshop (Jul 26-27, 2012) by Professor Galit Shmueli. Participants will gain knowledge and experience with best practices for communicating data. We will use data from Bhutan and software that is accessible to many (Microsoft Excel and the Tableau Public).

The workshop is a must for those who collect, analyze or present data or statistics.

For full details and registration click here.

Tuesday, May 22, 2012

Data Journalism: good, free resources

Free for download and sharing
A recent advance in global journalism is "Data Journalism", which focuses on telling a story in a more compelling way, basing it on numbers and presenting the data effectively.

The new, free and online book Data Journalism Handbook provides a starting point and reference for journalists who want to base their stories on data and present them in a compelling way. The book was assembled by many journalists from top news agencies around the world, including BBC, The New York Times, The Guardian, Financial Times and Deutsche Welle.
The handbook defines and illustrates data journalism in difference ways, including: "Data journalism can help a journalist tell a complex story through engaging infographics."

Another interesting introduction to Data Journalism can be found in the series of free videos Journalism in the Age of Data: A Video Report on Data Visualization (also available in non-Flash and Podcast versions).




Sunday, May 13, 2012

Chart Crunch Continues: say goodbye to pie charts

The Bhutanese newspaper has been presenting front-page articles on the Rupee crunch, presenting different figures to tell a story. They are to be commended for using charts to present the data. Yet, there is much to be desired in terms of the choice of charts. Yesterday's front-page article "How private consumption and credit caused the Rupee crisis" featured the following chart for showing the different sources of private consumption Rupee usage:

From www.thebhutanese.bt (May 12, 2012 issue)

In an earlier post, we discussed why pie charts are ineffective for presenting data. We also showed that a bar chart is a much better tool for presenting counts and percentages. This particular pie chart suffers from an additional ailment: it uses 3D. In other words, instead of thin pizza slices we're looking at thick cake slices! While this might build up an appetite, note that the extra dimension does not represent any information. Even worse, it distorts our perception of size. You can see this by trying to figure out which consumption item contributes the most? While the purple slice in the front looks largest, it in fact represents the second largest item! (can you find the first?). If an important decision (such as a ban) would be made based on this misperception, it would be quite tragic.

Other questions are also difficult and time-consuming to answer with a pie chart. For example:

  • Which consumption item contributes the least?
  • How much does Transport contribute? How does this compare to Clothing & Footwear?
Now let's look at an effective chart for conveying the same information. It is not fancy or colorful, but it doesn't require much thinking as the facts just "pop out":

Sources of private consumption and credit

We see that the first two items are similar in their contributions and high compared to the others, then a drop to 10% for Clothing & Footwear, and lastly the other items contribute between 2%-6% each, with Alcoholic Beverages, Tobacco & Narcotics contributing the least.

The bottom line: forget pie charts and forget 3D. Counts and percentages are always best to convey with simple bar charts (you can find a few more examples on this page -- click on a pie chart to see a better bar chart alternative).

Using charts in the media is very important. It catches the reader's eye and can help summarize the story in one look ("a picture is worth a thousand words"). However, it is crucial that chart creators acquire the basic knowledge in creating effective charts. It's not rocket science, yet it makes a huge difference.

Friday, April 27, 2012

Weather forecasts in the media: hot or cold?

Several local newspapers have weather forecasts in their online and print versions. While forecasts from different sources can differ (after all, these are uncertain forecasts), I found a few perplexing issues when examining forecasts from the two oldest papers: Kuensel and Bhutan Observer.

Kuensel takes a safety measure of giving high/low temperature and outlook forecasts only for "today". The first perplexing fact is the discrepancy between the print and online forecasts. Here's an example from today:
Kuensel's weather forecast. Left: print edition. Right: Kuenselonline.com (accessed 27 April, 2012, 4pm)

Turns out that the Kuensel website gets updated in the evening, so in fact their website does "back-casting" of yesterday's weather... The next strange discrepancy is between the names of the Dzongkhags as they appear in print and online (not to mention the different ordering). Third, I couldn't find the source for the forecasts. Where are they coming from? Finally, it would be helpful if the graphics in both modes used the same icons, order of columns, column names, and design.

Bhutan Observer has a Weekly Weather corner in its print and online editions. It offers more detail than Kuensel in terms of listing all the 20 Dzongkhags and provides forecasts for an entire week. However, unlike Kuensel which also reports high/low temperature forecasts, BO's graphic table tells us only the "outlook", that is, whether a day is likely to be of one of six types: mostly sunny skies, possibility of rain, cloudy, partly cloudy, partly sunny, or partly cloudy chances of snow/sleet. This choice of information is a bit unusual. I can see how it is useful for determining whether an umbrella, hat or boots are needed. It can also help guess the chances of flights landing in Paro.

And now to the data presentation: Is it easy to quickly figure out the weather information that a reader would be interested in? Here are some questions that a reader might ask:
Weekly corner from Bhutan Observer's website

  1. What is the forecast for Tuesday in Chukha?
  2. How many different weather outlooks across Bhutan in the coming week?
  3. How many "partly sunny" days anywhere in Bhutan? (good for trekking?)
In trying to answer these questions you probably noticed a few challenges:
  • Dzongkhag names are not listed alphabetically! With 20 names, searching for Chukha takes quite some time. 
  • Distinguishing between the icon images is difficult because they seem similar. A simple trick that would have made it much easier on our eye is using different colours. On the website there is no reason to use greyscale. And if colour-printing is too expensive, then using a more distinctive greyscale can work.
  • Given that there are only two different icons in this week's table, do we really need to see all six possibilities in the bottom legend? Removing unused icons from the legend would make the reader's task much easier.
Lastly, one should always ask about the data source. Here the source is listed clearly at the top as "Contributed by the Metrological Department of Bhutan". I suppose this is the Hydro-Meterological Services Division at the Department of Energy?

Monday, April 23, 2012

Telling a story effectively with charts

An article in this week's The Bhutanese showed data supporting "a study by the Bhutan Chamber of Commerce and Industry [which] shows that government expenditure and government projects are the main cause behind the rupee shortage." The following line graph was used to show the data (unfortunately, images are unavailable on the newspaper's website). The chart compares government spending with three other expenditures from (I think) 1995 to July 2011.


While a line graph is an effective way to compare trends, there are a few issues that must be kept in mind to avoid confusing the reader. Let's look at this chart carefully: First, the year labels are confusing. Second, are lines connecting annual numbers? monthly numbers? or perhaps some other aggregation? Third, the choice of line colors is quite hard to read, especially on the grey background. Try to follow the Rupee Reserves line. And fourth, the goal of a chart is to highlight the information for our story, not to dazzle us with color and bling. While the four lines and the legend are information, all those horizontal gridlines and background shading are distracting non-information.

These are just some guidelines for producing effective charts. Unfortunately, I do not have access to the data so I cannot produce an "improved" chart that more clearly communicates the message. Creating good charts is crucial for communicating news stories and convincing the audience. To learn more and improve your skills, you are welcome to join the Effective Data Presentation workshop.

Saturday, April 21, 2012

Seeing the Rupee crunch: Chart or Table?

Today's Kuensel article "Wholesalers see the silver lining" discussed the surge in sales by the Food Corporation of Bhutan following the Rupee Crunch. Here's the table they provided for comparing FCB's sales volume and amounts in March 2011 compared to March 2012. Try to figure out which food product saw the largest increase in sales? in volume? And the lowest?

Image from Kuensel newspaper, April 21, 2012


Let's try a chart of the same data. Again, let's try to figure out which food product saw the largest increase in sales? in volume? And the lowest? 


In this case, a chart is more suited for telling the story than a table. Why is that? When would a table be better? And what are best practices of presenting data in a table? We'll discuss all these and more in the upcoming two-day workshop Effective Data Presentation.

Wednesday, April 18, 2012

Rupee Crunch or Chart Crunch?

Today's Kuensel front page article "Self-sufficiency through commercialization"presents a large chart displaying the breakdown of different imported food product categories in 2011. While the chart is colorful and eye-catching, it does not convey the data or the story effectively. Why do I claim that it is ineffective? Try answering the following questions:

  1. Which food category was imported the least in 2011? (in terms of Rs.)
  2. Which was the second highest import category?
  3. Sugar imports differed from Vegetables, fruits and nuts by what amount?

How many did answers you get right the first time? How long did it take to answer each of these questions? These are measures of effectiveness. 







Now try answering the same questions with the following chart:

What makes the chart that I created more effective than Kuensel's fancy chart? Here are a few differences:
  • Pie charts are ineffective charts, because our brain and eye can easily compare differences in length or height, but not slices of pizza.
  • The "fancy" chart attracts too much attention to non-information compared to information, while the effective chart is focused on the information. Effects such as 3D, a plate with texture in the background, lots of color are all distractions from the information about amount of import of different food product categories.
  • Are the decimal digits really needed in Kuensel' s chart? (118.43M or just 118M?) And if so, then one or two decimals?

These are just a few issues that need to be considered when communicating data. The NY Times, Washington Post and other top media houses have already embraced effective visualization, as well as online sharing of charts and dashboards. The upcoming 2-day workshop Effective Data Presentation is a great way to learn about such practices and to gain experience in creating effective charts and interactive dashboards. Journalists and other media people are encouraged to join!

Friday, October 21, 2011

Online course "Interactive Data Visualization" (Opens next Friday)

I'll be instructing an online course on Interactive Data Visualization, starting Oct 28.

This course is about the interactive exploration of data, and how it is achieved using state-of-the-art data visualization software. Participants will learn to explore a range of different data types and structures. They will learn about various interactive techniques for manipulating and examining the data and producing effective visualizations.

This is very practical and hands-on course. We use real data and discuss practical issues.

This is also a great opportunity to interact with professionals worldwide (through an online discussion board) and to take advantage of online learning for expanding your knowledge. The course is completely asynchronous -- not need to be online at a certain time.

Who can benefit from this course? managers, journalists, researchers and others who handle and present data. Feel free to email me for more information and for the special terms for Drukpas.