Article

Simple Data Visualization in Excel – Part One

A big part of the work that I do at AHEAD involves performance analysis. Using various tools (PERL, Python, Excel, manufacturer proprietary, etc.), I take what can be several gigabytes of raw performance data and distill it into something that is meaningful for a specific situation. While each case is unique, Excel always plays a critical role in visualizing the data. For this first blog post on the topic, I will cover two techniques that I use consistently – decimation and polynomial trend lines.

Decimation

When describing storage workloads, I like to create a chart that includes Input/Output Operations per Second (IOPS) over time with reads and writes represented by a stacked area graph on the primary axis and then a percent writes line over this on the secondary axis. The problem here is that showing percent writes at the same granularity as the IOPS data makes the chart cluttered and difficult to read.

Simple_Data_Vis_Image_1

Enter a simple technique called decimation which displays every Nth value in the data set to thin out the values in the percent writes line.

Consider the following table of values:

Simple_Data_Vis_Image_2

Using an IF function, a modulus (MOD) function, a ROW function and an integer for the decimation value gets us the following cell entry:

=IF(MOD(ROW()-ROW(AD$2),Decimation)=0,AD2,NA())

This formula first figures out the difference between the current row number and the first row number of the column of data that we would like to decimate. If this difference is divisible by the decimation value with no remainder (mod() = 0), then display the value of the source cell. If not, make the value #N/A.

Note that “Decimation” is a named cell that holds our decimation value. Check out the table using our formula and a decimation value of 10:

Simple_Data_Vis_Image_3

This meets our goal of displaying every Nth value (easily changed by modifying the decimation value). With a little formatting we get the following graph:

Simple_Data_Vis_Image_4

Polynomial Trend Lines

Some data sets have a lot of variability sample to sample, but have an overall trend that is meaningful. Many data sets in performance analysis are like this, including cache hit rates and processor utilization. Consider the following processor utilization chart:

Simple_Data_Vis_Image_5

Given the number of samples and the variability in the data, a linear trend usually doesn’t provide a good enough perspective. A polynomial trend line is curved and the degree to which it follows variations in the data is controlled by the order, i.e. the trend line reacts more to changes in the data as the order is increased. Overall, I find third to fifth order polynomial trend lines to work best with this type of data set.

First, right click the data set and select Add Trendline:

Simple_Data_Vis_Image_6

Select Polynomial and experiment with the order. In this case, I chose fourth order:

Simple_Data_Vis_Image_7

Once completed, here is the end result with the trend lines giving us a better perspective on the data:

Simple_Data_Vis_Image_8

What’s Next

In my next post, I will cover lookups and pivot tables to provide a performance view by application armed only with performance data. In the mean time, try out decimation and polynomial trend lines on your own data sets and let me know how things work out in the comments section below.