Seeing Patterns in Random Data

by | Dec 14, 2011 | Blog

What we are after is very consistent data connections for our customers and clients. Below is one way to help quantify that your Wireless LAN is giving your clients consistent results. I know not everyone enjoys statistics… but sometimes with just a little massaging of data, in this case sorting the data first, will help allow you to see patterns–information–in your data. Rather than just take a single sample of data throughput, take a bunch. In this case I took 25 samples – the more the better. Now you can see more than a single snapshot in time – but a set of datapoints that we can learn much more from than a single point.

When looking at collected data, sometimes it seems to be quite random in nature. Looking at this random data, folks can make mistakes in analysis. One method we use to help ‘clean up’ this random data is to first sort the collected data from high to low, and graph according to percentage. This allows us to see graphically the differences between data sets.

As an example, I’ve put together the following sample data sets. Each has the exact same Maximum, Minimum and Average… but obviously, much different results. This is the value of this sorting method, it allows one to quickly see differences in data.

Maximum20
Minimum5
Average11.36
Datapoints25

Seemingly Random DataThe first is a graph showing the two sets of data, fairly random looking. Both look like they are quite similar in nature, both inconsistent, and with a fairly same average.

Consistent vs Inconsistent DataBut when you take this same information and sort it first, you can see distinct differences in the resulting graphs. One set of data is much more consistent than the other. Even though they both have the same averages.

We’d like to see very flat lines, showing customer experiences to be fairly consistent across the board. The higher the lines the higher the client’s throughput results.

A line with it’s curve toward the bottom left represents a fairly low consistent result. A diagonal line represents high variability – more inconsistency. A line with the curve in the upper right represents consistently higher results.

Another way to use these ‘sorted’ graphs is to look at the 50% line – this represents the ‘average’ someone would achieve. The 80% line on the bottom represents that 80% of all collected data meets or exceeds this number.

This is a good telltale sign for following the 80/20 rule. Don’t waste too much time and money trying to fix the last 20% – put the bulk of your resources towards getting the 80% to be as consistent (flat) as possible.