How to Scale Your Network so that You Can Support 1 Million Users with Colleen Szymanik

Colleen Szymanik presented at #WLPC US Dallas 2015 about what it looks like to support a million users?

You can watch the full presentation HERE.

I installed the first Access Point at University of Pennsylvania back in 2000. Through that network to an over 4,000 before I left there and then I started working at Comcast in their Xfinity Wi-Fi program a couple years ago.

When I was at University of Pennsylvania back in 2008, it’s interesting because we had an RFI bake-off process. As part of our process, we were soliciting feedback from other universities on their deployments from the finalists that we were looking at and how things were going.

But the hardest part that they had was supporting two different vendors at the same time on their campus. Now that I sit here supporting 9 million Access Points with 5 different vendors and all sorts of different models, I wholeheartedly agree with them.

Scaling

Monitoring. These are some key elements. You always need to know what’s going on your network no matter what the size is. That’s very important
Automation. You do need to know what’s going on in your plant. You’ll have a schema for making sure you have accurate latitude/longitude, how you deploy things and spreadsheets. You need automation for everything like when you’re dealing with support calls.
Self-healing. You need it because the last thing you want to do is have a thousand hands in the cookie jar and everyone’s going off in a different direction.

Self-Organizing Networks (SON)

SON is a concept that’s well-established in the world of LTE. In fact, you might hear terms of it on Wikipedia for things like configuring network elements and operating performance. You provide these soft healing capabilities in case of network interference or faulty network elements.

We’ve been pitching this plug-and-play idea with the controller based architecture for years. Can we scale that? Can we scale it to a million Access Points? Can we do it in a multi-vendor environment?

Let’s shrink the problem down and simplify. You have to keep it simple. It’s easy to get millions or hundreds of thousands of Wi-Fi out there. But can we manage those radios well at scale?

What to capture?

It’s the radio we’re talking about so I want to know about the activity factors:

When am I going to reach that energy tech threshold?
When is the Clear Channel assessment?
When am I going to know when a radio was too busy serving clients?
Is the noise floor was too high?
Does the client RSSI had reached a certain threshold?

How often do you capture?

I’m talking a multi-vendor environment. We got to default to the lowest common denominator on how to grab this information. Most Access Points out there can support SNMP.

We pull every hour, every five minutes. Can you imagine collecting SNMP statistics from over a million access points every five minutes? Is it manageable to scale to pull and to store that information in the central repository?

Can you imagine collecting SNMP statistics from over a million access points every five minutes? Is it manageable to scale to pull and to store that information in the central repository?

Network Visibility

The LTE and everything that we’re doing within the cellular industry is basically had this Wireless mean opinion score. Take these statistics, do a little algorithm, do some shifting and let’s figure out how we can basically assign a score to this. Now we have a score so we can assess how the radio is going. We can see we have some visibility.

We’re talking about different vendor types, but not all of them use the exact same statistical counting either due to equipment type or chipset. Is there an apples-to-apples comparison? Can we simplify this? How do we translate this into different deployment types?

Can I reasonably compare radio resources for the same way that in people’s houses or to multi-dwelling units to venues? The majority of the clients will be similar, but we’re not doing the same usage patterns in all these different areas.

As I start to think about all of these cases, this also starts to get more and more difficult to try to put these into some buckets and scale them.

I am pitching the idea of self-organizing networks for managing RF at scale. I think the key takeaway here is these metrics that have identified and I think the important part that we need to think about and discuss is knowing what is on our network and being able to understand it. That helps us make the decisions to do any type of analysis.

Colleen Szymanik is a Principal Engineer at Comcast Cable. If you have more questions or feedback, connect with Colleen via twitter.

Go HERE to watch this entire presentation.