Wireless Troubleshooting Best Practices

…guest post from Ahmad Nassiri

Troubleshooting a wireless network may not be as easy and straight forward activity as compared to a wired network.

There are many components to assess including but not limited to laptops, handheld devices, APs, controller, authentication, network switches, routers, DNS and DHCP. More frequently issues are caused by end user devices due to compatibility, outdated wireless card drivers, profile, access, security method, software version etc., yet wireless is the first to blame!

To focus on wireless network itself as the first step, poor wireless design, device’s default configuration, lack of regular monitoring and surveys checking network health, increase in number of users while there is inadequate number of APs, changes in the environment and coverage area that affects RF’s behaviour, upstream network devices and applications in use. As well as interference like Co-Channel (CCI), Adjacent Channel (ACI), non-wireless interference, high transmit power and support for lower data rates resulting in sticky clients, and rogue APs.

Like any other technology, wireless troubleshooting techniques must include a structured and a step by step process.

There are many troubleshooting methodologies either offered by vendors, researches done by universities and the wireless industry as a whole. While the common goal for all methodologies are the same i.e. to identify, define cause, create action plan, resolve and finally document. I have found the CWNP methodology included in the, “CWAP official Study Guide” to be the most aligned method to troubleshoot wireless issues.

CWNP Methodology includes the following:

  1. Identify the problem
  2. Discover the scale of the problem
  3. Define the possible causes of the problem
  4. Narrow to the most likely cause
  5. Create a plan of action or escalate the problem
  6. Perform corrective actions
  7. Verify the solution
  8. Document the results.
**Courtesy CWAP Official Study Guide (CWAP-402)

While identifying the problem the purpose is to narrow down the cause to either a device (s) type, user (profile) or application.

Was the issue noticed while using a particular device or application? The scale of the problem reveals if an outage is affecting all users or it is a single or group of users and or a particular location.

Defining the possible causes is the most important and difficult step, the troubleshooter must ascertain the cause from a pool of potential causes. Starting from the very first common cause and working down the list. A good example is to start on the client device and confirm all settings are as per standards set.

Some common faults could be client device adapter issues, profile not configured properly, AP or controller down, wired network issue with switch, PoE, DHCP, DNS, internet link, application server etc. This step if well documented will serve as the knowledge base for similar problems reported in future.

Creating a plan of action to tackle the identified problem must be documented and must be followed thoroughly. The action plan must include step by step approach and a section to note results until the problem is resolved.

I always recommend capturing a backup of current device configuration. No matter how small the tweak could be this will enable a smooth roll back in case the resolution undertaken does not fix the problem.

Verification of the actions taken is the next step, this is to confirm the actions undertaken has actually worked and resolved the issue by testing end to end, if not the next set of actions from the action plan document to be undertaken and tested.

The final step is to document the entire activities undertaken for future reference and building a knowledge base.

A well documented data base of problems with their associated resolution goes a long way, adding value to staff training and capability built up and saving precious time in troubleshooting the same type of problem again.

Recently I was called to help a client with wireless ‘problems’ keeping their IT helpdesk on their toes. From slow and non-responsive, to connection, association and disassociation, the list of complaints was long.

During the initial information gathering session, it was confirmed the controller was booted up with the default settings and nothing was changed apart from the basics and creating the SSIDs, “8 of them”. Lowest data rates supported, APs at maximum power, nearly 200 devices associated to the network, only 5 APs for a large office area with 9 meetings rooms having thick glass walls, 4 APs on the same channel on the 2.4GHz and 5.0GHz. The wireless network was not designed by a professional engineer, APs were installed by electricians on any location they deemed easy cable run and access. All of this resulting in an increased medium contention, retransmission and drop outs and in simple words a terrible wireless network.

Upon further analysis with Ekahau survey software and MetaGeek Channelyser more issues were discovered including CCI, ACI, interference from neighboring wireless networks, “4 neighbor APs with maximum power and on the same channel”, Ad-hoc APs and finally non-Wi- Fi
interference from Dect- Phones and microwave oven across the floor.

Starting from a complete redesign ‘predictive survey’, to place APs in the right location, proper capacity planning taking into consideration the services and applications in use, type of devices, bandwidth requirements, roaming, robust security options and suggestion for a bigger internet link was included in the proposal. Having few workshops and discussions ‘educating the client’, they understood that small tweaks with the controller configuration won’t fix all the issues discovered and unable to address the capacity, services and application delivery issues.

While the above scenario was a massive problem for the client, the approach I took was easier than, ‘a complete redesign’.

But there are times that the wireless network is good enough to support what is expected of it but still end users experience problems. This is where the focus should shift to the wired network components that affects the wireless network directly. These devices/services must be checked for configuration inconsistencies like ACLs, TCP/IP, DNS, DCHP, authentication settings like mismatched security settings, firewalls allowing access to certain services while
blocking others, traffic shaping, rate limiting and incorrect QoS parameters to name a few. All the findings and changes proposed must be well documented and followed step by step to finally isolate and fix the root cause.

As there are many factors that affects the performance of a wireless network, I cannot emphasis enough on regular monitoring and passive surveys to confirm wireless network performance.

The statistics captured by regular surveys provides visibility into the network and enables an administrator to compare results with a previous survey while taking necessary actions if there are anomalies found. Making small and timely configuration changes in the wireless settings will go a long way saving the day from much bigger issues encountered later.

The good news is that today we have some of the best wireless troubleshooting and performance testing tools in the market to analize wireless problems, generate detailed reports and confirm that a well-designed and maintained wireless network is NOT always the cause of the issue!