During the past three months, Dave has spent a lot of time coping with one broadband problem after another, including both local networking and broadband access issues. Now that he has finally resolved most of them, we've been thinking about how consumers can possibly cope as they run into similar issues.
Wi-Fi and Cable Issues At Our Condo Complex
As we've mentioned before, we own a couple of rental vacation condos in a beachfront complex on Sanibel Island. Dave's now a member of the condo board, and because of his technical background, he's increasingly acting as the (volunteer) CTO for the complex.
The complex has a Wi-Fi network used by owners and renters. There are ten 802.11g access points (APs). Three APs form a point-to-multipoint bridge between buildings, the others are used for client access. The main AP is in the clubhouse in the center of the complex, and other APs provide coverage to condos that do not have direct line of sight to the clubhouse. The network also includes a couple of routers, each connected through a commercial cable modem for high-speed Internet access. Several administrative PCs also use one of the cable modems.
When we first rented condos in the complex, the Wi-Fi network never seemed to work very well. The signal strength was fine outside the units, but fell off very rapidly inside. It was usually unusable at the dining room table, where we liked to work. When we bought our first condo three years ago, our first action was to install our own cable modem in the unit, and we soon set up our own Wi-Fi network as well. (See the Our Broadband Condo: PC and Internet: The Starting Point page for more information.)
After Dave joined the board a year ago, he started hearing complaints about the network. Some people complained about network outages, others about performance. The network seemed to fail ever more frequently; the condo association was spending a lot on service calls; and the property manager had to keep power cycling routers, cable modems, and access points to get the network back working.
During the holidays, we spent a couple of weeks at the complex and rented two more condos for our children's families. We used our own network at our condo, but our children told us the network was failing frequently--more than once a day. It wasn't at all clear what was causing the failures, and the property manager soon asked Dave to look into the problem.
Our son-in-law Jeremy Bennett was there with his family over the holidays. Jeremy is a software architect at Aruba Networks responsible for designing and building Aruba's RFprotect line of wireless security products. He's had ten years of hands-on experience with networking and was willing to get his hands dirty helping Dave track down the root causes of the problems.
Dave set up a laptop PC with AirMagnet and pointed it out our condo window. On New Year's Day, he found nearly 100 PCs trying to access the network. There are 141 condos in the complex; with more and more people carrying laptop PCs on vacation, that didn't seem too surprising--the seven adults in our family had seven laptops in three condos.
To try to understand why the complex network was crashing, Dave and Jeremy logged into all the active devices. They found most of the firmware was out of date--several access points were still running firmware dating back to the earliest release of 802.11g. The primary router, which also served as the primary access point, was a consumer-grade device. Analyzing its log files, they found it simply could not handle the load imposed by 100 PCs trying to gain access within a day. In particular, the DHCP server in the router appeared to crash and stop handing out IP addresses once it had given out 64 addresses.
During the same week, we also observed several outages of the cable modems serving the complex and our condos. Jeremy did some research on the Web and found a way to interrogate the status of the cable modems; we soon found that most of the modems were reporting marginal downstream and upstream power levels. At our condo we were also having trouble receiving some of the digital and high-definition cable channels, so we suspected there were broader cable problems as well.
Resolving all of these issues took more than a month. Jeremy's experience with diagnosing network problems was very helpful. Dave was able to leverage his own long experience with networking (he installed his first Ethernet network in the early 1980s) and with cable modems (he ran the first consumer cable modem trial) to diagnose and solve these problems. He replaced both routers with more appropriate "industrial grade" devices, upgraded the firmware in all network devices to the most recent releases, and reconfigured the IP addressing. To bring the cable modems within specs, the cable operator replaced some of the equipment in the complex, replaced most of the cable modems, and replaced most of their cabling.
With the access points all upgraded to the latest firmware, 802.11g performance improved substantially. The network has been very stable, with no outages like those we saw several times a day when we arrived.
Lest we believed that only consumers are facing these problems, we took heart in a recent article in Network World on wireless LAN management in the enterprise. It observes that "there is little history to tap into with wireless and most performance problems are reported long after the fact, both of which make it very difficult to find or reproduce the error and stop it from happening again. Their conclusion is that network managers want "more diagnostic tools and more expert analysis built into the products..."
TiVo Problem - Debugging Ethernet
A month or so after we returned home, we found that our TiVo had stopped working. We didn't discover it had failed until the program listings suddenly disappeared, and the TiVo told us that it had not been able to connect to the TiVo network to update the listings for 30 days. We had generally been using a different DVR in the kitchen but wanted the TiVo in our bedroom to also work.
The TiVo in our bedroom is connected to our home network through a 10/100 Ethernet switch in the attic. The TiVo Series 2 does not have a built-in Ethernet port; it has several USB ports and uses an external USB-to-Ethernet adapter to provide the network connection. The lights on the attic switch looked funny; Dave tried moving the Ethernet jack to a different port and the lights didn't change, so he suspected the USB-to-Ethernet adapter wasn't working properly.
Dave disconnected the adapter, installed its drivers on one of his PCs, and used it to connect to our network. It worked like a charm for several days, so that wasn't the problem.
After scratching his head for another week, Dave climbed back into the attic, disconnected a working PC from one of the ports on the attic switch, and connected the TiVo to that port. The lights looked fine, and the TiVo immediately started working again.
Somehow, while we were away, two of the five ports on the switch had failed. Dave replaced the switch with a new one (this is the third time he's had to do this) and we were back in business.
Skype Problem - Debugging Network Performance
About a month ago, Skype suddenly stopped working properly. Skype had been getting better and better, and we had gradually come to depend on it for the majority of our phone calls--both to other Skype users (including video calls with family members) and to regular telephones (using SkypeOut). All of a sudden, we noticed that the call quality was terrible - it was hard to understand what people were saying, sentences would get clipped, and people complained that they couldn't hear us. Since this happened simultaneously to both of us, and on several different PCs, we figured the problem was probably with Skype or with our cable operator.
After looking at both the Skype client application and the online help pages, we didn't find any obvious way to isolate the problem. So we went through the "back door" and contacted Skype's PR agency, who put us in touch with our old friend Jonathan Christensen, now Skype's General Manager for Video and Audio. Jonathan listened to our description of the problem and told us how to turn on the diagnostic tools hidden in the Skype client. As we were talking with him, we could see that the jitter was on the order of 300 milliseconds and the packet loss was around 10% of packets -- both awful. Jonathan said "it's probably your router -- we've seen a lot of router problems with those symptoms." We told him our router was an "industrial grade" device that we'd never had any trouble with, and we hadn't changed anything. He suggested we look more closely.
Searching the Web, Dave found a very helpful software tool called MySpeed PC VoIP Advanced Edition. It installs on a PC, and runs a periodic test to a remote server that measures the network parameters that affect VoIP services. It confirmed Skype's measurements of packet loss and jitter, and indicated that our network setup was unacceptable for VoIP calls.
To isolate the source of the problem, we subscribed to another cable modem service, and Dave picked up and installed the second modem. When he connected the test PC directly to the second modem, MySpeed PC reported that jitter was a few milliseconds, packet loss was zero, and the network was fully suitable for VoIP. Dave then installed a spare router between the test PC and the second cable modem, and ran the test overnight, with similar results. This effectively eliminated the cable network as the source of the Skype problem.
Our home router supports VPN links to our condos in Florida, and we recently added VPN links to the routers which serve the Wi-Fi network in the complex. One of those VPN links stopped working following a cable outage at the complex, and Dave had not been able to get the link back working. Both routers kept trying to reestablish the link, but it kept failing. Skype had stopped working at home in New Jersey at about the same time as the cable failure in Florida--unlikely as it seemed, perhaps the two problems were related.
Dave finally resolved the VPN problem about a week ago. As soon as the link was stable, he reconnected the test PC to the main router and cable modem. Sure enough, MySpeed PC reported that jitter was now a few milliseconds and there was no packet loss. Skype is back working fine, just as before.
In trying to reestablish the VPN link, our home router had apparently devoted so much of its processor time that it was losing packets and creating jitter in the VoIP link. We had not noticed problems with any other applications--but unlike VoIP, they are not time-critical.
HomePlug Problem -- Debugging Intermittent Failures
In an article several issues ago, we described how we had used powerline networking to solve a problem at one of our condos--we connected a cable modem in one bedroom to a router in another bedroom using a pair of Linksys HomePlug AV adapters (see Network Problem Solving with HomePlug). Soon after the article appeared, we started having intermittent problems with the broadband connection at that condo. From the symptoms, it was very difficult to tell what was causing the problem, since it went away whenever the equipment was power cycled.
Our son-in-law Jeremy stayed at that condo with his family over the holidays. Jeremy had used HomePlug to solve a similar problem in his California home, and worked with Dave to try to isolate the problem. The cable modem was clearly operating marginally; while we were there the cable company changed some equipment and cabling, which brought the downstream and upstream signals for the cable modem well within normal specs. We hoped the problem would go away, but it didn't; the Internet connection in the condo still failed several times a week, requiring power cycling to get it back working again. We both found this very frustrating--Jeremy said "If I have a problem with Ethernet, I know how to diagnose it. If there's a problem with Wi-Fi, I know what tools to use. But I don't know how to diagnose a problem with HomePlug."
In early February, Dave stayed at the condo for most of a week, and was determined to get at the root of the problem. On the way there, he stopped at Best Buy and bought a 50 foot Ethernet cable. He strung it across the floor between the bedrooms, and connected the router and the modem together directly. The network ran for several days without a problem.
Then Dave disconnected the Ethernet cable, and connected the cable modem and the router together with an ancient set of ST&T HomePlug 1.0 adapters we had tested during the summer of 2002--we've used these for five years to connect our AudioTron digital audio player to our home network. The ST&T pair worked fine while Dave was there, so he left them running when he headed home. They're still working, with only one failure in more than two months.
So the HomePlug AV pair appeared to be the root cause of the problem. When he got back home, Dave set them up and ran a test for more than two weeks without a failure. We shipped the pair to Intellon (whose chips are used in most HomePlug devices) for analysis. They reported finding a possible hardware problem, updated the devices, and sent them back to us.
We're heading to Sanibel this weekend, and we're bringing the updated Linksys pair back with us. If they run without a problem while we're there, we'll leave them running.
How Do Consumers Cope With These Problems?
We do operate a more elaborate network than you'd find in the typical home. We have nearly a dozen PCs including two servers, several networked media devices, three DVRs, and a NAS server; these are connected with Ethernet, Wi-Fi, and HomePlug (plus whatever else we're testing). Our home network is connected with VPN to routers at our two condos, and to two routers that serve the condo complex.
Some might feel we deserve the problems we get. But Dave brings an engineering mindset and many years of debugging experience when he sets out to solve problems. He looks for tools to help him isolate problems, in order to find their root cause. Because of our industry experience and contacts, we can call in help that wouldn't be available to the typical consumer.
Many consumers run into similar problems with their cable networks and their home networking equipment. Anyone could have encountered our problems with marginal cable modems, a failing Ethernet switch, and an intermittent HomePlug AV device. Our router problem was apparently not that unusual--we would not have noticed it until Skype acted like the canary in the coal mine--failing when everything else seemed to be working.
Consumers now take high-speed Internet for granted, and install home networks to support voice and video applications. We expect more and more consumers will run into problems similar to ours. How will they cope with them? Who will they call on to fix them?
What is our industry doing to address these issues? Will service providers, retailers, equipment makers, and application providers all point their fingers at each other and say "not my problem"? Will companies that make these products provide effective built-in tools to help consumers figure out what's wrong when they don't work right? Or will someone step up and take charge and provide help?
Cable modems provide an example for how to provide useful information to consumers with a problem. Most modems have a built-in Web server and respond to an http request at the fixed address 192.168.100.1 with all the key information. But it isn't easy to figure out what the normal range is--and it's especially hard if you can't get to the Internet.
Verizon may have the best approach. Verizon is using fiber to the home (FTTH) to provide video, voice and data service, using MoCA to provide home networking, and installing a Broadband Home Router (BHR) as the central management point for all services. In his talk at the MoCA Technology Conference, Verizon's CTO and Senior VP-Technology Mark Wegleitner described "The Verizon Managed Home". He said Verizon will use this infrastructure to remotely manage all video, voice and data communications for PCs and TVs. Since Verizon is already providing the end-to-end infrastructure, it thinks FiOS customers will expect it to solve any problems--even if they're caused by customer-installed equipment and applications.
The rest of the industry should keep an eye on how well this works, and start preparing to follow the same course if it works well.