by Locum sysadmin
In an ideal world, we all maintain networks composed of shiny, high-end equipment. Server rooms are stacked to the brim with racks of blinking lights. Neat bundles of cable wend their way through cable loops to orderly, labeled patch bays. When the occasional piece of equipment fails, a hot replacement is slotted in by trained technicians, often before users even notice the outage. Sleek, modern servers hum contentedly, offering their services all day, every day. All is well.
And then there are the other environments ...
Imagine, if you will, that you are a programmer, working for a small company. You are perhaps vaguely aware that all is not well with the small network that you use each day, but the system administrator (
, if there is one) is so busy with other duties that addressing your concerns seems to be last on the list. The occasional delay in CVS checkouts or e-mail that just never quite makes it seem like minor issues compared to... well, whatever it is that so occupies the sysadmin.
Or perhaps there is no sysadmin ... the network topology is neither ring, nor star, but more "accreted." It is possible that the nephew of one of the managers was responsible for its setup. Like coral, successive waves of employees have washed over the network, leaving their small additions—a cheap 8-port hub here, some gaffer-taped wiring there.
You become aware that your LAN/WAN environment is a real-world test of how deeply Ethernet hubs may be cascaded. A trip to the server room (or server closet) reveals a mess of cabling that closely resembles blue spaghetti. Access to the outside world can take several forms, but it is not uncommon to find a couple of dialup modems lurking quietly in the mess, unnoticed until a failure in the regular link means a failover to the pleasures of 30 employees sharing a 33.6k modem. The concept of labeling cables never made it to this paleolithic theme park, so if you ever trip on one of the floor-dwelling blue vines, locating its original socket can be a challenging occupation.
The servers themselves seem to be an interactive museum display charting the history of computing up until the late 1990s. Old UNIX boxes spill a mess of cables and hard drives over the bench, generic white-box servers of unknown vintage litter the room, "Powered by Linux" or FreeBSD stickers adorning them. Discolored 15-inch monitors sometimes display a blue screen of death, letting you know that some people still love NT4. Assorted tape drives blink quietly away, backing up regularly, though no one seems quite sure what they are backing up, or how to recover them. An elderly Sun box whiles away its retirement transferring mail and playing host to the occasional crackers who exploit security holes in its ancient
, then give up in disgust.
The spare parts for the network might occupy a shelf in the server room, or perhaps they nestle on top of a rack unit. A motley assortment of chewed-looking Category 5 cables, network cards so ancient that their manufacture date is in Roman numerals, and a sculpture of BNC connectors—the thought of turning here for help fills you with dread. A dead network adapter usually means a surreptitious raid of the petty cash and a trip to the local computer-parts store for a no-name Ethernet card.
Then—as it always does—disaster strikes. Somewhere, something goes wrong. One thing that you can be sure of is that it will happen at the worst possible time. It is likely that a crucial presentation will be under way, or perhaps a software release is due by close of business. Maybe you are hosting a server for a client, and the client has noticed its absence, and is on the phone, using words like "unscheduled outage" and "penalty clause." If your clients are so inclined, words like "kneecap" and "sledgehammer" might also be heard. Another fact you can be reasonably sure of is that the sysadmin will not be present, and the nextmost technical person will be called upon to work up a minor miracle to fix the ailing network.
Sound far-fetched? Believe it or not, I have been in this situation more than once. What follows are some hints that may help in fixing networks in suboptimal conditions, and as always, with the understanding that it must be done as cheaply as possible.
Many of the hints use features found on Linux boxes, beloved for its technical excellence (and its low cost). Most of the tips here can be adapted for whatever type of operating system you have.
is the venerable tool that we all know and love, and is the reigning king of the low-tech diagnostic tools. Linux (and other operating systems that use GNU tools) features an extension to
that produces a beep on receipt of a response. The
is designated by the
Something as simple as
ping -a missinghost.your.net
, left running from a console in the server room, can alert you when you have finally reestablished network connectivity. It is like having a cable tester that can traverse routers.
Where Are You?
In a server room full of unlabeled generic boxes, it can sometimes be tricky to know which box is which. The following conversation is typical:
"Okay, I've logged into
by SSH [Secure Shell Protocol], and I think its second hard drive has died. Can you turn off its power switch when I shut it down?"
"Sure, which box is it?
"Ummm... its hostname is
"None of them are labeled!"
it's a Pentium 2."
"That narrows it down to five boxes..."
This kind of guessing game can continue for quite some time. Following the ground-breaking research of Murphy, if you guess wrong, it is reasonably certain that you will pick a critical server to drop. My least favourite twist on this is when the boxes have been labeled—but labeled wrong—or labeled with yellow post-it notes (which fall off as the temperature in the server room increases).
If you are using a Linux box, and it has a CD-ROM drive, why not try ejecting it? Using the
(or other device name as appropriate) command will make the box spit out its CD tray. It is like telling the real
to put its hand up.
[Cautionary note: Be careful of doing this to machines where the CDROM tray is behind a closed door, such as with the Digital Prioris or the IBM NetVista. Like a tractor-pull for plastic components, you
find out whether the server door is stronger than the internal tray mechanism of the CD-ROM drive.]
[Disappointing note: Calling eject on a nonremovable drive does not cause the hard drive to eject its platters. Bummer! A hard drive that could unleash a couple of platters at 10,000 revolutions per minute would be an interesting sight.]
Change Default Passwords (and record them for your successor)
Sometimes in one of these computer ghettos, you will stumble across an unexpectedly nice piece of equipment, such as a managed switch or a decent router. The chances are strong that it will have been left in its default configuration, so that any devious member of staff can
to it, change its configuration, leaving the network even more fouled up.
Your natural inclination should be to change these passwords—even if people do not act maliciously, they can sometimes foul up equipment accidentally. However, because you have been pressed into service as the network admin, remember that the same fate will likely befall another hapless victim one day. As a mark of consideration, record the equipment description, location, serial number, and new password, on paper. If the company has a safe, store it there. If the company has a safety deposit box, store it there. Make sure someone (a manager or director) knows about it. The time you save may be your own.
Perhaps you have identified that the network really ought to be split up—maybe moving testing to its own segment so that the incessant load-testing does not choke the network for everyone. However, requests for budget allocation to buy a router might not actually be fulfilled. It is at times like this that an old Pentium, two network cards, and a copy of the
Linux Router Project
(LRP) can be pressed into service as a cheap router.
The throughput of such a lo-fi router may not match that of a dedicated unit, but it may suffice for a small organization.
For bonus points, you might also consider setting up some firewall rules on the router, so that the next virus-ridden e-mail opened by someone in marketing does not flood the entire network with excess traffic.
Network monitoring tools can make a world of difference to your quality of life as a temporary network administrator. Rather than waiting for users to alert you to a downed Internet connection, you can detect and repair problems as they occur. The ability to maintain logs of link downtime can also help support arguments to replace unreliable links.
 is a free network monitoring tool. It provides services such as:
|Monitor if a host is up|
|Monitor if key services on a host are up|
|Monitor if a host is running services it should not|
A Web interface allows easy access to status reports. It can be configured to notify you when problems occur, for example, with an email message. Of course, if the mail server is down, this notification method might not be so useful. Such a situation might be better handled by using the Nagios
Short Message Service
(SMS) messaging component.
Given that you might not have a dedicated G
lobal System for Mobile Communications
(GSM) modem available for sending these SMS notifications, you might like to investigate the Gnokii project . Ostensibly a project to assist the user in communicating with a mobile phone handset (over data-link cable or infrared), with a capable handset users can initiate sending SMS messages from their handset with Gnokii.
Intrusion detection might seem a luxury on a network that is struggling to stay operational, but when the price is right (free) and you can spare time to set it up,
offers a range of features that is surprisingly good. Snort can even run without an IP address, making its host computer a fairly difficult target for intruders. The documentation at the
Website  is quite comprehensive, and I recommend it.
 is a popular, free HTTP and FTP proxy server. The simple act of caching banner and button graphics for frequently accessed sites can give an apparent increase in Internet bandwidth. The impression for the end user is that things just get faster, because all those pretty graphics load immediately. You may know it is just a nifty trick, but why let on?
One characteristic of chaotic networks is that, like weeds after heavy rain, network services spring up everywhere. Programmers are prime offenders in this respect. But be wary—a service with a security flaw, running on an exposed server, can provide an easy beachhead for crackers (a lesson I learned the hard way).
 is a free network scanner that can assist in finding servers that seem to be running more services than they ought to. It operates in several modes, and offers a range of switches to control its operation.
One of the features that seems more oriented toward people who are scanning networks they are not supposed to is the "Timing policy," specified with the
command-line switch. The options offered here are
Paranoid, Sneaky, Polite, Normal, Aggressive, and Insane
. This feature actually comes in handy if the target of your attentions is heavily laden, or lives at the end of a slow link. If you are in the process of tuning a firewall to detect port scans,
offers an excellent test facility too.
Another feature that will likely be helpful is the
OS fingerprinting facility. Using a combination of techniques , it produces remarkably accurate results for most scans. Combine this result with a port scan and you can build a great picture of which machine has grabbed the wrong IP address (a favorite trick of laptop users: "I didn't know what my IP address was supposed to be, so I picked one.") You also can form a rough network map by OS-fingerprinting every active host on your network.
It is a good idea to stay up-to-date on your tetanus shots because occasionally you will nick your hands on the sharp bits of metal found in computer equipment.
When licenses for your VisualRouteAnalyser2000 and TrafficGraphic tools have expired, remember that
can be one of the most valuable tools to ascertain exactly where things are going wrong. The only (obvious) word of caution is to be aware that overzealous firewall rules can produce spurious results from
The desirability of labeling cables is so obvious that it seems silly to even mention it, but it might not have been standard practice for the sysadmin before you. All the more reason you should do the right thing. Sure,
know that the purple cable is the link from
, but will the next person who has to diagnose network issues?
The other impediment to labeling cables is that the sheer volume of unmarked cables makes the task seem futile. Why bother labeling the new one you have just put in, when there are another 40 unknowns? Take heart—by gradually labeling a few here and there, the cables will gradually get less scary each time. Sometimes it can seem like the labor of Sisyphus, but every little bit helps.
Post-it notes do not constitute an adequate label for network equipment or servers. You are strongly urged to preserve the sanity of other sysadmins by clearly labeling all equipment, using adhesive labels (in a pinch, the labels for a floppy disk will do).
At a minimum I would suggest that host name and operating system (where appropriate), IP address, and a dire warning against tampering with the unit be included. Bonus points are awarded to people who also maintain an equipment audit and record the details of the unit, plus a list of known services that it is running. Of course these will quickly become outdated, but with a known starting point confusion may be reduced.
Destroy Faulty Cables
After several hours of cable tracing, network-card replacement, checking switch link lights, and so on, it may be that you identify a network problem as being caused by a faulty network cable. It can happen anywhere, and is not necessarily a reflection on the skills of the [acting] sysadmin. (Although if the network cable has clearly been mangled and you should have spotted it with a quick visual inspection, you will probably feel a little silly if the time to locate the fault exceeded two hours).
So you whip a replacement cable out of your secret stash (you should have a secret stash of known-good cables) and voila! Network outage fixed. Now comes the most important duty of all—do not discard the damaged cable anywhere that subsequent admins might find it. On several occasions, damaged cables have been put back in operation, only to cause a repeat of the problem that caused them to be removed from service in the first place. It is not uncommon in server rooms to have an empty box that serves as a rubbish bin, but those unfortunates who come after you may not recognize its role as a waste repository in a time of crisis.
If waste is so abhorred that discarding cables is frowned upon, perhaps you can redo the ends of the cable and vigourously retest. Some even maintain that a long cable run can be split into several shorter runs and reused, because the cable fault is likely to be caused by a single break. I disagree—any cable that has broken in one place is likely to suffer further breaks. Demonstrating this principle to overly frugal managers is sometimes best achieved by ensuring the outcome of the demonstration. I suggest laying the cable through a close-fitting door frame and slamming the door on it a few times prior to testing.
Help Dying Equipment on Its Way
Sometimes it can be difficult to discard equipment. Combine this with the almost pathological frugality common in the small business owner, and you find the most decrepit network gear being nursed along. "I just know this old hub has another few years in it. Sure, a few of the Ethernet ports are stuffed, it overheats on warm days, and looks like it might have a mouse nest in the power supply, but that is no reason to discard it." Nothing is going to convince the owner of this piece of gear that it is time to "redeploy" it in the rubbish bin.
Sometimes you have to be cruel to be kind. Without wanting to seem too much like the
Bastard Operator from Hell
(BOFH) , you may have to help some of this equipment meet its end. It is difficult to identify any one method that fulfills this requirement. My best suggestion is to avoid solutions that leave any externally visible marks (unless they are carbonization marks caused by electrical fault).
You may find that some equipment shows a perverse ability to survive conditions well outside their "recommended operating environment," and nothing short of a sledgehammer will cause those last two operational ports to die. My recommendation here is to do some network reorganization so that the people responsible for the retention of the equipment are directly affected by it. Nothing says "replace me" quite like frequent trips to the server room to toggle the power switch on an ailing hub. It is surprising how fast requisition orders get signed when managers can no longer browse their favorite Websites.
The crisis has passed. Your time as a sysadmin has passed, and you are free to return to your real job. You have acquitted yourself admirably as sysadmin, and you have learned something in the process.
Like the end of a horror movie, you know that it does not really end here. Somewhere, something is waiting to go wrong. Will you be ready the next time?
 Gnokii project:
LOCUM SYADMIN is the nom de guerre of a roving programmer who often seems to find himself in sysdamin roles. Operating in deep secrecy, this elusive creature may sometimes be seen tracing cables and cursing. E-mail: