Saturday, May 12, 2012

When Troubleshooting A Computer Enters the Twilight Zone

I have fixed hundreds of computers since I got into the computer maintenance business. Much of computer repair work is routine. Then again, much is not. In fact, I have seen so many strange occurrences that I cannot explain that it scares me. Sometimes I sense a little superstition within me born of these inexplicable computer behaviors--computer hauntings, if you will. I have to disperse these superstitions because they just don't have place in a computer technician. We must use logic and proven troubleshooting techniques to resolve computer problems! But I think every computer technician can tell you stories that defy logic. The story I am about to tell is one of these.

Company A purchased a telecommunication switching system from Company B. At any given time of the day or night hundreds of phone calls were wending their way through the computer circuits of that switch. A few weeks after the installation of this very expensive computer (six figures) Company A's technician noticed in the logs that the switch would reset itself over night. It takes a bit of time for this switch to recover from a reboot so not only are all current calls terminated, but no new calls can begin. This was serious malfunction to Company A. Company B got their second level technical support right on this problem. However, they could find no indication of the situation that caused the resets. The resets continued to occur each night usually around 11:00. Of course they checked power issues, but there were no other power issues reported in the building or switch room. Company A was was displeased with the performance of the switch they had paid so much for and demanded a resolution. The problem was escalated to the designing engineers. They dove into the problem fully confident that they could get this resolved. They studied all the core dump logs to find nothing. The logs indicated everything was running perfectly and then suddenly a system-wide reset would occur. This was not happening with any other of their switches in any other location. They swapped out power supplies. Reset. They swapped out memory. Reset. They swapped out circuit boards. Reset. The reloaded software. Reset. What made the troubleshooting so difficult was that they had to make a change and then wait overnight to see if it was effective. Company A's impatience increased and it was clear something positive had to happen soon.

The engineer's of Company B could not understand what the problem was. They were at wit's end. It was as if the switch was in another space-time continuum where physics do not work the same. Out of desperation one engineer was sent to the switch site to sit up and watch the switch one night. The reset always occurred around 11:00 so he know when to be watchful. This troubleshooting technique is not usually effective in a computer. Computer problems normally occur in the virtual world where the naked human eye cannot see. But desperation drove this highly educated engineer to do the illogical--sit and watch the exterior of a computer to see if he could learn anything about what was going on inside.

He entered the room about 9:00 pm, brought up some monitoring screens, and then took a seat and stared at metal and silicon. The air conditioning made it chilling and all the fans in the computers created a sleepy din on his ears. 11:00 pm came and his expectations were disappointed when nothing happened. The switch kept purring along. At 11:30 he was tired and was just about ready to go back to his hotel when the door opened and a cleaning lady came with a vacuum cleaner. This was not a typical switch room and a part of it was carpeted. Curious, the engineer watched the cleaning woman. In a hurry to get her job done she ignored him. She went straight to the electrical sockets where the switch was plugged in and unplugged the switch to make room to plug her vacuum cleaner in. The mystery was solved.

Traditional logic does not always hold the keys to solving computer problems. Desperation will lead to some very creative troubleshooting techniques. Insanity is often defined as doing the same thing again and again, but expecting different results. More than once we have fixed a frustrating computer problem by applying the insanity method. At first you go through the troubleshooting logically marking each thing you have tried. When the problem doesn't fix you sometimes try previously tried items again. That doesn't always logically make sense, but eventually we get the problem fixed. Clearly something must have changed while repeating procedures, but we don't always know what it was. But if it is fixed and stays fixed I'm willing to put a mark in the victory column.

1 comment:

  1. I enjoyed it. The story is excellent. But, seriously, who hired that lady to work around computer equipment.

    ReplyDelete