Laptop Repair Tips
One of my favourite old laptops had major issues today, and after a couple of hours tinkering I now suspect it has a dodgy motherboard. It may be heading for a well-deserved retirement.
To look on the bright side, I thought I would use this as an opportunity to share the processes I use to troubleshoot laptops and the thinking behind these steps.
Killer is an A200 Toshiba with a T7100 Core2Dou and excellent graphics for a six or seven year-old laptop. It was my main computer, on 24/7, and tortured mercilessly about twelve hours per day. Often 20 or 30 browser tabs in three different browsers plus graphics editing software, several coding IDEs, word documents, SSH clients, a Python Web server and the odd virtual machine chugging away. All open at once. It could take half an hour to turn all this off when I finished!
The symptoms began about a month ago, when the machine started randomly turning off and turning on again. This coincided roughly with the time the little plastic tip broke off the power cable, so I assumed the cable was a bit touchy and it was losing contact. I did have a nagging feeling, though, that a dodgy power connection would cause a hard shutdown rather than a hard restart. As this was only occurring every couple of days I did not worry too much.
A week ago it started freezing. By this time it had had many hard restarts and I was expecting the hard drive to become badly corrupted, so I did not think much of it. It was about 18 months since Windows was reinstalled, so I took the opportunity to format and reload the operating system from scratch. Thanks to my obsession with multiple synced data backups this was as simple as putting in the Windows install DVD and restarting, nuking and installing. No wondering whether I needed to get this file or that file.
For a day or so it seemed better, but I was using my other computer so it was not doing much. Then it restarted again. By yesterday morning it was only lasting a few minutes before restarting. It was time to do what I do.
My Repair System
I do not really treat laptops much differently to desktop PCs when I diagnose them, but I am a little less-inclined to go swapping out hardware willy-nilly. I tend to do everything I can with the software before I start with the hardware.
My first step is the same with any computer I work on. I boot it up in Linux. Linux is an open-source Operating System (OS), which runs from a live CD, DVD or USB drive. I have many different versions of Linux, and I choose which one to use based on the specs of the computer. There are lighter versions of Linux I use for older machines and full versions for newer and more powerful systems.
For diagnosis purposes you should use a Linux distribution that requires about the same power to run as the Windows installed on the machine. This way you are more likely to see the same symptoms. For testing my laptop I used Linux Mint 16 Cinnamon on a USB stick.
The beauty of Linux for troubleshooting is that it can often tell you straight away whether you have hardware, or software issues. Say you have a display issue where your screen turns off during the Windows boot process. If a similar thing happens when you boot Linux, the problem is almost certainly in the hardware. If you have audio issues that also happens under Linux, that indicates a hardware problem. This is the reason I do this step first.
The less certain result is when Linux does not reproduce your fault. You would assume it means your problem is software related, and that is fairly likely, but it is not a certainty. Because Windows and Linux are different animals, they put different strain on the various parts of your computer, and they may even use different parts of your system to do the same things. Sometimes I find that Linux will take a lot longer to display the symptoms, so you may need to run it for longer.
In my case it was the former result. The machine happily restarted itself over and over no matter what OS it was running. I had a hardware problem. This was verified.
If your system can stay powered on this is when you would try to boot into safe mode. Some systems will enter safe mode if you press f8 during the boot process but I find this is hit and miss. If this does not work for you can initiate a Safe Mode session using msconfig. If your system runs properly in Safe Mode you may have a software issue, most likely related to device drivers.
Next, try to initiate a disk check. Windows will generally detect a corrupted hard disk but sometimes you need to run CHKDSK manually.
If you do not have a live Linux OS to boot and cannot download one, then your first step starts here.
First, remove any external monitors, keyboards, mice or other USB devices. Turn your computer on again. If your problem is solved, you will assume one of these devices was causing the problem. Once again, you cannot be certain yet. These devices may just be compounding another problem on your system, maybe by drawing extra power. Replug the devices one at a time, testing the laptop each time. If you can reliably reproduce the problem when you plug a certain device in, you have probably found your problem.
Remove the battery. Try the laptop on mains power alone. Remove the mains power, put the battery back in. Try the laptop on battery power alone.
Turn the computer on and enter the BIOS. Check if your hard disk is listed in and is first in the boot order. I will try to get a tutorial up soon for those that need help with this, although each BIOS manufacturer has different ways of checking these setting, so it is a difficult to write one definitive guide.
If you can borrow another power cable, try that next.
RAM often comes to mind at this stage. RAM can cause all the issues my laptop had, and more. I generally find it is fairly robust, and it goes bad less frequently than some would have you believe. If your system, unlike mine, can actually stay powered on you can use MemTest, a free utility you boot the same way as a live Linux OS. In fact, MemTest is a version of Linux. You boot it up and it runs an intensive routine on your RAM for as long as you let it run. It should find serious problems quickly, but you may need to run it for a few hours or more to catch random or intermittent issues.
This is the last of the non-intrusive steps. If you do not like the prospect of taking some of the internal parts out of your laptop, this is the time to take it to a repair shop. If the laptop is still under warranty, now is definitely that time.
If the repair shop is not an option and you are comfortable with the risks, read on to see my next steps.
On to the Real Work
Many people speak of component-specific symptons , but in my experience, there are no cut-and-shut symptoms for all components. For instance identical display issues can be caused by RAM or a bad power supply, not just a bad graphics card. Actual RAM issues discovered by MemTest can stem from a bad power supply. If you replace your two sticks of RAM with one larger stick and your problem improves, that may be because the bad power supply is under less strain feeding only one stick of RAM. That is why my next step is to swap some parts.
On many laptops there are only three easily-accessible internal components: The hard disk drive, the DVD drive and the RAM. I start by reseating the RAM. Turn the computer off, remove the battery and the power cable, and hold the power button down for a few seconds. The RAM usually lives in a small opening on the bottom of the laptop. There will be one or two modules. Carefully remove these modules and reinstall them. Close the cover then power on your laptop.
If your problem persists and the laptop has two RAM modules, remove one then reboot. Then reboot with just the other module. If, like me you have a spare good RAM module in your kit, try swapping that into the laptop. Make sure the RAM is the correct type for your system, ie DDR, DDR2, or DDR3.
We have now almost ruled out the RAM. Leave one module in the laptop and close the cover. We will move on to the DVD drive.
There is often just one screw holding the DVD drive in. If you turn the laptop upside down and visualise the shape of the drive you can probably guess which screw it is. It is generally towards the middle of the laptop. With the screw removed, turn the laptop on its side and give it a shake and the drive should begin to slide out. Remove it and put it aside.
Now we grab our trusty Linux again. For this step you obviously need to be booting off a USB stick because you no longer have a DVD drive. Boot the laptop into Linux and see what happens.
Next the hard drive (HDD). The HDD should be inside a cover slightly larger than the one you removed to access the RAM. Remove the cover and look for any screws holding the HDD in place. Often the same screws you took out to remove the cover also lock the HDD in place. Gently slide the HDD away from its connector and remove it from the unit. Put it somewhere safe for now, remembering that it probably contains all your important data.
Reboot the laptop. If your problem persists, you are in trouble.
What we have done is removed every component the laptop can possibly run without. This means there is nothing unnecessarily drawing power or interfering with things in any other way. Without completely dismantling the unit to access the motherboard there is nothing left to try. Taking a laptop apart to that extent and, more importantly, putting it back together properly is not a job for the faint-hearted. A computer repair shop may be the only option now.
Me, I have never paid someone to fix my computer and I will not start now. There is one more possible cause for my problem: an overheating CPU. I do not believe this is the issue, because I have not observed the usual patterns between room temperature, system load and timing that are generally obvious with problems caused by an overheating CPU. I will dismantle the laptop, reseat the CPU and apply fresh thermal paste. Sometimes reseating a CPU can work in the same way as reseating your RAM. However, my instincts tell me this is a fried motherboard. I have more important Cave projects at hand for now, so it might take me a while to get to it, but I will post part two of this article when I am done.
I hope this article has helped people gain an insight into the process of troubleshooting hardware issues. If it saves one laptop somewhere, then it was worth my time.
I decided to try one more thing before I disassemble this laptop. See the image below.
As you can see, Killer is running, and it is no longer freezing or restarting. It is, however, not fixed.
My last option before disassembly was entering the BIOS and disabling the second CPU core. Sure enough, Killer now runs. It no longer lives up to its name running on one half of a dual-core CPU, but it goes.
This is promising, because it may imply that heat is actually the issue and disabling half of the CPU tipped the balance back in favour of the cooling system. Maybe reapplying thermal paste to the CPU heat-sink will fix things.
As I mentioned earlier, though, symptoms can sometimes be misleading. This could still be power-related, with the reduced load from the CPU helping the system run. It could also indicate that the second core of the CPU is faulty, or that one of the motherboard buses that connect the second core is fried.
Overall, though, it does give me some hope, and in my opinion justifies the time and effort it will take to open it up and change the thermal paste. That is exactly what I will do when I get the chance.
I will continue this article then.
Bye for now. In the meantime, check out all the other projects and tutorials going on in Anth's Computer Cave.