I'm getting frequent BSODs and intermittent hard drive corruption on this setup that I purchased about 2 months ago. Very frustrating, because the problems disappear for a couple days, then my machine is unusable for a few hours, and then the problems seem to go away again. Then a few days later I end up running chkdsk and find all sorts of corruption on my hard drives.
I have run the Windows extended memory test, and that appears to be fine. Prime95 also runs for hours without any errors. All of the BIOS settings are on 'Default' or 'Automatic', except for the automatic DIMM voltage adjustment, which I turned off.
This is really frustrating, because the shop I bought the components from won't give me a replacement unless they can see some obvious failure - but the problems are intermittent, appearing on and off at an interval of several days.
Windows 7 64-bit
DX58SO. BIOS Version 4405 - SOX5810J.86A.4405.2009.1020.1419
Core i7 920.
3 x2 GB RAM.. can't remember the brand now, and can't run cpuid because I'm reinstalling Windows for the nth time.
Are you using UEFI boot?
Brand of harddisk? I assmune a Samsung in this case.
How do you shut down the computer?
To the S3 mode, S5 mode or completely off (G2 mode) = power off.
In this case, install no additional software, just windows, all hardware drivers and anti virus.
Remind that some 32 bit applications do not run under windows7 64bit , even under it's famous windowsXP mode.
I was not using UEFI boot, and the SATA was in IDE mode.
Yesterday I reinstalled with UEFI enabled, and SATA in AHCI mode. I've had a crash-free 36 hours, but my realistic self says it's too good to be true.
The only things I've changed this time are:
1. Automatic DIMM voltage training set to off (it was Automatic)
2. AHCI mode turned on
3. UEFI boot
The boot drive is a 1 TB Seagate 7200.11
I usually make the PC sleep in S3, but obviously it has to restart every now and then for updates etc.
Interestingly, it seems as though the PC either boots into a good state, or a bad state. I suspect this, because if it's running fine, then it can be doing so for many days, because I always make it sleep to S3. But then a Windows Update comes, I have to restart, and boom.. it's crashing all the time. And then the crashes disappear for a while.
I wish there was some reliable tool to test HD/Controller integrity.
Hmm great. It's just done it again - that was 4 days of uptime (since a fresh Windows install), before my hard drives were littered with NTFS errors, and files I download recently have invalid checksums.
Shame on your Intel (if it really is the board) - you should provide a tool to verify the motherboard. Some moral equivalent of Prime95, which stresses the other components too (Esp the SATA). I guess that's not easy to do. Anyway, since this kind of stuff happens once in 4 days, I doubt the retailer is going to be able to RMA it.. sigh.. time for an ASUS.
My power supply is a Gigabyte Odin 720 watt. I'm pretty sure that's OK. I had identical problems when I was running with a 550 watt PSU. I've tried running with and without my UPS - problems occur either way.
I do not think it is the hard drive, for this reason:
The computer appears to run perfectly for many hours/days. I can run all sorts of memory tests (Prime95, Windows 7 Memory Diagnostic, Burn-In test) for hours, without any problems. Seagate seatools tests all pass. Then, all of a sudden, for no apparent reason, every process is crashing, and the system most likely dies with a BSOD.
Then, I reboot, and run the Windows memory diagnostic, and in a matter of seconds, it says "There is a problem with your hardware, etc etc". So I turn the machine off, and I re-run the memory diagnostic with only one DIMM in (to test my 3 DIMMs each separately). Then, gradually, over the course of about 1 hour, the report of memory failure becomes less and less frequent, until the machine is completely stable again, and I can't get it to report any kind of fault whatsoever. The DIMMs that failed 30 minutes ago are now presenting with no issues at all.
If the hard drive was the only problem, then the memory diagnostic would not fail.
Could it be heat? I don't think so, since these problems occur at all times of the day, and I'm running with the case open. These problems can occur seconds into bootup, after the machine has been off for several hours. Stock cooling on the CPU and GPU.
I'm really puzzled (and frustrated!).
Am I right in thinking that since the memory controller lives on the CPU, the motherboard is essentially just a sheet of metal between the DIMMs and the CPU socket? And if so, can the motherboard be to blame for this kind of thing, or can it possibly be the CPU? I hear the CPU problems are extremely rare.
I doubt it would be the CPU, unless it was overvolted or exspose to major stastic. Sounds to be your motherboard or memory.If you can not exchange your parts, you may want to have the motherboard tested. First I would go through everything though. When I first built with this board it would have hard lockups everytime I would power up, updating the bios solved my problem. Also my video card had issues, but niether caused hard drive errors. I would start with updating your Bios ,there is a new one after your version. and update chipset driver. If that does not work I would to find out what memory you have to make sure the voltage and speed and timings are correctly set in bios, even at default the settting could be wrong . Voltage is very important can not exceed 6.5volts or processor will fry. The computer probably would not even of booted, but I would reset your cpu and make sure no pins on the socket are bent and put a high quailty paste on the heatsink. When you boot up did you go into bios and check temps?If you are not already running temp montoring software I would. I doubt heat is your problem but it would cause your problem. Maybe dumb questions, but is your Auxiliary motherboard power plugged in, Memory in the 3 blue slots? PCI express card in the top slot?Also when you restalled Windows did you reformat and do a clean install?
Yeah I don't think it's heat - temperature reading are dead normal when the thing crashes. And I have updated to the latest bios - all it did was break keyboard support outside of Windows. I don't think it's RAM, because when the system is in a good mood, I can run RAM tests until I'm blue, without a problem. All timings are by SPD, and they are conservative.
Over the weekend I ran 8 instances of Prime95 (one for each hyperthread), for 22 hours non-stop. No errors. Then tonight, about an hour after bootup, I get a BSOD. I think if I get an annual bonus this financial year end, I'm just gonna start replacing parts until I've discovered the culprit(s). The only question is - which is the most likely candidate?
1. Geforce 9800GT (65 nm, A2 revision)
2. Core i7 920 (D0 revision)
4. 3 x 2GB Apacer DIMMs (PC3-10700H) - running at 533 / 8/8/8/20/59. Part numbers 78.A1GC6.BN1.
My gut feeling is start with the motherboard.
You did not mention that you have checked the eventviewer of windows.
If the eventviewer reports ATAPI timeouts then it's harddisk related.
What I would do is : download a bootable linux live CD and create the CD.
Disconnect the harddrive and boot up the computer with the linux live CD.
Use the linux distro longer as you have used windows before the crashes.
Now: if the linux distro is crashing also and any memory test application finds no ram errors. Then it seems a board problem.
Don't worry, the linux distro looks a bit different but the GUI is user friendly.
Booting will take much longer tho since the CD drive is much slower compared to the harddisk.
I would like to add to Sergei reply, ATAPI errors can also be CD-DVD ROM, sata controller related , voltage,even a bad cable, not just a hard drive problem. If you have thoes errors you could try moving your sata cables around and reboot. Also Sea tools should of found any hard drive errors.
It was my RAM. A couple months ago, I had actually sent that particular DIMM back to the shop, but the idiots in the retailer's back end sent it back to me saying they had run some or other test "7 times", and the DIMM was just fine. That is why I suspected the MB. Anyway, I finally sat down over the weekend and rebooted the machine into Memtest86 about 60 times, until I had statistically significant proof that it was that particular DIMM.
So what I have learnt is that the RAM either boots into a good state, or it boots into a bad state. If the memory test fails, it does so within the first 20 seconds. If not, then the test will run for hours without fault. This was an interesting find.