It seems very strange that a single failing fan would cause the system to reset with the redundant fans in this system.
If the CPU is overheating and therm tripping, yes it could, but I am not sure this is the case.
If you press F2 to go to BIOS set and let it sit a few minutes, does it reboot?
Can you try booting into windows in safe mode (F8 after the POST completes) ?
In both these cases, if the reboot is caused by CPU thermals, it should still reboot.
If it does not reboot in either case, I would suspect a driver was updated and is causing the crash.
In Safe mode you can access the event log and see if anything is being reported.
If I boot to safe mode it does the same. Reboots at the Windows load screen.
If I go to setup it will run in setup mode and I can stay in teh BIOS for as long as I like.
I ran a consistancy on the RAID yesterday. It ran for like 14hrs but it made no difference.
I have since discovered that it is not the fan causing the reboot. You are right, it is something in the drivers. They were loading MS SQL 2008 Express R2 and the reboot failed, so they switched the power off after about 2 hrs of hanging. These stories always come out later.
So you are right, these are two seperate issues that just happened to crop up at the exact same time. Murphy at it again.
I am going to do as suggested above. Put in two new drives, get it running, recover any data, even though there is a backup, and then remove the new drives and try a Windows rebuild. My only concern is that I have to boot off the Intel CD which loads Linux and the drivers before it goes to Windows. I cannot remember those screens and the steps. So I am nervous to just try it without some sort of additional backup. In my limited experience you can never have enough backups. We only have one from Friday night and if that fails we are in real trouble.
Now I cannot get the OS loaded.
New RAID1 configured.
I am using the Intel Server Board S5000PAL bootable disk.
It creates everything and when it restarts it should boot to the RAID drive but it gives an error
0123Divide by Zero (86AC:1430)
ROM-DOS Fatal Error! Internal Error! (3.461)!
System Halted .....
What do I do now?
Looks like it is still booting to the CD and (hopefully) the CD has a scratch causing an error.
Since you are rebuilding anyway, you may want to update all the drivers and BIOS
You can do a standard Windows install with the latest drivers from the website and you don't need the CD.
If your installing W2008, all the drivers are native (although some updated versions may be posted)
If you doing W2003, you will need to press F6 at the begining to load the raid driver from floppy.
1) Update BIOS, BMC, FRUSDR if needed
2) Build your raid in the RAID BIOS screen (ctrl c or ctrl g when booting) (I aways forget which)
3) Boot to BIOS set-up and make sure your CD is first boot device and RAID is second.
4) Insert WIndows CD and boot to CD.
5) Press F6 at begining of CD load to add RAID driver from floppy.
6) Select RAID drive as OS
7) Answer the reast of MicroSofts questions as they come up.
8) When load is done, go to Device manager and install Chipset drivers, Nic and Video.
The Intel drivers on the web were faulty. Or should I say the firmware in particular. Contacted Intel who mailed new Firmware drivers and that worked to get the sysetm running on new hard drives.
You were correct, the fan had nothing to do with the failure.
It seems that one of the raid drives failed. I did not know this and did a consistancy check which copied old data back to the good drive. So all the data was then 4 days old and at the stage when the first drive failed. Lucky we had a backup of the data the night before the second drive got corrupted and took the system down. So we used drive recovery to recover some lost data (the backups were not perfect either) and then applied the last backup.
We have ordered a new fan but it has not arrived yet 6 weeks later.
At least the system is up and running again.
The old drives seem ok. So either the one raid drive got corrupted and the raid stopped working, explaining why the data was 4 days old, or the firmware failed causing the raid to fail and not copy data to the drive for 4 days.
Nobody knew that there was a drive failure for 4 days.
How does one know this?