Thank you for the answer. Here is the log file from the moment when processors usage was on minimum level:
And here when processors were used in 100% (around 30-60s before the shouting down procedure):
As you can see 'P1 Therm Margin' is -20, Everest at point of making this log file was showing 76'C, few seconds before the shouting down it was showing 81'C. I don't know how reliable is Everest, but according to it 'P1 Therm Margin' was still on minus level. Too bad, that in logs there is no information about processors temperature, so I can figure out what is the limit.
Maybe it is something about voltages?
I managed to make this log 15sec before shouting down procedure:
This is what happened: I started to using processors in 100%, then after 90sec I made a log, then after 7sec I stopped using processors, and then after 8sec there was the shouting down procedure anyway.
OK, now it happend again when I was in BIOS, so this is no proc temperature problem for sure.
The only thing I did before it started to happening was to enable only 1 core (instead of all - 4) in BIOS. After few days I set it back to all cores and then it started to happening. Now I've installed newest BIOS and set all to default.
Maybe there is something wrong with a power supply, but how can I be sure? (I don't have any other PS)
If this is a power supply issue, then I'm very unlucky, because about 10 months ago I had similar problem with another world class PS in the same system...
The sysinfo.log do not look to be complete.
The tool should dump data on the following 25 areas, but your logs are only showing the first 2.
Possiably a error in the BMC / FRUSDR flashing (? Guessing here)
You may want to try reflashing and then run the tool to see if you get the full dump.
1. Platform Firmware Inventory
3. Sensor Data Records
4. BMC SEL (IN HUMAN READABLE FORM)
5. BMC SEL (IN HEX FORM)
6. Base Board FRU
7. System BMC Boot Order
8. BMC User Settings
9. BMC LAN Channel Settings
10. BMC SOL Channel Settings
11. BMC Power Restore Policy Settings
12. BMC channel settings
13. SMBIOS Type 1, Type 2, Type 3
18. HARD Drive
19. Operating System Information
20. Device Manager Information (a.k.a drivers)
21. List Of Software Installed
22. Operating System Event Log
23. PCI Bus Device Information
24. RAID settings and RAID log
25. BIOS Settings (per BIOS SETUP F2 Screen).
Well, it seems that I've got the answer. It was IOH temp issue. Question is why System Information Retrieval Utility shows, that everything is OK by a big margin.
-33.000000 Degrees Celsius IOH Therm Margin
The SIT tool is a current snap shot of the system and does not reflect what is happening during run time.
The SEL log in the SIL dump would show if any run time events were logged.