Processors
Intel® Processors, Tools, and Utilities
14403 Discussions

i7 4790 mce hardware errors and kernel panics

HBana
Beginner
4,056 Views

All,

This box have been up and running for over 2 years with no issues.

All of the sudden it started kernel panic and reboot every 5-7 days. I checked usual suspects like temperature/fan as well as cpu and memory usage and everything is looking good.

During debugging I disconnected everything and only left the motherboard, cpu and power supply. I also tried to run with one RAM at a time using different memory slot. My last test was using completely different set of RAM from a different working machine. I was still getting these mce hardware errors no matter what.

Memtest throws "Unexpected Interrupt – Halting CPU0" within few seconds. This is even on know good RAM from a different machine.

I also updated kernel and microcode but that didn't help either.

I'm running out of ideas other than RMA either CPU or motherboard or both.

I will take any suggestions you may have.

 

Thank you!

Hubert

CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Motherboard: Gigabyte H97N-WIFI

OS: Debian 9.4

Kernel: 4.15.0-0.bpo.2-amd64 # 1 SMP Debian 4.15.11-1~bpo9+1 (2018-04-07)

Apr 18 19:20:26 vmhost01 kernel: mce: CPU supports 9 MCE banks

Apr 18 19:20:52 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

Apr 18 19:20:52 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c2000400004117a

Apr 18 19:20:52 vmhost01 kernel: mce: [Hardware Error]: TSC 2c8475c502 ADDR 3e321e900 MISC 4936e020086

Apr 18 19:20:52 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1524093652 SOCKET 0 APIC 0 microcode 1c

Apr 29 11:42:07 vmhost01 kernel: mce: CPU supports 9 MCE banks

Apr 29 11:42:10 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

Apr 29 11:42:10 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c20004000041152

Apr 29 11:42:10 vmhost01 kernel: mce: [Hardware Error]: TSC f90abaa67 ADDR 19655e900 MISC 3020020086

Apr 29 11:42:10 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1525016530 SOCKET 0 APIC 0 microcode 22

Apr 29 11:57:28 vmhost01 kernel: mce: CPU supports 9 MCE banks

Apr 29 11:57:55 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

Apr 29 11:57:55 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c2000400004117a

Apr 29 11:57:55 vmhost01 kernel: mce: [Hardware Error]: TSC 2eb7df22ab ADDR 3f22fe900 MISC 4936e020086

Apr 29 11:57:55 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1525017475 SOCKET 0 APIC 0 microcode 22

May 01 09:21:34 vmhost01 kernel: mce: CPU supports 9 MCE banks

May 01 09:21:44 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

May 01 09:21:44 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c20004000041152

May 01 09:21:44 vmhost01 kernel: mce: [Hardware Error]: TSC 1571a9a68c ADDR 41eb7e900 MISC 3020020086

May 01 09:21:44 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1525180904 SOCKET 0 APIC 0 microcode 22

May 01 09:44:17 vmhost01 kernel: mce: CPU supports 9 MCE banks

May 01 09:44:17 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

May 01 09:44:17 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c20004000041152

May 01 09:44:17 vmhost01 kernel: mce: [Hardware Error]: TSC 189820f452 ADDR 405b7e900 MISC 3022020086

May 01 09:44:17 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1525182257 SOCKET 0 APIC 0 microcode 22

May 01 09:52:29 vmhost01 kernel: mce: CPU supports 9 MCE banks

May 01 09:52:29 vmhost01 kernel: mce: [Hardware Error]: Machine check events logged

May 01 09:52:29 vmhost01 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: 8c2000400004117a

May 01 09:52:29 vmhost01 kernel: mce: [Hardware Error]: TSC 0 ADDR 3c021e900 MISC 4936e020086

May 01 09:52:29 vmhost01 kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1525182746 SOCKET 0 APIC 0 microcode 22

0 Kudos
7 Replies
HBana
Beginner
2,374 Views

I just swapped PSU as this was suggested in another thread but that didn't make a difference. I ran MemTest and within few seconds got "Unexpected Interrupt – Halting CPU0". This was with all peripherals disconnected ... only motherboard, cpu, ram and this new PSU.

0 Kudos
HBana
Beginner
2,375 Views

I tried to run Intel Processor Diagnostic Tool under Fedora ... it starts up fine and few seconds later everything freezes and the box reboots itself. Ehhhh I don't know what else to try. I wish I had any Haswell CPU around to try it out.

Please let me know if any of you have any ideas.

Thank you!

0 Kudos
idata
Employee
2,375 Views

 

Hello hubert.banas,

 

 

I understand that you are unexpected reboots with your machine.

 

In order to assist you better please let us know your memory part number.

 

As well we would like you to install (into a spare drive) a supported OS by your motherboard and CPU to check if you get any sort of BSOD or any random reboot. We would like to get any error code that may help us to debug you issue

 

 

I hope to hear from you soon

 

 

Best Regards,

 

Diego S.

 

 

0 Kudos
HBana
Beginner
2,375 Views

This box have been running for over two years with two 8GB Crucial CT102464BA160B.C16FPR modules. I suspected issues with RAM so I got hold of two 4GB MT16JTF51264AZ-1G6M1 modules just for testing. The box freezes and reboots with these new modules as well so it is not RAM issue.

Which OS would you like me to installed and also what kind of testing would you like me to do? As I mentioned in my other post I ran Intel Processor Diagnostic Tool under fresh Fedora 28 Xfce already. This test froze and rebooted the box within 20-30 seconds after hitting the "Start" button.

As for error codes ... the only errors I captured are kernel panic and mce hardware errors which you can review in my first post.

Thank you!

0 Kudos
idata
Employee
2,375 Views

Hello hubert.banas,

 

Thank you for your response.

 

In this case the supported OS for the graphic controller are:

 

Windows 7*

 

Windows 8*

 

Windows 8.1*

 

Windows® 10*

 

Feel free to select one as long as the other components on your system support it too.

 

Regarding the tests, we would like you to run another Diagnostic with our Intel (r) processor diagnostic tool.

 

Here you will find the Windows* version:

 

https://downloadcenter.intel.com/download/19792/Intel-Processor-Diagnostic-Tool

 

I hope to hear from you soon.

 

Best Regards,

 

DIego S.
0 Kudos
HBana
Beginner
2,375 Views

Hi DIego,

Thank you for getting back to me on this.

My friend let me use his CPU and I was able to run some stress test today. This was using my motherboard and my RAM with his CPU. Everything worked stable.

I already have a case # 03390202 opened on my CPU warranty. I would appreciate if you could possibly expedite the replacement.

Thank you!

0 Kudos
idata
Employee
2,375 Views

Hello hubert.banas,

 

Thank you for your response.

 

I can see hat there is a ticket handled by the warranty department, we are going to work on this so the case can be solved as soon as possible.

 

Best Regards,

 

Diego S.
0 Kudos
Reply