- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an i7-3612Qe system that has been giving kernel panics when running a certain application pretty consistently after a couple minutes. After the kernel panic, the machine reboots. When I turn off turbo mode in the BIOS, the panics happen less frequently. Other BIOS settings are all default, no overclocking or anything fancy.
-Can be reproduced by running streaming application using ~250% CPU, temps are a little high, they float around 68-71 degrees
-sysbench runs fine with 8 threads, throttles CPU up to ~800%, no kernel panics, temps remain below 70 degrees
-MemTest did not report any errors
-Intel Processor Diagnotic tool passed
-Tried swapping RAM
-Happens on all of our machines, not just a single processor (possibly eliminates it being a bad single proc)
-Able to mitigate most of the kernel panics and reboots by disabling Turbo mode (this is unacceptable, just including this for debugging purposes)
-Also able to mitigate kernel panics and reboots by changing the cpu frequency sacling_governor to conservative, from ondemand. Conservative should "gracefully increase and decreases the CPU speed
rather than jumping to max speed the moment there is any load on the CPU" (https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt)
Here is what I could copy down from the kernel panic on the monitor, sometimes the messages vary slightly, bu the TSC and PROCESSOR messages are almost always the same.
[Hardware Error]: TSC 6e496d96062
[Hardware Error]: PROCESSOR 0:306a9 TIME 1418929330 SOCKET 0 APIC 3 microcode 12
[Hardware Error]: Run the above through 'mcelog --ascii'
[Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: b200000000100402
[Hardware Error]: RIP !INEXACT! 10: {intel_idle+0xb9/0x119}
[Hardware Error]: TSC 148c99828a0
[Hardware Error]: PROCESSOR 0:306a9 TIME 1418929330 SOCKET 0 APIC 3 microcode 12
[Hardware Error]: Run the above through 'mcelog --ascii'
[Hardware Error]: Some CPUs didn't answer in synchronization
[Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not synching: Fatal machine check on current CPU
Pid: 0, comm: swapper/3 Tained: P M 0 3.2.0-4-amd64 # 1 Debian 3.2.51-1
Call Trace: ...
Also, here is a post from superuser on some suggestions that I tried: http://superuser.com/questions/854199/kernel-panic-from-overheating?noredirect=1# comment1130272_854199 cpu - Kernel Panic from overheating? - Super User
Any ideas for what to try next?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ejo4041,
May I know what applications are you running when issue appears? What Linux Distro are you using?
Kevin M
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The application is FFmpeg. The Linux Distro is Debian 3.2.51-1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kernel: 3.2.0-4-amd64 # 1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
Debian Version: 7.2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a screen shot of a kernel panic that just happened while the machine was idle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for sharing this information.
M recommendation is to post your query on Debian community so other users can provide their experience on commands.
Here is the link:
https://www.debian.org/support https://www.debian.org/support
Kevin m
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page