Processors
Intel® Processors, Tools, and Utilities
14536 Discussions

Skylake hard locks seeminly when IDLE / CStates ON

GGlyn
Novice
64,494 Views

Myself and others have been struggling with a different hard lock problem with Skylake and we have ran out of options but to assume it is a CPU fault/issue. Does anyone have any comments or feedback?

The long thread is over here, but I will summarise. http://www.tomshardware.co.uk/forum/id-2830772/skylake-build-randomly-freezing-crashing/page-7.html New Skylake Build Randomly Freezing/Crashing - Page 7 - windows 10 - Windows 10

Our systems hard lock randomly but appears on the most part to be when IDLE or next to IDLE. The hardlock can come when playing video, browsing (reading) or mostly when not interacting at all. Some (like myself) noted that they seem to occur more often if you are doing a large amount of disk IO (partition move for example or copying 40+ GB of data disk to disk).

Hard locks happen in either Windows (7,8,10) or Linux.

With CStates on, the crashes are often and usually within 10 minutes of IDLE. With CStates disabled completely most find that the system remains "mainly" stable for days. Generally, no one has reported locks when gaming or under high CPU load (so I do not think this is the Prime95 example but then, since compilers / drivers use the latest instructions (according to CPUID) I cant be sure).

The symptoms happen on many different manufacturers motherboards, RAM types and setups. We have successfully removed or replaced all components and the only change that seems to make a difference is CPU for a few. CStates off in the BIOS is absolutely a guarenteed way to stabilize significantly but not fix.

I changed CPU (after trying everything else) and find it now mostly stable with CStates ON. I must be clear that my system is 100% identical, CPU # 1 crashes all the time and must have CStates off, CPU # 2 does not crash so often even with CStates ON. One may argue this is a problem with my PC, a fault, but with so many people having the exact same issues I cant believe it is a defect. And with Cstates/SpeedStep so specifically (a known new part of Skylakes architecture), I remain convinced that this is something real.

Others were not so lucky, a replacement CPU did not make the situation better. Batches from different factories as well etc etc

My new CPU has hard locked but not when IDLE like the last one (not yet anyway). Mine can lock when under full load, video transcoding, but not more than 1 in 10 sessions. Cstates ON seems equally stable on this CPU as OFF.

Thanks for any suggestions you can give.

393 Replies
Allan_J_Intel1
Employee
8,051 Views

Thanks for the information.

Was the BIOS updated to the latest microcode?

CStates are connected directly to the BIOS.

Allan.

0 Kudos
GGlyn
Novice
8,051 Views

Thank you for taking the time to answer.

I will collect the information and ask the others with the problem to return their versions too (everyone has different boards and everyone updated to their respective "latest" BIOS as part of diagnosis).

Glyn

0 Kudos
GGlyn
Novice
8,051 Views

Hello,

OK, I have some feedback.

Everyone suffering the CState freezing seems to have Microcode 39. They all have the latest BIOS from the respective vendors, most with Beta BIOSes. My old CPU i7 6700K had the same problems which is why I am involved with the others, and changing only my CPU to another identical i7 6700K fixed them for me, I have Cstates ON with reasonable stability now. However my CPU is at Microcode 55. Since I tested the old CPU and the new one with zero other changes, same BIOS, same MB etc etc, and since the old one hard locked all the time with CStates ON and the new one doesnt, I am left confused from my side. I do have hard locks under transcode load randomly (more then the load in the intel diagnostic tool) but my hope is that these locks are the other known issue with Skylake and I dont want to confuse the two issues here. If microcode comes from the BIOS, why would the OLD CPU be suffering CState locks and this one not, if a micorcode issue? Strange I think, but equally strange that they all have 39 and I now dont.

Lots of people report that changing RAM makes things more stable, some say increasing vCore to 1.2. But in all cases they leave Cstates OFF still; so I honestly think that these things a individual and not enough evidence exists to say this is a cure or just a lucky hit.

Is there a way to deliver new microcode (in OS driver form) so that they can test it?

The Intel Diagnositic tool passes on their systems and mine. With Cstates ON on their systems it still gets through each time (the crashes happen when IDLE mostly).

Thank you very much for your time and feedback. A lot of very fed up people monitor that thread and I would not like to guess how many are there but not writing.

Glyn

0 Kudos
DThif
Beginner
8,051 Views

@Skokie: thanks for taking the time to write this here.

You can add this thread to the conversation.

http://forum.asrock.com/forum_posts.asp?TID=1712&PN=1&title=new-asrock-z170-pro4-randomly-freezing New ASRock Z170 Pro4 - Randomly freezing - ASRock Forums - Page 1

I'm not sure how to check my microcode and what CStates are though... But I can confirm my MOBO has the latest ASRock Bios.

0 Kudos
DThif
Beginner
8,051 Views

Any update Allan?

is Intel investigating the claims? Do you need more information from us to help identify the problem?

0 Kudos
Allan_J_Intel1
Employee
8,051 Views

I got the following notification from engineering team: Intel has identified an issue that potentially affects the 6th Gen Intel® Core™ family of products. This issue only occurs under certain complex workload conditions, like those that may be encountered when running applications like Prime95. In those cases, the processor may hang or cause unpredictable system behavior. Intel has identified and released a fix and is working with external business partners to get the fix deployed through BIOS.

Allan.

0 Kudos
AMohr1
Novice
8,051 Views

In which c-state does the CPU freeze?

0 Kudos
DThif
Beginner
8,051 Views

Thanks for your reply Allan.

Howeever, could you indicate that our problem occurs at IDLE, when the processor is like doing close to nothing and not when doing "complex workload"? To me, this seems to be 2 different problems, especially when Skokie above wrote "New BIOSes for the Prime95 fixes are starting to appear. The initial (early) feedback is that it does not help in the CStates/IDLE hard locks." 2 posts ago.

We just want to be sure that engineering is investigating our problem too...

0 Kudos
AMohr1
Novice
8,051 Views

It would be easier, when it would be clear, which C State is reached.

But a hard lock is caused if the FETs in the cores do not switch over.

1. Supply voltage to low: inherent noise higher than switching voltage.

2. The clock frequency is too low: i don´t think so.

0 Kudos
AMohr1
Novice
8,051 Views

Have you installed the lastest Bios for your mainboard?

0 Kudos
AMohr1
Novice
8,051 Views

Windows 7 may prevent a core becoming completly idle.

Solution for that:

1. A lowest level for the supply voltage: set by the BIOS.

2. Restart core by integrated circuit in the processor itself.

a) Check the status of core by minimum calculating routine (="idle routine")

b) reset single core

What is done, when the processor is started on power up ("computer on")?

0 Kudos
GGlyn
Novice
8,051 Views

Hei,

Ok, how would one know which CState? We can run the Intel diagnostic Cstates power thingy (it displays current CStates as a form of percentage), but at the point of hard lock, how would one read from that which one is currently and which core etc?

Since the CPU sits mainly (85%) in C7 and never in C8 or above according to the tool, is it safe to statistically assume its C7 or the switching to it?` Many CStates involve 0v so your FET idea would not seem to indicate one in particular but rather the flipping between?

Since the flipping between states happens 10s of thousands of times a second (guess), why would the crash happen so rarely (ii.e. every 10 minutes of that flipping load seems a LONG time).

Sorry, my expertise at this level is next to zero, so its impossible for me to give you detailed diagnosis feedback without some instruction.

0 Kudos
AMohr1
Novice
8,051 Views

Then try to disable the lowest C-State in Bios or anywhere else.

Until a new Bios is written.

It is not possible to get around physical fundamentals like:

inherent noise of semiconductors (comes from brownian motion).

This can´t be suppressed.

0 Kudos
GGlyn
Novice
8,051 Views

Thank you for your feedback, its very much appreciated.

Most of us basically disable CStates completely and/or play with voltages (especially VccIO), which seems to be set very low (lower than spec) on a lot of boards in earlier BIOSes. That combination makes for a reasonably stable system for most people. But most are not really stable, and although crash times go from minutes to hours/days, the lock ups still happen.

But CStates OFF is not ideal, and if this is a production fault, people should really be able to rectify that. So we are all looking for the cause and then for a good solution.

But given the symptoms and what seems to make things a little better, your comment on Vcc too low and noise when flipping CStates is perhaps the closest we have come to a universal diagnosis.

But could you please answer some questions for us regarding this?

1) One changed their CPU, just CPU and is now 99% stable. The new CPU is an identical model i7 6700K as before. Would this result match your explanation given your knowledge of FETs and Voltages? Others have changed CPU and had the same symptoms afterwards, so its hard to diagnose without real knowledge on the subject.

2) What is the BIOS solution for this? You talk about FET problems only being solved in the BIOS, but is the BIOS solution to increase voltages or are there some under the hood things that we mortals do not understand? If it is just Voltages, then it should be possible to adjust in the BIOS to correct levels with Intels help.

Thank you for your continued answers.

0 Kudos
AMohr1
Novice
8,051 Views

I have a X99 board and a Core I 7-5820K.

I won´t disable power managment of the processor, too.

1. The lowest possible voltage for a FET is limited by inherent noise. The switchting point is defined by semiconductor abatment.

So thís question can´t be answered generally. A possible solution to lower inherent noise is reducing the temperature.

2. Define a lowset supply voltage for the processor core - by the Bios settings. You can measure or test this by lowering the supply voltage step by step. Then test, if the processor runs without any problems.

0 Kudos
GGlyn
Novice
8,051 Views

For 2.

Do you know what such a setting in the BIOS is called? There are many Vccs but nothing which suggests "minimum volts". Is that something that the BIOS delivers but does not present to the user?

0 Kudos
MNova3
Beginner
7,035 Views

I have very similar specs as person above. Having MAYOR freezing issues on system IDLE. Freezes usually happen immediately after boot (login screen) or after exiting a game or demanding application and also when I basically leave my computer for a few minutes.

The system did not yet froze when copy files or surfing web etc.

I have put a GTX 960 into the system and using it for gaming, where it works without flaws.

This is totally unacceptable. Cstates on or off freezing still happens, although with Cstates off less frequent...

My specs:

Used alternative configurations, only addon graphic card and different disk on next (same problems)

Gigabyte GA-H170N-WIFI Intel H170

Intel Core i5 6500 4x 3.20GHz

16GB (2x 8192MB) Crucial CT2K8G4DFD8213 DDR4-2133 DIMM

128GB Intenso Top III or 250GB Samsung 850 Evo (tried both same result, same IDLE freezing issue). Intenso had Win8.1, samsung has Win10 installed (both with latest drivers).

Super Flower Golden Green SF-350P14XE

now using now also:

4096MB KFA2 GeForce GTX 960 Gamer (in games there are NO freezes!)

i have latest BIOS.

0 Kudos
OTrau
Beginner
7,035 Views

triglav

http://www.tomshardware.co.uk/forum/id-2830772/skylake-build-randomly-freezing-crashing/page-15.html# 17455667 New Skylake Build Randomly Freezing/Crashing - Page 15 - Intel i7 - Windows 10

EDIT: This might tell you that you have a) a Ram issue, and b) this highly annoying thing with malfunctioning c-states on Skylake CPUs. If you have solved your Ram problem, disabling c-states might finally stabilize your PC. However, this is inacceptable and clearly IN INTEL'S RESPONSIBILITY!

0 Kudos
MNova3
Beginner
7,035 Views

I saw the post on tomsharware.. Posted here because I hope someone from Intel will adress the issue. I am quite sure that ram is ok (tested it), plus there is no logic in that the ram would work in a demanding software ok , but then it would fail when system idles.

My system as is it was not bought primarly for gaming and is fairly useless with this issue.

INTEL fix this!

0 Kudos
OTrau
Beginner
7,035 Views

As far as we have seen at toms, you likely have both issues. That's why disabling c-states only helps in part. Also you probably need to RMA your CPU until you are good. At least that's what we have learned from it. Intel will never acknowledge that they let batches of bad CPUs on the market. So they probably will never help with bios updates that fix this idle issue. Sad but true.

0 Kudos
Reply