Intel, We have been trying to find the root cause of BSOD's that are occurring on 1000+ computers that have a i7-4765T with the HD4600 iGPU. OS is Windows 8.1 Enterprise x64 running the intel video driver versions 10.18.14.4170 or 10.18.14.4080.
After collecting 500 GB worth of BSOD dumps we hired OSR to look at the dumps in order to find the root cause of the crashing. Here are the findings :
"The systems are experiencing a wide variety of system crashes, though they appear to all trace back to a very consistent memory corruption pattern. Specifically, we are consistently seeing one of two values "randomly" appear in memory:
Interestingly, when the corruption is discovered the value appears at physical memory page offset 0xFD8 (most common) or 0xD70 (less common).
For example, in one crash the problem was that the MRXSMB20 image file is corrupted:
3: kd> !chkimg -d mrxsmb20
fffff800826a6fd8-fffff800826a6fdd 6 bytes - mrxsmb20!Smb2UpdateFileInfoCacheEntry+4c8
[ 89 7d 18 49 89 45:04 00 00 00 10 00 ]
fffff800826a6fdf - mrxsmb20!Smb2UpdateFileInfoCacheEntry+4cf (+0x07)
[ e8:00 ]
7 errors : mrxsmb20 (fffff800826a6fd8-fffff800826a6fdf)
Dumping the start of the corrupted range, we see our offset and value:
3: kd> dq fffff800826a6fd8
fffff800`826a6fd8 00000010`00000004 4c2b894c`0000e99c
fffff800`826a6fe8 ade901b6`41986d8b 850f02f8`83fffffd
fffff800`826a6ff8 8bc03345`fffffbb5 445e15ff`ce8b49d7
fffff800`826a7008 fb9f850f`c0840002 03fffffe`fee9ffff
In another crash a pool header is corrupted:
2: kd> !pool ffffc00089adcd70
Pool page ffffc00089adcd70 region is Paged pool
ffffc00089adcc00 size: 170 previous size: b0 (Free ) MPsc
ffffc00089adcd70 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...
2: kd> dq ffffc00089adcd70
ffffc000`89adcd70 00000010`00000004 8d5eb149`4b83d33a
ffffc000`89adcd80 00000000`00000000 ffffe000`bdecfec0
ffffc000`89adcd90 ffffe000`bc728860 ffffc000`89adcd98
Due to the fact that the crash appears at random in different virtual address ranges, we believe that the corruption must be generated by a device in the system (or by the platform)."
The team as OSR also discovered that our "memory scribble" BSOD were mostly occurring around "power transitions states" on our computers. Examples just after a wake event; User logs-on move mouse or keyboard, Monitor wakes up from sleep BSOD.
We disabled the monitor inactivity time-out sleep from all our 1000+ computers with the i7-4765T with the HD4600 iGPU. and the BSOD stopped ! (All our systems are on the windows high performance power profile with the idle monitor power-off disabled.)
While reading the release notes for the new driver version 18.104.22.16894 I found this in the fixed section :
" - Set monitor to turn off after 1 min. Unplug DVI monitor once monitor is off. System hang seen when DVI monitor is plugged back. - Windows 7 / 8"
Would it be possible to have details regarding the nature of these systems hangs as we would like to know if they fit with the BSOD's types we have collected.