12 Replies Latest reply on Oct 18, 2016 8:30 AM by Intel Corporation

    NUC6i3SYB instable/crashing on disk I/O

    HelpMyNUC

      Hi there,

       

      I have a brand new NUC6i3SYB version H81132-504 with a SYSKLi35.86A.0042.2016.0409.1246 BIOS. It runs 16GB Crucial RAM (CT2K8G4SFD8213 DDR4-2133 SO-DIMM CL15 Dual Kit). RAM settings are factory default. There is a 1 TB 2,5" WD Red WD10JFCX-68N6GN0 HDD installed and on it runs Linux Mint x86_64 (Sarah). Under disk I/O the system exhibits kernel ooopses, bus faults, segmentation faults. It becomes extremely fragile, unstable and ultimately unusable. Since I've first thought this problem to originate in the ext4 driver contained in the 4.4.0-21-generic #37 Ubuntu stock kernel, I've posted my problem on on the linux-ext4 mailing list yesterday:

       

      https://www.spinics.net/lists/linux-ext4/msg53857.html

       

      The course of the discussion has led me to upgrade to a vanilla 4.8.0 kernel, which exhibts the same problem. If I put the CPU under high load (for example, by compiling a kernel) and it's done from a ramdisk (e.g, /dev/shm which has a tmpfs mounted), everything works great. When I try the same with reading from disk (i.e., my filesystem), the system becomes completely unstable and crashes as described before. Note that I've found a previous stacktrace by someone else running the same kernel on a different NUC who ran into a very similar issue (kernel stacktrace looks almost identical): http://pastebin.com/BJbu35H4 (unfortunately, no further info available).

       

      Note also that I've been made aware of the thread here: https://communities.intel.com/thread/105640 which describes a similar problem. Unfortunately I do not run Windows on the NUC and therefore cannot provide the output of the System Support Utility -- but I will happily provide any other information that is useful.

       

      I'm starting to suspect a hardware defect. Is that assessment likely?

      Thanks in advance,

      Johannes

        • 1. Re: NUC6i3SYB instable/crashing on disk I/O
          Intel Corporation
          This message was posted by Intel Corporation on behalf of

          HelpMyNUC,

           

          Thank you for contacting the Intel Communities.

           

          I would like to recommend posting this inquiry in our dedicated forum support for Linux-Based OS.

           

          There are peers in this forum/Community that can assist you with troubleshooting for this matter

           

          01.org is the forum to go.

           

          Regards,
          Esteban C

          • 2. Re: NUC6i3SYB instable/crashing on disk I/O
            HelpMyNUC

            Hi Esteban,

             

            I found your answer disappointing since at no place do you even try to address any of my issues. Especially since I find there is a high likelihood that I'm dealing with a hardware fault, this forum should be the correct place to ask such questions.

             

            Nevertheless I've looked at 01.org and I cannot see at all how my question would fit in there. First, I didn't find a forum, but lots of mailing lists and IRC channels. The mailing lists deal with specific Intel side-projects such as Linux UEFI, Linux NFC and such. My question would definitely be wrong on any of the lists shown here https://01.org/community/contribution-tools?qt-projects_aggregated_links=1 -- except maybe LKML (but where I actually started out on, i.e., the kernel ext4 subsystem mailing list).

             

            Regards,

            Johannes

            • 3. Re: NUC6i3SYB instable/crashing on disk I/O
              hegenious

              Totally agree, that's useless advice. Unfortunately this seems the default answer from Intel reps here as soon as it comes to Linux. Heck, my first impression is that your issue is not Linux related at all.

              This being said, it may be a hardware issue, in particular with your hard disk. Read/write to tmpfs works great, but as soon as you do the same kind of I/O operation to the real disk your system eventually becomes unusable.

               

              Do you have another 2.5'' laptop disk laying around so you could test with that? (Install an OS, generate disk I/O, see what happens) I'm not sure, but by what I read in your post your hdd may be failing, perhaps check what SMART reports.

              1 of 1 people found this helpful
              • 4. Re: NUC6i3SYB instable/crashing on disk I/O
                Intel Corporation
                This message was posted by Intel Corporation on behalf of

                Hello HelpMyNUC,

                 

                The Linux support is to be handled by peers at the website provided before (01.org). Possible troubleshooting for this issue may be available there (i.e.: "Linux* Kernel" from contribution tools).

                 

                From our end, we have to find out if this is an issue with hardware or software.

                 

                In order to do so, please provide me with your system configuration:

                 

                -RAM (maker and model noted in the white sticker):
                -Drive (maker and model):
                -Any other additional components added to the system:
                -BIOS version:

                 

                In terms of software, if you have the option, please install Windows and try to replicate the issue with this OS.

                 

                Screenshots reflecting the issue would definitely assist to find the possible source of the problem.

                 

                Hope I can hear from you.

                 

                Regards,
                Esteban C

                • 5. Re: NUC6i3SYB instable/crashing on disk I/O
                  HelpMyNUC

                  Esteban,

                   

                  I hope you can understand my frustration when the information which you're asking is *literally* what I provided in my original post with the expectation that it would be relevant to cause a fault. Allow me to repeat myself:

                   

                  -RAM (maker and model noted in the white sticker):

                   

                  16GB Crucial RAM (CT2K8G4SFD8213 DDR4-2133 SO-DIMM CL15 Dual Kit)


                  -Drive (maker and model):

                   

                  1 TB 2,5" WD Red WD10JFCX-68N6GN0


                  -Any other additional components added to the system:

                   

                  None


                  -BIOS version:

                   

                  SYSKLi35.86A.0042.2016.0409.1246

                   

                  I do not understand how you want me to screenshot a headless system. I've provided a detailed kernel stacktrace before:

                  [ 3405.666456] general protection fault: 0000 [#1] SMP [ 3405.666519] Modules linked in: rfcomm bnep binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic arc4 snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core iwlmvm snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_core mac80211 btusb snd_compress btrtl ac97_bus snd_pcm_dmaengine dw_dmac_core snd_hda_intel snd_hda_codec snd_hda_core 8250_dw snd_hwdep intel_rapl snd_pcm x86_pkg_temp_thermal intel_powerclamp coretemp iwlwifi snd_seq_midi snd_seq_midi_event kvm_intel snd_rawmidi hci_uart btbcm btqca snd_seq cfg80211 snd_seq_device kvm btintel bluetooth snd_timer irqbypass ir_sharp_decoder ir_rc5_decoder snd ir_lirc_codec ir_jvc_decoder ir_xmp_decoder lirc_dev ir_mce_kbd_decoder ir_sanyo_decoder ir_rc6_decoder ir_sony_decoder ir_nec_decoder rc_rc6_mce soundcore ite_cir intel_lpss_acpi mei_me rc_core [ 3405.667326]  idma64 virt_dma shpchp intel_lpss_pci intel_lpss mei acpi_pad acpi_als kfifo_buf industrialio mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq jitterentropy_rng drbg ansi_cprng dm_crypt algif_skcipher af_alg dm_mirror dm_region_hash dm_log crct10dif_pclmul crc32_pclmul i915_bpo intel_ips i2c_algo_bit drm_kms_helper aesni_intel syscopyarea sysfillrect aes_x86_64 lrw gf128mul glue_helper sysimgblt ablk_helper e1000e fb_sys_fops sdhci_pci cryptd ahci ptp i2c_hid drm pps_core libahci sdhci pinctrl_sunrisepoint video hid pinctrl_intel fjes [ 3405.667929] CPU: 3 PID: 2261 Comm: hexchat Not tainted 4.4.0-21-generic #37-Ubuntu [ 3405.667998] Hardware name:                  /NUC6i3SYB, BIOS SYSKLi35.86A.0042.2016.0409.1246 04/09/2016 [ 3405.668082] task: ffff88003565ac40 ti: ffff8804332e8000 task.ti: ffff8804332e8000 [ 3405.668148] RIP: 0010:[<ffffffff811eb027>]  [<ffffffff811eb027>] kmem_cache_alloc+0x77/0x1f0 [ 3405.668234] RSP: 0018:ffff8804332eba88  EFLAGS: 00010282 [ 3405.668282] RAX: 0000000000000000 RBX: 0000000002408040 RCX: 00000000000e1547 [ 3405.668345] RDX: 00000000000e1546 RSI: 0000000002408040 RDI: 000000000001a940 [ 3405.668408] RBP: ffff8804332ebab8 R08: ffff88046ed9a940 R09: ffdb88033bb3a3a8 [ 3405.668470] R10: ffff8804591a4ed0 R11: ffffffff81ccc462 R12: 0000000002408040 [ 3405.668533] R13: ffffffff81243351 R14: ffff88045e08bc00 R15: ffff88045e08bc00 [ 3405.668597] FS:  00007f1df9704a40(0000) GS:ffff88046ed80000(0000) knlGS:0000000000000000 [ 3405.668668] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3405.668719] CR2: 00007fd945ecebd6 CR3: 0000000456a48000 CR4: 00000000003406e0 [ 3405.668782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3405.668844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3405.668906] Stack: [ 3405.668926]  01ff880438ee2508 0000000000001000 ffff8803344df000 ffffea000cd137c0 [ 3405.669003]  0000000000000000 0000000000000000 ffff8804332ebad0 ffffffff81243351 [ 3405.669080]  ffff8800354bd024 ffff8804332ebb18 ffffffff81243829 00000001332ebb70 [ 3405.669156] Call Trace: [ 3405.669186]  [<ffffffff81243351>] alloc_buffer_head+0x21/0x60 [ 3405.669240]  [<ffffffff81243829>] alloc_page_buffers+0x79/0xe0 [ 3405.669294]  [<ffffffff812438ae>] create_empty_buffers+0x1e/0xc0 [ 3405.669351]  [<ffffffff812979cc>] ext4_block_write_begin+0x3cc/0x4d0 [ 3405.669410]  [<ffffffff812e74db>] ? jbd2__journal_start+0xdb/0x1e0 [ 3405.669469]  [<ffffffff81296e10>] ? ext4_inode_attach_jinode.part.60+0xb0/0xb0 [ 3405.669536]  [<ffffffff812cb83d>] ? __ext4_journal_start_sb+0x6d/0x120 [ 3405.669596]  [<ffffffff8129d574>] ext4_da_write_begin+0x154/0x320 [ 3405.669656]  [<ffffffff8118d4de>] generic_perform_write+0xce/0x1c0 [ 3405.669713]  [<ffffffff8118f382>] __generic_file_write_iter+0x1a2/0x1e0 [ 3405.669773]  [<ffffffff81291ffc>] ext4_file_write_iter+0xfc/0x460 [ 3405.669833]  [<ffffffff81794d6e>] ? inet_recvmsg+0x7e/0xb0 [ 3405.669885]  [<ffffffff816fdb6b>] ? sock_recvmsg+0x3b/0x50 [ 3405.669938]  [<ffffffff8120bedb>] new_sync_write+0x9b/0xe0 [ 3405.669990]  [<ffffffff8120bf46>] __vfs_write+0x26/0x40 [ 3405.670040]  [<ffffffff8120c8c9>] vfs_write+0xa9/0x1a0 [ 3405.672397]  [<ffffffff8120c776>] ? vfs_read+0x86/0x130 [ 3405.674693]  [<ffffffff8120d585>] SyS_write+0x55/0xc0 [ 3405.676925]  [<ffffffff818244f2>] entry_SYSCALL_64_fastpath+0x16/0x71 [ 3405.679111] Code: 08 65 4c 03 05 83 f1 e1 7e 49 83 78 10 00 4d 8b 08 0f 84 29 01 00 00 4d 85 c9 0f 84 20 01 00 00 49 63 47 20 48 8d 4a 01 49 8b 3f <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63 [ 3405.683725] RIP  [<ffffffff811eb027>] kmem_cache_alloc+0x77/0x1f0 [ 3405.685876]  RSP <ffff8804332eba88> [ 3405.696001] ---[ end trace 4968a9119e168c92 ]---


                  Since unfortunately your forum completely trashes the log, here it is again in legible form: http://pastebin.com/GPFKMtnU

                   

                  Installing Windows is no option since I do not own a Windows license and am not going to buy one.

                   

                  Regards,

                  Johannes

                  • 6. Re: NUC6i3SYB instable/crashing on disk I/O
                    HelpMyNUC

                    Quick note: Just upgraded the BIOS to SYSKLi35.86A.0052.2016.0910.1456. Problem persists.

                    • 7. Re: NUC6i3SYB instable/crashing on disk I/O
                      HelpMyNUC

                      Another note: I've changed the harddisk against a 256 GB SSD. Problem persists, the HDD is not at fault. Leaves the RAM or the NUC in my opinion.

                       

                      Since I've 2 8 GiB modules in there, I'll remove one and the other to try to reproduce the problem.

                      • 8. Re: NUC6i3SYB instable/crashing on disk I/O
                        N.Scott.Pearson

                        Try each DIMM individually in first one and then the other DIMM socket. If you see issue with either DIMM in a specific socket, you will know that it is something with the NUC itself (though this may be a general incompatibility with the DIMMs - and the only way to verify this would be to use different memory)...

                         

                        Hope this helps,

                        ...S

                        • 9. Re: NUC6i3SYB instable/crashing on disk I/O
                          HelpMyNUC

                          Hi Scott,

                           

                          yup, that's what I meant with my admittedly cryptic "remove one and the other" :-)

                           

                          In fact, that's what I did today and it cleared it up somewhat: With only DIMM "A" installed, I had no problems whatsoever and no system instabilities. With only DIMM "B" installed, the problems reappeared. So it appears that one of the two DIMMs is defective. Strange though that I ran both through Memtest86 and no error was found (ran all night). That's something I did early on. But on the ext4 mailing list it was already suggested to me that the memtest might not be completely reliable.

                           

                          So I've sent in a RMA to get the DIMMs replaced. I'll leave this issue open however and will report back with new RAM to see it that was really the solution. I'm still a bit wary, even though the test seems to be rather conclusive.

                           

                          Cheers,
                          Johannes

                          • 10. Re: NUC6i3SYB instable/crashing on disk I/O
                            N.Scott.Pearson

                            You verified that the "good" DIMM worked fine in both sockets, right? If so, then yes, I would conclude you have a bad DIMM.

                             

                            Every time I set up a system with new DIMMs, I run a test of these DIMMs using MemTest86+. On a number of occasions, I have seen DIMMs work just fine, only to fail a week or two later. The explanation? I don't have a good one. Sometimes, it takes being run for a while for the problems to show up...

                             

                            ...S

                            • 11. Re: NUC6i3SYB instable/crashing on disk I/O
                              Intel Corporation
                              This message was posted by Intel Corporation on behalf of

                              Thank you for the update, HelpMyNUC.

                               

                              Please keep us posted with the recommendations provided by Scott and results with RMA'd DIMMs

                               

                              Regards,
                              Esteban C

                              • 12. Re: NUC6i3SYB instable/crashing on disk I/O
                                Intel Corporation
                                This message was posted by Intel Corporation on behalf of

                                Hello HelpMyNUC,

                                 

                                Would like to verify if you were able to check the recommendations provided by Scott.

                                 

                                Please let us know.

                                 

                                Regards,
                                Esteban C