P4600 and Linux kernel 4.13 timeout

PBert2 · ‎02-21-2018

Hi

I have installed two P4600 NVME devices in a server and installed Proxmox 5.1-3. The running kernel is 4.13.13-6-pve. There is no RAID controller involved.

# nvme list

Node SN Model Namespace Usage Format FW Rev

---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------

/dev/nvme0n1 BTLE736103CH4P0KGN INTEL SSDPEDKE040T7 1 4.00 TB / 4.00 TB 512 B + 0 B QDV10170

/dev/nvme1n1 BTLE736103AG4P0KGN INTEL SSDPEDKE040T7 1 4.00 TB / 4.00 TB 512 B + 0 B QDV10170

The output of "isdct show -a -intelssd" is attached in the file "intelssdp4600-2.txt".

Now by using LVM I can always reproduce a "hanging" behaviour of 1-2 minutes that does not lead to any fatal error:

vgcreate SSD /dev/nvme0n1 /dev/nvme1n1

lvcreate -l 100%FREE -n SSDVMSTORE01 --stripes 2 --stripesize 128 --type striped SSD

lvremove -d -v SSD/SSDVMSTORE01

Do you really want to remove and DISCARD active logical volume SSDTEST/SSDTEST01? [y/n]: y

here the command seems to be hanging for 1-2 minutes. in the Logs I see:

Feb 21 14:14:17 px kernel: [ 3654.745355] nvme nvme0: I/O 200 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.745772] nvme nvme0: I/O 201 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.746110] nvme nvme0: I/O 202 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.746436] nvme nvme0: I/O 203 QID 14 timeout, aborting

Feb 21 14:14:32 px kernel: [ 3669.013614] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014012] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014325] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014629] nvme nvme0: Abort status: 0x0

Feb 21 14:15:10 px kernel: [ 3707.737495] nvme nvme1: I/O 297 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.737902] nvme nvme1: I/O 298 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.738231] nvme nvme1: I/O 299 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.738547] nvme nvme1: I/O 300 QID 14 timeout, aborting

Feb 21 14:15:25 px kernel: [ 3722.005726] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006113] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006434] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006751] nvme nvme1: Abort status: 0x0

After this, the command completes without error.

This does not happen with Debian 9.3 (Kernel 4.9.x). If I partition the devices with one 2GB primary partition and do the same operation but with /dev/nvmeXn1p1 or p2 the timeout does happen on the second partition but not on the first:

parted -a optimal /dev/nvme0n1 mklabel gpt

parted -a optimal /dev/nvme0n1 mkpart primary 4 2047

parted -a optimal /dev/nvme0n1 mkpart primary 2048 100%

parted -a optimal /dev/nvme1n1 mklabel gpt

parted -a optimal /dev/nvme1n1 mkpart primary 4 2047

parted -a optimal /dev/nvme1n1 mkpart primary 2048 100%

Any clues?

Best,

Pierre

idata · ‎02-21-2018

Hi berthierp,

Thank you for contacting our support community. We understand your situation regarding the Intel® SSD DC P4600 Series.

It seems that the Proxmox* OS is not running the latest Kernel, we would like to inform you that the Kernel required for the SSD is 4.7 or above due to the NVMe* drivers. That is why with Debian* 9.3 (Kernel 4.9.x) the SSD works.

We recommend you to install the Kernel 4.7 or above for the Proxmox* OS(if available).

Regards,

Junior M.

idata · ‎02-28-2018

Hi berthierp,

We would like to know if you read our previous post. If you have any other questions, we'll be waiting for your response.

Regards,

Junior M.