Thank you for reaching us out, as you mention this product is in a En of Interactive Support fase since 2011, meaning that there is no longer support for it and resources are limited, however, I can provide you with some recommendations under no liability.
With that said, it is very important for you to confirm you have a valid backup of your data before to take any further action unless data loosing will have no real impact in your production environment.
First of, I would like to confirm the following information to have a better idea of what could be the problem:
- You have a Intel® Modular Server Chassis MFSYS35, however, this chassis is compatible with two different compute modules Intel® Compute Module MFS5000SI and Intel® Compute Module MFS5520VIR. What is the actual module being used on your system?
- What kind of controller are you using (hardware, software, onboard)? What is the model or name?
- What OS is being run on the system?
- Have you checked the status from the raid BIOS?
- How many drives, in what arrays and models are being used on the server with the message?
- Is the server presenting any kind of outage or performance degradation at the moment?
- Are there any alerted drives?
- What are the firmware versions on your system (board and controller)?
- Could you provide the hardware logs in order to have a deeper view of what is going on? if so please follow the next steps to generate the file:
- You will need USB flash drive formatted as FAT32.
- Please download the Sysinfo_V14_0_Build12_AllOS.zip package, extract the contents of the Sysinfo_V14_0_Build12_AllOS\Sysinfo_V14_0_Build12_AllOS\UEFI folder into the root of the flash drive (not into a folder).
- Boot into Internal EFI Shell (with the Thumb drive connected to the server), get into the flash drive with the command “FS0: + enter” and run the sysinfo.efi file for the utility to start. (FS) may change depending on what USB port is being used,, try with FS1, FS2 or change the USB port if needed.
- Once the utility complete its process, it will copy the log file on your flash drive.
- Once completed please share the files with us.
Intel Customer Support
Hi Kenneth! Thank you for fast response!
I wrote down below required information:
1. I have a Intel® Modular Server Chassis MFSYS35 with 5 MFS5520VI Compute Modules.
2. I use buil-in chassis dual controller SAN and MFS5520VI HBAs which takes LUNs from it.
3. Previously I run Windows Server 2008R2 on it, but now all the compute modules are clean. I moved up all the data on spare servers.
4. There`s no RAID-controllers in the compute modules, HBAs only which takes LUNs from built-in SAN.
5. Built-in chassis SAN contains 6 3,5 SAS HUC101890CS4204 drives in RAID5.
6. At the moment servers are clean, I can`t do any measure of performance.
7. All drives are in good condition.
8. Chassis firmware: Current Build Version: 18.104.22.16820307.34729
MFS5520VI Compute Module:BMC Firmware: 1.27.1, BMC Boot: 0.28, BIOS: S5500.86B.01.00.0060.092120111445
Storage Control Module 1 Firmware ok 22.214.171.124
Storage Control Module 2 Firmware ok 126.96.36.199
9. I could provide Diagnostic report from the chassis itself: https://drive.google.com/open?id=0B4ywsAhL5S0MR255d0tDUTYza28
INTERNAL DIAGNOSTICS TEST
Test run: 10/19/2017 10:38:56
Device Present I2C Ping SNMP CIM HAPI VBMC PBMC
------- ------- ------ ------ ------ ------ ------ ------ ------
ESM1 yes PASS PASS PASS - - - -
SCM1 yes PASS - - - - - -
SCM2 yes [T/O] - - - - - -
VSCM yes - PASS PASS PASS - - -
FAN1 yes PASS - - - - - -
FAN2 yes PASS - - - - - -
IOFAN yes PASS - - - - - -
PS1 yes - - - - - - -
PS2 yes - - - - - - -
PS3 yes - - - - - - -
PS4 yes - - - - - - -
SERVER1 yes PASS PASS - - PASS PASS PASS
SERVER2 yes PASS PASS - - PASS PASS PASS
SERVER3 yes PASS PASS - - PASS PASS PASS
SERVER4 yes PASS PASS - - PASS PASS PASS
SERVER5 yes PASS PASS - - PASS PASS PASS
CHASSIS yes - - PASS - PASS PASS PASS
I just finished reviewing the logs, and for what I can see at some point the controller 2 was removed then reinserted and after that started to show as Offfline, It could be that after removal the hardware got physically damaged but also chances are that the firmware got corrupted and since the controller was active there were no changes to it, then when the controller was reinserted the corruption took effect, anyhow, based on the logs I can't be sure of any of the two scenarios and I can't find previous cases since the product has been out for a while.
So, with the information available so far, I would recommend to confirm hardware wise all components are properly seated and run the Firmware update, find the last package here, make sure to take a look on the release notes for the specific instructions, known issues and requirements, I am also thinking in the drive distribution, you say all your drives are in a single raid, what if the controller 2 is showing as offline because it has no drives assigned and all of the current raid is running in controller one? could you check that please,
I'll stay tuned to your comments.
I addition to the previous answer I would like to know:
- What options are available when you when checking the SCM2 prperties for the device?
- Could you please reboot the SCM2?
- Try to generate diagnostic logs from the device, try checking this, please take a look in the release notes for instruction on the different OS supported or UEFI.
I would like to know if you had the chance to review the emails sent to your address and if there are any updates from your side, or if the assistance is no longer needed and we are OK to set this case as closed, either way please let me know by replying to this email.
I'll stay tuned to your comments, best regards.
Intel Customer Support
Sorry for delay. Today I`ll try to make diag logs.
I was wondering if you could get the Complete System Diagnostics file to better understand this issue. Additionally, I was wondering if you could click on the SCM2 and let us know what actions are available as well as the status of the storage pools.
Diagnostic File Capture.JPG 80.3 K
Thanks for response.
I`ve got Complete System Diagnostics file and the screenshot of failed SCM in Management interface.
P.S I swapped controllers. Now the failed one is in SCM1 slot. I`ve tried to figure out is the problem related midplane/controller.
I sent you complete DiagLogs via email.