Awaiting the results of your odyssey as I am very curious. I just started to investigate the capabilities & limits of the SS4000-E in anticipation of making a purchase of the device.
So... Weird thing that happened today...
Yesterday, I attempted to stress the solution by restoring two partitions worth of info concurrently.
My intent was to write to NAS-1 "public" as well as NAS-1 "public-3" at the same time.
Due to my own error, I actually wrote both restores to "public" (... into separate subdirectories... so should have been no big deal).
I stopped the restore of the data that should have gone to public-3 when I saw my mistake (a few hours later) and I proceeded to copy that data from "public" to "public-3"
Data copy was S L O W ... much slower than the initial network write.
So, after about 30 minutes, I decided to delete that misplaced data... since I knew I had a backup anyway.
Now, TODAY is when the REALLY strange behavior begins...
On trying to access public-3 today, Win7 said that the network resource was not available, or that I had insufficient rights.
Looking at the "home page", I saw that the NAS was ... confused.
Apparently one partition (the smallest, public-3) was likely gone, and that another partition seemed to think that it was for backups.... And that the overall status was "NOT READY"
This was troubling, as I had seen this type of behavior on the first box that I had attempted to test.
At that point, I tried a reboot of the box, but on restart, the status was the same.
Checking the partitions, I saw that "public-3" had decided that it was "0" MB in size:
However, the disks in the RAID still were good... and hotplug indicator was YELLOW (which is good)
Thinking that the NAS could be doing something in the background, I checked the System Status page... and the CPU was idle.
And... the system log showed no errors.
So: I'll give the S4000-E the benefit of the doubt on this one...
I deleted partitions "public-2" and "public-3", recreated them and assigned rights, and continue on with the restoration of the "public" partition.
Now the home page shows what we expect to be normal:
Bottom line: Working, and still restoring, but just more than a little disconcerting with that disappearing partition.
OK... Now it looks like we are getting to a potential problem that appears to be replicated.
Reminder: My S4000-E has 4 2TB drives installed in a RAID 5 config, resulting in 5.5 TB (5587 GB) available space. This requires the creation of three partitions: public (created by default and my choice to size to 2048 GB) , public-2 (created by choice at 2048 GB), and public-3 (created by choice with the remaining space)
So far, NAS-1/public has restored well.
I paused restoring NAS-1/public-2 and decided to let NAS-1/public-3 restore for a while.
That is when the problem became evident.
When I began to restore NAS-1/public-3, I saw that the speed of transfer was extremely slow.
Here is what that looked like:
The picture ablove shows a very slow transfer, when compare that with a screenshot of NAS-1/public-2 showing the expected transfer speed below...
Also disturbing was finding that the system log no longer had a complete record, but appeared to start over:
Knowing that the running restore of NAS-1/public-3 may take weeks at the speed displayed, I attempted to abort the restore, finally pulling the ethernet cable out to cause a loss of network resource.
Having stopped the restore, I reconnected the ethernet cable to the NAS. All partitions were still there, as well as all physical drives still indicating YELLOW in the RAID configuration.
On reboot of the drive, The home screen showed a screen that we have seen before:
A partition that thinks that it is shared, a partition that thinks that it is a backup, and a partition that is GONE.
(... and, yes, "public-3" again was at 0 bytes.)
In this case, I was able to again delete NAS-1/public-3, and then public and public-2 "came back" and the system again was ready.
The good news is that the physical drives remain in teh raid, and teh raid remain valid.
Again, I continued with the previously interrupted restore of NAS-1/public-2, with apparently no problem.
Here is the "new" home window, showing the space occupied by public and public-2:
SO.... HERE IS MY QUESTION(S) TO THE INTEL SUPPORT TEAM:
I await your input.
While waiting, I continue on with the restoration of Nas-1/public-2
v.
Sadly, the S4000-E has again suffered a critical error.
Unless The support folk at Intel can provide a reason why, this may be the end of the pursuit of having the S4000-E work w/ 2 TB drives.
System XRAYoutput file available at http://dl.dropbox.com/u/23866842/Apr_22_xray.tgz
SUMMARY:
NAS-1/public-2 was restoring, and continued to restore.
However there was a Windows message informing me that NAS-1/public was no longer available as a resource.
On checking windows network resources, I could access the data on NAS-1/public-2, but I could not access NAS-1/public.
On checking the S4000-E, Drive #1 was dark.
On logging in, there was a Disk Change Notification message, stating that Drive #1 was no longer active, and the raid was degraded (3 of 4 drives functional).
Removing and reinserting the drive begins the rebuild process, however NAS-1/public remains unavailable.
DETAILS:
While I did not initiate any change, here is the Disk Change Notification:
After removing the drive and reinserting, I received a rebuilding update:
If the time message is correct, then rebuld process of that drive will take over 4 days.
However, the rebuild process may not be of much value... as after reinserting the drive there is still no access to NAS-1/public:
("admin" and "public-2" can be accessed.)
After reinserting the drive, selecting [ Continue ] on the Disk Change Notification screen would NOT allow me to proceed to the Home screen, so I could not tell if the "public" partition was still there.
I was able to run the Intel XRAY diagnostic program built in on the S4000-E. The output is available for anyone to view at: http://dl.dropbox.com/u/23866842/Apr_22_xray.tgz
While unfamiliar with all the info that could be reviewed in this data, checking the MESSAGES file, I found the record of the disk being shut down by the S4000-E. This looks like the system failing and the system choosing to shut down the drive, rather than a physical drive failure:
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: drivers/scsi/gd31244/drv/gd31244_lld.c#2201:gd31244_device_reset: Dev Reset 0:0:0:0, dev# 0: Success
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: drivers/scsi/gd31244/drv/gd31244_lld.c#2218:gd31244_device_reset: reconfigure device #0 failed
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: drivers/scsi/gd31244/drv/gd31244_lld.c#2112:gd31244_bus_reset: Bus Reset called for Ho:Ch:Tgt:Lun (0:0:0:0)
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: drivers/scsi/gd31244/drv/gd31244_lld.c#2201:gd31244_device_reset: Dev Reset 0:0:0:0, dev# 0: Success
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: drivers/scsi/gd31244/drv/gd31244_lld.c#2218:gd31244_device_reset: reconfigure device #0 failed
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.info kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.warn kernel: SCSI error : <0 0 0 0> return code = 0x10000
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.warn kernel: end_request: I/O error, dev sda, sector 3907029008
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.err kernel: scsi0 (0:0): rejecting I/O to offline device
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.alert kernel: raid1: Disk failure on sda1, disabling device.
Apr 22 06:44:22 ZEBRAITIS-NAS-1 user.warn kernel: ^IOperation continuing on 3 devices
Strictly out of curiosity, I will be letting the rebuild continue, just to see what will happen and to see the status of the "public" partition.
.
Admittedly, I have spent nearly a month working on this, and considering these issues continue on more than one box... Well, I'm nearly at the end.
I would very much like the Intel support team to look at the XRAY output and let me know what's going on.
RAID FAILURE
Turning off the S4000-E and restarting resulted in the Failure of the RAID.
Even though three drives remained, and the RAID and data should have been secure, it completely failed requiring an initialization of the drives to continue in any manner.
At this point, there is no sense in making any other posts unless the Intel support team provides guidance.
zebraitis,
I appreciate your detail for investigating this. However, we know that the 1.4 firmware for the SS4000 was created to allow for support for drives greater than 500 GB. The SS4000 was officially discontinued July 1, 2008. The last Tested hardware and operating system list was published February 2008 and contained one "officially" tested 1TB HDD. We don't know from a validation standpoint how drives that are not tested will function. If a customer chooses to use non-validated components, the operational testing becomes their responsibility. It looks like you've performed more than enough testing to determine that the 2TB HDDs you're using may not be reliable enough.
Again, thanks for your effort and I wish the results would have been more favorable.
Regards,
John

