8 Replies Latest reply on Jun 25, 2012 9:28 AM by john_s@intel

    Intel SS4200-E - How to get back some data with ssh ?

    kloog

      Hi,

       

      After a reboot of my SS4200-E i got all the led amber, and so i cannot access to my data anymore.

      My NAS uses 4 disks in parity mode.

       

      I saw many post about this subject on this forum and tried to do many solutions:

       

      - did a hard reset  -> unsuccesful

      - I bought a brain new disk and try 4 times the exchange with one of the original disks     -> unsuccesful

      - I connected with ssh, but the "ps -af" didn't show any "e2fsck" process      -> unsuccesful

       

      When i did "mdadm --examine sda1" and the same for sdb1 sdc1 sdd1, i understand that i have at least 2 faulty disk, i think but i'm not sure (actually, i don't understand the result of the commands i used...):

       

      My question is :

       

      Is there a way to recover some data from my degraded Array ?

      On my 3TB of data i have only 70Go very important i expect to recover.

      i hope the array is not fully unreadable, isn't it ?

       

       

      Here is the log of "mdadm --examine"

      # mdadm --examine sda1 sdb1 sdc1 sdd1

      sda1:

                Magic : a92b4efc

              Version : 00.90.00

                 UUID : 7a5ebb7c:46fc5931:47ed1e69:69a35450

        Creation Time : Tue Sep 23 11:32:57 2008

           Raid Level : raid5

          Device Size : 976759936 (931.51 GiB 1000.20 GB)

           Array Size : 2930279808 (2794.53 GiB 3000.61 GB)

         Raid Devices : 4

        Total Devices : 4

      Preferred Minor : 0

       

          Update Time : Sat Jun  2 12:51:50 2012

                State : clean

      Active Devices : 2

      Working Devices : 3

      Failed Devices : 2

        Spare Devices : 1

             Checksum : eef3e5e6 - correct

               Events : 0.52370

       

               Layout : left-asymmetric

           Chunk Size : 32K

       

            Number   Major   Minor   RaidDevice State

      this     0     254        0        0      active sync   /dev/evms/.nodes/sda1

       

         0     0     254        0        0      active sync   /dev/evms/.nodes/sda1

         1     1       0        0        1      faulty removed

         2     2       0        0        2      faulty removed

         3     3     254        3        3      active sync   /dev/evms/.nodes/sdb1

         4     4     254        1        4      spare   /dev/evms/.nodes/sdc1

      sdb1:

                Magic : a92b4efc

              Version : 00.90.00

                 UUID : 7a5ebb7c:46fc5931:47ed1e69:69a35450

        Creation Time : Tue Sep 23 11:32:57 2008

           Raid Level : raid5

          Device Size : 976759936 (931.51 GiB 1000.20 GB)

           Array Size : 2930279808 (2794.53 GiB 3000.61 GB)

         Raid Devices : 4

        Total Devices : 4

      Preferred Minor : 0

       

          Update Time : Sat Jun  2 12:51:50 2012

                State : clean

      Active Devices : 2

      Working Devices : 3

      Failed Devices : 2

        Spare Devices : 1

             Checksum : eef3e5e9 - correct

               Events : 0.52370

       

               Layout : left-asymmetric

           Chunk Size : 32K

       

            Number   Major   Minor   RaidDevice State

      this     4     254        1        4      spare   /dev/evms/.nodes/sdc1

       

         0     0     254        0        0      active sync   /dev/evms/.nodes/sda1

         1     1       0        0        1      faulty removed

         2     2       0        0        2      faulty removed

         3     3     254        3        3      active sync   /dev/evms/.nodes/sdb1

         4     4     254        1        4      spare   /dev/evms/.nodes/sdc1

      sdc1:

                Magic : a92b4efc

              Version : 00.90.00

                 UUID : 7a5ebb7c:46fc5931:47ed1e69:69a35450

        Creation Time : Tue Sep 23 11:32:57 2008

           Raid Level : raid5

          Device Size : 976759936 (931.51 GiB 1000.20 GB)

           Array Size : 2930279808 (2794.53 GiB 3000.61 GB)

         Raid Devices : 4

        Total Devices : 4

      Preferred Minor : 0

       

          Update Time : Sat Jun  2 12:51:37 2012

                State : active

      Active Devices : 3

      Working Devices : 4

      Failed Devices : 1

        Spare Devices : 1

             Checksum : eef3193a - correct

               Events : 0.52365

       

               Layout : left-asymmetric

           Chunk Size : 32K

       

            Number   Major   Minor   RaidDevice State

      this     2     254        2        2      active sync   /dev/evms/.nodes/sdd1

       

         0     0     254        0        0      active sync   /dev/evms/.nodes/sda1

         1     1       0        0        1      faulty removed

         2     2     254        2        2      active sync   /dev/evms/.nodes/sdd1

         3     3     254        3        3      active sync   /dev/evms/.nodes/sdb1

         4     4     254        1        4      spare   /dev/evms/.nodes/sdc1

      sdd1:

                Magic : a92b4efc

              Version : 00.90.00

                 UUID : 7a5ebb7c:46fc5931:47ed1e69:69a35450

        Creation Time : Tue Sep 23 11:32:57 2008

           Raid Level : raid5

          Device Size : 976759936 (931.51 GiB 1000.20 GB)

           Array Size : 2930279808 (2794.53 GiB 3000.61 GB)

         Raid Devices : 4

        Total Devices : 4

      Preferred Minor : 0

       

          Update Time : Sat Jun  2 12:51:50 2012

                State : clean

      Active Devices : 2

      Working Devices : 3

      Failed Devices : 2

        Spare Devices : 1

             Checksum : eef3e5ef - correct

               Events : 0.52370

       

               Layout : left-asymmetric

           Chunk Size : 32K

       

            Number   Major   Minor   RaidDevice State

      this     3     254        3        3      active sync   /dev/evms/.nodes/sdb1

       

         0     0     254        0        0      active sync   /dev/evms/.nodes/sda1

         1     1       0        0        1      faulty removed

         2     2       0        0        2      faulty removed

         3     3     254        3        3      active sync   /dev/evms/.nodes/sdb1

         4     4     254        1        4      spare   /dev/evms/.nodes/sdc1

        • 2. Re: Intel SS4200-E - How to get back some data with ssh ?
          john_s@intel

          kloog,

           

          Your mdadm readout shows State: clean, Active Devices: 2 and Failed Devices: 2. With RAID5, two failed disks is pretty much not recoverable.

           

          You can see if you have access (if the subdirectories exist) at:
          /mnt/soho_storage/samba/shares/

           

          On every mount of the file system, the Storage System software does a file system check.  This will replay the data in the ext3 logs, correcting any file system issues that may have been caused by an improper shutdown.  If this fails a full file system check is performed.  The full check will make the file system mountable if possible.  It may also recover some user data that would otherwise be lost, and place it in the lost+found directory.  The lost+found directory is at /mnt/soho_storage/lost+found.  If the full file system check doesn't make the file system mountable, it is likely the user data is permanently lost.  However it may be desirable to attempt the file system check manually:

          e2fsck –f /dev/evms/md0vol1

           

          If this finishes successfully attempt to mount the file system:

          mount /dev/evms/md0vol1 /mnt/soho_storage –t ext3

           

          If this succeeds the file system is now mounted.  It’s possible that some user files were lost in the file system corruptions and recovered by the full file system check.  They would be placed in /mnt/soho_storage/lost+found.  If that is the case the data can be made available to the user by copying it to the public folder:

          cp –r /mnt/soho_storage/lost+found /mnt/soho_storage/samba/shares/public

           

          The files in lost+found may not be complete, or may be corrupted, but the user will now have access to some or all of their data.  The Storage System device should now be rebooted to restart all services.

           

          Be very careful logged into the system as root, a mistake can make the system unusable.

           

          If the above steps don't make the SS4200 data available, you'll need to recover data from your backup.

           

          Regards,
          John

          1 of 1 people found this helpful
          • 3. Re: Intel SS4200-E - How to get back some data with ssh ?
            kloog

            Thanks for your answer John,

             

            Unfortunatly i have nothing in /mnt/soho_storage/ folder :


            # cd /mnt/soho_storage

            # ls -l

            -r-xr-xr-x    1 root     root          881 Jun 21 17:14 smb.conf

             

            And it seems that i have not the md0vol1 too :

             

            #  e2fsck -f /dev/evms/md0vol1

            e2fsck 1.38 (30-Jun-2005)

            e2fsck: No such file or directory while trying to open /dev/evms/md0vol1

            # pwd

            /dev/evms

            # ls -l

            drwxr-xr-x    2 root     root           60 Jun 21 17:13 dm

             

            this is dead ?

            • 4. Re: Intel SS4200-E - How to get back some data with ssh ?
              john_s@intel

              Kloog,

               

              It may be possible to recover the array if 3 of the 4 original drives are available and functioning. 

               

              With a RAID 5 array the following command can be used to attempt to recover the array:

               

              mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

               

              If that's not successful, use:

               

              mdadm –examine /dev/sdX1 (where X is the specific disk you want to see: a, b, c or d) to determine the disk drive failure. Check if all working disks have identical UUID after issuing this command on all disks. 

               

              In the case where 1 out of 4 disks has a different UUID, and others are the same, try following command:

               

              mdadm --assemble --run --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdd1 (assume here sdc1 has the different UUID and sda1, sdb1, sdd1 are same, This command will force rebuild the md0 without the different UUID)

               

              If this is successful the array will begin recovery.  Use cat /proc/mdstat to monitor recovery of the array.  If it does not succeed because one or more of the drives is bad, the hardware is bad, or any other reason, the user data is irretrievably lost.  If it succeeds the user data may still be recovered.  Reboot the device. The file system including the user data is still on the array and if not too badly corrupted will be available.  The Storage System device will find the file system and check it.  If it can be mounted it will be.  If not you may try a manual check as described in the previous section. 

               

              It’s possible the file system check succeeded and created a lost+found area.  If so you want to copy the data as described in the previous section.

               

              Good luck,

              John

              1 of 1 people found this helpful
              • 5. Re: Intel SS4200-E - How to get back some data with ssh ?
                kloog

                Thank you John,

                I tried, here are the logs below.

                i understand that it didn't work, did it ?


                # mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

                mdadm: forcing event count in /dev/sdc1(2) from 52370 upto 52374

                mdadm: clearing FAULTY flag for device 2 in /dev/md0 for /dev/sdc1

                mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 spare.

                 

                # cat /proc/mdstat

                Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]

                md0 : active raid5 sda1[0] sdb1[4] sdd1[3] sdc1[2]

                      2930279808 blocks level 5, 32k chunk, algorithm 0 [4/3] [U_UU]

                      [=============>.......]  recovery = 66.3% (648277552/976759936) finish=62.2min speed=87882K/sec

                     

                After 2 hours...

                 

                unused devices: <none>

                # cat /proc/mdstat

                Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]

                md0 : active raid5 sda1[0] sdb1[4](S) sdd1[3] sdc1[5](F)

                      2930279808 blocks level 5, 32k chunk, algorithm 0 [4/2] [U__U]

                 

                Then I reboot

                 

                # cat /proc/mdstat

                Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]

                unused devices: <none>

                # pwd

                /mnt/soho_storage

                # ls

                smb.conf

                • 6. Re: Intel SS4200-E - How to get back some data with ssh ?
                  john_s@intel

                  Unfortunately it doesn't look like it kloog. If things were correctly in place, the reboot would have mounted md0 though the normal process.

                   

                  John

                  • 7. Re: Intel SS4200-E - How to get back some data with ssh ?
                    kloog

                    Ok thank you john,

                     

                    I tried everything i could, now i'm ready to say goodby to my data

                     

                    by

                    • 8. Re: Intel SS4200-E - How to get back some data with ssh ?
                      john_s@intel

                      kloog,

                       

                      I understand this information is after the fact, but we cannot stress the importance of a backup solution for important data. Computer systems can and do fail. Data should be stored in multiple locations to increase the chance to recover from a failure.

                       

                      See our web solution on Planning for the worst case: the importance of a backup solution for more information.

                       

                      Regards,

                      John