ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Replacing a Failed drive in MD RAID 10

    IT Discussion
    failed drive md raid raid linux raid 10 how to
    10
    33
    5.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403
      last edited by DustinB3403

      So tomorrow's project (as I'm building backups and heading home for the night) will be how to determine which drive is failed with MDADM as well as physically tell and then how to eject the disk from the Software array to be replaced.

      1 Reply Last reply Reply Quote 1
      • DustinB3403D
        DustinB3403
        last edited by

        So to start let's check the array.

        0_1453936142530_chrome_2016-01-27_18-08-43.png

        Obviously sdc is in a Failed state.

        So let's see what smartclt has to say...

          smartctl -i /dev/sdc
        

        Hrm... something is off....
        0_1453936287111_chrome_2016-01-27_18-10-53.png

        So It would appear I have to update the smartctl database...

        1 Reply Last reply Reply Quote 0
        • DustinB3403D
          DustinB3403
          last edited by DustinB3403

          Now with leaving SmartCTL as is(I'll have to come back to it); I don't have hot-swap capabilities on this server. An updated version of SmartCTL would be nice to provide additional information about my disks, and is something that I want to update. But the critical point is to get this drive swapped out as quickly as possible so that I can get this server back to good running condition.

          Since I don't have hot-swap capabilities, I'm going to have to shut down the server in order to actually perform the disk exchange. Not overly complex, but adds to the risk of having to restore from backup should something go horribly wrong.

          1 Reply Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403
            last edited by

            Now there are a few guides that keep popping up in Google Search that give instructions on how to do this for RAID 1 MDADM Arrays.

            And even @scottalanmiller has recommended the same above guide for RAID10 and this one on SW. But again RAID1.

            So we'll have to work through it and ensure that they are still accurate.

            travisdh1T 1 Reply Last reply Reply Quote 1
            • travisdh1T
              travisdh1 @DustinB3403
              last edited by

              @DustinB3403 Should be, mdadm still works the same way.

              DustinB3403D 1 Reply Last reply Reply Quote 0
              • DustinB3403D
                DustinB3403 @travisdh1
                last edited by DustinB3403

                @travisdh1 said:

                @DustinB3403 Should be, mdadm still works the same way.

                Thanks, just being extra cautious to ensure this works smoothly.

                To remove the disk from the array I should have to simply type

                mdadm --manage /dev/md0 --fail /dev/sdc
                

                and then

                mdadm --manage /dev/md0 --remove /dev/sdc
                

                At this point I should be able to shutdown the server, remove the disk and add it's replacement with

                 shutdown -h now
                
                travisdh1T 1 Reply Last reply Reply Quote 1
                • DustinB3403D
                  DustinB3403
                  last edited by DustinB3403

                  Obviously at this point there is some manual labor involved since I have no hot-swap capabilities. If your server has hot-swap you can just pull the drive at this point and add the replacement disk.

                  1 Reply Last reply Reply Quote 1
                  • DustinB3403D
                    DustinB3403
                    last edited by

                    I'm at a stand-still as I wait for my replacement disk to arrive, so this project will have to get picked up in a day or so.

                    1 Reply Last reply Reply Quote 1
                    • travisdh1T
                      travisdh1 @DustinB3403
                      last edited by

                      @DustinB3403 said:

                      @travisdh1 said:

                      @DustinB3403 Should be, mdadm still works the same way.

                      Thanks, just being extra cautious to ensure this works smoothly.

                      To remove the disk from the array I should have to simply type

                      mdadm --manage /dev/md0 --fail /dev/sdc
                      

                      and then

                      mdadm --manage /dev/md0 --remove /dev/sdc
                      

                      At this point I should be able to shutdown the server, remove the disk and add it's replacement with

                       shutdown -h now
                      

                      Yep. After putting a replacement drive in, just add it back.

                      mdadm --manage /dev/md0 --add /dev/sd?
                      

                      I like to keep an eye on the rebuild process with:

                      watch /cat/proc/mdstat
                      

                      The array should be back to normal.

                      1 Reply Last reply Reply Quote 3
                      • coliverC
                        coliver
                        last edited by

                        How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?

                        DustinB3403D 1 Reply Last reply Reply Quote 0
                        • DustinB3403D
                          DustinB3403 @coliver
                          last edited by

                          @coliver said:

                          How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?

                          How do I know which disk it is?

                          Well the other day I noticed that the array had a failed disk. Since I was rebuilding the system anyways I pulled each disk and performed a check disk from windows while checking for bad sectors.

                          Only 1 disk was found with bad sectors.

                          Knowing which disk this was, and windows saying it fixed the problem, I re-added the disk and simply "remember" which disk had the bad sectors.

                          So this disk is the disk that has to be removed.

                          coliverC 1 Reply Last reply Reply Quote 0
                          • coliverC
                            coliver @DustinB3403
                            last edited by

                            @DustinB3403 said:

                            @coliver said:

                            How did you figure out what drive it was in the array? Or did you pull them until you saw the one with that serial number?

                            How do I know which disk it is?

                            Well the other day I noticed that the array had a failed disk. Since I was rebuilding the system anyways I pulled each disk and performed a check disk from windows while checking for bad sectors.

                            Only 1 disk was found with bad sectors.

                            Knowing which disk this was, and windows saying it fixed the problem, I re-added the disk and simply "remember" which disk had the bad sectors.

                            So this disk is the disk that has to be removed.

                            Ok, so you wouldn't be able to figure this out from the Linux CLI you would have to have a record of all the serial numbers that are in each bay.

                            DustinB3403D 1 Reply Last reply Reply Quote 0
                            • DustinB3403D
                              DustinB3403 @coliver
                              last edited by

                              @coliver Pretty much.

                              Since there is no hot-swap function on my server (no indicator lights either) it's simply a matter of my knowing which disk is connected to which SATA port.

                              1 Reply Last reply Reply Quote 1
                              • DustinB3403D
                                DustinB3403
                                last edited by DustinB3403

                                So at this point I have the disk marked as failed, and removed from the array as shown below.

                                0_1453994344578_XenCenterMain_2016-01-28_10-18-57.png

                                As you can see sdc is not a part of the array at the moment, which means nothing will be written to the disk. Obviously I'm in a dangerous point in time.

                                If I can't get my replacement disk soon, I risk losing the entire array.

                                Now, because I've ready had issues with this array (specifically the disk) I have nothing running on this system that I don't have several backups of. So the drive has been ordered and will be here in a day or so.

                                At which point I'll shutdown the server, remove the bad disk, and put the new one in.

                                1 Reply Last reply Reply Quote 1
                                • DustinB3403D
                                  DustinB3403
                                  last edited by

                                  While I wait for that drive to arrive, I'm going to figure out how to configure email alerts for the mdadm array. Seeing as this would be incredibly useful to have.

                                  Since I can't sit here watching the cat /proc/mdstat.... 🙂

                                  travisdh1T 1 Reply Last reply Reply Quote 1
                                  • travisdh1T
                                    travisdh1 @DustinB3403
                                    last edited by

                                    @DustinB3403 said:

                                    While I wait for that drive to arrive, I'm going to figure out how to configure email alerts for the mdadm array. Seeing as this would be incredibly useful to have.

                                    Since I can't sit here watching the cat /proc/mdstat.... 🙂

                                    No remote ssh access?

                                    DustinB3403D 1 Reply Last reply Reply Quote 0
                                    • DustinB3403D
                                      DustinB3403 @travisdh1
                                      last edited by

                                      @travisdh1 I do have access, but I'm still not going to sit here and watch it.

                                      1 Reply Last reply Reply Quote 1
                                      • DustinB3403D
                                        DustinB3403
                                        last edited by DustinB3403

                                        So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

                                        Which are pretty common drives.

                                        Western Digital Red 1TB.

                                        I'm really surprised how old of a database is built into XenServer 6.5.

                                        So time to figure this part out.

                                        JaredBuschJ 1 Reply Last reply Reply Quote 0
                                        • JaredBuschJ
                                          JaredBusch @DustinB3403
                                          last edited by

                                          @DustinB3403 said:

                                          So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

                                          Which are pretty common drives.

                                          Western Digital Red 1TD.

                                          I'm really surprised how old of a database is built into XenServer 6.5.

                                          So time to figure this part out.

                                          WTF is a TD?

                                          DustinB3403D scottalanmillerS 2 Replies Last reply Reply Quote 0
                                          • DustinB3403D
                                            DustinB3403 @JaredBusch
                                            last edited by

                                            @JaredBusch said:

                                            @DustinB3403 said:

                                            So now that I have the email alerts configured for my Xen Servers, I really want to work on updating SmartCTL so it supports the drives that I have in this server.

                                            Which are pretty common drives.

                                            Western Digital Red 1TD.

                                            I'm really surprised how old of a database is built into XenServer 6.5.

                                            So time to figure this part out.

                                            WTF is a TD?

                                            That would be a typo' whoops.

                                            1TB.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post