ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Hot Swap vs. Blind Swap

    Announcements
    storage raid hot swap blind swap cold swap
    10
    66
    24.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller
      last edited by

      There are three different concepts of drive swapping: cold, hot and blind (or blind hot.)

      Cold Swapping: No enterprise or business class servers would ever require cold swapping of hard drives, although a shocking number of IT pros assume that cold swapping is required or recommended. Cold swapping of drives means that the entire server is powered down before being able to replace a failed drive. This defeats much of the value of RAID and is mostly a product of consumer desktop products.

      Hot Swapping: The ability to replace a hard drive while the server is still running. This allows for zero downtime drive replacement so, in theory, an array could be replaced, drive by drive, over a period of time many times over without the server ever needing to be powered down. Any enterprise class or business class server will be hot swap by definitionl. Lacking this feature would disqualify a device from being considered business ready as a server.

      Blind Swapping: Generally a unique feature to hardware RAID systems. This is an extension of hot swapping that includes not needing to interact with the operating system first. Hot swapping alone does not imply that a lack of interaction is needed. Blind swapping is popular in large datacenters so that datacenter staff who do not have access to the operating system can replace failed drives without any interaction from the systems administrators.

      1 Reply Last reply Reply Quote 11
      • BRRABillB
        BRRABill
        last edited by

        Can you pull a hotplug drive out at any time, or is that dependent on server manufacturer and RAID card?

        scottalanmillerS 1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller @BRRABill
          last edited by

          @BRRABill said:

          Can you pull a hotplug drive out at any time, or is that dependent on server manufacturer and RAID card?

          Depends what you are asking.

          The hardware will determine if pulling a hot plug drive will cause a short. Hot plug hardware will allow a random drive yank to not cause an electrical issue.

          Hot plug software will allow the OS to allow you to tell it that a drive has been removed and then have it add it when you replace it.

          Blind swap allows you to just walk up to an array, pull a drive without preparing anything and put a new one in without needing to tell them system what you have done.

          1 Reply Last reply Reply Quote 0
          • BRRABillB
            BRRABill
            last edited by

            I ask because I had an issue yesterday on our DELL server. Which, admittedly, is very, very old. Experienced, I should say. No one likes to be called old.

            It's our main data server. One of two servers that really matter.

            We have 4 drives in a RAID5 array. (This is from the dark ages when that was considered OK.)

            I went into the server room for something else, and noticed one of the drives was blinking amber. I go from a 1 to a 5 on the 1 to 10 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the old drive. No problem. I put in the new drive, no problem. I go to log in to start rebuilding the array, and I notice that the server is rebooting. Hmm, that's odd. I look at the drive. Now TWO of the four are blinking amber. I've now gone to a 10, LOL.

            Turns out a second drive failed after I did the hot plug. I'm not sure if it was just random (which seems unlikely) or something wierd happened during the hot plug.

            I spent a long, long time getting everything back to how it was.

            1 Reply Last reply Reply Quote 0
            • scottalanmillerS
              scottalanmiller
              last edited by

              RAID 5 induces other failures when you go to rebuild. It's extremely common and just an artifact of that RAID level. Doesn't mean that it will always do it or even normally do it, but it is very common. Once you do a drive swap it immediately increases the load on the drives and makes them more likely to fail.

              BRRABillB 1 Reply Last reply Reply Quote 1
              • BRRABillB
                BRRABill
                last edited by

                Interesting. The second failed drive definitely sounded like it was dead...mechanical issue.

                I think that happened to me a long time ago on a server, which is why I'm always nervous doing it.

                THOUGH thanks to ML I'll never have another RAID 5 array, so no need to worry!

                It doesn't do that for any other RAID level?

                And I am assuming RAID 5 of SSDs wouldn't do that?

                scottalanmillerS 1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @BRRABill
                  last edited by

                  @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                  BRRABillB 1 Reply Last reply Reply Quote 0
                  • scottalanmillerS
                    scottalanmiller
                    last edited by

                    RAID 6 induces even more immediate wear and tear so is even more likely to kill off a second drive at the time of drive replacement PLUS has one extra drive to have fail but can withstand losing one additional drive so is dramatically safer overall.

                    1 Reply Last reply Reply Quote 0
                    • BRRABillB
                      BRRABill @scottalanmiller
                      last edited by

                      @scottalanmiller said:

                      @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                      Is the rate the same? Or is this a random (but common) thing?

                      scottalanmillerS 1 Reply Last reply Reply Quote 0
                      • scottalanmillerS
                        scottalanmiller @BRRABill
                        last edited by

                        @BRRABill said:

                        @scottalanmiller said:

                        @BRRABill SSDs do suffer from mechanically induced failed like Winchester drives.

                        Is the rate the same? Or is this a random (but common) thing?

                        Sorry that was a typo. SSDs do NOT suffer mechanically induced failure.

                        1 Reply Last reply Reply Quote 0
                        • BRRABillB
                          BRRABill
                          last edited by

                          Oh. Phew.

                          What is the point of RAID if that happens?

                          That's it. I'm quitting IT.

                          I've had enough.

                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @BRRABill
                            last edited by

                            @BRRABill not much point to RAID 5, that's what we've been saying for years. By 2009 it was so dangerous that it was actually worse in most cases than doing nothing at all.

                            1 Reply Last reply Reply Quote 0
                            • BRRABillB
                              BRRABill
                              last edited by

                              Well this server is from well before 2009.

                              It's a miracle nothing has happened yet.

                              drewlanderD 1 Reply Last reply Reply Quote 0
                              • scottalanmillerS
                                scottalanmiller
                                last edited by

                                That is indeed pretty old.

                                1 Reply Last reply Reply Quote 0
                                • BRRABillB
                                  BRRABill
                                  last edited by

                                  I'm not going to say exactly HOW old because I've not sure I can take any more heads shaking at me this month. LOL.

                                  1 Reply Last reply Reply Quote 1
                                  • drewlanderD
                                    drewlander
                                    last edited by

                                    @BRRABill said:

                                    0 anxiety scale because that kind of stuff always makes me nervous. Anyway, no problem, I have spare drives on the shelf ready to go. I pull out the

                                    In complete honesty I will admit that one time I was cold swapping a failed drive in a proliant dl360G5 and replaced the wrong one. Fortunately the server wouldnt even boot and I was able to power it down, sort it out and bring it back up. Since then I will never run a server without the backplane kit and hot swappable drive caddies with the status indicator LED.

                                    1 Reply Last reply Reply Quote 2
                                    • drewlanderD
                                      drewlander @BRRABill
                                      last edited by

                                      @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                      BRRABillB 1 Reply Last reply Reply Quote 3
                                      • BRRABillB
                                        BRRABill @drewlander
                                        last edited by

                                        @drewlander said:

                                        @BRRABill Sounds like a situation I had to deal with last year where an organization was running Dell PowerEdge 2950 Gen II pizza boxes. I tried reasoning with them explaining that 9 year old servers should not be production machines for mission critical systems. They didn't seem to care about business continuity until they started failing.

                                        This was a PowerEdge 2800. I've been kind of proud of the fact that I kept these things up and running for so long. And considering the low RAM and age, they still run awesome.

                                        BUT ... like I said it's a miracle that things haven't gone south quicker. The second drive that failed was a replacement drive, which of course was not new.

                                        Key point, as in anything, is to always have a good backup. 🙂

                                        1 Reply Last reply Reply Quote 1
                                        • scottalanmillerS
                                          scottalanmiller
                                          last edited by

                                          We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                          BRRABillB 1 Reply Last reply Reply Quote 0
                                          • BRRABillB
                                            BRRABill @scottalanmiller
                                            last edited by

                                            @scottalanmiller said:

                                            We once had a set of Compaq Proliant 800s that made it a decade without failing. They were all retired effectively still healthy - just old and worthless.

                                            That's about where we are. I've hung lucky mementos in there, and am hoping for the best. 🙂

                                            I actually have a construction paper good luck charm a vendor's wife once gave me a long time ago (before these servers even) that's actually hanging in there. It has done it's job pretty good so far.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 1 / 4
                                            • First post
                                              Last post