ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    How Reliable Is Your Server

    Scheduled Pinned Locked Moved IT Discussion
    best practiceserverriskrisk analysis
    30 Posts 10 Posters 7.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller
      last edited by

      Servers are unreliable things, right? We hear about this all of the time. Everyone is concerned that their server(s) will fail. They fail left and right. It happens all of the time. Servers are fragile and need risk mitigation for nearly all situations. They don't even have internally redundant components, right?

      Wrong. None of this is true. Once upon a time it was true that commodity servers and their corresponding operating systems were highly fragile, but this was the early 1990s and even then the failure risks were mostly limited to the storage layer and mostly limited to only those systems where cost cutting measures reduced reliability far below what was available at the time.

      Enterprise servers have long been highly reliable, even going back to the 1970s, and commodity servers entered this world of being highly reliable by the late 1990s and in the 2000s moved even closer, especially with the advents of 64bit computing and full virtualization. Today commodity enterprise servers like those from HPE's Proliant line and Dell's PowerEdge line are incredibly reliable. When properly designed, built and maintained reliability might move towards the six nines range! This puts a normal server well into consideration for "high availability" right from the onset today.

      Standard servers do this through a couple techniques. One is by simply using very solid, well engineered components. Parts like processors and motherboards have come a very long way and almost never fail, even after a decade or more of continuous abuse. But some parts will always continue to have some risk, power supplies and hard drives being some of the riskiest components. In modern enterprise commodity servers nearly all reasonable components are redundant and field serviceable and nearly always hot swappable. Hot swap power supplies, hard drives, fans and more are standard. Pretty much every component with significant risk is already redundant, field replaceable and can be done live without any downtime even after a component has failed. And others, like NICs, are often redundant as well.

      Even two decades ago it was standard to have hot swappable PCI slots so that support components could be replaced without downtime!

      Of course these are only commodity servers that we are talking about. Today even AMD64 architecture servers are available in non-commodity approaches (mini computers and better.) RAS features (reliability, accesibility and serviceability) on mini (HPE Integrity, SuperDome, Oracle M, IBM Power, Fujitsu Sparc) and mainframe systems are extreme and go far beyond what can be done with commodity servers. Hot swappable memory, backplanes, CPUs, controllers and even motherboards are standardly available. Downtime isn't a word that systems like these know, at all.

      Simply put, servers today are not the fragile things that they were twenty or thirty or even forty years ago. Servers are generally rock solid, incredibly reliable devices. The idea that servers will simply die regularly, that they are unreliable and need to be protected from hardware failure in all cases is emotional, irrational and based off of the fears of not just a different era, but a totally different generation entirely.

      Before giving in to fear that your server will stop functioning every few months, take a minute to think... perhaps your servers are more reliable than you give them credit for.

      1 Reply Last reply Reply Quote 6
      • scottalanmillerS
        scottalanmiller
        last edited by

        And all of that is before we introduce new software technologies like virtualization that allow us to build server clusters. There are many software techniques available on commodity servers today that were also only on mini-computers and better two decades and longer ago. Good hardware and software and system design and maintenance in combination is what produces highly reliable systems. Modern approaches in all of these areas are providing us with reliability and uptimes that were unthinkable just two decades ago.

        1 Reply Last reply Reply Quote 2
        • BRRABillB
          BRRABill
          last edited by

          I am more afraid of the software ON the server than the server itself.

          Oh, and recently, USB boot devices. 🙂

          A scottalanmillerS C 3 Replies Last reply Reply Quote 5
          • A
            Alex Sage @BRRABill
            last edited by Alex Sage

            @BRRABill said :

            Oh, and recently, USB boot devices. 🙂

            That's why you always have two USB devices 😉

            1 Reply Last reply Reply Quote 1
            • scottalanmillerS
              scottalanmiller @BRRABill
              last edited by

              @BRRABill said in How Reliable Is Your Server:

              I am more afraid of the software ON the server than the server itself.

              Oh, and recently, USB boot devices. 🙂

              You don't protect that in the same way. You just restore from backup.

              DustinB3403D 1 Reply Last reply Reply Quote 0
              • DustinB3403D
                DustinB3403 @scottalanmiller
                last edited by

                @scottalanmiller said in How Reliable Is Your Server:

                @BRRABill said in How Reliable Is Your Server:

                I am more afraid of the software ON the server than the server itself.

                Oh, and recently, USB boot devices. 🙂

                You don't protect that in the same way. You just restore from backup.

                How many spare Bootable USB clones would you recommend for most SMB's?

                It might seem like an odd question but I'm sure someone is asking it.

                I keep 1 spare at all times for my USB booted systems.

                DanpD 1 Reply Last reply Reply Quote 0
                • MattSpellerM
                  MattSpeller
                  last edited by

                  Humans are the weakness in the chain, have been for a long time now.

                  Chances are exceptionally good that if you are having issues, PEBKAC.

                  1 Reply Last reply Reply Quote 1
                  • DanpD
                    Danp @DustinB3403
                    last edited by

                    @DustinB3403 Just order 3 more of these from Amazon for $12.49 each (current Deal of the Day).

                    1 Reply Last reply Reply Quote 0
                    • BRRABillB
                      BRRABill
                      last edited by

                      Just seems like a strange thing, to have to keep spare USBs around because they fail.

                      I mean, yeah they are cheaper, but what is the time cost to keep making backups, keep buying USBs, manage the backups and USBs, etc...

                      You could setup a small 2 disk array and accomplish the same thing. Probably easier (at least on XS) to just backup the config.

                      What is the point again? Or am I, like usual, missing it?

                      DustinB3403D 1 Reply Last reply Reply Quote 0
                      • DustinB3403D
                        DustinB3403 @BRRABill
                        last edited by

                        @BRRABill said in How Reliable Is Your Server:

                        Just seems like a strange thing, to have to keep spare USBs around because they fail.

                        I mean, yeah they are cheaper, but what is the time cost to keep making backups, keep buying USBs, manage the backups and USBs, etc...

                        You could setup a small 2 disk array and accomplish the same thing. Probably easier (at least on XS) to just backup the config.

                        What is the point again? Or am I, like usual, missing it?

                        They don't fail that often is what you're missing.

                        The time to clone a USB in minutes a month (or every few months).

                        The time to restore a config in XS would be hours, at the point in time it crashes. If not longer. Plus you have no recent backup to work from.

                        BRRABillB 1 Reply Last reply Reply Quote 0
                        • BRRABillB
                          BRRABill @DustinB3403
                          last edited by

                          @DustinB3403 said

                          They don't fail that often is what you're missing.

                          The time to clone a USB in minutes a month (or every few months).

                          The time to restore a config in XS would be hours, at the point in time it crashes. If not longer. Plus you have no recent backup to work from.

                          Understood.

                          Do they really not fail that much? We've seen a few on ML just this month.

                          Coincidence, maybe.

                          I wonder if you let the logs write to the USB stick, would it really die quickly, anyway?

                          travisdh1T stacksofplatesS 2 Replies Last reply Reply Quote 0
                          • travisdh1T
                            travisdh1 @BRRABill
                            last edited by

                            @BRRABill said in How Reliable Is Your Server:

                            @DustinB3403 said

                            They don't fail that often is what you're missing.

                            The time to clone a USB in minutes a month (or every few months).

                            The time to restore a config in XS would be hours, at the point in time it crashes. If not longer. Plus you have no recent backup to work from.

                            Understood.

                            Do they really not fail that much? We've seen a few on ML just this month.

                            Coincidence, maybe.

                            I wonder if you let the logs write to the USB stick, would it really die quickly, anyway?

                            USB storage sticks are very hit and miss with their reliability.

                            The only brand I haven't had a problem with are the Micro Center branded USB drives. They're also the only ones I know of that give you a lifetime guarantee. Walk in with a bad USB drive and walk out with a new one.

                            1 Reply Last reply Reply Quote 0
                            • stacksofplatesS
                              stacksofplates @BRRABill
                              last edited by

                              @BRRABill said in How Reliable Is Your Server:

                              @DustinB3403 said

                              They don't fail that often is what you're missing.

                              The time to clone a USB in minutes a month (or every few months).

                              The time to restore a config in XS would be hours, at the point in time it crashes. If not longer. Plus you have no recent backup to work from.

                              Understood.

                              Do they really not fail that much? We've seen a few on ML just this month.

                              Coincidence, maybe.

                              I wonder if you let the logs write to the USB stick, would it really die quickly, anyway?

                              You can always just have one extra USB and keep a copy of the image somewhere. Once you need to use your backup USB then just order another one and write the image to it.

                              BRRABillB 1 Reply Last reply Reply Quote 0
                              • BRRABillB
                                BRRABill @stacksofplates
                                last edited by

                                @stacksofplates said

                                You can always just have one extra USB and keep a copy of the image somewhere. Once you need to use your backup USB then just order another one and write the image to it.

                                I guess my point is going along with the OP of reliability, that two small SATA drives would probably run for years without needing a reboot. My servers are 10+ years old, and have just recently started having drive failures. That 24x7x365x10 (or whatever haha) without needing spares and worrying constantly it was going to fail.

                                Why introduce that is so finicky into a server situation if we are concerned about reliability.

                                stacksofplatesS 1 Reply Last reply Reply Quote 2
                                • stacksofplatesS
                                  stacksofplates @BRRABill
                                  last edited by stacksofplates

                                  @BRRABill said in How Reliable Is Your Server:

                                  @stacksofplates said

                                  You can always just have one extra USB and keep a copy of the image somewhere. Once you need to use your backup USB then just order another one and write the image to it.

                                  I guess my point is going along with the OP of reliability, that two small SATA drives would probably run for years without needing a reboot. My servers are 10+ years old, and have just recently started having drive failures. That 24x7x365x10 (or whatever haha) without needing spares and worrying constantly it was going to fail.

                                  Why introduce that is so finicky into a server situation if we are concerned about reliability.

                                  Oh I'm not arguing that they drives wouldn't last longer, just that it's cheap to replicate the USB drives. I think stopping log writing to the USB drive would drastically increase the life of it. You could also just load the whole hypervisor to a RAM disk ha.

                                  BRRABillB scottalanmillerS 2 Replies Last reply Reply Quote 2
                                  • BRRABillB
                                    BRRABill @stacksofplates
                                    last edited by

                                    @stacksofplates said

                                    I think stopping log writing to the USB drive would drastically increase the life of it.

                                    Haha my server responded to me doing this by crashing and burning.

                                    stacksofplatesS 1 Reply Last reply Reply Quote 0
                                    • stacksofplatesS
                                      stacksofplates @BRRABill
                                      last edited by

                                      @BRRABill said in How Reliable Is Your Server:

                                      @stacksofplates said

                                      I think stopping log writing to the USB drive would drastically increase the life of it.

                                      Haha my server responded to me doing this by crashing and burning.

                                      Lol, you could always do software RAID 1 with two USB drives.

                                      1 Reply Last reply Reply Quote 1
                                      • scottalanmillerS
                                        scottalanmiller @stacksofplates
                                        last edited by

                                        @stacksofplates said in How Reliable Is Your Server:

                                        @BRRABill said in How Reliable Is Your Server:

                                        @stacksofplates said

                                        You can always just have one extra USB and keep a copy of the image somewhere. Once you need to use your backup USB then just order another one and write the image to it.

                                        I guess my point is going along with the OP of reliability, that two small SATA drives would probably run for years without needing a reboot. My servers are 10+ years old, and have just recently started having drive failures. That 24x7x365x10 (or whatever haha) without needing spares and worrying constantly it was going to fail.

                                        Why introduce that is so finicky into a server situation if we are concerned about reliability.

                                        Oh I'm not arguing that they drives wouldn't last longer, just that it's cheap to replicate the USB drives. I think stopping log writing to the USB drive would drastically increase the life of it. You could also just load the whole hypervisor to a RAM disk ha.

                                        Outside of systems logging to the USBs dying, I really never run into them having problems.

                                        BRRABillB 1 Reply Last reply Reply Quote 0
                                        • BRRABillB
                                          BRRABill @scottalanmiller
                                          last edited by

                                          @scottalanmiller said

                                          Outside of systems logging to the USBs dying, I really never run into them having problems.

                                          Just for giggles, how much data do you think can be written to a USB drive before it kicks the bucket?

                                          Like say you left logging on for some crazy reason.

                                          How long would you feel "safe" using the USB?

                                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @BRRABill
                                            last edited by

                                            @BRRABill said in How Reliable Is Your Server:

                                            @scottalanmiller said

                                            Outside of systems logging to the USBs dying, I really never run into them having problems.

                                            Just for giggles, how much data do you think can be written to a USB drive before it kicks the bucket?

                                            Like say you left logging on for some crazy reason.

                                            How long would you feel "safe" using the USB?

                                            Not very long. Totally not how they are meant to be used. Their utility is in being a write once, read many device. In fact, I'd recommend hitting that little lock option on the side if it is available.

                                            BRRABillB 1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 2 / 2
                                            • First post
                                              Last post