ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Xen Server 6.5 + Xen Orchestra w. HA & SAN

    Scheduled Pinned Locked Moved IT Discussion
    112 Posts 8 Posters 37.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ntoxicatorN
      ntoxicator
      last edited by

      Absolutely... Noted.

      Also doesn't help, for 2years I've been begging for an IT budget. They want me being director of IT, but without a budget to work with. Very difficult to make decisions in best interest. I just give 'sugestions'. Then we have a guy internally that will beat up our vendors on pricing, more so on my efforts - or they get the approval. But thats besides the point, and more of internal struggles

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller
        last edited by

        You might want to bring someone in from operation and talk about mitigation strategies should there be downtime. For example...

        • How much can you do with the server down? Lots of companies can keep doing something.
        • If you had extended downtime, could you shift lunch breaks, send people home early, get them back early the next day, do a company picnic, whatever, to offset the downtime?
        • Could you restore critical workloads to old hardware?
        • Could you work from a cloud resource?
        • Could you run off of your backup appliance?

        The list goes on and on.

        1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller @ntoxicator
          last edited by

          @ntoxicator said:

          Also doesn't help, for 2years I've been begging for an IT budget. They want me being director of IT, but without a budget to work with.

          Honestly, while that sounds bad, it is not. IT should never have a budget, that's a bad thing. Budgets mean that no one understands how it works.

          In reality you should buy what is best for the business. That number is always better than the budget number. Generally it is much smaller than what companies will budget - budgets most often cause wild over spending. And sometimes when you need something important, or to invest in the future, a budget kills it and you have to "make due" with something less suited to the needs and financial future of the business.

          1 Reply Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller
            last edited by

            We went through a major "single point of failure" event last year. It was one of those "all hands on deck, massive disasters" that people fear in IT. It was our biggest one in a decade and a half. There was no failover system. It took a monumental effort to get things back online. Everything that could go wrong, did. It was huge, it was painful and it was very, very emotional.

            And when it was all said and done and we did the post mortem... as you should do, the final answer was this....

            Yes, it was painful and emotional and costly... but not as costly as it would have been to have mitigated the risk. We knew, at the end of the outage, how much money was lost. We also know how much we would have spent to have HA to "maybe" have avoided the outage. Had we paid for the HA and had it worked perfectly.... it would still have been the wrong decision. Even having the incredibly unlikely outage that we had, HA would have been the bigger "outage" or "money loss event."

            1 Reply Last reply Reply Quote 1
            • scottalanmillerS
              scottalanmiller
              last edited by

              "As you should do", I realized, can be stressed two ways.

              As you should do or as you should do.

              I meant the latter. Wasn't saying that you should go do one, I meant that after an outage you should run a post mortem.

              1 Reply Last reply Reply Quote 0
              • scottalanmillerS
                scottalanmiller
                last edited by

                Two important things to think about with HA when running numbers...

                • HA isn't fool proof. It can fail and sometimes does. Not often, but it can. So it mitigates only "most" scenarios.
                • HA requires the issues to be IT issues. What if there is a fire or a flood, platform level HA will do nothing.
                1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller
                  last edited by

                  Also... many systems should not use platform HA. Active Directory, for example, you should have HA turned off. You need to quantify which workloads would be on HA and which would not for your calculations.

                  1 Reply Last reply Reply Quote 0
                  • ntoxicatorN
                    ntoxicator
                    last edited by

                    More great info here, Thank you @scottalanmiller

                    1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller
                      last edited by

                      Also... one of the best quotes ever in IT from a senior architect at VMware.... "HA is something that you do, not something that you buy."

                      That's great and you are approaching it well with not looking at just "buying your way out of it." but have you considered that....

                      The first thing for getting HA is moving your servers to a Tier IV datacenter? Your power supply and HVAC are the most critical components of your HA strategy. To look at HA you need things like dual generators, HA HVAC systems, dualing UPS systems that can be failover over, dual rail power supply, etc.

                      You get more uptime moving a single server to a good datacenter than you do putting HA servers on premises. The facilities matter a lot.

                      That's why we generally see six nines from our standard servers. Six nines!!!

                      DashrenderD 1 Reply Last reply Reply Quote 1
                      • scottalanmillerS
                        scottalanmiller
                        last edited by

                        Also, somewhat obvious things that often get lost when talking to management...

                        HA does not cover application errors, data corruption and the like. Those things replicate through the HA. Windows patching issues, downtime for upgrades... HA won't help with those and might make them more complicated.

                        In my experience, most IT outages are not addressed by HA. A big percentage are, maybe 30%, but only 30%. The other 70% you have to mitigate some other way.

                        And hardware issues, that HA protects against mostly, are the easier to remedy. How much does a spare part cost or does a Dell 6 hours to repair warranty cost?

                        1 Reply Last reply Reply Quote 1
                        • ntoxicatorN
                          ntoxicator
                          last edited by

                          I try to plan the environment similar to datacenter setup with N+1, although cant be the case without serious $$.

                          But yes, after your great detail and points of seeing the bigger picture. I've also now considered what it would cost to have a cold spare back-up server. Or just simply the 5-year NBD or as you said 6-hour repair warranty.

                          it all comes full circle as to what downtime costs the company and to what is worth to mitigate the downtime.

                          scottalanmillerS 1 Reply Last reply Reply Quote 0
                          • scottalanmillerS
                            scottalanmiller @ntoxicator
                            last edited by

                            @ntoxicator said:

                            But yes, after your great detail and points of seeing the bigger picture. I've also now considered what it would cost to have a cold spare back-up server. Or just simply the 5-year NBD or as you said 6-hour repair warranty.

                            And, of course, look at lower cost providers like xByte who take a lot of the cost off of getting good gear. It lets you get better warranties and better equipment for a lower price. Better equipment means longer MTBF.

                            ntoxicatorN 1 Reply Last reply Reply Quote 1
                            • ntoxicatorN
                              ntoxicator @scottalanmiller
                              last edited by

                              @scottalanmiller

                              I have reached out to them recently 🙂

                              scottalanmillerS 1 Reply Last reply Reply Quote 1
                              • scottalanmillerS
                                scottalanmiller @ntoxicator
                                last edited by

                                @ntoxicator said:

                                @scottalanmiller

                                I have reached out to them recently 🙂

                                How quickly they can get you a replacement machine is a factor, too.

                                For us, our logistics partners can often get us a server in two hours. A full server. So the need for spare parts goes way down.

                                ntoxicatorN 1 Reply Last reply Reply Quote 0
                                • ntoxicatorN
                                  ntoxicator @scottalanmiller
                                  last edited by

                                  @scottalanmiller

                                  That's 'jimmy johns' fast. Nice!

                                  scottalanmillerS 1 Reply Last reply Reply Quote 2
                                  • scottalanmillerS
                                    scottalanmiller @ntoxicator
                                    last edited by

                                    @ntoxicator said:

                                    @scottalanmiller

                                    That's 'jimmy johns' fast. Nice!

                                    Yeah, we love Softmart.

                                    1 Reply Last reply Reply Quote 0
                                    • DashrenderD
                                      Dashrender @ntoxicator
                                      last edited by

                                      @ntoxicator said:

                                      Ok, why dont you come consult for us then? Explain why HA is not needed and list the negatives and upside.

                                      I dont get why your so anti-HA?

                                      So we get another single server, spec'd full of drives and hope that we dont have a hardware failure

                                      What are chances of mobo dying on Dell R730? or integrated NIC card failing, etc? I suppose low percentage rate.

                                      Of the approximately 100 servers that I personally have supported in the past 15 years I've seen exactly one motherboard failure, and zero RAID card failures. I've seen probably 5 Power Supply failures and perhaps 20-30 drives fail. This is a pretty low number of servers, but gives us easy things to make percentages out of.

                                      So
                                      motherboard failures = 1% over 15 years
                                      RAID controllers = 0%
                                      Power Supplies = 5%
                                      drives - I can't give a real number here because I have no idea how many actual drives were on all these servers over the years. Assuming a minimum of 4, for a total of 400 drives (this is probably low) we'd be talking about a 4-5% failure rate.

                                      So looking at those numbers (as much BS as they really are) we can already see what we make certain parts redundant, and others not. Power Supplies and Drives fail often, at around 5% so we've setup multiple Power Supply systems and drives we've created RAID to keep the systems running in case of a failure.

                                      But Mobo's and RAID cards fail so infrequently that we don't worry about it.

                                      scottalanmillerS 2 Replies Last reply Reply Quote 0
                                      • scottalanmillerS
                                        scottalanmiller @Dashrender
                                        last edited by

                                        @Dashrender said:

                                        I've seen probably 5 Power Supply failures and perhaps 20-30 drives fail.

                                        And those are hot swap in any enterprise class server, even entry level. So they don't result in down time.

                                        BRRABillB 1 Reply Last reply Reply Quote 0
                                        • BRRABillB
                                          BRRABill @scottalanmiller
                                          last edited by

                                          @scottalanmiller said:

                                          @Dashrender said:

                                          I've seen probably 5 Power Supply failures and perhaps 20-30 drives fail.

                                          And those are hot swap in any enterprise class server, even entry level. So they don't result in down time.

                                          Until they blow out the other drives! 🙂

                                          1 Reply Last reply Reply Quote 0
                                          • scottalanmillerS
                                            scottalanmiller @Dashrender
                                            last edited by

                                            @Dashrender said:

                                            But Mobo's and RAID cards fail so infrequently that we don't worry about it.

                                            High end server make those redundant too. they are the least likely to go bad and teh most expensive to make redundant so that is why they are avoided. But an Integrity, Oracle M or IBM i or z will all do redundant there.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 6
                                            • 3 / 6
                                            • First post
                                              Last post