I.T. Availability Management
(Facts & Fiction)
Availability What?
So here’s the scenario. One of the many hats you are wearing at the non-profit organization that you work for is to oversee I.T. services. In this role you are asked to find a new hosted service to manage payroll. You are very familiar with the business processes that support payroll functions for your organization and you have narrowed the candidates to just a few options. As you review the services they offer, you make your way down to the section that describes the “availability” of their service and it reads something like “We provide 99.999% availability” or “Availability: Five 9s.”
You don’t have a technical background, and you aren’t sure what this means. However, someone told you that whatever service you contract with needs to offer “Five 9s of availability so that is what you are looking for in a service. Still, since you really aren’t sure what term means you decide to search the internet on the topic of “Five 9s” to educate yourself on the subject. Your search yields a variety of results from exact calculations to abstract ideas and definitions of “Five 9s.”
Here’s the deal. It is a generally accepted practice to communicate the availability of a service as a percentage, (i.e. 99.999%). However, the term Five 9s or 99.999% can be misleading. It has almost become more of a marketing term than a true calculation of the availability of an I.T. service. With that in mind, rather than give you an availability table with percentages and their corresponding timeframes, I would rather share with you how to wisely and intelligently analyze technology service availability and introduce you to the subject of availability management..
Why?
Organizations continue to grow in their reliance on third party vendors to provide application, infrastructure and support services to meet their technology needs. Many of these services are accessed through an internet browser and all that is required for many of these services is a decent internet connect. This means that in many situations, business units other than I.T. are making decisions about technology solutions that were once the sole responsibility of the I.T. department. Technology providers take advantage of this knowledge gap and gloss over certain technical details. The result is that the customer ends up paying for more availability than they need or they end up having a false sense of security that the service they are paying for will be there when they need it. With this in mind, I am not necessarily making a plug for I.T. departments and I am not saying that many of these decisions should continue to fall under I.T. departments. As technology becomes more and more democratized, decisions for how technology is used in an organization has also moved beyond the boundaries of I.T. departments.
How?
With all of this in mind, Here are a couple tips to help the business manager in a non-profit make an educated analysis when it comes to managing the “availability” risk of an I.T. service they are investigating:
1. Start with a standard Definition of Availability
A generally accepted definition of availability as it applies to I.T. Services is the: Ability of a Configuration Item or IT Service to perform its agreed Function when required. (Note: A configuration item or CI is just a fancy term for computer, server, device, etc.) Translation: Will the service I am paying for be there when I need it.?
For example, let’s say that you are looking are a new hosted payroll system. Rather than looking for five 9s of availability, or 3 nines of availability or whatever, ask yourself the following question instead. Will this service be available when I need it? You may conclude that you need to this service to be available from 8 a.m. to 5 p.m., Monday Through Friday. From this point of view the question of percentage of availability really is really only relevant to the times that you need to access the service.
Google Apps is usually available 99.9% of the time. That is 3 nines of availability for those of you keeping score. This means that I can expect that Google apps may not be available for up to around 43.8 minutes a month. On one hand that seems like a lot of down-time. However, one the other hand, I can’t remember a time when I could not access my Google apps account. Therefore, it has been available when I have needed. In this case “3 nines” of availability has been more that enough for me. If I allowed myself to get caught up in a sales pitch that tried to convince me that though their service costs more, they offer “Five 9s” of availability, I would be paying for more for that service than needed.
2. Understand what goes into the availability of a Service.
Gaining a clear understanding of the concept of availability when it comes to technology services can be challenging. Even a service like a credit card processing service that claims to be available 99.999% of the time could add little value to an organization if there is a pattern of service interruptions; even if the interruptions are only a few seconds a day. Learning the factors that go into calculating availability can clarify confusion and help to ensure that the service you are paying for will be available when you need it.
According to ITILv3, Availability is determined by Reliability, Maintainability, Serviceability, Performance and Security.
- Reliability is a measure of how long a Configuration Item or IT Service can perform its agreed Function without interruption.
- Maintainability is a measure of how quickly and Effectively a Configuration Item or IT Service can be restored to normal working after a failure. In the case of Software as a Service, maintainability can also be applied as a measure of how easy it is to make changes and/or repairs to the software.
- Serviceability refers to the “contractual conditions with a given supplier covering the availability of, and the conditions under which the contractual conditions are valid for, a Configuration Item or system.
- Performance is measure of what is achieved or delivered by a System, person, team, Process or IT Service.
- Security is defined as the “process of ensuring that services are used in an appropriate way by the appropriate people.”
I will dig into each of the items listed above in future posts. However, even a basic understanding of the elements that determine the availability of a technology service can go along way toward helping you craft intelligent analysis questions like:
- How often is the performance of the service impacted by unplanned interruptions?
- How long do service interruptions last on average?
- What are the terms and conditions as to when the promise of availability is in effect?
- What needs to be achieved in order to call the service available? For example, in the case of an online credit card processor, they might commit to a certain level of availability for credit card processing but exclude access to reporting from that commitment.
- To whom does the service need to be available in order to label the service as available? What are the conditions for that availability. For example, I know of a service provider that blocked access to the admin dashboard to anyone who tried to gain access from a specific location for 24 hours because of too many failed login attempts. It was a documented security measure that released the the provider from any availability commitments. It was disruptive to the user. However, the disruption was at least expected.
3. Examine References, Customer Forums, Product Reviews and the Provider Directly
I believe one of the most effective ways to manage risk when it comes to the Availability of a technology service that I am investigating is to take questions like the ones I mentioned above and apply them to references, customer forums, product reviews as well as to the provider directly. Many service providers claim Five 9s of availability. In today’s social media crazed world we live in, it is not hard to find information about technology services that we are interested in. You might search an independent forum on the reliability of a specific I.T. service that you are researching. If you notice that there have been availability issues, You might post some additional questions asking how long service interruptions last. You might inquire as to how the service provider handled the outage. Did they honor their commitment to provide service credits if offered? You might even ask the service provider how they came up with their availability percentage figures, or what the timeframe is that they use to make their availability calculations, (e.g. monthly, annually, billing cycle, 24/7, 8/5).
Conclusion
As more and more organizations rely on 3rd party technology services to support their operational requirements, it is important for organizations to have a clear understanding about what they can expect when it comes managing the availability risks of the technology services they rely on.
Key points to successful availability management include:
- A clear generally accepted standard definition of availability (Will the service be available when I need it?)
- A clear understanding of when you need to access the service
- A clear understanding of the elements that determine availability
- A concise list of questions that highlight availability elements
- A simple application of those questions as they apply to references, customer forums, product reviews and even the service provider directly
The points we covered in this post are not prescriptive or exhaustive. However, I hope that this information helps my friends who spend the best hours of their days helping non-profits run smoothly avoid costly surprises that lower productivity and momentum.
Further Reading/Study
- Wikipedia Definition of High Availability
-
Helpful Video Tutorial on Availability from the ITIL perspective.
- ITIL Official Website
- ITIL 60 Overview