Availability Management for IT Services Best Practice Handbook: Proactively manage and maintain Service Levels to meet SLA expectations in Reliability, Maintainability, Serviceability, Resilience and Security - Ready to use bringing Theory into Action
Notice of Rights: Copyright © The Art Of Service. All rights reserved. No part of this book may be reproduced or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Notice of Liability: The information in this book is distributed on an “As Is” basis without warranty. While every precaution has been taken in the preparation of the book, neither the author nor the publisher shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the instructions contained in this book or by the products described in it. Trademarks: Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations appear as requested by the owner of the trademark. All other product names and services identified throughout this book are used in editorial fashion only and for the benefit of such companies with no intention of infringement of the trademark. No such use, or the use of any trade name, is intended to convey endorsement or other affiliation with this book. ITIL® is a Registered Community Trade Mark of OGC (Office of Government Commerce, London, UK), and is Registered in the U.S. Patent and Trademark Office.
Write a Review and Receive a Bonus Emereo eBook of Your Choice
Up to $99 RRP – Absolutely Free If you recently bought this book we would love to hear from you – submit a review of this title and you’ll receive an additional free ebook of your choice from our catalog at http://www.emereo.org.
How Does it Work? Submit your review of this title via the online store where you purchased it. For example, to post a review on Amazon, just log in to your account and click on the ‘Create Your Own Review’ button (under ‘Customer Reviews’) on the relevant product page (you’ll find plenty of example product reviews on Amazon). If you purchased from a different online store, simply follow their procedures.
What Happens When I Submit my Review? Once you have submitted your review, send us an email via
[email protected], and include a link to your review and a link to the free eBook you’d like as our thank-you (from http://www.emereo.org – choose any book you like from the catalog, up to $99 RRP). You will then receive a reply email back from us, complete with your bonus ebook download link. It's that simple!
Availability Management Workbook
Table of Contents INTRODUCTION ROADMAP ................................................................................................... 5 AVAILABILITY MANAGEMENT ............................................................................................... 9 SUPPORTING DOCUMENTS ................................................................................................ 43 Objectives and Goals .......................................................................................................... 45 Policies, Objectives and Goals............................................................................................ 49 Business Justification Document ........................................................................................ 53 Recovery Template ............................................................................................................. 59 Component Failure Impact Analysis ................................................................................... 67 Availability Requirements .................................................................................................... 73 Roles & Responsibilities...................................................................................................... 81 Availability Management Process Manager........................................................................ 83 Service Outage Analysis ..................................................................................................... 87 Reports, KPIs and other Metrics ......................................................................................... 95 Communication Plan ......................................................................................................... 101 IMPLEMENTATION PLAN – PROJECT PLAN.................................................................... 107 FURTHER INFORMATION................................................................................................... 115
3
Also from Emereo Publishing and The Art of Service:
IT Service Operations Management Guide: Your Complete Guide to Managing an IT Service Operation A professional technical roadmap to ITIL V3 Framework IT Service Operations Management (Incident, Event, Problem and Access Management, plus Request Fulfilment) with 34 templates and design documents for organizational assessment and implementation.
Availability Management Workbook
INTRODUCTION ROADMAP Many organizations are looking to implement Availability Management as a way to improve the structure and quality of the business.
This document describes the contents of the Availability Management Workbook. The information found within the book is based on the ITIL Version 3 framework, specifically the Service Design phase which incorporates the updated ITIL version 3 Availability Management process.
The workbook is designed to answer a lot of the questions that Availability Management process raises and provides you with useful guides, templates and essential, but simple assessments.
The supporting documents and assessments will help you identify the areas within your organization that require the most activity in terms of change and improvement.
Presentations can be used to educate or be used as the basis for management presentations or when making business cases for Availability Management implementation.
The additional information and bonus resources will enable you to improve your organizations methodology knowledge base.
The workbook serves to act as a starting point. It will give you a clear path to travel. It is designed to be a valuable source of information and activities.
The Availability Management Workbook:
Flows logically, Is scalable, Provides presentations, templates and documents Saves you time 5
Availability Management Workbook Step 1
Start by reviewing the PowerPoint presentation. •
Availability Management
This presentation will give you a good knowledge and understanding of all the terms, activities and concepts required within the Availability Management process. It can also be used as the basis for management presentations or when making a formal business case for Availability Management implementation. Make sure you pay close attention to the notes pages, as well as the slides, as references to further documents and resources are highlighted here.
6
Availability Management Workbook Step 2
If you did not look at the supporting documents and resources when prompted during the PowerPoint presentations, do this now.
Below is an itemized list of the supporting documents and resources for easy reference. You can use these documents and resources within your own organization or as a template to help you in prepare your own bespoke documentation. •
Objectives and Goals
•
Policies, Objectives and Scope
•
Business Justification Document
•
Recovery Template
•
Component Failure Impact Analysis
•
Availability Requirements
•
Roles and Responsibilities
•
Availability Management Process Manager
•
Service Outage Analysis
•
Reports, KPIs and other Metrics
•
Communication Plan
7
Availability Management Workbook Step 3
Alternatively, continue by working through the Availability Management Implementation & Project Plan with the focus on your organization.
8
Availability Management Workbook
AVAILABILITY MANAGEMENT
Availability Management needs to ensure that the level of service availability delivered in all services is matched to or exceeds the current and future agreed needs of the business, in a cost effective manner.
9
Availability Management Workbook
Availability Management should ensure the agreed level of availability is provided. The measurement and monitoring of IT availability is a key activity to ensure availability levels are being met consistently. Availability Management should look to continually optimize and proactively improve the availability of the IT infrastructure, the services and the supporting organization, in order to provide cost-effective availability improvements that can deliver business and customer benefits.
10
Availability Management Workbook
More information on Objectives and Goals can be found on page 45.
11
Availability Management Workbook
Availability Management needs to understand the service and component availability requirements from the business perspective in terms of the: •
Current business processes, their operation and requirements
•
Future business plans and requirements
•
Service targets and the current IT service operation and delivery
•
IT infrastructure, data, applications and environments and their performance
•
Business impacts and priorities in relation to the services and their usage.
•
Understanding all of this will enable Availability Management to ensure that all services and components are designed and delivered to meet their targets in terms of agreed business needs.
There is a Policies, Objectives and Scope document available on page 49.
12
Availability Management Workbook
The Availability Management process and planning should be involved in all stages of the Service Lifecycle, from Strategy to Design, through to Transition and Operation to Improvement. The appropriate availability and resilience should be designed into services and components from the initial design stages. This ensures that not only will availability if new or changed services meet expected targets, but also that all existing services and components continue to meet all their targets.
There is a Business Justification Document available on page 53.
13
Availability Management Workbook
The process is continually trying to ensure that all operational services meet their agreed availability targets, and that new or changed services are designed appropriately to meet their intended targets, without compromising the performance of existing services. In order to achieve this Availability Management performs both reactive and proactive activities.
Proactive activities: Involve the proactive planning, design and improvement of availability. These activities are principally involved within design and planning roles. (Service Design Phase).
Reactive activities: Involves the monitoring, measuring, analysis and management of all events, incidents and problems regarding availability. (Service Operation Phase).
14
Availability Management Workbook
Availability Management relies on the monitoring, measurement, analysis and reporting of the following aspects: •
Availability
•
Reliability
•
Maintainability
•
Serviceability.
(See pages 16-17 for definitions)
15
Availability Management Workbook
Availability Management relies on the monitoring, measurement, analysis and reporting of the following aspects:
Security: Security Management determines requirements, Availability Management implements measures
Availability: The ability of an IT Service or component to perform its required function at a stated instant or over a stated period of time.
Reliability: Freedom from operational failure.
Resilience: The ability to withstand failure.
Maintainability (internal): The ability of an IT component to be retained in or restored to, an operational state - based on skills, knowledge, technology, backups, availability of staff.
16
Availability Management Workbook
continued… Serviceability (external): The contractual obligation / arrangements made with 3rd party external suppliers. Measured by Availability, Reliability and Maintainability of IT Service and components under control of the external suppliers - managed by Supplier Management in Service Design
Vital Business Function (VBF): The business critical elements of the business process supported by an IT Service.
17
Availability Management Workbook
Mean time between Failures (MTBF) or uptime •
average time between the recovery from one incident and the occurrence of the next incident, relates to the reliability of the service
Mean time to Restore Service (MTRS) or downtime •
Average time taken to restore a CI or IT service after a failure.
•
Measured from when CI or IT service fails until it is fully restored and delivering its normal functionality.
Mean time between System Incidents (MTBSI): •
Average time between the occurrences of two consecutive incidents.
•
Sum of the MTRS and MTBF.
Relationships: •
high ratio of MTBF/MTBSI indicates there are many minor faults
•
low ratio of MTBF/MTBSI indicates there are few major faults
18
Availability Management Workbook
Lifecycle of an Incident (Availability Management Metrics)
Detection Time: Time for the service provider to be informed of the fault. (reported)
Diagnosis Time: Time for the service provider to respond after diagnosis completed
Repair Time: Time the service provider restores the components that caused the fault. Calculated from diagnosis to recovery time
Restoration Time: (MTRS) The agreed level of service is restored to the user. Calculated from detection to restore point.
Restore Point: The point where the agreed level of service has been restored
A Recovery Template can be found in a separate document on page 59.
19
Availability Management Workbook
4 easy steps to calculating Availability!
Example: 24x7 Service with 2 hours agreed downtime for maintenance
•
Calculate Agreed Service Time
•
24x7= 168 hours per week
2. Subtract Agreed Downtime (2 hours per week in this example) 168 – 2 = 166
3. Divide the result by the original Agreed Service Time 166/168 = 0.988
4. Multiply by 100 0.988x100 = 98.80%
20
Availability Management Workbook
Measurement and reporting can provide the basis for: •
Monitoring the actual availability delivered versus agreed targets
•
Establishing measures of availability and agreeing availability targets with the business
•
Identifying unacceptable levels of availability that impact business and users
•
Reviewing availability with the IT support organization
•
Continual improvement activities to optimize availability.
21
Availability Management Workbook
The whole point of collecting these availability measurements and reports is to improve the quality and availability of IT service provided to the business and users. All measures, reports and activities should reflect this purpose. Availability, when measured and reported to reflect the experience of the user, provides a more representative view of an overall IT service quality. The user view of availability is influenced by 3 main factors: •
Frequency of downtime
•
Duration of downtime
•
Scope of impact
22
Availability Management Workbook
Definition: Service Failure Analysis
An activity that identifies underlying causes of one or more IT service interruptions. SFA identifies opportunities to improve the IT service provider’s processes and tools, and not just the IT infrastructure. SFA is time-constrained, project-like activity, rather than an ongoing process of analysis.
SFA initiatives should use input from all areas and all processes including, most importantly, the business and users. Each SFA assignment should have recognized sponsor(S) and involve resources from many technical and process areas.
23
Availability Management Workbook
The use of the SFA approach: •
Provides the ability to deliver enhanced levels of availability without major cost
•
Provides the business with visible commitment from the IT support organization
•
Develops in-house skills and competencies to avoid expensive consultancy assignments related to availability improvement
•
Encourages cross-functional team working and breaks barriers between teams
•
Provides a program of improvement opportunities that are focused on delivering benefit to the user
•
Provides an independent ‘health check’ of IT Service Management processes and is the trigger for process improvement.
A Component Failure Impact Analysis template can be found in a separate document on page 67.
24
Availability Management Workbook
Select opportunity: prior to scheduling the SFA agree which IT service or technology is to be selected.
Scope assignment: state explicitly which areas are to be covered (documented in Terms of reference).
Plan assignment: plan a number of weeks in advance, with a committed set of resources.
Build hypothesis: useful method of building likely scenarios to aid early conclusions within the analysis period.
Analyze data: the SFA team dictates how to allocate specific analysis responsibilities.
Interview key personnel: to capture user and business perspectives.
Finding and conclusions: documented initial findings and conclusions, supported by evidence and facts gathered during analysis.
Recommendations: SFA team will formulate recommendations from previous step.
25
Availability Management Workbook
continued…
Report: final report to be issued with a management summary.
Validation: it is recommended that for each of the SFA key measures that reflect the business and user perspectives prior to the assignment are captured and recorded as the ‘before’ view. As SFA recommendations are progressed, the positive impacts in availability should be captured to provide the ‘after’ view.
26
Availability Management Workbook
The next few slides will show the activities that are the proactive techniques of the Availability Management process.
27
Availability Management Workbook
Definition: Vital Business Function (VBF)
A function of a business process that is critical to the success of the business. Vital Business Functions are an important consideration of Business Continuity Management, IT Service Continuity Management and Availability Management.
28
Availability Management Workbook
When considering how the availability requirements of the business are to be met, it is important to ensure that the level of availability to be provided for an IT service is at the level actually required, and is affordable and cost justifiable to the business. The example above indicates the products and processes required to provide varying levels of availability and the cost implications. Availability Requirements can be found in a separate document on page 73.
29
Availability Management Workbook
Additional investment required to achieve higher levels of availability will be wasted and availability levels not met if these base products and components are unreliable and prone to failure.
30
Availability Management Workbook
The design also needs to eliminate or minimize the effects of planned downtime to the business operation normally required to accommodate maintenance activity, the implementation of changes in to the IT infrastructure or business application.
31
Availability Management Workbook
The business requirements for IT availability should contain at least: •
A definition of the VBF’s supported by the IT Service
•
\A definition of IT service downtime, i.e. the conditions under which the business considers the IT service to be unavailable
•
The business impact cause by loss of service, together with the associated risk
•
Quantitative availability requirements, i.e. the extent to which the business tolerates IT service downtime or degraded service.
•
The required service hours
•
An assessment of the relative importance of different working periods
•
Specific security requirements
•
The service backup and recovery capability.
32
Availability Management Workbook
The PSO contains details of all the scheduled and planned service downtime within the agreed service hours for all services. These documents should be agreed with all the appropriate areas and representatives for both the business and IT. Once the PSO has been agreed, the Service Desk will ensure that it is communicated to all relevant parties so that everyone is aware of any additional planned downtime.
A Service Outage Analysis can be found on page 87.
33
Availability Management Workbook
The criticality of services will often change and it is important that the design and the technology supporting such services is regularly reviewed and improved by Availability Management to ensure that the change of importance in the service is reflected within the revised design and supporting technology. Where the agreed levels of availability re already being delivered, it may take considerable effort and incur significant cost to achieve a small incremental improvement within the level of availability.
34
Availability Management Workbook
A number of sources of information are relevant to the Availability Management process, they are summarized above.
35
Availability Management Workbook
The outputs produced by Availability Management should include the examples summarized above.
36
Availability Management Workbook
You can find the Roles & Responsibilities of Availability Management and the Availability Management Process Manager, within separate documents on pages 81 and 83 respectively.
37
Availability Management Workbook
These are just some examples that can be used to measure the effectiveness and efficiency of Availability Management.
There is more information on Reports, KPIs and other Metrics in a separate document on page 95.
38
Availability Management Workbook
Availability Management faces many challenges, but probably the main challenge is meeting the expectations of customers, the business and senior management. These expectations are that services will always be available on a 24-hour, 365 basis. When they aren’t, it is assumed that they will be recovered within minutes.
This is only the case when the appropriate level of investment and design had been applied to the service, and this should only be made where the business impact justifies the level of investment. However, the message needs to be publicized to all customers and areas of the business, so that when services do fail they have the right level of expectation on their recovery. It also means that Availability Management must have the right access to the right level of quality information on the current business need for IT services and its plans for the future. This is another challenge faced by many Availability Management processes.
A Communication Plan can be found on page 101.
39
Availability Management Workbook
The main Critical Success Factors for the Availability Management process are summarized above.
40
Availability Management Workbook
Some of the major risks associated with the Availability Management process are summarized above.
41
Availability Management Workbook
42
Availability Management Workbook
SUPPORTING DOCUMENTS Through the documents, look for text surrounded by << and >> these are indicators for you to create some specific text.
Watch also for highlighted text which provides further guidance and instructions.
43
Availability Management Workbook
44
Availability Management Workbook
Objectives and Goals
IT Services Detailed Objectives/Goals Process: Availability Management
Status: Version:
0.1
Release Date:
45
Availability Management Workbook Detailed Objectives/Goals for Availability Management The document is not to be considered an extensive statement as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered. The detailed objectives for Availability Management should include the following salient points: Objective To ensure a high level of Availability of IT Services and the supporting infrastructure through optimisation. Availability management will provide a cost effective and sustained level of availability that is aligned with needs and objectives of the business.
Notes Met/Exceeded/Shortfall ☺ Dates/names/role titles
Minimise the adverse affects on the IT Infrastructure and the Business by designing for Availability. Once developed an Availability Management process can be used to plan for availability recovery for the business before loss of service can cause significant harm to the IT services being delivered. To establish efficient assessment guidelines that covers the business, technical and financial aspects of Availability Management and the supporting infrastructure. Generally this will involve different people so the challenge is designing a process that minimizes the time taken. To develop a variety of activities to cater for the required levels of Availability. For example, there are a wide degree of potential impacts that loss of service may have on the environment. If we can categorize and target these areas, then we can pre-build models
46
Availability Management Workbook for dealing with them. To establish ground rules that distinguishes between Availability, Reliability, Maintainability and Serviceability. Develop working relationships with all other process areas. The Availability Management process should be considered a proactive one with requiring input from other process areas. Obvious links include Security Management (Confidentiality, Integrity and Availability), Service Level Management (to help gather requirements), IT Service Continuity Management (planning for availability, and planning for assurances and recovery) and Network Management tools (to identify potential threats or loss of service to the IT Infrastructure). Develop a sound Availability Management process and look for continuous improvement.
Use these objectives to generate discussion about others that may be more appropriate to list than those provided. Refer also to the Communication Plan on page 101 for ideas on how to communicate the benefits of Availability Management.
47
Availability Management Workbook
48
Availability Management Workbook
Policies, Objectives and Goals
IT Services Policies, Objectives and Goals Process: Availability Management
Status: Version:
0.1
Release Date:
49
Availability Management Workbook
Policies, Objectives and Scope for Availability Management The document is not to be considered an extensive statement as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered. Policy Statement A course of action, guiding principle, or procedure considered expedient, prudent, or advantageous Use this text box to answer the “SENSE OF URGENCY” question regarding this process. Why is effort being put into this process? Not simply because someone thinks it’s a good idea. That won’t do. The reason has to be based in business benefits. You must be able to concisely document the reason behind starting or improving this process. Is it because of legal requirements or competitive advantage? Perhaps the business has suffered major problems or user satisfaction ratings are at the point where outsourcing is being considered. A policy statement any bigger than this text box, may be too lengthy to read, lose the intended audience with detail, not be clearly focussed on answering the WHY question for this process.
The above Policy Statement was; Prepared by: On:
<
>
And accepted by:
Refer to Implementation Plan – Project Plan for planning and implementation guidelines (that includes the Policy, Objectives and Scope statements) on page 107.
50
Availability Management Workbook Objectives Statement Something worked toward or striven for, a goal Use this text box to answer the “WHERE ARE WE GOING” question regarding this process. What will be the end result of this process and how will we know when we have reached the end result? Will we know because we will establish a few key metrics or measurements or will it be a more subjective decision, based on instinct? A generic sample statement on the “objective” for Availability Management is: The object of Availability Management is to ensure that the capability of the IT Services and the supporting Infrastructure can be delivered in a cost effective manner, with a sustained level of Availability, in line with the Business Objectives. In addition to this, Availability Management will perform iterative optimisation activities to ensure constant improvements and alignment. Note the keywords in the statement. For the statement on Availability Management they are “cost effective” and “sustained level of availability”. These are definite areas that we can set metrics for and therefore measure progress. An objective statement any bigger than this text box, may be too lengthy to read, lose the intended audience with detail, not be clearly focussed on answering the WHERE question for this process.
The above Objective Statement was; Prepared by: On:
<>
And accepted by:
Refer to Reports, KPIs and Metrics on page 95 for metrics, KPI’s for Availability Management
Refer to Objectives and Goals on page 45 for detailed statement of process objectives/goals
51
Availability Management Workbook Scope Statement The area covered by a given activity or subject Use this text box to answer the “WHAT” question regarding this process. What are the boundaries for this process? What does the information flow look like into this process and from this process to other processes and functional areas? A generic sample statement on the “scope” for Availability Management is: The Availability Management process will be responsible for measuring and setting availability involving the following aspects of the IT Infrastructure: • Hardware • Software • System Software • Etc Availability Management will not be responsible for those components that exist under the banner of Applications Development. Availability issues will be reported to the Service Desk, via the Incident Management process. Availability Management will implement the requirements as described in the Security Management policy. An scope statement any bigger than this text box, may be too lengthy to read, lose the intended audience with detail, not be clearly focussed on answering the WHAT question for this process.
The above Objective Statement was; Prepared by: On:
<>
And accepted by: On:
<>
52
Availability Management Workbook
Business Justification Document
IT Services Business Justification Document Process: Availability Management
Status: Version:
0.1
Release Date:
53
Availability Management Workbook Business Justification Document for Availability Management The document is not to be considered an extensive statement as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered. This document serves as a reference for HOW TO APPROACH THE TASK OF SEEKING FUNDS for the implementation of the Availability Management process. This document provides a basis for completion within your own organization. This document was; Prepared by: On:
<>
And accepted by: On:
<>
54
Availability Management Workbook Availability Management Business Justification A strong enough business case will ensure progress and funds are made available for any IT initiative. This may sound like a bold statement but it is true. As IT professionals we have (for too long) assumed that we miss out on funds why other functional areas (e.g. Human resources and other shared services) seem to get all that they want. However, the problem is not with them, it’s with US. We are typically poor salespeople when it comes to putting our case forward. We try to impress with technical descriptions, rather than talking in a language that a business person understands. For example: We say
We should say
We have to increase IT security controls, with the implementation of a new firewall.
Two weeks ago our biggest competitor lost information that is now rumored to be available on the internet.
The network bandwidth is our biggest bottleneck and we have to go to a switched local environment.
The e-mail you send to the other national managers will take 4 to 6 hours to be delivered. It used to be 2 to 3 minutes, but we are now using our computers for so many more tasks.
Changes to the environment are scheduled We are making the changes on Sunday for a period of time when we expect there afternoon. There will be less people working to be minimal business impact. then.
Doesn’t that sound familiar? To help reinforce this point even further, consider the situation of buying a new fridge. What if the technically savvy sales person wants to explain “the intricacies of the tubing structure used to super cool the high pressure gases, which flow in an anti-clockwise direction in the Southern hemisphere”. Wouldn’t you say “too much information, who cares – does it make things cold?” Well IT managers need to stop trying to tell business managers about the tubing structure and just tell them what they are interested in. So let’s know look at some benefits of Availability Management. Remember that the comments here are generic, as they have to apply to any organization.
55
Availability Management Workbook
Benefits
Notes/Comments/Relevance
Through a properly controlled and structured Availability Management process we will be able to more effectively help in the alignment of the delivery of IT service to the business requirements. This is achieved through the nature of the process by understanding such things as Vital Business Functions and the true needs of the business.
A reduction in the amount of unavailability will therefore allow IT to spend more time on aligning the IT Services with the needs of the Business.
A heightened visibility and increase communication related to Availability of Services for both business and IT support staff. The reader should be able to draw upon experience regarding the overall negative impact of the business when IT departments have been concerned with supplying high levels of availability for services that aren’t critical to the business.
Organizations and therefore IT environments are becoming increasing complex and continually facing new challenges. The ability to meet these challenges is dependant on the speed and flexibility of the organization. The ability to cope with more changes at the business level will be directly impacted by how well IT Departments can reduce the amount of time in loss of service due to bad Availability Management planning.
56
Availability Management Workbook
(Reader, here you can describe a missed opportunity, due to bad Availability Management or a process dragged down by bureaucracy)
Noticeable increases in the potential productivity of end users and key personnel through reduced interruption times, higher levels of availability. The goal statement of Availability Management is to optimise the availability of IT Services and ensure alignment back to the business. By the very nature of this statement we can expect to start seeing a reduction in loss of service due to availability issues and bad planning. Whether end users and staff take advantage of this reduced down-time is not an issue for IT professionals to monitor. Knowing that we have made more working time available is what we need to publish – NOT productivity rates.
An ITIL Availability Management process will guide you towards understanding the financial implications of all those necessary availability requirements needed in the IT infrastructure. This has real benefits as it may prevent an organization from spending money on areas of the IT Infrastructure where there really isn’t a need for building high availability services for the business.
Availability Management aides in improving the security aspects of the organization with respect to IT.
57
Availability Management Workbook
Availability Management will work in conjunction with Security Management to implement those security requirements described in the Security Policy.
Correct management of Security Requirements will help in maintaining the right levels of availability needed by the business.
The Availability Manager will ensure that the impact of the loss of service has been fully assessed prior to starting a service improvement programme in conjunction with Problem Management and Service Level Management.
With a sound Availability Management process we can expect an overall improvement in the level of Availability as better planning can occur under a structure, repeatable process.
Any ITIL process has the potential to increase the credibility of the IT group, as they offer a higher quality of service, combined with an overall professionalism that can be lacking in ad-hoc activities.
58
Availability Management Workbook
Recovery Template
IT Services Availability Template Recovery Template
Status: Version:
Draft 0.1
Release Date:
59
Availability Management Workbook Document Control Author Prepared by
Document Source This document is located on the LAN under the path: I:/IT Services/Service Delivery/Availability Management/
Document Approval This document has been approved for use by the following: ♦
, IT Services Manager
♦
, IT Service Delivery Manager
♦
, Availability Manager
♦
, IT Service Continuity Process Manager
♦
, Customer representative or Service Level Manager
Amendment History Issue
Date
Amendments
Completed By
Distribution List When this procedure is updated the following copyholders must be advised through email that an updated copy is available on the intranet site: Business Unit
Stakeholders
IT
60
Availability Management Workbook Introduction
Purpose
The Recovery Template provides the basic IT requirements needed to recovery an IT Service in the event of failure. Scope This document describes the following: Detailed form for Recovery of IT Services. Summary of Recovery for each type of IT Services Note: It is assumed for each service described in this document that the supporting back-end technology is already in place and operational. Audience This document is relevant to all staff in
Ownership IT Services has ownership of this document. Related Documentation Include in this section any related document reference numbers and other associated documentation: IT Service Continuity Management Policies, Guidelines and Scope Document Business Continuity Strategy Template Risk Assessment Reciprocal Arrangements Relevant SLA and procedural documents IT Services Catalogue Relevant Technical Specification documentation Relevant User Guides and Procedures
61
Availability Management Workbook Executive Overview Describe the purpose, scope and organization of the Availability Recovery document. Scope As not all IT Services may initially be included within the Availability Recovery document. Use this section to outline what will be included and the timetable for other services to be included. Scope for the Recovery document may be determined by the business, therefore covering only a select few of the IT Services provided by the IT department that are seen as critical to the support of the business processes. Note this document needs to differ from IT Continuity Recovery. Include in the scope the difference between IT Service Continuity and Availability. This will depend on how the service is defined in the Service Catalogue To improve recovery, use the Component Failure Impact Analysis document. Service Availability Summary This section of the document provides a summary of all the services listed in the document and the pertinent information regarding recovery of that service. It should be used as a check list. IT Service
Owner
Business Process
Business Owners
SLA #/Service Catalogue Reference
Service A
J. Ned
Billing
T. Smith
SLA001
Email
A. Boon
Communication
R. Jones
SLA234
SAP
C. Jones
Invoice and Payroll
P. Boon
SLA123
Service B
L. Smith
Marketing
R. Reagan
SLA009
Service C
R. Smith
Manufacturing
R. Smith
SLA007
Probability of Failure
Recovery Time
Recovery Procedure
Back Up Available and Tested
<< List appropriate procedures. NB. This will require Incident and Problem Mgt input >>
Yes
62
Data Capture
Availability Management Workbook Service A
Service Description In this section briefly describe the service. Probability of Loss In this section for this service describe the probability of a disruption to this service and the effect on the business. For example, will the loss of service invoke a contract that has set costs associated with it? If we lose this service can we expect to lose customers/clients/market share. Define each form that the loss of the service. Service Degradation Use this section to specify for this Service/application the speed at which it is likely that the situation regarding the loss of this service will degrade overall performance. That is, provide a score of 1 (low) to 10 (highest) that indicates how the service loss will grow in severity. Escalation Score
Resulting Business Impact
(1 is slow/barely noticeable, 10 Rapid pace of overall deterioration) 9 1
Complete loss of Service. Company reputation at threat. Minor Degradation. Customers unaware.
Escalation Procedures Use this section to detail all escalation procedures. In the event of a failure in service it is important to provide a concise list of personnel that will need to be contacted. This will help reduce the service disruption time. Priority
Hierarchical Name Dept Number
Functional Name Dept Number
Business Name Dept Number
1 2 3 4 5 9
63
Availability Management Workbook Device Dependencies In this section list out those devices that are components of the service. Understanding this will help better pinpoint the area of failure, thus decreasing the time to respond and recovery. IT Components (Configuration Items (CI)) CI #
Serial #
CI Name
Type
Sub-Type
Criticality
SER345
15434563
EMERO
Hardware
Server
High
RT5700
54444443
CISCO-002
Hardware
Router
High
RT4567
76547457
CISCO-001
Hardware
Router
High
MS001
N/A
MS Office
Software
Microsoft
Low
Business Needs
Use this section to describe any and all information that needs to be supplied to the business to help them manage the impact of failure on their processes. This will also help in setting the correct expectations and managing any issues that may arise due to the failure. IT Needs and Resource Factors Use this section to specify for this Service/application the combination of the complexity of facilities and the level of skills required in the people that will permit this service to stay operating, in the event of a failure. List out all necessary involvement with third party vendors as well. Recovery Procedure In this section you should list the recovery procedures for the above listed Configuration Items. We have added a simple procedure template as well.
CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
IT Components (Configuration Items (CI)) Type Sub-Type Hardware Server Hardware Router Hardware Router Software Microsoft
Criticality High High High Low
Recovery Procedure
64
Availability Management Workbook PROCEDURE TEMPLATE Step
Task / Activity
Timing / Dependency
Expected Duration
1 2 3
The expected duration column allows measurements to be taken so as to identify opportunities for improving recovery times. Data Capture In this section detail the required level of diagnostics that need to be captured in the event of failure. This will include things such as, Server logs, Application Logs or Error files, System Management tool diagnostics etc. This information will be used in the Problem Management process to help identify the underlying cause and provide an avenue for removing the possibility of the failure.
65
Availability Management Workbook Appendices Include any applicable appendixes that are needed. Terminology Make sure that all terminology is captured and documented correctly. e.g. CMDB ITSCM SLA UC
Configuration Management Data Base Information Technology Services Continuity Management Service Level Agreement Underpinning Contract
66
Availability Management Workbook
Component Failure Impact Analysis
IT Services Availability Management Component Failure Impact Analysis (CFIA)
Status: Version: Release Date:
0.1
67
Availability Management Workbook Document Control Author Prepared by
Document Source This document is located on the LAN under the path: I:/IT Services/Service Delivery/Availability Management
Document Approval This document has been approved for use by the following: ♦ ♦ ♦
, IT Services Manager , IT Service Delivery Manager , National IT Help Desk Manager
Amendment History
Issue
Date
Amendments
Completed By
Distribution List When this procedure is updated the following copyholders must be advised through email that an updated copy is available on the intranet site: Business Unit IT
Stakeholders
68
Availability Management Workbook Introduction
Purpose
This template provides an approach for understanding the criticality of components with relation to IT Services.
Scope This document describes the following: Component Failure Impact Analysis. Note: It is assumed for each service described in this document that the supporting back-end technology is already in place and operational.
Audience This document is relevant to all staff in
Ownership IT Services has ownership of this document.
Related Documentation Include in this section any related document reference numbers and other associated documentation:
IT Service Continuity Management Policies, Guidelines and Scope Document Business Continuity Strategy Template Risk Assessment Reciprocal Arrangements Relevant SLA and procedural documents IT Services Catalogue Relevant Technical Specification documentation Relevant User Guides and Procedures
69
Availability Management Workbook Executive Overview Describe the purpose, scope and organization of the document. Scope Not all IT Services may initially be included within the Availability Requirements document. Use this section to outline what will be included and the timetable for other services to be included. Scope for the document may be determined by the business, therefore covering only a select few of the IT Services provided by the IT department that are seen as critical to the support of the business processes. The document is not an extensive description of IT Services or the components comprising the IT Services. The document is to be used in conjunction with Availability Recovery and Availability Requirements. A CFIA provides input in requirements planning and recovery planning. It can also help in identify areas of failure during the loss of availability of an IT service.
70
Availability Management Workbook Component Failure Impact Analysis Matrix Once the Business Process and their corresponding IT Services are captured and documented, it is then possible to start the mapping of the configuration items. We can do this in a CFIA Matrix, shown below. The below table provides a template for capturing IT Components (Configuration Items) against the IT Services. The values in the Service Columns give an indication of the criticality of the IT component in relation to the IT Service that it is supporting. A criticality of 5 indicates that the service has a high dependency on the related IT Component. In situations such as this, IT personnel may consider building in more resilience for that component or in the event of loss of IT Service, this will be the first point of investigation. IT Services
IT Components (Configuration Items (CI)) CI # Serial # CI Name SER345 15434563 EMERO CISCORT5700 54444443 002 CISCORT4567 76547457 001 MS001 N/A MS Office
SubType Type Hardware Server
Service A 0
Service B 0
Service C 5
Service D 1
Hardware Router
5
5
2
3
Hardware Router Software Microsoft
5 0
5 0
5 3
5 3
This information is critical in providing quality and known services to the organisation. For example, if we were to look at the two CISCO routers above, we can see that CISCO-001 is integral to all four services listed above, whilst CISCO-001 is only integral to two of those services. This information will now help in the planning process of Service Level Management and Availability Management in helping agree to levels of service that rely on those particular configuration items and planning for correct availability. We can also use this information in conjunction with our IT Service Continuity planning. In the above table it would be important to ensure that in the event of CISCO-001 failing, affecting 4 services, that we have appropriate measures in place. To get a list of your configuration items, you will need to go to your Configuration Management Database.
71
Availability Management Workbook Appendices List any appendices needed in conjunction with this document. Terminology IT Infrastructure: includes hardware, software, procedures, policies, documentation, etc. Configuration Item: those components that are recorded that make up the IT Infrastructure in helping supply a service back to the Organisation.
72
Availability Management Workbook
Availability Requirements
IT Services Availability Management Availability Requirements
Version:
0.1
Release Date:
73
Availability Management Workbook Document Control Author Prepared by
Document Source This document is located on the LAN under the path: I:/IT Services/Service Delivery/Availability Management/
Document Approval This document has been approved for use by the following: ♦ ♦ ♦ ♦ ♦
, IT Services Manager , IT Service Delivery Manager , Availability Manager , IT Service Continuity Process Manager , Customer representative or Service Level Manager
Amendment History Issue
Date
Amendments
Completed By
Distribution List When this procedure is updated the following copyholders must be advised through email that an updated copy is available on the intranet site: Business Unit
Stakeholders
IT
74
Availability Management Workbook Introduction Purpose
This template provides an approach for capturing availability requirements for IT Services. Scope This document describes the following: Detailed form for Availability Requirements of IT Services. Note: It is assumed for each service described in this document that the supporting back-end technology is already in place and operational. Audience This document is relevant to all staff in Ownership IT Services has ownership of this document. Related Documentation Include in this section any related document reference numbers and other associated documentation:
IT Service Continuity Management Policies, Guidelines and Scope Document Business Continuity Strategy Template Risk Assessment Reciprocal Arrangements Relevant SLA and procedural documents IT Services Catalogue Relevant Technical Specification documentation Relevant User Guides and Procedures
75
Availability Management Workbook Executive Overview Describe the purpose, scope and organization of the Availability Requirements document. Scope Not all IT Services may initially be included within the Availability Requirements document. Use this section to outline what will be included and the timetable for other services to be included. Scope for the requirements document may be determined by the business, therefore covering only a select few of the IT Services provided by the IT department that are seen as critical to the support of the business processes. Note this document needs to differ from IT Continuity Requirements. Include in the scope the difference between IT Service Continuity and Availability. This will depend on how the service is defined in the Service Catalogue To improve recovery, use the Component Failure Impact Analysis document.
76
Availability Management Workbook Service Availability Summary This section of the document provides a summary of all the services listed in the document and the pertinent information regarding requirements of that service. It can be used as a check list. IT Service
Service A Email SAP Service B Service C
Owner
Business Process
Business Owners
SLA #/Service Catalogue Reference
J. Ned
Billing
T. Smith
SLA001
A. Boon C. Jones
R. Jones P. Boon
SLA234 SLA123
L. Smith
Communication Invoice and Payroll Marketing
R. Reagan
SLA009
R. Smith
Manufacturing
R. Smith
SLA007
Recovery Times
Resilience
Serviceability
Maintenance
Service A Service Description In this section briefly describe the service. Business Requirements In this section for this service describe the reason and need for the IT Service and where it aligns with the needs of the business Describe in business terms how any unavailability of this service will affect the business. List out any Vital Business Functions for this service. Requirements Mapping Use this section to map the technical requirements of the service against the components that are involved in the delivery of the service. Service Component CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
Type Hardware Hardware Hardware Software
SubType Server Router Router Microsoft
Requirements Recovery Times Criticality High High High Low
Availability Times
Availa bility %
Resili ence
Maintena nce Times
No Yes Yes No
77
Availability Management Workbook This table would be better served on a landscape format, but provides the correct idea none the less. The requirements columns can also include the following: • Serviceability • SLA • OLA SLA’s • Etc. Use this section to detail all Service Level Agreements that may be in place our need to be in place and any pertinent information. Service Component
SLA SLA #
CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
Type Hardware Hardware Hardware Software
SubType Server Router Router Microsoft
Availability Times
Criticality High High High Low
Availa bility %
Variants / Conditions
OLA’s In addition to Service Level Agreements, it is important to capture all necessary Operational Level Agreements. OLA’s can directly affect the requirements stipulated by the business with regards to the IT Service. To plan correctly for Availability you will need to know about existing OLA’s. Service Component CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
Type Hardware Hardware Hardware Software
SubType Server Router Router Microsoft
OLA OLA # Criticality High High High Low
Response Times
IT Departments Involved
Serviceability Requirements Use this section to describe any and all information that needs to be supplied with regards to serviceability of the IT Service and its components. Serviceability is related to third party suppliers. Service Component CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
Type Hardware Hardware Hardware Software
SubType Server Router Router Microsoft
Vendor Information Vendor Criticality High High High Low
Service Schedule
Lease Period
Rates
78
Availability Management Workbook Metrics Use this section to list out all the metrics that need to be captured with regards to this service. Some of these metrics will be captured within the Incident Management process. However, other measurements will be determined by what is available from an IT Service Management tool perspective. Note: If the tool use to take measurements seems fairly extensive, it should not be considered an invitation to turn on all available measurements. The metrics that you capture need to provide information to the business management as well as IT departments so as to allow improvement in levels of availability. Ask yourself, if I take this measurement, what will it let me improve. Testing Procedures In this section you should list the recovery procedures for the above listed Configuration Items. We have added a simple procedure template as well.
CI # SER345 RT5700 RT4567 MS001
CI Name EMERO CISCO-002 CISCO-001 MS Office
IT Components (Configuration Items (CI)) Type Sub-Type Hardware Server Hardware Router Hardware Router Software Microsoft
Criticality High High High Low
Testing Procedure
PROCEDURE TEMPLATE Step 1 2 3
Task / Activity
Timing / Dependency
Expected Duration
The expected duration column allows measurements to be taken so as to identify opportunities for improving testing times. Conclusion (not part of the repetitive process)
This template has given you a concise and simple way to look at the requirement options for particular IT Services. Maintenance of this document should be performed on a regular time basis (to coincide with reviews of the Service Level Management – Service Catalogue or Service Level Agreement reviews).
79
Availability Management Workbook Appendices Include any applicable appendixes that are needed. Terminology Make sure that all terminology is captured and documented correctly.
e.g. CMDB ITSCM SLA UC
Configuration Management Data Base Information Technology Services Continuity Management Service Level Agreement Underpinning Contract
80
Availability Management Workbook
Roles & Responsibilities Availability Manager An Availability Manager has responsibility for ensuring that the aims of Availability Management are met. This includes responsibilities such as: •
Ensuring that all existing services deliver the levels of availability agreed with the business in SLAs
•
Ensuring that all new services are designed to deliver the levels of availability required by the business, and validation of the final design to meet the minimum levels of availability as agreed by the business for IT services
•
Assisting with the investigation and diagnosis of all incidents and problems that cause availability issues or unavailability of services or components
•
Participating in the IT infrastructure design, including specifying the availability requirements for hardware and software
•
Specifying the requirements for new or enhanced event management systems for automatic monitoring of availability of IT components
•
Specifying the reliability, maintainability and serviceability requirements for components supplied by internal and external suppliers
•
Being responsible for monitoring actual IT availability achieved against SLA targets, and providing a range of IT availability reporting to ensure that agreed levels of availability, reliability and maintainability are measured and monitored on an ongoing basis
•
Proactively improving service availability wherever possible, and optimizing the availability of the IT infrastructure to deliver cost-effective improvements that deliver tangible benefits to the business
•
Creating, maintaining and regularly reviewing an AMIS and a forward-looking Availability Plan, aimed at improving the overall availability of IT services and infrastructure components, to ensure that existing and future business availability requirements can be met
•
Ensuring that the Availability Management process, its associated techniques and methods are regularly reviewed and audited, and that all of these are subject to continual improvement and remain fit for purpose
81
Availability Management Workbook •
Creating availability and recovery design criteria to be applied to new or enhancing infrastructure design
•
Working with Financial Management, ensuring the levels of IT availability required are cost-justified
•
Maintaining and completing an availability testing schedule for all availability mechanisms
•
Ensuring that all availability tests and plans are tested after every major business change
•
Assisting Security and IT Service Continuity Management with the assessment and management of risk
•
Assessing changes for their impact on all aspects of availability, including overall service availability and the Availability Plan
•
Attending CAB meetings when appropriate.
82
Availability Management Workbook
Availability Management Process Manager
IT Services Roles and Responsibilities Process: Availability Management
Status: Version: Release Date:
0.1
83
Availability Management Workbook Detailed responsibilities of the Availability Management process owner The Availability Manager….. Description
1.
2.
Will develop and maintain the Availability Management Process. Will develop, maintain and promote Availability Management. Will coordinate process reviews utilizing independent parties to provide an objective view on the simplicity of the process and areas for improvement. Will be responsible for implementing any design improvements identified.
3.
4.
Will chair the Technical Observation Post meetings that are used to identify and action availability issue and to verify that all steps were completed and the objective of the process was achieved. Arrange and run all Availability Management reviews with the Availability Management team. The reviews where necessary will include other IT Departments as well as key customers.
5.
Will control and review: Any outstanding process related actions Current targets for availability performance The process mission statement
6.
Make available relevant, concise reports that are both timely and readable for Customers and Management
Notes/Comments
Use the notes/ Comments column in different ways. If you are looking to apply for a process role, then you can check yourself against the list (with ticks or look to update your resume). If you are looking to appoint a process manager or promote someone from within the organization you can make notes about their abilities in the particular area.
84
Availability Management Workbook Detailed skills of the Availability Management process owner
The Availability Manager….. Description
1.
The Availability Manager will display a communication style based around information and escalation.
2.
Have practical and quantifiable process management experience. High degree of analytical skills to be able to assess the impact of incidents on different business systems and people.
3.
High degree of analytical skill needed to be able to help in the process or restoring service as quickly as possible. Technical ability in being able to read data from the Availability Management process that will help with the identification of trends and improvements relating to availability.
4.
5.
An ability to run a meeting according to strict guidelines (not to get side-tracked on items that one person may be interested in). Must possess skills in influencing and negotiation as well. The Availability Manager must be able to communicate with people at all levels of the organization. This is especially important during meetings.
6.
The process manager must be able to demonstrate ways to “do things differently” that will improve the process.
7.
Must be able to think logically about availability issues that could affect the organization and design appropriate assessment and diagnosis activities.
Notes/Comments
Use the notes/ Comments column in different ways. If you are looking to apply for a process role, then you can check yourself against the list (with ticks or look to update your resume).
If you are looking to appoint a process manager or promote someone from within the organization you can make notes about their abilities in the particular area.
This will provide a strong link into the Problem Management process and Service Level Management process.
85
Availability Management Workbook
86
Availability Management Workbook
Service Outage Analysis
IT Services Availability Management Service Outage Analysis (SOA) Service: << insert service name >>
Version:
0.1
Release Date:
87
Availability Management Workbook Document Control Author Prepared by
Document Source This document is located on the LAN under the path: I:/IT Services/Service Delivery/Availability Management
Document Approval This document has been approved for use by the following: ♦ ♦ ♦ ♦
, IT Services Manager , IT Service Delivery Manager , National IT Help Desk Manager first name, last name>, Availability Manager
Amendment History
Issue
Date
Amendments
Completed By
Distribution List When this procedure is updated the following copyholders must be advised through email that an updated copy is available on the intranet site: Business Unit
Stakeholders
IT
88
Availability Management Workbook Introduction
Purpose
The purpose of this document is to provide a structure approach in helping improve end to end service availability for a selected IT Service or a set of Infrastructure components.
Scope The scope for this document will be one IT Service and / or a set of Infrastructure components.
Audience This document is relevant to all staff in
Ownership IT Services has ownership of this document.
Related Documentation Include in this section any related document reference numbers and other associated documentation:
IT Service Continuity Management Policies, Guidelines and Scope Document Business Continuity Strategy Template Risk Assessment Reciprocal Arrangements Relevant SLA and procedural documents IT Services Catalogue Relevant Technical Specification documentation Relevant User Guides and Procedures
89
Availability Management Workbook Executive Overview Describe the purpose, scope and organization of the document. Scope << For the user of this document: This document will outline the pertinent areas of consideration when performing a Service Outage Analysis. The document has been developed to allow you to use the structure when performing your own Service Outage Analysis. A Service Outage Analysis will generally be performed due to an outage occurring on an IT Service and the components involved in that service. The document is to be used in conjunction with Availability Recovery, Availability Requirements and Problem Management Objectives and Goals. >> In this section detail the scope of the SOA.
90
Availability Management Workbook Planning In this section include the plan for the SOA for the service. This is much like performing a Post Implementation Review. Record in this section of the document the following: Milestones
Team
• •
Record the start and end dates for the SOA. Record all deliverables, included their start and end times.
List all team members involved in the SOA. Appropriate information will include:
Data Sources
• • • • • •
Name Department Contact Details Technical Expertise Roles and Responsibilities % of Involvement
In this section detail a list of possible data sources need for the SOA. Potential sources of information are:
Resources
• • • • • • •
Incident Management Problem Management Configuration Management Database Availability Management Database Capacity Database Network Monitoring Tools Desktop Monitoring Tools
During the SOA you may require appropriate resources to complete the assignment. Potential resources are: • • •
PCs or Laptops Accommodation Stationary
91
Availability Management Workbook Schedules Include in this section any appropriate Project Management plans. It is important to have a clearly defined scheduled, one that is distributed amongst the team, which will help you drive the SOA assignment. Within this section list the following: • Start and End Dates for the assignment • When data is to be collected • An interview schedule for key personnel o It is important to include business people here as the true perception of the service is through their eyes. • Site visits and surveys • 3rd Party inputs Hypotheses • etc The next thing is to list all hypotheses regarding the Service Outage. This can be done in the following table: Hypotheses List your hypotheses here
Probability What is the probability of it being true?
Investigative Area Where will you look to get the information to prove it right or wrong?
Data Analysis From the above table, gather the necessary data from the selected sources. Data Analysis techniques can very dramatically, and it is not the intent of this document to provide such techniques. Create a table for each data source to capture the necessary information so that appropriate analysis can take place. Provide a summary of the data in the following table: Hypotheses The hypotheses can be re-listed
Data Source Data What was the data What data was source collected
Supportive Did it support the hypotheses
92
Availability Management Workbook Interviews Interviews are a key aspect of the SOA. They can provide better insight into the outage and the processes around it. The “human factor” can provide more meaningful input than straight data. They will provide business and user perspective of the service outage. By interviewing staff, it will be easier to determine where the real issues have occurred within the user community. The solution to this may be quite different as to where the technical data is pointing. Interview the Problem Management team as well. Findings and Recommendations From your hypotheses and interviews, you should be able to provide a list of findings and recommend necessary solutions to help improve the end to end service availability. Recommendation can be captured in the below table: Priority This column is used for prioritizing the solutions
Hypotheses List the hypotheses
Findings List any findings, supportive or not, for the hypotheses
Recommendations Provide the recommendations for improving the service availability.
93
Availability Management Workbook Appendices List any appendices needed in conjunction with this document. Terminology IT Infrastructure: includes hardware, software, procedures, policies, documentation, etc. Configuration Item: those components that are recorded that make up the IT Infrastructure in helping supply a service back to the Organisation.
94
Availability Management Workbook
Reports, KPIs and other Metrics
IT Services Reports and KPI Targets Process: Availability Management
Status: Version:
0.1
Release Date:
95
Availability Management Workbook Reports and KPI Targets for Availability Management The document is not to be considered an extensive statement as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered. This document serves as a GUIDE ON SUITABLE KEY PERFORMANCE INDICATORS (KPIs) and REPORTS FOR MANAGEMENT for the Availability Management process. This document provides a basis for completion within your own organization. This document contains suggestions regarding the measures that would be meaningful for this process. The metrics demonstrated are intended to show the reader the range of metrics that can be used. The message must also be clear that technology metrics must be heavily supplemented with nontechnical and business focused metrics/KPI’s/measures. This document was; Prepared by: On:
<>
And accepted by: On:
<>
96
Availability Management Workbook Key performance indicators (KPI’s) Continuous improvement requires that each process needs to have a plan about “how” and “when” to measure its own performance. While there can be no set guidelines presented for the timing/when of these reviews; the “how” question can be answered with metrics and measurements. With regard to timing of reviews then factors such as resource availability, cost and “nuisance factor” need to be accounted for. Many initiatives begin with good intentions to do regular reviews, but these fall away very rapidly. This is why the process owner must have the conviction to follow through on assessments and meetings and reviews, etc. If the process manager feels that reviews are too seldom or too often then the schedule should be changed to reflect that. Establishing SMART targets is a key part of good process management. SMART is an acronym for: Simple Measurable Achievable Realistic Time Driven
Metrics help to ensure that the process in question is running effectively.
97
Availability Management Workbook With regard to AVAILABILITY MANAGEMENT the following metrics and associated targets should be considered: Key Performance Indicator
Target Value
Time Frame/Notes/Who
(some examples) Using data from the Configuration Management Database (CMDB) indicates any particular Configuration items that are experiencing frequent losses of Availability.
Number of Incidents logged relating to Availability issues. Incident tickets will be able to provide the following measurements: • Detection Time • Response Time • Repair Time • Recovery Time • Mean Time to Repair • Mean Time Between Failures • Mean Time Between System Incidents
Availability Management Trend Analysis This should be down by: • IT Service • IT System • IT Component This can be further broken down by using Incident management to supply the following: • Type • Category • Priority, Impact, Urgency
The average cost per availability issues
What is the client perspective with relation to Availability of IT Services? Number of Availability management meetings. This will indicate a constant cycle of discussions and a process of improvement. 98
Availability Management Workbook
Increased Learning and growth. This refers to the interaction with other processes, staffing, training and investments in software and hardware.
Special Tip: Beware of using percentages in too many cases. It may even be better to use absolute values when the potential number of maximum failures is less than 100. Reports for Management Management reports help identify future trends and allow review of the “health” of the process. Setting a security level on certain reports may be appropriate as may be categorizing the report as Strategic, Operational or Tactical. The acid test for a relevant report is to have a sound answer to the question; “What decisions is this report helping management to make?” Management reports for Availability Management should include: Report
Time Frame/Notes/Who
The number of Incidents lodged as a result of loss of Availability. As well as the numbers, a very concise view of major failures in availability can also be included.
Service Outage Analysis Report provides a detailed analysis of service interruptions and opportunities to improve levels of Availability.
Summary of availability recommendations for the coming year. The business will interested in this as it shows a proactive approach to providing IT Services and demonstrating the benefits to the business.
The number of incidents attributable to different business areas is also useful. This will help Management to understand departments that in a 99
Availability Management Workbook state of continual disruption. Incidents can indicate poor management, fluctuating internal or increasing pressures from external forces.
Service Level Achievements. These are essentially service management reports, which the business managers will use for understanding if their SLA’s are being met. In addition to this, other Service Managers can use these reports for high-level process control.
Analysis and results of meetings completed
The situation regarding the process staffing levels and any suggestions regarding redistribution, recruitment and training required.
Human resource reporting including hours worked against activities (including weekend/after hours work, for example, on call duties).
Audit Reports should verify that availability infrastructure has all relevant and expected information recorded.
Relevant Financial information– to be provided in conjunction with Financial Management for IT Services. This information will include costs relating to the building of appropriate infrastructure to maintain the right levels of availability.
100
Availability Management Workbook
Communication Plan
IT Services Communication Plan Process: Availability Management
Status: Version:
0.1
Release Date:
101
Availability Management Workbook Communication Plan for Availability Management The document is not to be considered an extensive statement as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered. This document serves as a GUIDE FOR COMMUNICATIONS REQUIRED for the Availability Management process. This document provides a basis for completion within your own organization. This document contains suggestions regarding information to share with others. The document is deliberately concise and broken into communication modules. This will allow the reader to pick and choose information for e-mails, flyers, etc. from one or more modules if and when appropriate. This document was; Prepared by: On:
<>
And accepted by:
102
Availability Management Workbook Initial Communication Sell the Benefits First steps in communication require the need to answer the question that most people (quite rightly) ask when the IT department suggests a new system, a new way of working. WHY?
It is here that we need to promote and sell the benefits. However, be cautious of using generic words. Cite specific examples from your own organization that the reader will be able to relate to. Generic Benefit statements
Specific Organizational example
Improved Customer Service
This is important because…
Reduction in the number of Incidents
In recent times our incidents within IT have…
Provides quicker resolution of Incidents
Apart from the obvious benefits, the IT department in recent times has…
Improved Organisational learning
A recent example of … saw the individual and others in the company start to…
The above Communication module (or elements of) was/were distributed; To: On:
<>
By:
103
Availability Management Workbook Availability Management Goal The Goal of Availability Management The Goal of Availability Management can be promoted in the following manner. Official Goal Statement: To maintain and optimise the IT Services and supporting infrastructure to provide a high level of Availability that has been designed to meet the needs of the business. •
High visibility and wide channels of communication are essential in this process. Gather specific Availability Requirements from nominated personnel
(Special Tip: Beware of using only Managers to gain information from, as the resistance factor will be high) • •
Oversee the monitoring of process to ensure that the business needs of IT are not impacted, but taking into account that changes are required to ensure continued high levels of IT Service Delivery and Support Availability. Provide relevant reports to nominated personnel.
(Special Tip: Beware of reporting only to Managers. If you speak to a lot of people regarding Service Support and Delivery then you need to establish ways to report to these people the outcomes and progress of the discussions). Always bear in mind the “so what” factor when discussing areas like goals and objectives. If you cannot honestly and sensibly answer the question “so what” – then you are not selling the message in a way that is personal to the listener and gets their “buy-in”.
The above Availability Management Goals module was distributed; To: On:
<>
By:
104
Availability Management Workbook Availability Management Activities Intrusive & Hidden Activities
The list of actions in this module will have a direct impact on end users and IT Staff. They will be curious as to why working with them in this manner, rather than the historical method of just “doing it”. There could be an element of suspicion and resistance, so consider different strategies to overcome this initial scepticism. Business Availability Requirements • • •
Interview and record the needs from the Business Capture any Vital Business Functions Create availability and recovery design criteria based on the requirements
Business Impact Assessment • • •
Perform an Impact Assessment if the particular service is unavailable Put in place communication guidelines in the event of loss of service Create IT Infrastructure resilience and risk assessment documents from these results
Availability, Reliability, and Maintainability • • •
Don’t just plan and communicate for Availability Setup measurements for Reliability and Maintainability of Service Communicate the difference between the measurements
Incident and Problem Data • • •
Correct categorisation of Incidents will allow for more accurate problem identification when it comes to unavailability of service Communicate methods for recording Incident and Problem tickets relating to availability Provide a process for dealing with Availability Incidents and Problems
Service Level Achievements • • • •
Set appropriate Service Level Achievements Agree through Service Level Management the SLA’s for Availability Communicate to IT Staff the reason and benefits Communicate to business staff the reasons and benefits
Information regarding activities was distributed; To: On:
<>
By: 105
Availability Management Workbook Availability Management Planning Costs Information relating to costs may be a topic that would be held back from general communication. Failure to convince people of the benefits will mean total rejection of associate costs. If required, costs fall under several categories: •
Personnel – availability management staff, database management team (Set-up and ongoing of the availability database)
•
Accommodation – Physical location (Set-up and ongoing)
•
Software – Tools (Set-up and ongoing)
•
Hardware – Infrastructure (Set-up)
•
Education – Training (Set-up and ongoing)
•
Procedures – external consultants etc (Set-up)
The costs of implementing Availability Management will be outweighed by the benefits. For example, many organizations have a negative perception of the Availability Management process as it seems to constantly measure the wrong services. To alleviate this, customers and end-users need to be constantly informed of the levels of availability being provided. This provides good customer service and adds a level of comfort to the users in the sense that they can “see” action taking place. A well run Availability Management process will make major inroads into altering the perception of the IT Organisation.
Details regarding the cost of Availability Management were distributed; To: On:
<>
By: On:
<>
106
Availability Management Workbook
IMPLEMENTATION PLAN – PROJECT PLAN
IT Services Implementation Plan/Project Plan Skeleton Outline Process: Availability Management
Status: Version:
0.1
Release Date:
107
Availability Management Workbook
Planning and implementation for Availability Management This document as described provides guidance for the planning and implementation of the Availability Management ITIL process. The document is not to be considered an extensive plan as its topics have to be generic enough to suit any reader for any organization. However, the reader will certainly be reminded of the key topics that have to be considered for planning and implementation of this process. Initial planning When beginning the process planning the following items must be completed: CHECK
DESCRIPTION
☺ or 2 or date Get agreement on the objective (use the ITIL definition), purpose, scope, and implementation approach (e.g. Internal, outsourced, hybrid) for the process. Assign a person to the key role of process manager/owner. This person is responsible for the process and all associated systems.
This will person will generally be the Network or Operations Manager. Conduct a review of activities that would currently be considered as an activity associated with this process. Make notes and discuss the “re-usability” of that activity. Three key activities of Availability Management are: • • • • •
Gathering Availability Requirements Gathering Vital Business Functions from the Business Designing for Availability Designing for Resilience Designing for Recovery
Create and gain agreement on a high-level process plan and a design for any associated process systems. NOTE: the plan need not be detailed. Too many initiatives get caught up in too much detail in the planning phase. KEEP THE MOMENTUM GOING. Review the finances required for the process as a whole and any associated systems (expenditure including people, software, hardware, accommodation). Don’t forget that the initial expenditure may be higher than the ongoing costs. Don’t forget annual allowances for systems maintenance or customizations to 108
Availability Management Workbook systems by development staff. Agree the policy regarding this process
Create Strategic statements Policy Statement The policy establishes the “SENSE OF URGENCY” for the process. It helps us to think clearly about and agree on the reasons WHY effort is put into this process. An inability to answer this seemingly simple, but actually complex question is a major stepping stone towards successful implementation The most common mistake made is that reasons regarding IT are given as the WHY we should do this. Reasons like to make our IT department more efficient are far too generic and don’t focus on the real issue behind why this process is needed. The statement must leave the reader in no doubt that the benefits of this process will be far reaching and contribute to the business in a clearly recognizable way. Objective Statement When you are describing the end or ultimate goal for a unit of activity that is about to be undertaken you are outlining the OBJECTIVE for that unit of activity. Of course the activity may be some actions for just yourself or a team of people. In either case, writing down the answer to WHERE will this activity to me/us/the organization is a powerful exercise. There are many studies that indicate the simple act of putting a statement about the end result expected onto a piece of paper, then continually referring to it, makes achieving that end result realistic. As a tip regarding the development of an objective statement; don’t get caught up in spending hours on this. Do it quickly and go with your instincts or first thoughts – BUT THEN, wait a few days and review what you did for another short period of time and THEN commit to the outcome of the second review as your statement. Scope Statement In defining the scope of this process we are answering what activities and what “information interfaces” does this process have. Don’t get caught up in trying to be too detailed about the information flow into and out of this process. What is important is that others realize that information does in fact flow.
109
Availability Management Workbook For example, with regard to the AVAILABILITY MANAGEMENT process we can create a simple table such as: Availability Management Information flows Process Availability Management Problem Management
to to
Availability Management Change Management
to
Availability Management Service Level Management
to
to
to
Process Problem Management Availability Management
Information Availability reports to indicate current or future problems Report of availability related problems and known errors
Change Management Availability Management
RFC
Service Level Management Availability Management
Availability reporting to planned vs. actual comparison Service Level Requirements, SLA’s, OLA’s, UC’s
Info on planned changes as some RFC’s may effect availability
Refer to Policies, Objectives and Scope on page 49 for more template information
Steps for Implementation regarding Policy, Objective and Scope statements.
There can be a variety of ways to implement this process. For a lot of organizations a staged implementation may be suited. For others a “big bang” implementation – due to absolute equality may be appropriate. In reality however, we usually look at implementation according to pre-defined priorities. Consider the following options and then apply a suitable model to your own organization or case study. STEPS
NOTES/ /RELEVANCE/DATES/ WHO
Define the Objective and Scope for Availability Management Establish and agree on a clear definition for the words: • • • • •
Availability Reliability Maintainability Serviceability Resilience
This is one of the most interesting aspects. It can be very difficult to get everyone to agree to a definition, and it can be very difficult to establish the correct understanding of the definition.
110
Availability Management Workbook However, get this right, and the rest of the process is made easier. Seek initial approval Establish and Define Roles and Responsibilities for the process. Appoint an Availability Manager. Establish and Define the Scope for Availability Management and the relationships with IT Services Establish Availability Management Process Establish and Define Relationship with all other processes. This is another key aspect of the Availability Management process. Availability Management is where we are helping set the expectations of service and influence their perceptions. Availability Management works closely with Service Level Management to achieve this. Establish monitoring levels. Availability from as seen by the business is related to the service and not the components that make up the service. Define reporting standards Publicize and market
The priority selection has to be made with other factors in mind, such as competitive analysis, any legal requirements, and desires of “politically powerful influencers”. Costs The cost of process implementation is something that must be considered before, during and after the implementation initiative. The following points and table helps to frame these considerations: (A variety of symbols have been provided to help you indicate required expenditure, rising or falling expenditure, level of satisfaction regarding costs in a particular area, etc. Personnel
Initial
During 0
Costs of people for initial design of process, implementation and ongoing support Accommodation
☺
Ongoing /
Costs of housing new staff and any associated new equipment and space for documents or process related concepts. Software New tools required to support the process and/or the costs of migration from an existing tool or system to the new one.
111
Availability Management Workbook Maintenance costs Hardware New hardware required to support the process activities. IT hardware and even new desks for staff. Education Re-education of existing staff to learn new techniques and/or learn to operate new systems. Procedures Development costs associated with filling in the detail of a process activity. The step-by-step recipe guides for all involved and even indirectly involved personnel.
In most cases, costs for Process implementation have to be budgeted for (or allocated) well in advance of expenditure. Part of this step involves deciding on a charging mechanism (if any) for the new services to be offered. Build the team Each process requires a process owner and in most situations a team of people to assist. The Availability Management process is one of the processes in the Service Delivery set that shows very visible benefits from the outset and is very influential in setting the perception of IT Services to its customers and end users. Of course a lot will be dependant on the timing of the implementation and whether it is to be staged or implemented as one exercise. Refer to Roles and Responsibilities on pages 81 and 83 for role, responsibilities and tasks of involved personnel.
Analyse current situation and FLAG Naturally there are many organizations that have many existing procedures/processes and people in place that feel that the activities of Availability Management is already being done. It is critical to identify these systems and consider their future role as part of the new process definition.
112
Availability Management Workbook Examples of areas to review are: Area Power teams Current formal procedures Current informal procedures Current role descriptions Existing organizational structure Spreadsheets, databases and other repositories Other…
Notes
Implementation Planning After base decisions regarding the scope of the process and the overall planning activities are complete we need to address the actual implementation of the process. It is unlikely that there will not be some current activity or work being performed that would fit under the banner of this process. However, we can provide a comprehensive checklist of points that must be reviewed and done. Implementation activities for Availability Management Activity
Notes/Comme nts/Time Frame/Who
Review current and existing Availability Management practices in greater detail. Make sure you also review current process connections from these practices to other areas of IT Service Delivery and Support.
Review the ability of existing functions and staff. Can we “reuse” some of the skills to minimize training, education and time required for implementation?
Establish the accuracy and relevance of current processes, procedures and meetings. As part of this step if any information is credible document the transition from the current format to any new format that is selected.
Decide how best to select any vendor that will provide assistance in this process area (including tools, external consultancy or assistance to help with initial high workload during process implementation).
Establish a selection guideline for the evaluation and selection of tools required to support this process area (i.e. Availability Management tools).
113
Availability Management Workbook
Purchase and install tools required to support this process (i.e. Availability Management tool). Ensure adequate skills transfer and on-going support is catered for if external systems are selected.
Create any required business processes interfaces for this process that can be provided by the automated tools (e.g. reporting – frequency, content).
Document and get agreement on roles, responsibilities and training plans.
Communicate with and provide necessary education and training for staff that covers the actual importance of the process and the intricacies of being part of the process itself.
An important point to remember is that if this process is to be implemented at the same time as other processes that it is crucial that both implementation plans and importantly timing of work is complementary. Cutover to new processes The question of when a new process actually starts is one that is not easy to answer. Most process activity evolves without rigid starting dates and this is what we mean when we answer a question with “that’s just the way it’s done around here”. Ultimately we do want the new process to become the way things are done around here, so it may even be best not to set specific launch dates, as this will set the expectation that from the given date all issues relating to the process will disappear (not a realistic expectation).
114
Availability Management Workbook
FURTHER INFORMATION For more information on other products available from The Art of Service, you can visit our website: http://www.theartofservice.com
If you found this guide helpful, you can find more publications from The Art of Service at: http://www.amazon.com
115