Business continuity planning
Encyclopedia
Business continuity planning (BCP) “identifies [an] organization's exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, whilst maintaining competitive advantage and value system integrity”. It is also called business continuity and resiliency planning (BCRP). A business continuity plan is a roadmap for continuing operations under adverse conditions (i.e. interruption from natural or man-made hazards). BCP is an ongoing state or methodology governing how business is conducted. In the US, governmental entities refer to the process as continuity of operations planning (COOP).

BCP is working out how to continue operations under adverse conditions such as include local events like building fires, theft, and vandalism, regional incidents like earthquakes and floods, and national incidents like pandemic illnesses. In fact, any event that could impact operations should be considered, such as supply chain interruption, loss of or damage to critical infrastructure (major machinery or computing/network resource). As such, risk management
Risk management
Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

 must be incorporated as part of BCP.

BCP may be a part of an organizational learning
Organizational learning
Organizational learning is an area of knowledge within organizational theory that studies models and theories about the way an organization learns and adapts....

 effort that helps reduce operational risk
Operational risk
An operational risk is, as the name suggests, a risk arising from execution of a company's business functions. It is a very broad concept which focuses on the risks arising from the people, systems and processes through which a company operates...

. Backup plan to run any business event uninterrupted is a part of business continuity plan. BCP for specified organization is to be implemented for the organizational level in large scale however backup plan at individual level is to be implemented at small unit scale. Organizational management team is accountable for large scale BCP for any particular firm while respective individual management team is accountable for their BCP at small unit scale. This process may be integrated with improving security and corporate reputation risk management
Risk management
Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

 practices.

In December 2006, the British Standards Institution
BSI Group
BSI Group, also known in its home market as the British Standards Institution , is a multinational business services provider whose principal activity is the production of standards and the supply of standards-related services.- History :...

 (BSI) released a new independent standard for BCP — BS 25999-1. Prior to the introduction of BS 25999
BS 25999
BS 25999 is BSI's standard in the field of Business Continuity Management . This standard replaces PAS 56, a Publicly Available Specification, published in 2003 on the same subject.-Structure:...

, BCP professionals relied on BSI information security standard BS 7799
BS 7799
BS 7799 was a standard originally published by BSI Group in 1995. It was written by the United Kingdom Government's Department of Trade and Industry , and consisted of several parts....

, which only peripherally addressed BCP to improve an organization's information security compliance. BS 25999's applicability extends to organizations of all types, sizes, and missions whether governmental or private, profit or non-profit, large or small, or industry sector.

In 2007, the BSI published the second part, BS 25999-2 "Specification for Business Continuity Management", that specifies requirements for implementing, operating and improving a documented business continuity management system (BCMS).

In 2004, the United Kingdom enacted the Civil Contingencies Act 2004
Civil Contingencies Act 2004
The Civil Contingencies Act 2004 is an Act of the Parliament of the United Kingdom that establishes a coherent framework for emergency planning and response ranging from local to national level...

, a statute that instructs all emergency services and local authorities to actively prepare and plan for emergencies. Local authorities also have the legal obligation under this act to actively lead promotion of business continuity practices in their respective geographical areas.

– Identification of top risks and mitigating strategies.
– Considerations for resource reallocation e.g. skills matrix for larger organizations.

Analysis

The analysis phase is used in the development of a BCP manual consists of an impact analysis, threat analysis, and impact scenarios with the resulting BCP plan requirement documentation.

Impact analysis (business impact analysis, BIA)

An impact analysis results in the differentiation between critical (urgent) and non-critical (non-urgent) organization functions/ activities. A function may be considered critical if the implications for stakeholders of damage to the organization resulting are regarded as unacceptable. Perceptions of the acceptability of disruption may be modified by the cost of establishing and maintaining appropriate business or technical
Technical
Technical may refer to:*Technical , a fighting vehicle based on a pickup truck*Technical analysis, a discipline for forecasting the future direction of prices through the study of past market data*Technical drawing, also known as drafting...

 recovery
Recovery
-Health:* Healing* Cure* The Recovery model of mental distress/disorder* Recovery International, a self-help mental health program based on the work of the late Abraham A...

 solutions. A function may also be considered critical if dictated by law. For each critical (in scope) function, two values are then assigned:
  • Recovery Point Objective
    Recovery point objective
    -Recovery point objective :When computers used for normal "production" business services are affected by a "Major Incident" that cannot be fixed quickly, then the Information Technology Service Continuity Plan is performed, by the ITSC recovery team...

     (RPO) – the acceptable latency of data that will be recovered
  • Recovery Time Objective
    Recovery Time Objective
    The recovery time objective is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in business continuity....

     (RTO)  – the acceptable amount of time to restore the function


The recovery point objective must ensure that the maximum tolerable data loss for each activity is not exceeded.
The Recovery Time Objective must ensure that the Maximum Tolerable Period of Disruption
Maximum Tolerable Period of Disruption
Maximum tolerable period of disruption is the maximum amount of time that an enterprise's key products or services can be unavailable or undeliverable after an event that causes disruption to operations, before its stakeholders perceive unacceptable consequences.- Definition :The BSI Group...

 (MTPD) for each activity is not exceeded.

Next, the impact analysis results in the recovery requirements for each critical function. Recovery requirements consist of the following information:
  • The business requirements for recovery of the critical function, and/or
  • The technical requirements for recovery of the critical function

Threat analysis

After defining recovery requirements, documenting potential threats is recommended to detail a specific disaster’s unique recovery steps. Some common threats include the following:
  • Disease
  • Earthquake
  • Fire
  • Flood
    Flood
    A flood is an overflow of an expanse of water that submerges land. The EU Floods directive defines a flood as a temporary covering by water of land not normally covered by water...

  • Cyber attack
    Hacker (computer security)
    In computer security and everyday language, a hacker is someone who breaks into computers and computer networks. Hackers may be motivated by a multitude of reasons, including profit, protest, or because of the challenge...

  • Sabotage
    Sabotage
    Sabotage is a deliberate action aimed at weakening another entity through subversion, obstruction, disruption, or destruction. In a workplace setting, sabotage is the conscious withdrawal of efficiency generally directed at causing some change in workplace conditions. One who engages in sabotage is...

     (insider or external threat)
  • Hurricane or other major storm
  • Utility outage
    Power outage
    A power outage is a short- or long-term loss of the electric power to an area.There are many causes of power failures in an electricity network...

  • Terrorism
  • Theft (insider or external threat, vital information or material)
  • Random failure of mission-critical systems


All threats in the examples above share a common impact: the potential of damage to organizational infrastructure – except one (disease).
The impact of diseases can be regarded as purely human, and may be alleviated with technical and business solutions. However, if the humans behind these recovery plans are also affected by the disease, then the process can fall down.
During the 2002–2003 SARS outbreak, some organizations grouped staff into separate teams, and rotated the teams between the primary and secondary work sites, with a rotation frequency equal to the incubation period
Incubation period
Incubation period is the time elapsed between exposure to a pathogenic organism, a chemical or radiation, and when symptoms and signs are first apparent...

 of the disease.
The organizations also banned face-to-face contact between opposing team members during business and non-business hours. With such a split, organizations increased their resiliency against the threat of government-ordered quarantine
Quarantine
Quarantine is compulsory isolation, typically to contain the spread of something considered dangerous, often but not always disease. The word comes from the Italian quarantena, meaning forty-day period....

 measures if one person in a team contracted or was exposed to the disease. Damage from flooding also has a unique characteristic. If an office environment is flooded with non-salinated and contamination-free water (e.g., in the event of a pipe burst), equipment can be thoroughly dried and may still be functional.

Definition of impact scenarios

After defining potential threats, documenting the impact scenarios that form the basis of the business recovery plan is recommended. In general, planning for the most wide-reaching disaster or disturbance is preferable to planning for a smaller scale problem, as almost all smaller scale problems are partial elements of larger disasters. A typical impact scenario like 'building loss' will most likely encompass all critical business functions, and the worst potential outcome from any potential threat. A business continuity plan may also document additional impact scenarios if an organization has more than one building. Other more specific impact scenarios – for example a scenario for the temporary or permanent loss of a specific floor in a building – may also be documented. Organizations sometimes underestimate the space necessary to make a move from one venue to another. It is imperative that organizations consider this in the planning phase so they do not have a problem when making the move.

Recovery requirement documentation

After the completion of the analysis phase, the business and technical plan requirements are documented in order to commence the implementation phase. A good asset management program can be of great assistance here and allow for quick identification of available and re-allocatable resources. For an office-based, IT intensive business, the plan requirements may cover the following elements which may be classed as ICE (In Case of Emergency) Data:
  • The numbers and types of desks, whether dedicated or shared, required outside of the primary business location in the secondary location
  • The individuals involved in the recovery effort along with their contact and technical details
  • The applications and application data required from the secondary location desks for critical business functions
  • The manual workaround solutions
  • The maximum outage allowed for the applications
  • The peripheral requirements like printers, copier, fax machine, calculators, paper, pens etc.

Other business environments, such as production, distribution, warehousing etc. will need to cover these elements, but are likely to have additional issues to manage following a disruptive event.

Solution design

The goal of the solution design phase is to identify the most cost effective disaster recovery
Disaster recovery
Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity...

 solution that meets two main requirements from the impact analysis stage. For IT applications, this is commonly expressed as:
  1. The minimum application and application data requirements
  2. The time frame in which the minimum application and application data must be available

Disaster recovery plans may also be required outside the IT applications domain, for example in preservation of information in hard copy format, loss of skill staff management, or restoration of embedded technology in process plant.
This BCP phase overlaps with disaster recovery planning
Disaster recovery
Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity...

 methodology. The solution phase determines:
  • the crisis management command structure
  • the location of a secondary work site (where necessary)
  • telecommunication architecture between primary and secondary work sites
  • data replication methodology between primary and secondary work sites
  • the application and software required at the secondary work site, and
  • the type of physical data requirements at the secondary work site.

Implementation

The implementation phase, quite simply, is the execution of the design elements identified in the solution design phase. Work package testing may take place during the implementation of the solution, however; work package testing does not take the place of organizational testing.

Testing and organizational acceptance

The purpose of testing is to achieve organizational acceptance that the business continuity solution satisfies the organization's recovery requirements. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws, or solution implementation errors. Testing may include:
  • Crisis command team call-out testing
  • Technical swing test from primary to secondary work locations
  • Technical swing test from secondary to primary work locations
  • Application test
  • Business process test


At minimum, testing is generally conducted on a biannual or annual
Year
A year is the orbital period of the Earth moving around the Sun. For an observer on Earth, this corresponds to the period it takes the Sun to complete one course throughout the zodiac along the ecliptic....

 schedule. Problems identified in the initial testing phase may be rolled up into the maintenance phase and retested during the next test cycle.

In the 2008 book Exercising for Excellence, published by The British Standards Institution the authors, Crisis Solutions, identified three types of exercise that can be employed when testing business continuity plans.

Simple exercises
A simple exercise is often called a ‘desktop’ or ‘workshop’. It typically involves a small number of people, perhaps 5–20, and concentrates on a specific aspect of a business continuity plan or a specific subject area. (For example, Human Resources, Information Technology or Media) However, the beauty of a Simple exercise is that it can easily accommodate complete teams from various areas of a business. The numbers may increase and with it the logistics but the objectives will remain the same.
Alternatively it could involve a single representative from several teams rather than needing the whole team to attend. It will seldom involve the provision of a Virtual World environment or the need for other than everyday resources. Typically, participants will be given a simple scenario and then be invited to discuss specific aspects of a company’s BCP.
For example, a fire is discovered out of working hours – what are the current call out procedures – how is the incident management team activated – where does it meet – do the current documented procedures cover all eventualities?
It will probably last no more than three hours and is often split into two or three sessions, each concentrating on a different theme. In this case either two or three different scenarios can be used or one scenario can be progressively developed to introduce themes that need to be addressed. Real time pressure is not usually an element of Simple exercises.
Questions will need to be crafted ahead of time so that facilitators ensure discussions are productive and germane to the objectives of the event.

Medium exercises
A medium exercise will invariably be conducted within a Virtual World and will usually bring together several departments, teams or disciplines. It will typically concentrate on more than one aspect of the BCP prompting interaction between teams.
The scope of a medium exercise can range from a small number of teams from one organisation being co-located in one building to multiple teams operating from dispersed locations. Attempts should be made to create as realistic an environment as practicable and the numbers of participants should reflect a realistic situation. Depending on the degree of realism required it may be necessary to produce simulated news broadcasts, together with simulated websites.
A medium exercise will normally last between two and three hours, though they can take place over several days.
They typically involve a Scenario Cell who feed in pre-scripted injects throughout the exercise to give information and prompt actions.

Complex exercises
A Complex exercise is perhaps the hardest to define as it aims to have as few boundaries as possible. It will probably incorporate all the aspects of a medium exercise and many more. Elements of the exercise will inevitably have to remain within a virtual world, but every attempt should be made to achieve realism. this might include a no-notice activation, actual evacuation and actual invocation of a disaster recovery site.
While a start and cut off time will have to be agreed, the actual duration of the exercise might be unknown if events are allowed to run their course in real time. If it takes two hours to get to the DR site instead of the expected forty-five minutes, the exercise must be flexible enough to cater for this. If a key player is unavailable a deputy must be prepared to step in.

Definitions
These definitions provide broad guidance as to the types of available exercise but it should be recognised that there can be considerable ‘blurring of the edges’. It is possible to conduct a Simple exercise at a Recovery Site thereby adding a different dimension but this would not necessarily make it a Medium exercise. Regardless of the category, the importance of an exercise is that it achieves its defined objectives.

Maintenance

Maintenance of a BCP manual is broken down into three periodic activities.
The first activity is the confirmation of information in the manual, roll out to ALL staff for awareness and specific training for individuals whose roles are identified as critical in response and recovery.
The second activity is the testing and verification of technical solutions established for recovery operations.
The third activity is the testing and verification of documented organization recovery procedures. A biannual or annual maintenance cycle is typical.

Information update and testing

All organizations change over time, therefore a BCP manual must change to stay relevant to the organization. Once data accuracy is verified, normally a call tree test is conducted to evaluate the notification plan's efficiency as well as the accuracy of the contact data. Some types of changes that should be identified and updated in the manual include:
  • Staffing changes
  • Staffing personal
  • Changes to important clients and their contact details
  • Changes to important vendors/suppliers and their contact details
  • Departmental changes like new, closed or fundamentally changed departments.
  • Changes in company investment portfolio and mission statement
  • Changes in upstream/downstream supplier routes

Testing and verification of technical solutions

As a part of ongoing maintenance, any specialized technical deployments must be checked for functionality. Some checks include:
  • Virus
    Computer virus
    A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...

     definition distribution
  • Application security and service patch distribution
  • Hardware operability check
  • Application operability check
  • Data verification
  • Data application

Testing and verification of organization recovery procedures

As work processes change over time, the previously documented organizational recovery procedures may no longer be suitable. Some checks include:
  • Are all work processes for critical functions documented?
  • Have the systems used in the execution of critical functions changed?
  • Are the documented work checklists meaningful and accurate for staff?
  • Do the documented work process recovery tasks and supporting disaster recovery infrastructure allow staff to recover within the predetermined recovery time objective.

Treatment of test failures

As suggested by the diagram included in this article, there is a direct relationship between the test and maintenance phases and the impact phase. When establishing a BCP manual and recovery infrastructure from scratch, issues found during the testing phase often must be reintroduced to the analysis phase.

See also

  • Business Continuity Institute
    Business Continuity Institute
    The Business Continuity Institute was established in 1994 to enable individual members to obtain guidance and support from fellow business continuity practitioners...

  • Catastrophe modeling
    Catastrophe modeling
    Catastrophe modeling is the process of using computer-assisted calculations to estimate the losses that could be sustained due to a catastrophic event such as a hurricane or earthquake...

  • Disaster recovery
    Disaster recovery
    Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity...

  • Disaster
    Disaster
    A disaster is a natural or man-made hazard that has come to fruition, resulting in an event of substantial extent causing significant physical damage or destruction, loss of life, or drastic change to the environment...

  • Emergency management
    Emergency management
    Emergency management is the generic name of an interdisciplinary field dealing with the strategic organizational management processes used to protect critical assets of an organization from hazard risks that can cause events like disasters or catastrophes and to ensure the continuance of the...

  • Natural hazards
  • Man-made hazards
  • Space accidents and incidents
  • Risk management
    Risk management
    Risk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...

  • Disaster recovery and business continuity auditing
    Disaster recovery and business continuity auditing
    Disaster recovery and business continuity refers to an organization’s ability to recover from a disaster and/or unexpected event and resume or continue operations. Organizations should have a plan in place that outlines how this will be accomplished...

  • Systems engineering
    Systems engineering
    Systems engineering is an interdisciplinary field of engineering that focuses on how complex engineering projects should be designed and managed over the life cycle of the project. Issues such as logistics, the coordination of different teams, and automatic control of machinery become more...

  • Systems engineering process
    Systems engineering process
    A systems engineering process is a process for applying systems engineering techniques to the development of all kinds of systems. Systems engineering processes are related to the stages in a system life cycle...

  • System lifecycle
    System lifecycle
    The system lifecycle in systems engineering is an examination of a system or proposed system that addresses all phases of its existence to include system conception, design and development, production and/or construction, distribution, operation, maintenance and support, retirement, phase-out and...

  • Systems thinking
    Systems thinking
    Systems thinking is the process of understanding how things influence one another within a whole. In nature, systems thinking examples include ecosystems in which various elements such as air, water, movement, plants, and animals work together to survive or perish...

  • Resilience (organizational)
    Resilience (organizational)
    Resilience is defined as “the positive ability of a system or company to adapt itself to the consequences of a catastrophic failure caused by power outage, a fire, a bomb or similar” event....


International Organization for Standardization

  • ISO/IEC 27001:2005 (formerly BS 7799-2:2002) Information Security Management System
  • ISO/IEC 27002:2005 (remunerated ISO17999:2005) Information Security Management – Code of Practice
  • ISO/IEC 22399:2007 Guideline for incident preparedness and operational continuity management
  • ISO/IEC 24762:2008 Guidelines for information and communications technology disaster recovery services
  • IWA 5:2006 Emergency Preparedness

British Standards Institution

  • BS 25999-
    BS 25999
    BS 25999 is BSI's standard in the field of Business Continuity Management . This standard replaces PAS 56, a Publicly Available Specification, published in 2003 on the same subject.-Structure:...

    1:2006 Business Continuity Management Part 1: Code of practice
  • BS 25999-2:2007 Business Continuity Management Part 2: Specification
  • BS 25777:2008 Information and communications technology continuity management – Code of practice

Others

  • "A Guide to Business Continuity Planning" by James C. Barnes
  • "Business Continuity Planning", A Step-by-Step Guide with Planning Forms on CDROM by Kenneth L Fulmer
  • "Business Continuity Plan Design, 8 Steps for Getting Started Designing a Plan" By Richard Kepenach
  • "Disaster Survival Planning: A Practical Guide for Businesses" by Judy Bell
  • ICE Data Management (In Case of Emergency) made simple – by MyriadOptima.com
  • Harney, J.(2004). Business continuity and disaster recovery: Back up or shut down.
  • AIIM E-Doc Magazine, 18(4), 42–48.
  • Dimattia, S. (November 15, 2001).Planning for Continuity. Library Journal,32–34.
  • Exercising for Excellence (Delivering successful business continuity management exercises) by Crisis Solutions

External links


Standards organizations


Competency certification ventures

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK