What is meant by Reliability Specification
Meaning: is a technical document that outlines the requirements for a systems reliability.
Reliability can be measured, so non-functional reliability requirements may be specified quantitatively.
Software reliability requirements may also be included to cope with hardware failure or operator error.
Name each part of the specification process (4)
Risk Identification:
Identify the types of system failures that may lead to economic losses.
Risk Analysis:
Estimate the costs and consequences of the different types of software failure
Risk Decomposition:
Identify the root causes of system failure
Risk Reduction:
Generate reliability specifications, including quantitative requirements
defining the acceptance levels of failure.
Recap (Name the types of System Failures)
Loss of Service:
- The system is unavailable and cannot deliver its services to users.
Incorrect service delivery:
The system does not deliver a service correctly to users.
System/data corruption:
Damages to the system or its data. Usually in conjunction with other types of failures.
What is meant by a reliability metric?
Meaning: Probability that a system failure will occur when in use in a particular setting.
Note: System reliability is measured by counting the number of operational failures and, where
appropriate, relating these to the demands made on the system and the time that the system has been operational.
Key Note: A long-term measurement programme is required to assess the reliability of critical systems.
Name and Describe each Reliability Metric?
Probability of failure on demand (POFOD):
- Meaning: The probability that the system will fail when a request for service is made.
- Used when demands for service are intermittent and relatively infrequent.
- Appropriate for protection systems where services are demanded occasionally and where there are serious consequences in case of failure.
Rate of occurrence of failures/Mean time to failure (ROCOF/MTTF):
- Meaning: Reflects the rate of occurrence of failure in the system.
* ROCOF of 0.002 means 2 failures
are likely in each 1000 operational
time units e.g. 2 failures per 1000
hours of operation.
- Relevant for systems that process a large number of similar requests in a defined time period.
* E.g., Credit card processing system, Supermarket checkout system.
Mean Time to Failure (MTTF):
- Meaning: Measures the average length time a system can be expected to run without failure.
- Relevant for systems with long transactions i.e. where system processing takes a long time. e.g. conveyor belt
- MTTF should be longer than expected transaction length so that the system does not normally fail (recall meaning) during a session or transaction.
Availability:
- Meaning: Measure of the fraction
of the time that the system is
available for use.
- Takes repair and restart time into
account.
- Availability of 0.998 means
software is available for 998 out of
1000 time units.
- Relevant for non-stop,
continuously running systems:
* telephone switching systems,
railway signalling systems,
e-commerce systems, etc
What are the failure consequences?
When specifying reliability, it is not just the number of system failures that matter but the consequences of these failures.
Failures that have serious consequences are clearly more damaging than those where repair and recovery is straightforward.
In some cases, therefore, different reliability specifications for different types of failure may be defined.
What is meant by ‘over-specification’ of reliability?
Meaning: a high-level of reliability is specified but it is not cost-effective to achieve this.
How do we avoid this:
- Specify reliability requirements for different types of failure. Minor failures may be acceptable.
- Specify requirements for different services separately. Critical services should have the highest reliability requirements.
- Decide whether or not high reliability is really required or if dependability goals can be achieved in some other way.
Describe the steps to reliability specification?
Note: Different metrics may be used for different requirements.
What is meant by Functional reliability requirements?
Name each type of functional reliability requirement?
Meaning: specification for system and software functionality that avoids, detects and tolerates software faults.
3 Types:
Checking requirements:
- identify checks needed to ensure that incorrect data is detected before it leads to a failure.
Recovery requirements:
- Are geared to help the system recover after a failure has occurred.
Redundancy requirements:
- specify redundant features of the system to be included.
Process requirements:
- Are for reliability which specify the development process to be
used may also be included.
Summary of Topic (Reliability Specification)
Reliability requirements can be defined quantitatively. They include
probability of failure on demand (POFOD), rate of occurrence of failure (ROCOF) and availability (AVAIL).