Critical Systems Flashcards

Question

What do we mean by decoupling and simplicity in the dependability mindset?

Answer 1

Localise critical properties to components. Allows making assurances easier to check locally.

Answer 2

- rigorous techniques for the specification, development and (manual or automated) verification of software and hardware systems. - logic-based (e.g. propositional logic) so that we can formulate what we want to check. - model checking

Answer 3

- allows for desired behaviour properties to be verified via a suitable model of the system. - completely automatic - offers counterexamples

Answer 4

- set of states connected by transitions - transitions labelled with elements from an alphabet (actions) - states denote a snapshot or configuration of the system

Answer 5

A system can be trusted to give the right service.

Answer 6

The system is up and running when needed.

Answer 7

The system gives the correct service again and again.

Answer 8

The system does not harm people or the environment.

Answer 9

The system can resist attacks or mistakes.

Answer 10

A problem in the system (like bad code).

Answer 11

A wrong state inside the system.

Answer 12

The system gives the wrong service to users.

Answer 13

Fault avoidance: stop mistakes early. Fault detection/removal: find and fix errors. Fault tolerance: keep working even if faults exist.

Answer 14

Work without causing injury, death, or damage.

Answer 15

Hazard avoidance: design so hazards can’t happen. Hazard detection/removal: catch hazards before accidents. Damage limitation: reduce harm if accidents occur.

Answer 16

Avoid vulnerabilities Detect and stop attacks Limit damage and recover

Answer 17

Programmer makes an error Error becomes a fault in the code Fault gets activated → creates an error state If error affects service → failure

Answer 18

Denver: baggage system failed → huge delays. St Helena: airport built but planes couldn’t land safely. Renfe: trains too big for tunnels → wasted money. Common issue: poor planning + weak requirements.

Answer 19

Auditable → can be checked by others. Complete → no missing parts. Sound → arguments are correct. Shows why a system can be trusted.

Answer 20

Build a model of system behaviour. Check safety rules (e.g., gates closed when train passes). Automatic → gives counterexamples if rules fail. Helps prove safety before real use.

Answer 21

Systems run in real-world settings. Example: plane thrust only works on ground, not in air. Hard to model humans, traffic, or changing rules. Environment adds uncertainty.

Answer 22

Equipment → hardware/devices. OS → basic software. Middleware → connects systems. Applications → do tasks. Business processes → people + systems. Organisation → strategy. Society → laws, culture. All layers affect each other.

Answer 23

Appear when parts work together. Examples: reliability, security, usability. Cannot be seen in single components. Only visible in whole system.

Answer 24

People act differently each time. Systems change often (software, hardware, data). Same input ≠ same output.

Answer 25

Success depends on viewpoint. Example: hospital system → managers happy (reports), doctors unhappy (less patient time). One group’s success = another’s failure.

Answer 26

Everyday glitches. Cause extra work for users. Not catastrophic, but waste time. Recovery cost = extra effort.

Answer 27

Requirements engineering is the systematic process of identifying, analysing, documenting, validating, and managing stakeholder needs so that the system built actually solves the right problem and meets stakeholder expectations.

Answer 28

Environments change fast. Stakeholders disagree. People unclear about needs. Politics influence decisions. Hard to get stable, clear requirements.

Answer 29

EU project for secure medical data sharing. Patient-focused, GDPR compliant. Used blockchain + privacy-preserving AI. Aim: trustworthy healthcare systems.

Answer 30

All systems fail at some point. Failures cannot be fully avoided. We must plan for them.

Answer 31

Real world is complex. Traffic, people, weather, rules all change. Hard to capture everything in one model.

Answer 32

Different groups want different things. One group’s success = another group’s failure. Conflicts never fully go away.

Answer 33

Ways to group needs by stakeholder type. Examples: end‑user, manager, admin, engineer.

Answer 34

Big issues that affect all. Examples: safety, privacy, cost, usability. Link goals → system needs.

Answer 35

Data Protection Act → keep info private. Mental Health Act → rules for patient detention.

Answer 36

System must warn if patient allergic to medicine. Prescriber can override, but system records it.

Answer 37

Step‑by‑step process. From requirements → design → build → test → integrate → validate.

Answer 38

Tech changes. Organisations change. Markets, politics, laws change. So system needs change too.

Answer 39

Busy people give vague needs. Hard to get clear, detailed requirements.

Answer 40

Different systems need different detail. Example: railway signals need strict specs. Games need storyboards.

Answer 41

Start with risks. Find ways to reduce them. Steps: 1. Risk Identification 2. Risk Analysis 3. Risk reduction 4. Risk requirements

Answer 42

Preliminary → risks from environment. Life cycle → risks from design/build. Operational → risks from users/operators.

Answer 43

“As Low As Reasonably Practical.” Keep risks small, within cost/time limits.

Answer 44

Find dangers that may harm system. Types: physical, electrical, biological, service failure.

Answer 45

Never give too much insulin. Overdose is life‑threatening.

Answer 46

Diagram of causes of a hazard. Shows how failures combine. Goal: avoid single points of failure.

Answer 47

Functional → check errors, recover from attacks Non‑functional → usability, reliability, availability Excluding → avoid unsafe system states

Answer 48

Use methods like DISCOS to test if they’re complete and consistent.

Answer 49

Avoid the risk. Detect & remove faults. Limit damage if failure happens.

Answer 50

A tool that helps humans judge risks better. It’s itself safety‑critical.

Answer 51

L0 → No automation. L1–2 → Driver assist (ADAS). L3 → Conditional automation (car monitors environment, driver still needed). L4 → High automation (car handles most tasks, driver steps in sometimes). L5 → Full automation (no driver needed at all).

Answer 52

POFOD → Probability of Failure on Demand. Use for on‑demand safety systems (airbags, shutdown systems). ROCOF → Rate of Occurrence of Failures. Use for continuous systems where failure frequency matters. MTTF → Mean Time To Failure. Use when you need time‑to‑failure for hardware or long‑running components. Availability → % of time system is up and running. Use when service uptime is the key requirement (web, ATC, hospitals).

Answer 53

Safety → failures are accidental. Security → failures may be caused by attackers who know system weaknesses.

Answer 54

1. Identifying assets, 2. Identifying threats, 3. Analysing vulnerabilities, 4. Evaluating and prioritising risks based on likelihood and impact.

Answer 55

Redundancy → backup copies/components. Diversity → different ways to do the same job, so one bug doesn’t break all versions.

Answer 56

A separate system that monitors and shuts down equipment if danger is detected. Example: reactor shutdown system.

Answer 57

Multiple channels run the same computation. If outputs differ → assume a failure. Used in Airbus flight control.

Answer 58

Three identical components. They vote on the output. If one disagrees → it is ignored.

Answer 59

Several teams build different versions of the same software. A voting system picks the majority result.

Answer 60

Teams make similar mistakes Specification errors affect all versions Hard to ensure true independence

Answer 61

Unexpected behaviour that appears when components interact. Not visible when looking at components alone.

Answer 62

LTS + clock variables. Used to model real‑time systems like controllers.

Answer 63

Timed automata with costs such as energy, memory, or usage counts.

Answer 64

A model where transitions have probabilities.

Answer 65

Mix of choices (actions) and probabilities.

Answer 66

Automaton with continuous variables (e.g., physics).

Answer 67

Industrial systems that monitor and control infrastructure like water, power, gas.

Answer 68

Hallucinations Indirect prompt injection Jailbreaks

Answer 69

LLMs may produce confident but false medical statements. This can lead to unsafe treatment.

Answer 70

A 1989 crash of a British Midland Boeing 737‑400 near Kegworth after the crew shut down the wrong engine.

Answer 71

A fan blade failure in the left engine caused vibration, smoke smell, and loss of power. The pilots misinterpreted vibration readings, cockpit cues, and smell location. They shut down the right engine, which was working.

Answer 72

New aircraft model Crew trained on older version Instruments placed differently Vibration gauge moved to a new location Crew relied on habit, not the new layout

Answer 73

Fault: fan blade failure Error: crew believed the wrong engine was failing Failure: shutting down the working engine → loss of thrust Hazard: aircraft unable to maintain flight Accident: crash short of runway

Answer 74

The aircraft relied on pilot judgement rather than diverse independent systems. A second independent diagnostic system could have prevented the wrong engine shutdown.

Answer 75

Automated baggage system failed → jams, lost bags, huge delays.

Answer 76

Over‑ambition, late requirement changes, poor integration, no full testing.

Answer 77

Shared — airport authority, contractors, airlines, weak project governance.

Answer 78

Freeze requirements, incremental rollout, proper integration testing, realistic schedule.

Answer 79

Requirements engineering, socio‑technical systems, dependability, emergent failures.

Answer 80

£285m airport built but planes couldn’t land safely due to wind shear.

Answer 81

Poor environmental modelling, missed hazard, political pressure.

Answer 82

Government planners, consultants, political stakeholders.

Answer 83

Full hazard analysis, independent safety review, prototype testing.

Answer 84

Safety vs reliability, hazard identification, system boundaries, socio‑technical pressure.

Answer 85

The Ariane 5 rocket exploded 37 seconds after launch (1996) due to a software failure in the inertial reference system.

Answer 86

- Reused Ariane 4 software without re‑validating assumptions. - Integer overflow: 64‑bit floating‑point → 16‑bit integer conversion failed. - Unhandled exception → inertial system shut down. - Backup system failed the same way (identical software). - Rocket received nonsense attitude data, causing violent course correction → breakup.

Answer 87

- Design team for reusing software without checking environmental assumptions. - Management for not requiring full re‑validation. - Process failure, not individual failure: - No exception handling. - No independent diversity in backup system. - Inadequate testing for Ariane 5’s different flight profile

Answer 88

- Validate reused software against new system environment. - Add exception handling for overflow conditions. - Use design diversity in backup systems. - Perform full system‑level testing under Ariane 5 flight conditions. - Apply risk‑driven specification to identify critical assumptions

Answer 89

- Dependability (failure due to unhandled fault). - Fault → Error → Failure chain (classic example). - Fault prevention (bad assumptions). - Fault tolerance (backup system identical → no diversity). - Safety‑critical software engineering. - Requirements engineering (environment assumptions not captured). - Reusing components without re‑verification. - Hazard analysis (integer overflow as intolerable risk). - System boundaries (software assumed Ariane 4 flight dynamics)

Answer 90

- Better human‑centred design of engine indicators. - Clearer warning systems for engine failure. - Updated training for new engine behaviour. - Stronger cockpit–cabin communication protocols. - Use of formal methods to check assumptions about human interaction and system cues.

Answer 91

Distributed, Interacting, Complex, Organisational Systems — a socio‑technical method for analysing accidents by looking at the whole system, not just individuals.

Answer 92

To understand how organisational structures, communication, tools, and people interact to create conditions for failure, avoiding “blame the operator”.

Answer 93

Latent organisational failures such as poor training, outdated procedures, mismatched assumptions, unclear responsibilities, and weak communication channels.

Answer 94

Large‑scale socio‑technical systems: aviation, healthcare, rail, defence, banking, and complex IT deployments — anywhere many people + tech interact.

Answer 95

Kegworth shows that the crash wasn’t just “pilot error”: - training was based on older aircraft models, - cockpit indicators were ambiguous, - organisational communication didn’t highlight differences in the new engine behaviour, - procedures didn’t match real‑world cues. DISCOS reveals how distributed organisational decisions created conditions where the pilots’ mistake became likely.

Critical Systems Flashcards

(119 cards)