What is data classification and how is it used by systems?
Data classification labels data by sensitivity so systems can apply different controls.
- Systems tag assets and datasets (public, internal, confidential, regulated).
- Policies use labels to decide access, encryption requirements, logging, and retention.
- Classification only matters if enforcement points read the label and apply rules.
How does asset identification work for data security?
Asset identification is mapping where sensitive data lives and how it flows.
- Systems inventory stores (databases, object stores, logs, analytics) and the datasets inside them.
- Systems map data producers/consumers and transfer paths (APIs, ETL, exports).
- Controls are applied per asset; unknown assets are typically uncontrolled assets.
What breaks classification and asset identification in practice?
Cause → system behavior → security impact.
- Cause: data stores or pipelines are not inventoried or labeled.
- Behavior: policies do not apply because enforcement cannot target unknown/labeled assets.
- Impact: sensitive data is stored or exported without required controls.
What are data access control decisions (who/what can read)?
Access control decisions are checks that determine whether a principal can read or modify data.
- Input: principal identity (user/service), requested action, target dataset/object, context.
- Decision: allow or deny based on policy rules.
- Enforcement: the storage or service must perform the check before returning data.
How do systems enforce data access control mechanically?
Step 1: Principal authenticates and receives an identity.
Step 2: Principal requests an action on a data resource.
Step 3: Enforcement evaluates policy rules (role/attribute/context) against the request.
Step 4: If allowed, data is returned; if denied, data is not returned.
What breaks data access control in practice?
Cause → system behavior → security impact.
- Cause: over-permissioned roles, shared credentials, or missing authorization checks in a data API.
- Behavior: data is returned to principals that should be denied.
- Impact: unauthorized disclosure or modification of sensitive data.
What is encryption in transit vs at rest (applied)?
They protect data in different states and against different attacker positions.
- In transit: protects data while moving between endpoints by encrypting network traffic (eavesdroppers cannot read).
- At rest: protects stored data by encrypting it on disk/object storage (disk snapshots or stolen media cannot be read without keys).
| Neither control replaces access control; both reduce exposure when a specific layer is compromised.
What breaks encryption in transit in practice?
Cause → system behavior → security impact.
- Cause: TLS (Transport Layer Security) not used, weak validation, or trust misconfiguration.
- Behavior: attacker on the network can read or modify traffic because encryption/validation checks do not protect the channel.
- Impact: credential theft, data interception, and request tampering.
What breaks encryption at rest in practice?
Cause → system behavior → security impact.
- Cause: keys are accessible to the attacker in the same environment as the encrypted data.
- Behavior: attacker reads ciphertext and also obtains keys, so decryption succeeds.
- Impact: at-rest encryption fails to reduce disclosure in that compromise scenario.
What are key management patterns for data protection?
Key management patterns define how keys are generated, stored, used, and rotated.
- Separate data encryption keys from key encryption keys to limit blast radius.
- Restrict who/what can request decryption operations.
- Rotate keys so compromised keys stop enabling future decryption.
| The core mechanism is controlling which principals can cause decryption to happen.
How do systems enforce key usage controls mechanically?
Step 1: Principal requests an encrypt/decrypt operation using a key identifier.
Step 2: Key management service validates the principal’s authorization for that key and operation.
Step 3: If allowed, the service performs the cryptographic operation and returns result.
Step 4: All key usage is logged so misuse can be detected and investigated.
What is envelope encryption (applied) and what does it achieve?
Envelope encryption encrypts data with a Data Encryption Key (DEK) and protects that DEK with a Key Encryption Key (KEK).
- Data is encrypted with a per-object/per-record DEK.
- DEK is encrypted (“wrapped”) using a KEK controlled by a key management system.
- To decrypt, system must unwrap DEK via authorized access to the KEK.
| This limits blast radius and simplifies key rotation strategies.
How does envelope encryption work mechanically?
Step 1: Generate a DEK for the data item.
Step 2: Encrypt data using the DEK, producing ciphertext.
Step 3: Encrypt (wrap) the DEK using the KEK, producing wrapped key material.
Step 4: Store ciphertext + wrapped DEK; decrypt requires authorized KEK unwrap then data decrypt.
What breaks envelope encryption in practice?
Cause → system behavior → security impact.
- Cause: KEK access is too broad or decryption service is exposed.
- Behavior: attacker can unwrap DEKs and decrypt data because authorization checks allow it.
- Impact: envelope structure exists but does not limit who can decrypt.
What is tokenization vs encryption (mechanical difference)?
They protect data using different mechanisms.
- Encryption: transforms plaintext into ciphertext using a key; decryption reverses it with the key.
- Tokenization: replaces plaintext with a token; a separate mapping system returns plaintext when authorized.
- Tokenization security depends on controlling access to the token vault/mapping lookup, not on cryptographic secrecy alone.
How do systems enforce tokenization mechanically?
Step 1: Service sends sensitive value to tokenization system.
Step 2: System stores mapping (token ↔ value) and returns token.
Step 3: Downstream systems store/process token instead of real value.
Step 4: Detokenization requires an authorized lookup; unauthorized principals cannot get the original value.
What breaks tokenization in practice?
Cause → system behavior → security impact.
- Cause: token vault access is broad or detokenization endpoints are reachable by too many services.
- Behavior: attacker detokenizes tokens at scale because authorization checks allow it.
- Impact: tokens become thin wrappers and do not reduce disclosure.
What are data integrity checks (hash/MAC/signature usage)?
Integrity checks verify data has not been altered, with different trust properties.
- Hash: detects change only if you have a trusted reference hash; hash alone does not prove who changed it.
- MAC (Message Authentication Code): proves integrity and authenticity to parties that share a secret key.
- Digital signature: proves integrity and authenticity to anyone who trusts the signer’s public key.
How do systems enforce integrity checks mechanically?
Step 1: When data is created, compute integrity proof (hash/MAC/signature) over exact bytes.
Step 2: Store or transmit data with its integrity proof.
Step 3: On read/use, recompute proof and compare or verify with key/public key.
Step 4: If verification fails, system rejects the data as modified or untrusted.
What breaks integrity checks in practice?
Cause → system behavior → security impact.
- Cause: integrity proof is not verified at use time, or keys used for MAC/signing are compromised.
- Behavior: altered data is accepted because verification is skipped or attacker can forge valid proofs.
- Impact: data tampering becomes silent and can affect correctness, safety, and security decisions.
What are data retention and deletion mechanics?
Retention defines how long data exists; deletion defines how it is removed or made inaccessible.
- Systems enforce retention via lifecycle rules and expiration policies.
- Deletion requires removing references and ensuring copies (backups/replicas) are handled per policy.
| “Deleted” is only meaningful if systems prevent future reads of the data.
What breaks retention and deletion in practice?
Cause → system behavior → security impact.
- Cause: unmanaged copies (exports, snapshots, caches) are outside retention enforcement.
- Behavior: data persists in secondary locations and remains readable.
- Impact: long-term exposure continues even after primary deletion.
What are backups and restore trust (what is verified)?
Backups preserve data; restore trust is proving restored data and systems are safe to use.
- Backups can also preserve compromised or tampered state.
- Restore trust requires verifying integrity, provenance, and that restored credentials and configs are not attacker-controlled.
| Restore is not just copying bytes back; it is re-establishing a trustworthy state.
How do you verify backups before and after restore?
Step 1: Verify backup integrity (hashes/signatures, immutable storage properties if used).
Step 2: Verify backup content version/time matches the intended recovery point.
Step 3: Restore into an isolated environment and validate expected behavior and access controls.
Step 4: Rotate credentials/keys as needed so old compromised material cannot be reused after restore.