How do you approach anonymization in general?
Explain what device manufacturers do to harden tracking
However: once connected, address is stable
Name and explain 5 ethics theories and practices
Name 3 types of publication
What categories of records exist?
Name 3 anonymization objectives
Protect individuals against:
be aware that additional information can be used
Name 4 anonymization approaches
Suppression: remove (parts of) attributes
Generalization: Limit granularity: age: 21 -> age 20-30
Perturbation: Add noise to data while preservice general properties
Permutation: Swap association of attributes across records
What is k-Anonymity and how does it work?
Each record should be indistinguishable from at least (k-1) others on its QI attributes
or: Cardinality of any query result should be at least k
We have to consider all different data sources and try to avoid linking
Name some problems with k-anonymity
Efficiency problem: expensive to find k-anonymity transform with max utility
Security problem (e.g. k=4): k-anon: identifier, quasi-identifier, other.
1: Other data might have little diversity: information may be leaked
2: In presence of side-information, any information can become quasi-identifier
What is L-diversity?
Build q-blocks (grouped by one attribute).
A q-block is l-diverse if it contains at least L well represented values for the sensitive attribute S.
A table is l-diverse if every q-block is l-diverse
What is t-closeness?
An equivalence class is said to have t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is not more than t
How can we practically deal with anonymity vs utility?
It is rare that releasing data sets after sanitization can preservice privacy and utility so we can only release the rich under license to designated trusted parties