what is the goal of anonymization
Balancing Data Privacy and Data Utility to make data less specific while retaining its usefulness
original database goes through _________ to become published database
anonymization
what are some anonymisation techniques
Attribute Suppression
Character Masking
Generalisation
Swapping
Data Perturbation
Synthetic Data
Data aggregation
K-anonymity
Pseudonymization
what is attribute suppression
what is an example of attribute suppression
Example: Data consists of test scores
what is character masking
what is an example of character masking
Example: online grocery store conducting a study of its delivery demand from historical data
what is generalisation
example of generalisation
Example: Dataset contains person name, age in years, and residential address
* Age ranges of 10 years, starting with a range <20 years, and ending with
range >60 years
* Remove the block/house number and retain only the road name in Addres
what is swapping
what is an example of swapping
Example: Dataset contains information about customer records for a business organisation
what is Data Perturbation
what is an example of data perturbation
rounding off the values of the numeric columns to either base 3 or base 5 depending on the range of values of the attribute.
what is synthetic data
example of synthetic data
Example: Office facility, providing “hot-desking” facilities, keep records of the time that users start and end using their facilities.
what is data aggregation
what is an example of data aggregation
Example: charity organisation has records of the donations made, as well as some information about the donors. Aggregated data is assessed to be sufficient to perform data analysis
what is K-anonymity
what is an example of k-anonymity
Example: Research needs to be done on the types of disease
what is Pseudonymization
what is an example of pseudonymization
Example: names of persons who obtained their driving licenses and other information
Useful for cross dataset linking and where original data structure is needed, but does not comply with personal data protection regulations, if applied specifically on explicit identifiers
what are the 2 phases in the anonymisation methodology
Anonymisation Preparation Phase
Anonymisation Execution Phase
what are the 4 steps in the anonymisation preparation phase
determine the release model
determine the reidentification risk threshold
classify the data attributes
remove unused data attributes
what does determine the release model mean ?