Data Quality Flashcards

(23 cards)

1
Q

What is data quality?

A

Data quality refers to how accurate, complete, reliable, and relevant data is for its intended purpose.

Poor quality = poor decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define reliability in the context of data quality.

Also example

A

Data is consistent over time and from trusted sources.
If you measure the same thing multiple times under the same conditions, you get the same result.

Example: A sensor records the temperature as 25°C every time in the same environment — this is reliable. If readings jump randomly (25°C → 30°C → 20°C), it’s unreliable.

If you measure the same thing multiple times under the same conditions, you get the same result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List methods to ensure reliability.

A

Use standardised measurement tools – ensures the same method or device is used each time so results are consistent and comparable.

Automate data collection to reduce human error – removes bias or mistakes that can occur from manual entry, keeping results stable.

Test data consistency over time – regularly check if the same process produces similar outcomes across different periods or datasets.

Example: A sensor records the same temperature consistently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define validity in the context of data quality.

A

Validity is the accuracy of data — the data correctly measures what it is supposed to measure.

The data correctly measures what it is supposed to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List methods to ensure validity.

A
  • Use accurate measurement instruments
  • Compare with trusted benchmarks
  • Ensure the data collection method fits the research question

Example: A valid customer satisfaction survey question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is accuracy in data quality? w/ example

A

How close the data is to the true or accepted value

Example: GPS coordinates of a store that match exactly with Google Maps = accurate.

Example: GPS coordinates matching exactly with Google Maps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define relevance in data quality w/ example

A

Data must be applicable to the problem you are solving

Example: Using sales data to predict foot traffic in a store is relevant; using social media likes may not be.

Example: Using sales data to predict foot traffic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define completeness in data quality w/ example

A

All necessary data is present

Example: A customer database missing phone numbers = incomplete.

Example: A customer database missing phone numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define timeliness in data quality w/ example

A

Data is up-to-date

Example: Stock prices updated every second are timely; last year’s prices = not timely.

Example: Stock prices updated every second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the purpose of validation rules?

A

Prevent incorrect data entry

Example: Age field only allows 0–120.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of data cleaning?

A

Remove errors, duplicates

Example: Fix missing addresses, remove duplicate entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of data auditing?

A

Review data periodically

Example: Check monthly sales data for anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of automated data collection?

A

Reduce human error

Example: IoT sensors logging temperature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of cross-verification?

A

Compare data from multiple sources

Example: Check customer emails in CRM vs sign-up form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of standardised protocols?

A

Maintain consistency

Example: Use the same units (kg, m, $).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is structured data?

A

Clearly organised (tables, SQL)

Easy to analyse.

17
Q

What is unstructured data?

A

Text, images, videos

Harder to analyse; may lack reliability.

18
Q

What is semi-structured data?

A

JSON, XML — has structure but not as strict as tables

Combines elements of both structured and unstructured data.

19
Q

Give an example of a reliability issue.

A

Temperature sensor gives random spikes

Indicates inconsistency in data.

20
Q

Give an example of a validity issue.

A

Using social media likes as a proxy for customer satisfaction

May not accurately reflect true customer sentiment.

21
Q

Give an example of a completeness issue.

A

Missing transaction IDs in a sales report

Indicates lack of necessary data.

22
Q

Give an example of a timeliness issue.

A

Using last year’s stock prices for today’s trading decisions

Data is outdated.

23
Q

Why does data quality matter?

A

Poor data quality → bad decisions → financial loss or missed opportunities

Reliable and valid data ensures trustworthy analysis.