F.4. Data Visualization Flashcards

Master principles of visual discovery and how to interpret and design data visualizations. (26 cards)

1
Q

What is data visualization or data discovery?

A

It is the visual representation of information, used for better understanding data and predictions from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data discovery analysis?

A

The process by which businesses collect data from various sources and analyze it by detecting patterns and trends within datasets and outliers in the data using advanced analytics and visual analysis of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two uses for data visualization?

A
  • Data mining process
  • To communicate information to users of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In which steps of the data mining process is visualization primarily used?

A

In the exploration and cleansing of the data in the preprocessing step and in the data dimension reduction step of the process.

  • In data exploration - helps determine which variables to include in the analysis
  • In data cleansing - helps find erroneous and missing values, duplicate records, and so forth
  • In data reduction - helps determine which categories can be combined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some benefits of data visualization?

A
  • Makes data more understandable
  • Promotes quick assimilation of large amounts of data
  • Helps to identify correlations by illustrating relationships between data items and events
  • Facilitates faster decision-making
  • Communicates business insights
  • Useful in data mining
  • Enables identification of trends and better interpretation of data
  • Helps in recognizing patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some limitations of data visualization?

A
  • May lead to speculative conclusions when used to make projections and estimates
  • Meaningful data may be excluded, leading to biased results
  • Information presented may be oversimplified
  • Users may rely too much on visuals and miss important insights
  • Potential for misrepresentation or distortion of data through choices made in the way it is presented
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are characteristics of a well-designed dashboard?

A
  • All required information is available on a single screen
  • Uses visual components such as charts to highlight information
  • Easy to use with minimal training
  • Allows for further exploration
  • Draws data from multiple systems and combines them into a summarized view
  • Does not perform complex calculations
  • Provides ability to drill down to underlying data
  • Information is refreshed in a timely manner
  • Provides benchmarks for comparison of key performance indicators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some best practices for table and chart design to avoid distortion?

A
  • Don’t omit important data
  • For a time series, ensure sequential and consistent dates on the x-axis
  • Use appropriate scale on the y-axis
  • Begin y-axis scale with zero when practical
  • Use consistent increments on the y-axis
  • Ensure pie chart percentages sum to 100%
  • Keep pie chart items to a minimum
  • Indicate sampling error with confidence intervals when appropriate
  • Provide comparison to external benchmarks when available
  • Use the right chart for the job
  • Consider whether a table might work better than a chart or graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should be considered when using color in charts and graphs?

A
  • Use color meaningfully, not just for decoration
  • Consider needs of viewers with visual disabilities:
    • Ensure colors have sufficient contrast
    • Avoid using red and green together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a scatterplot used for?

A

It is used to show all the values for a dataset, typically when there are two variables, and illustrate the relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a scatterplot reveal about variables?

A

A correlation or a lack of correlation between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a dot plot used for?

A

It is used to visualize summarized data points for each category on the x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the measures of central tendency?

A
  • Mean
  • Median
  • Mode

A measure of central tendency is a value that represents the center point of a set of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define:

Dispersion in a dataset

A

It describes how much individual values in a set of data are scattered or spread out about their center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of a bar chart?

A

It is useful for comparing a statistic across groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a pie chart primarily used for?

A

It is primarily used for showing proportions.

17
Q

What is a line chart used to depict?

A
  • Can be used to visualize several observations for each category, using one line for each series of observations.
  • Can also depict change over time, such as in a time series.
18
Q

What additional dimension does a bubble chart add compared to a scatterplot?

A

A bubble chart replaces data points with bubbles that vary in size according to the size of the values they depict, thus adding the relative sizes of the values plotted as an additional dimension to the chart.

19
Q

What does a histogram show?

A

The frequencies of a variable using a series of vertical bars.

20
Q

What is the difference between a histogram and a bar graph?

A

A bar graph relates two variables to one another, whereas a histogram communicates only one variable, and a histogram is used to depict the frequency distribution of that variable.

21
Q

What does a box plot display?

A

The full distribution of a variable, including the minimum, maximum, mean, median, quartiles, and outliers.

22
Q

What is an outlier in a dataset?

A

Outliers are values that are far away from most of the other values in the dataset.

23
Q

How is the interquartile range (IQR) of a dataset calculated?

A

Third quartile (Q3) minus the first quartile (Q1).

IQR = Q3 − Q1

24
Q

How is an outlier determined in a dataset?

A

An outlier is any value that is either:

  • Less than Q1 − (1.5 × IQR) or
  • Greater than Q3 + (1.5 × IQR)

Q1 and Q3 are the first and third quartiles, respectively, and IQR is the interquartile range calculated as Q3 − Q1.

25
What is the interquartile range (IQR) of a dataset if the first quartile (Q1) is 17.5 and the third quartile (Q3) is 29?
11.5 ## Footnote The interquartile range is calculated as Q3 − Q1, which in this case is 29 − 17.5.
26
# Fill in the blank: For a dataset where the first quartile (Q1) is 17.5 and the third quartile (Q3) is 29, an outlier on the high end of the dataset would be any value greater than \_\_\_\_\_\_\_\_.
46.25 ## Footnote The interquartile range is 29 − 17.5, which equals 11.5. An outlier on the high end of the dataset would be any value greater than (29 + [1.5 × 11.5]), which equals **46.25**.