What is data visualization or data discovery?
It is the visual representation of information, used for better understanding data and predictions from data.
What is data discovery analysis?
The process by which businesses collect data from various sources and analyze it by detecting patterns and trends within datasets and outliers in the data using advanced analytics and visual analysis of the data.
What are the two uses for data visualization?
In which steps of the data mining process is visualization primarily used?
In the exploration and cleansing of the data in the preprocessing step and in the data dimension reduction step of the process.
What are some benefits of data visualization?
What are some limitations of data visualization?
What are characteristics of a well-designed dashboard?
What are some best practices for table and chart design to avoid distortion?
What should be considered when using color in charts and graphs?
What is a scatterplot used for?
It is used to show all the values for a dataset, typically when there are two variables, and illustrate the relationship between the variables.
What does a scatterplot reveal about variables?
A correlation or a lack of correlation between variables.
What is a dot plot used for?
It is used to visualize summarized data points for each category on the x-axis.
What are the measures of central tendency?
A measure of central tendency is a value that represents the center point of a set of data.
Define:
Dispersion in a dataset
It describes how much individual values in a set of data are scattered or spread out about their center.
What is the purpose of a bar chart?
It is useful for comparing a statistic across groups.
What is a pie chart primarily used for?
It is primarily used for showing proportions.
What is a line chart used to depict?
What additional dimension does a bubble chart add compared to a scatterplot?
A bubble chart replaces data points with bubbles that vary in size according to the size of the values they depict, thus adding the relative sizes of the values plotted as an additional dimension to the chart.
What does a histogram show?
The frequencies of a variable using a series of vertical bars.
What is the difference between a histogram and a bar graph?
A bar graph relates two variables to one another, whereas a histogram communicates only one variable, and a histogram is used to depict the frequency distribution of that variable.
What does a box plot display?
The full distribution of a variable, including the minimum, maximum, mean, median, quartiles, and outliers.
What is an outlier in a dataset?
Outliers are values that are far away from most of the other values in the dataset.
How is the interquartile range (IQR) of a dataset calculated?
Third quartile (Q3) minus the first quartile (Q1).
IQR = Q3 − Q1
How is an outlier determined in a dataset?
An outlier is any value that is either:
Q1 and Q3 are the first and third quartiles, respectively, and IQR is the interquartile range calculated as Q3 − Q1.