Define mean.
The average of a set of numbers, calculated by dividing the sum by the count.
What function calculates the mean in R?
The function is mean(). It computes the average of numeric values.
True or false: Median is always the same as the mean.
FALSE
The median is the middle value, which can differ from the mean.
Fill in the blank: The standard deviation measures _______ in a dataset.
Variability or dispersion
What does the summary() function do in R?
It provides a summary of the statistical measures for an object, like mean and quartiles.
Define outlier.
A data point that significantly differs from other observations in a dataset.
What is the purpose of the boxplot() function?
To visualize the distribution of data and identify outliers.
True or false: A p-value less than 0.05 indicates statistical significance.
TRUE
It suggests strong evidence against the null hypothesis.
Fill in the blank: A normal distribution is also known as a _______ distribution.
Gaussian
What does the cor() function compute?
It calculates the correlation coefficient between two variables.
Define regression analysis.
A statistical method for modeling the relationship between a dependent and one or more independent variables.
What is the purpose of the lm() function in R?
To fit linear models for regression analysis.
True or false: Histograms display frequency distributions of continuous data.
TRUE
They show how data is distributed across different ranges.
Fill in the blank: The t-test compares means between _______ groups.
Two
What does the ggplot2 package do?
It provides a system for creating complex graphics based on the Grammar of Graphics.
Define confidence interval.
A range of values derived from sample data that is likely to contain the population parameter.
What is the purpose of the shapiro.test() function?
To test the normality of a dataset using the Shapiro-Wilk test.
True or false: ANOVA is used to compare means across multiple groups.
TRUE
It stands for Analysis of Variance.
Fill in the blank: Data frames in R are similar to _______ in Excel.
Tables
What is the dplyr package used for?
For data manipulation and transformation in R.
Define variable.
A characteristic or attribute that can take on different values.
What does the tidyverse include?
A collection of R packages designed for data science, including ggplot2 and dplyr.
True or false: Factor variables are used for categorical data in R.
TRUE
They help in statistical modeling and plotting.
Fill in the blank: The plot() function creates a _______ of data points.
Scatter plot