What is a general definition of a graph?
Visual representation of a relationship between two or three variables (and more sometimes) where variables can be of any type (categorical or numerical)
What is the example that measures the influence of stress on grasshoppers: stress measured by serotonin levels?
What is jittering in graphs?
Jittering is a data visualization technique where small random variations are added to data points in a scatterplot or other chart to prevent overlapping, making patterns, clusters, and trends easier to identify.
What are 7 reasons why to use graphs?
1) Powerful way of summarizing data that is easy to read (i.e., quick and direct).
2) Highlight the most important information (i.e., facilitate communication).
3) Facilitate (summarize) data understanding.
4) Help convince others.
5) Easy to remember (general trends).
6) Aid in detecting unusual features in data.
7) Tell stories.
What does the space between bars on a bar graph represent?
Known numerical variable → it automatically tells us the type of variable!
In the example that looks at the activities of people at the time they were killed by tigers, what were the 2 variable types?
1) Activity - categorical = do not have magnitude on a numerical scale
2) Frequency - numerical (discrete) = have magnitude on a numerical scale
Why should you avoid pie charts?
Avoid pie charts because when you get too many categories, you won’t be able to see small variations between data
What are the 2 types of categorical variables in bar graphs?
1) Response variable: Ex = malaria susceptibility → This variable is responding to the explanatory variable
2) Explanatory variable: Ex = Egg removal → Variable that is controlled
In the following example, what is the best way to display the data?: Example: Is reproduction risky to health?
Mosaic graph: where the relative frequencies of female birds with and without malaria in control are layered on top of each other and same with the egg removal treatment
Why should you start at 0 on a bar graph?
To ensure that the length/pattern on the graph properly represents the data
What is an experimental study?
Researcher randomly assigns observational units (birds) to different groups (often called treatments), i.e., they control the treatments
Explanatory and response variables: from the malaria experiment
When conducting an experiment (e.g., malaria study in the last slides), the treatment variable (the one manipulated by the researcher) is the explanatory variable, and the measured effect of the treatment is the response variable.
In the following experiment, where a dose of a toxin was given and mortality rates were measured, which variable would be explanatory and which would be the response variable?
The administered dose of a toxin in a toxicology experiment would be the explanatory variable, and organism mortality would be the response variable.
What may the “assumed” explanatory power depend on?
“Assumed” explanatory power may depend on the type of study: [1] experimental versus [2] observational studies
What is an observational study?
Observational study: Researchers have no control over which observational units fall into which treatment or values of the explanatory variable. Examples:
What is an example of an observational study that must remain observational due to ethical issues?
Ex: Studies on the health consequences of cigarette smoking in humans (unethical to assign smoking and no-smoking treatments to observational units, i.e., people). This is unethical and thus cannot be done in the lab.
What is an example of an observational study that must remain observational due to location issues?
Ex: Growth of fish in warm versus cold lakes (observational units, i.e., fish are already in lakes; the research has no control on which fish goes in which lake).
What occurs when neither variable is manipulated by the researcher (provide tv example)?
Where are on the axis of a graph would independent versus dependent variables (= explanatory versus response variables), go?
The explanatory variable goes in the X-axis and the expected response variable goes in the Y-axis
What is a scatter plot?
Graphical display of 2 numerical variables in which each observation is represented as a point on a graph with 2 (or 3) axes.
What are line graphs?
Uses dots connected by line segments to display trends in a measurement over time or other ordered states (e.g., size, etc)
What are 2 mistakes of data visualization?
1) Mistake 1: Hide data
2) Mistake 2: Making Patterns Hard to See
What 2 things makes patterns hard to see (mistake 2 of data visualization)?
1) Nonsensical Order Hides Patterns: Sensible ordering of factors make patterns more evident (i.e., arrange by mean for nominal.
2) Bad Axis-Limits Hide Patterns: In this plot, the large scale (limits of the Y-axis) hides the pattern.