What are the issues with grid search?
why do we maximize LOG likelihood instead of just likelihood?
The likelihood is the product of many numbers between 0 and 1. For large datasets, eventually this number will get rounded down to zero (numerical underflow)
4 steps for maximum likelihood
Step 1: Formulate a model that predicts probabilites of all possible outcomes as function of parameters
Step 2: Calculate the probability of each observation given parameters
Step 3: The product of the probability of all observations is the Likelihood
Step 4: Search/Solve for parameters that maximize Likelihood
What is the difference in how we fit linear and non-linear models to data?
Name four types of models used in behavioral sciences
What is a poisson distribution?
When (and when not) would you use a poisson distribution and why?
Process-based/computational models of behavior
General framework for fitting/analyzing nearly all models
What do we use instead of OLS (in this course) and why?
Likelihood. reason: OLS does not work for all types of data (e.g non-normally distributed, binary)
What is likelihood
Likelihood is the joint probability/ probability density of the data given a set of parameter values
* In other words “the probability of the data given the parameter values”
* When errors of data points are independent, the joint probability of the data is the product of the probabilities/probability densities of all observations
PMF AND PDF
PMF: probability mass function – for discrete probability distributions (gives probability of
observations as function of parameters)
PDF: probability density function - for continuous probability distributions (gives probability density of observations as function of parameters)
What is the problem we are trying to solve by optimization methods?
Non-linear models,
General problem: we want to find the parmeters that maximize the log likelihood, but don’t know what the likelihood surface looks like, we can only evaluate the Likelihood one parameter combination at a time
2 types of optimization methods
Gradient-based methods (e.g newtons method, gradient descent), Gradient free methods (Nelder-mead simplex)
Nelder-mead simplex
How does gradient descent find the best parameters likelihood
How do we avoid local minima
Pros and cons of grid search, gradient-based and gradient free optimization methods
Grid search
Pros:
*Easy
*unlikeliy to miss global minima if grid set appropriatly
Cons:
*Very slow for high dimensional problems
*Only as precise as the grid
Gradient-based (e.g. Newton’s Method)
* Pros:
Fast - Converge much faster to minima
Cons:
Easily caught in global minima if they exist
Gradient free (e.g. Nelder Mead)
Pros:
Works models of intermediate complexity
Faster than grid search
Cons:
Slower to converge than gradient methods
Can still get caught in local minim
What question are we trying to answer with parameter recoverability and what steps does it involve?
if this cognitive process/behavior works like I think it works, will my
experiment provide sufficient information to recover parameters
with the desired precision and without bias?
Steps of Parameter recoverability
1. Use your model and known parameter values to generate a synthetic data set
2. Simulate the experiment you plan to conduct (# of replicates, ect.)
3. Fit your model to the simulated data set
4. Compare the true and fitted parameter values
5. Repeat many times and evaluate the distribution of fitted parameter estimates compared to the true value that generated the dataset (to estimate precision and bias
Why must we use probability density instead of probability for continous distributions
3 continuous distributions other than gaussian (normal)
Probability density function
What type of distribution is usually used to model RTs?
Standard error and CIs (in normal distribution)