Value Based Decision Making II Flashcards

(48 cards)

1
Q

Describe stages of value based decision making = representation and valuation

A

Encoding of value as a common currency (align on same axis) in vmpfc/ofc
Representation and valuation - which area responsible for this, represent state of world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what about changes in value across different directions = 2 things measured

A

Desirability = current worth of available options - what we measured with the common currency
Availability = chance that chosen items obtained or goal of action realized
- distinguish between the 2 - processed by diff areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

patient with vmpfc lesion has suboptimal performance in Iowa task = describe overall

A

Broad extensive Pfc lesion - but found specific deficits - high risk deck the whole time
And not able to compute value so cannot switch
Perseverate in choosing risk decks even tho have negative average returns in long run
Simailr to Wisconsin (inability to change their behaviour with new info) but now = domain diff = estimating probabilistic rewards rather than rules
Can we be more precise = decompose which type of value is encoded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

prefrontal areas

A

Ventral lateral frontal cortex
Obitoforntal cortex
Regions in vmpfc
Compare availability and desirability
Lesions in patients rarely localized
Experimental lesions in macaque can test more finely the contribution of diff Pfc areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

definition vmpfc

A

Sometimes encompasses parts of ofc - lateral parts area 11/13
Large, encompasses many areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

describe 3 arm bandit task

A

3 obejcts - each associated with probability of obtaining reward
Probability of obtaining reward changes slowly within blocks (not completely stable)
Abrupt changes in the probabilities across blocks
Monkeys need to evaluate options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what do you have to do to do 3 arm bandit task

A

Have to integrate info across history = user reward history to build value
Have to reexplore and figure out which one is best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why is location of obejcts changed on 3 arm bandit task

A

Controlling for things we aren’t testing for
Make sure nto assigning value tot location = control for location bias
Identify of obejct most important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

describe what happens in bandit task when ofc lesion

A

Does nto affect behaviour when bilateral excitotoxic lesion
Measure probability of choosing high reward option - when switch probabilities
In control subjects = tacks changing reward probabilities= choose it with same probability of reward
No changes in behaviour wirh ofc lesion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

describe what happens in bandit task when vlpfc lesion

A

Bilateral excitotoxic lesion
= changes behaviour, big deficit in task
Subjects with vlpfc lesions have deficit in tracking dynamics of stimulus outcome contingencies
Able to track initially (lagging a bit tho) but then cannot update choices to pick new high reward option
Choices are less affected by recent outcomes leading to deficit in tracking reward contingencies
= cannot integrate recent outcomes to build expected value fo world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

specificity of value properties affected by lesions across areas = conclsuions of when lesion ofc and vlpfc 3 arm bandit task exp

A

Ofc lesion does not affect behaviour in bandit task
Vlpfc lesion impairs ability to track dynamic stimulus outcome contingencies = availability of reward - ability to evaluate if obejct reachable- possible signature of a required contribution of vlpfc to compute and act upon availability component of value = assessing probability it is present
Now = try to find one that does affect when ofc lesion - double dissociation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

describe devaluation task

A

Measure changes in desirability of a reward
60 training trails per session, 30/food reward= train
In test phase = first have access to one reward type - consume untill satiety = eat berries untill no longer wanted
Then tested = 30 trails on preference between the 2 reward types
Devalued berries - value decreased, now don’t want, changed preference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what happens in devaluation task when lesion ofc

A

Measure proportion of choices shifted away from high value (non satiated) option comapired to baseline condition without satiation
Inability to adapt following ofc lesion= monkeys with ofc lesion made fewer adaptive choices- unable to reevaluate value of options after they ahve changed - can’t make devaluation choice = cannot update value based on internal state = don’t feel devaluation and can’t switch
Possible signature of a required contribution of ofc to compute and act upon desirability compoennt of value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

double dissociation function = gen

A

2 tasks measure diff areas and diff aspects value = these functions specific o that areas - bc deciding specific to 1 task
Double dissociation of function between ofc and vlpfc - selection and independent contributions to diff types of value learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

double dissociation function = ofc

A

Signature of a required contribution of ofc to compute and act upon desirability competent of value - what is subjective value of one unit of reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

double dissociation function = vlpfc

A

Signature of a required contribution of vlpfc to compute and act upon availability components of value - what is probability of obtaining reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

subdivsions ofc

A

Ofc divided into medial ofc and lateral ofc based on anatomical and connectivity info
= use tasks and targeted lesions to start probing at finer level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 arm bandit task adapted version

A

Reward probability of best and second best options are very close and relatively stable during the task
- 2 options very close to each other = have to compare relative values - small differences in values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what happens when lesion lateral ofc - adapted 3 arm bandit task

A

No behavioural deficits in task following lateral ofc lesion - chose best option with same probality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what happens when lesion medial ofc - adapted 3 arm bandit task

A

Behavioural deficits in task following medial ofc lesion made= lose ability to make distinction between options
critical to make fine judgement options - fine discrimination
Mofc deficits rescued in sessions with larger differences across options = suggests that mofc critical for comparing available options in order to guide choices

21
Q

describe devaluation task =lesion lateral and medial ofc

A

Look at how devaluation changes over time
More precise lesion loazlition to understand contributions of subdivisions of ofc
Control animals ahve robust changes in behaviour after devaluation - time course of behaviour changes, keeps picking one they haven’t had much of

22
Q

describe devaluation task =lesion lateral ofc results

A

Long term deficits in behaviour and do not adapt to changes in value due to satiation
Picks both options equally

23
Q

describe devaluation task =lesion medial ofc results

A

More transient
Initially strong deficits in behaviour but those recover (pick non devalued option) back to baseline from second post op test
Time course = imrpoant, can recover for medial ofc lesion

24
Q

muscimol devaluation task = set up

A

Muscimol inactaivtions = allow for a temporally and spatially defined lesion - compare effect of lesion during devaluation (when eating berries) process or after devaluation process (when choose)
Is it a deficit in creating value or in being able to act on it?
Certain areas/circuits can be needed for learning/udpating value but not for action selection
Dissociation between building value representation and using representation for action

25
muscimol devaluation task = lesion area 13
Deficits in behaviour is area 13 lesionned during devaluation - deficit in updating value Strong impairment - not shifting behaviour If lesion while eating berries
26
muscimol devaluation task = lesion area 11
Deficit in behaviour of area 11 if lesionned after devaluation = deficit in goal selection When in test phase, cannot use value to switch behaviour easily
27
define desirability
Current worth of available options - what we measured with common currency - domain general
28
define lateral ofc
Representations of value important for learning and value updating
29
define medial ofc
Needed for choice based for value comparison Transforms value representation into common currency - making comparing between values
30
define availability
Chance that chosen items are obtained or goal of action is realized - vlpfc
31
define vlpfc
Estimation of availability of the different options What is chance options can be obtained
32
action Selection
Chose what we want - nto always choosing best one = might want to explore world Saw dissociation between building valuation and using valuation for action Basal ganglia and striatum Involved in bias - one way or another, compute valye of action, biases choices by changing specific components of values
33
describe the 2 pathways in striatum
Direct - activation promotes action by selection of the intended motor program - neurons more active = promotes action = accelerator Indirect - activation of indirect pathway suppresses competing motor programs, suppresses what is encoding, break/inhibtit/repress
34
value depends on reward history = mice exp set up
Simailr to bandit tasks Value of an option depends on the reward history One port not rewarded Other port rewarded 75% of time Contingencies change during task so mice have to evaluate whether port location changed or if its an unreewarded trial Integrate over history - which one is best, reward most of time
35
value depends on reward history = mice exp general results
Compute reward history that leads mouse to get reward Mice persevere in port following rewarded trials Tend to switch preference after 2 unrewarded trails Use reward history to assess whether unrewarded trial is one of the 25% unrewarded trials or due to change in the block
36
value depends on reward history = mice model - can infer action value on each trial
Value nto given by stimulus but have to be inferred based on trail history Build computation model = how much reward in past influences current estimate fo value - estimate relative contribution of previous rewarded and unrewarded trails as a function of their recency Approach allows to estimate for each trial an action value = simailr to decisions variable in perceptual decision making Asses relative value - don’t always pick best one, graded shift in variability Base action on reward history = what is most likely rewarded port
37
why do mice nto always pick most valuable option
Bias - might be biased, Exploration - task probabilistic, need to explore to see where reward is Variability - task variable Laziness - hard to know if animal doing task, if they are trying, at extremes when stimulus easy - pick one side vs other = good sign they are actually doing task
38
describe micro stimulation, can we do this exp in mice in striatum
Micro stimulation in Mt leverages anatomy - add value in one part and see if changes behaviour - only bc neurons organized like that But not same thing in d1/d2 = neurons integrated, mixed, cannot stimulate one specifically
39
optogenetic manipulation of neural activity= gen
insert light sensitive ion channel - gene coding for a protein so it can be inserted in a specified subset of cells = Shine light = activate or inhibit neurons Allows for targeted manipulation of a genetically defined subset of cells in a temporally precise ms manner - very precise
40
optogenetic manipulation of neural activity= in striatum
By light = can actiavte one specific cell type even tho not anatomically segregated Can provide laser pulses to 1 hemisphere only Activate independently each of the pathways to understand they contribution to action selection = see how it affects behaviour
41
simplified model of action selection
Right striatum dms - d1 direct = increase left action value, d2 indirect = inhibit left action value Left striatum dms - d1 direct increase right action value, d2 indirect = inhibit right action value Define as relative av = left av-right av Define positive action value as left av higher than right av - psychometric curves will therefore have p(left) increasing with higher relative action value
42
optigenetic activation during behavioural task
When go in center port and when lights are on = optical stimulation during specific time on 6% of trials Optical stimulation is done on a subset of the trials to actiavte specific sub population - d1 or d2 in one hemisphere Can manipulate each of the 4 pathways independently to test this model of action selection - one pathway specifically Bias behaviour in crude sense, not super precise
43
biasing Of behaviour with pathway specific manipulation = stimulate left hemisphere d1
Direct pathway = leads to biasing towards contralateral side - more towards right
44
biasing Of behaviour with pathway specific manipulation = stimulate left hemisphere d2
Indirect pathway = biasing towards ipsilateral side More towards left
45
changing bias - measuring effect of perturbation
Do same in sensory decision task Induce increase in perceived stimuli’s = means subjective effect that is negative the other way Provides negative - now for value of environment to be + to perceive 50/50 Smelting happening with manipulation that provides you with negative info
46
biasing choices with increasing stimulation = first effects
bias pathways and looks t how behaviour changes Compute relative action value = left av-right av, based on reward history and plot probability of choosing left side Inhibit left action value = shift to right, something gives negative action value, decrease relative action value Stimulation d2 neurons right dms = decreases relative action value - decreases left action value and therefore subjective relative action value With stimulation = need positive relative av to observed p(left choice)=0.5 More precise control with optogenetics
47
biasing choices with increasing stimuluaiton = as increase
Bias increase = increases with more stimaultion =more powerful Amount of bias is correlated with intensity of stimulation Faster stimulation frequency leads to larger bias in action value Graded effect as increase strength of stimulation More it reduces left action value - now only if reward every time, strong evidence model is correct = manipulation quantitatively affects behaviour
48
comparing reward history and stimulation contributions = overall, describe conclsuions of mdoel
Fit model and can assign subjective value stimulation value = put number on it Can be compared to equivalent added motion coherence Model can provide estimate of equivalent change in action value given a stimaultion level Graded effect that can be controlled experimentally strengthens evidence for model= quantified as equiv amount subjective value Downstream of action building = build based on reward history and pick which to act on - can bias specific and precise, bias behaviour By activating one of the output platforms, picking which to act on given representation