3 types of adaptive processes
Evolutionary robots…
Genotypes
numbers that describe the phenotype of the evolutionary robot
Genotypes can be…:
A. discrete numbers between [0, 1]
B. continuous numbers between [0, 1]
C. both A and B
D. neither A nor B
C. both A and B
T/F: Different parts of the genotype can describe different parts of the phenotype of the robot
True
An evolutionary algorithm can be split into two main phases…
3 methods to create off-springs (in the context of evolutionary algorithms)…
Match the off-spring method (in the context of evolutionary algorithms) to its description…
A. genetic algorithm
B. evolutionary strategy
C. modern evolutionary strategy
A-2
B-1
C-3
Advantages of evolutionary algorithms
If one component is weaker, the others can compensate!
Disadvantages of evolutionary algorithms
T/F: in RL, only the off-spring “carries” the improvement
False, that’s the case generally for evolutionary algorithms. In RL, the robot improves itself continuously given a reward
Policy
explains to the robot which action to take in a certain state
Is the policy “ideal” before the robot starts exploring?
No, the robot updates the policy as it explores the environment
Two types of policies
2. stochastic
Gaussian policy is an example of…
A. deterministic policy
B. stochastic policy
B. stochastic policy
In a deterministic policy…
the robot has n categories that it can use / pick from
In a stochastic policy…
there is a condition on execution: when the robot is in one given field, there is i.e., 6 % chance that the robot will move to the right, 2 % it will move to the left, 1% it will move up and 1% it will move down.
T/F: A gaussian policy is used when the action space is continuous
True
Finite-horizon undiscounted return
summing all the rewards and picking the policy which corresponds to the highest sum
Infinite-horizon discounted reward
summing all rewards and subtracting all steps taken. if the result is negative, it means that we took more steps than the total reward, which is bad; if the result is positive, it means that we got a high total result with little number of steps, which is what we want
Q-learning
Monte-carlo approach
class of computational algorithms that rely on repeated random sampling to obtain numerical results.
T/F: the “learning by demonstration” adaptation method has 3 main phases.
If TRUE, name them
If FALSE, give the exact number
True;
demonstration phase; training phase; testing phase
In the context of learning by demonstration > > in the demonstration phase…
…Match the following types of demonstrations with their description
A. kinesthetic teaching
B. tele-operated teaching
C. direct imitation of human behavior
C-1
A-3
B-2