Backward view TD(lamba) - pseudo
Sarsa(lamba) - pseudo
Gradient MC for estimating v_hat
Semi-gradient TD(0) for estimating v_hat
Semi-gradient n-step for estimating v_hat
Episodic semi-gradient Sarsa for stimating q_hat
MC policy gradient method for estimating pi_theta
QAC
QAC with advantage function