What are the steps for training a multilayer neural network using back propagation?
What is the process for the forward pass?
What is the mean squared error function?
E(X) = 0.5 * ∑(y_n - t_n )^2
What is the differential of the sigmoid equation?
σ = 1 / 1 + e^-βx
σ = 1 / 1 + e^-βx = (1 + e^-βx)^-1
y = u^-1 u = 1 + e^-βx
dy/du = -u^-2 du/dx = -βe^-βx
σ’ = -βe^-βx / -(1 + e^-βx)^-2
σ’ = βe^-βx / (1 + e^-βx)^-2
1 - σ = 1 - 1 / 1 + e^-βx
1 - σ = (1 + e^-βx - 1 )/ 1 + e^-βx
1 - σ = (e^-βx)/ 1 + e^-βx
σ’ = σ(1 - σ)
What is the differential of the mean squared error function with respect to the output neuron?
E(X) = 0.5 * ∑(σ(a) - t_n )^2 [Image 6]
E(X) = 0.5 * ∑(σ(a) - t_n )^2
y = 0.5 * u^2 u = σ(a) - t_n
dy/du = u du/da = σ(a)(1 - σ(a))
dE/da = σ(a)(1 - σ(a))u
dE/da = σ(1 - σ(a))(σ(a) - t_n)
dE/da = σ(1 - σ(a))(σ(a) - t_n)
What is the differential of the output of the weight-input sum (a) with respect of a output neuron weight (w_n)?
[Image 6]
da/dw_n = w0z0 + w1z1 + … + w_nz_n
da/dw_n = d/dw_n (w_n z_n) = z_n
What is the differential of the error with respect to an output neuron (b)?
Reminder:
> dE/da = σ(1 - σ(a))(σ(a) - t_n)
[Image 6]
da/db = w0z0 + w1z1 + … + w_nz_n
da/db = d/db(w_1σ(b))
da/db = w_1σ(b)(1 - σ(b))
da/db = w_1z_1(1 - z_1)
dE/db= dE/da × da/db
dE/da = σ(a)(1 - σ(a))(σ(a) - t_n)
dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)
What is the differential of the error with respect to an output neuron weight (v_n)?
Reminder:
> dE/db = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)
[Image 6]
db/dv_n = d/dw_n (v_n x_n) = x_n
dE/dv_n = dE/db × db/dv_n
dE/dv_n = σ(a)(1 - σ(a))(σ(a) - t_n)w_1z_1(1 - z_1)x_n
For this example, what is the range differential of the error with respect to a_k?
[Image 7]
E(X) = 0.5 * ∑(z_k - t_n )^2 E(X) = 0.5 * ∑(σ(a_k) - t_n )^2
y = 0.5 * u^2 u = σ(a_k) - t_n
dy/du = u du/da_k = σ(a_k)(1 - σ(a_k))
dE/da_k = σ(a_k)(1 - σ(a_k))u
dE/da_k = σ(a_k)(1 - σ(a_k))(σ(a) - t_n)
dE/da_k = σ(a_k)(1 - σ(a))(σ(a) - t_n)
What is the symbol for the differential of the error with respect to a_k?
δ_k
For this example, what is the range differential of the error with respect to w_jk?
[Image 7]
dE/dw_jk = dE/da_k × da_k/w_jk
da_k/w_jk = d/dw_jk (w_0kz_0 + w_1kz_1 + ... + w_nkz_j +) da_k/w_jk = z_j
dE/da_k = δ_k
dE/dw_jk = δ_kz_j
How do you apply gradient descent to the output layer?
[Image 7]
w_jk+1 = w_jk - ηδ_kz_j
How do weights propagate through the hidden layers?
It passes through multiple neurons which connect to each other. It does not propagate through one path instead through multiple paths.
What is the equation for the error with respect to a_j?
[Image 7]
dE/da_j = ∑_k dE/da_k × da_k/da_j
dE/da_j = ∑_k δ_k × da_k/da_j
da_k/da_j = d/(da_j ) ∑_j (w_jk σ(a_j))
da_k/da_j = ∑_j w_jk σ(a_j )(1-σ(a_j ))
da_k/da_j = ∑_j w_jkz_j(1-z_j)
dE/da_j = ∑_j w_jkz_j(1-z_j)δ_k
What is the equation for the error with respect to weights w_ij?
[Image 7]
dE/dw_ij = dE/da_j × da_j/dw_ij
da_j/dw_ij = d/dw_ij (∑_i (w_ij z_i))
da_j/dw_ij = z_i
dE/dw_ij = δjzi
How do you apply gradient descent to the hidden layer?
[Image 7]
w_ij+1 = w_ij - ηδ_kz_i
Why might the output value change every time the AI is trained?
Because of the AI finding local minma
What should the weights of the neural network be set to initially?
> Set randomly
> Close to 0