What does b' mean when reconstructing an autoencdoer? - autoencoder

Im just learning AutoEncoder, and I was reading from this website: http://deeplearning.net/tutorial/dA.html
and I didn't get what b' means.
So I tried to build autoencoder with tied weights.
For example,
Encoding : hid = s(x*w+b)
x : ( 1000, 2000 )
w : ( 2000, 500 )
b : ( 500, 1 )
-> hid : ( 1000 , 500 )
And when I decode encoded data,
decode : y=s(hid*w'+b')
w' : (500, 2000)
so hid*w' will be (1000,2000)
and I have to add b', and its shape should be (2000,1) but I only have b with shape of ( 500, 1).
What have I done wrong here?
I found some code, and they just randomly made bais with shape of number of inputs, i.e., (2000,1) and optimized it.

I just found my answer, and I post it for one's convenience.
When decoding AE, you may want to make a bias vector which has a shape of (n_input).
So in my example, I have to make a new bias vector b* : (2000,1) and use this b* to reconstruct inputs and optimize weights and b* together

Related

Solving the Linear Regression Model using QR Decomposition (MATLAB)

Background : I want to implement a MATLAB algorithm that takes as input vectors x and y, solves the linear regression problem
associated with the data stored in x and y using a modified QR version and then to plot the graph of the linear function.
So first I wrote the modified QR algorithm :
function x=QRQ(A,b,n)
[Q1,R1]=qr(A);
c1=Q1'*b;
n=length(c1);
x=backward(R1,c1,n);
end
function x=backward(U,y,n)
x=zeros(n,1);
x(n)=y(n)/U(n,n);
for i=n -1 : -1 : 1
x(i)=(y(i)-U(i,i+1 : n)*x(i+1 : n))/U(i,i);
end
end
Then I wrote the algorithm for the linear regression :
function ysol = LinearReg(x,y)
A=[x ones(21,1)];
z=QRQ(A,y,2);
ysol=z(1)*x+z(2);
plot(x,y,'bo',x,ysol,'g-');
end
I tried to run this algorithm on the following data :
x=[0;0.25;0.5;0.75;1;1.25;1.5;1.75;2;2.25;2.5;2.75;3;3.25;3.5;3.75;4;4.25;4.5;4.75;5];
y=[4;3;7;7;1;4;4;6;7;7;2;6;6;1;1;4;9;3;5;2;7];
The full error message that I received is :
Index in position 2 exceeds array bounds (must not exceed 2).
Error in untitled>backward (line 12)
x(n)=y(n)/U(n,n);
Error in untitled>QRQ (line 8)
x=backward(R1,c1,n);
Error in untitled>LinearReg (line 20)
z=QRQ(A,y,2);
The line causing the error is x(n)=y(n)/U(n,n);
the only variable with an index in position 2 is U. Apparently U only has 2 columns, and n is a value >2, hence the error.
Using the debugger, I see that U is a 21x2 array, and n has a value of 21.
How can this MATLAB algorithm be fixed?
The U in your case is R1. Since the matrix A has rank two, R1 will only have two columns since A already only has two columns.
You then try to solve the system R1 * x = y using backward substitution with the index starting at n, but here you clearly have to start at 2.
(Keep in mind that R1 is an upper triangular matrix.)

Using Julia Flux to build a simple neural network

I have a dataset of images (https://www.kaggle.com/iarunava/cell-images-for-detecting-malaria), and I want to use a neural network to know if one picture is a uninfected cell or not.
So I arranged my data to get 4 variables :
X_tests, Y_tests, X_training, Y_training
Each of these variable is of type Array{Array{Float64,1},1}
And I have a function to build a simple neural network (that comes from an example https://smist08.wordpress.com/2018/09/24/julia-flux-for-machine-learning/):
function simple_nn(X_tests, Y_tests, X_training, Y_training)
input = 100*100*3
hl1 = 32
m = Chain(
Dense(input, 32, relu),
Dense(32, 2),
softmax) |> gpu
loss(x, y) = crossentropy(m(x), y)
accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))
dataset = [(X_training,Y_training)]
evalcb = () -> #show(loss(X_training, Y_training))
opt = ADAM(params(m))
Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))
println("acc X,Y ", accuracy(X_training, Y_training))
println("acc tX, tY ", accuracy(X_tests, Y_tests))
end
And I get this error after executing simple_nn(X_tests, Y_tests, X_training, Y_training) :
ERROR: DimensionMismatch("matrix A has dimensions (32,30000), vector B has length 2668")
...
The error is on this line : Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))
I don't know what the functions are doing, what argument they take, what they are returning and I can't find any documentation on the internet. I can only find examples.
So I have two questions : How can I make this work for my dataset? And Is there a documentation for Flux functions, like for sklearn? (like this for example : https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)
Can you provide a self-contained MWE? I think your X_training is not of dimension 3*100*100x something, and it is in fact 2688 by something.
Your first layer is Dense(input, 32, relu) and input is 3*100*100, so it's expect an input where one of dimension is 3*100*100 which you don't satisfy.
Maybe try to replace
dataset = [(X_training,Y_training)]
with
dataset = zip(X_training,Y_training)
zip actually pairs the training data 1 of X with 1 of Y and thus turns a tuple of vectors into a vector of tuples. I would guess that your training data has 2688 samples?

Backpropagation formula seems to be unimplementable as is

I've been working on getting some proficiency on backpropagation, and have run across the standard mathematical formula for doing this. I implemented a solution which seemed to work properly (and passed the relevant test with flying colours).
However ... the actual solution (implemented in MATLAB, and using vectorization) is at odds with the formula in two important respects.
The formula looks like this:
delta-layer2 = (Theta-layer2 transpose) x delta-layer3 dot x gprime(-- not important right now)
The working code looks like this:
% d3 is delta3, d2 is delta2, Theta2 is minus the bias column
% dimensions: d3--[5000x10], d2--[5000x25], Theta2--[10x25]
d3 = (a3 - y2);
d2 = (d3 * Theta2) .* gPrime(z2);
I can't reconcile what I implemented with the mathematical formula, on two counts:
The working code reverses the terms in the first part of the expression;
The working code does not transpose Theta-layer2, but the formula does.
How can this be? The dimensions of the individual matrices don't seem to allow for any other working combination.
Josh
This isn't a wrong question, I don't not why those downvotes; the implementation of a backpropagation algorithm is not intuitive as it appears. I'm not so great in math and I've never used MATLAB ( usually c ), so I avoided to answer this question first, but it deserve it.
First of all we have to do some simplifications.
1° we will use only a in_Data set so: vector in_Data[N] ( in the case below N = 2 ) ( If we succeed whit only a pat is not difficult extend it in a matrix ).
2° we will use this structure: 2 I, 2 H, 2 O ( I we succeed whit this; we will succeed with all ) this Network ( that I've stolen from: this blog )
Let's start: we know that for update the weights:
note: M=num_pattern, but we have previous declare in_data as vector, so you can delete the sum in the formula above and the matrix in the formula below. So this is your new formula:
we will study 2 connections: w1 and w5. Let's write the derivative:
let's code them: ( I really don't know MATLAB so I'll write a pseudocode )
vector d[num_connections+num_output_neurons] // num derivatives = num connections whitout count bias there are 8 connections. ; +2 derivative of O)
vector z[num_neurons] // z is the output of each neuron.
vector w[num_connections] // Yes a Vector! we have previous removed matrix and the sum.
// O layer
d[10] = (a[O1] - y[O1]); // Start from last to calculate the error.
d[9] = (a[O2] - y[O2]);
// H -> O layer
for i=5; i<=8; i++ ( Hidden to Out layer connections){
d[i] = (d)*g_prime(z[i])
}
// I -> H layer
for i=1; i<=8 i++ (Input to Hidden layer connections){
for j=1; i<=num_connection_from_neuron i++ (Take for example d[1] it depends on how many connections have H1 versus Outputs){
d[i] = d1 + (d[j+4]*w[j+4] )
}
d[i] = d[i]*g_prime(z[i]);
}
If you need to extend it in a Matrix write it in a comment that I'll extend the code.
And so you have found all the derivatives. Maybe this is not what you are exactly searching. I'm not even sure that all what I wrote is correct (I hope it is) I will try to code backpropagation in these days so I will able to correct errors if there are. I hope this will be a little helpful; better than nothing.
Best regards, Marco.

Passing Individual Channels of Tensors to Layers in Keras

I am trying to emulate something equivalent to a SeparableConvolution2D layer for the theano backend (it already exists for the TensorFlow backend). As the first step What I need to do is pass ONE channel from a tensor into the next layer. So say I have a 2D convolution layer called conv1 with 16 filters which produces an output with shape: (batch_size, 16, height, width) I need to select the subtensor with shape (: , 0, : , : ) and pass it to the next layer. Simple enough right?
This is my code:
from keras import backend as K
image_input = Input(batch_shape = (batch_size, 1, height, width ), name = 'image_input' )
conv1 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(image_input)
conv2_input = K.reshape(conv1[:,0,:,:] , (batch_size, 1, height, width))
conv2 = Convolution2D(16, 3, 3, name='conv1', activation = 'relu')(conv2_input)
This throws:
Exception: You tried to call layer "conv1". This layer has no information about its expected input shape, and thus cannot be built. You can build it manually via: layer.build(batch_input_shape)
Why does the layer not have the required shape information? I'm using reshape from the theano backend. Is this the right way of passing individual channels to the next layer?
I asked this question on the keras-user group and I got an answer there:
https://groups.google.com/forum/#!topic/keras-users/bbQ5CbVXT1E
Quoting it:
You need to use a lambda layer, like: Lambda(x: x[:, 0:1, :, :], output_shape=lambda x: (x[0], 1, x[2], x[3]))
Note that such a manual implementation of a separable convolution would be horribly inefficient. The correct solution is to use the TensorFlow backend.

Optimization with Unknown Number of Variables

Since the original problem is more complicated, the idea is described using a simple example below.
For example, suppose we want to put several router antennas somewhere in a room so that the cellphone get most signal strength on the table (received power > Pmax) while weakest signal strength on bed (received power < Pmin). What is the best (minimum) number of antennas that should be used, and where should they be placed, in order to achieve the goal.
Mathematically,
SIGNAL_STRENGTH is dependent on variable (x, y, z) and the number
of variables
. i.e. location and number of antennas.
Besides, assume
PREDICTION = f((x1, y1, z1), (x2, y2, z2), ... (xi, yi, zi), ... (xn,
yn, zn))
where n and (xi, yi, zi) are to be optimized. The goal is to minimize
cost function = ||SIGNAL_STRENGTH - PREDICTION||
I tried to use GA with mixed integer programming in Matlab to implement that. Two optimization functions are used, outer function is to optimize n, and inner optimization function optimizes (x, y, z) with given n. This method works slow and I haven't seen one result given by this method so far.
Does anyone have a more efficient way to solve this problem? Any suggestion is appreciated. Thanks in advance.
Terminology | Problem Definition
An antenna is sending at position a in R^3 with constant power. Its signal strength can be measured by some S: R^3 -> R where S has a single maximum S_0 at a and the set, constructed by S(x) > const, is simply connected, i.e. S(x) = S_0 * exp(-const * (x-a)^2).
Given a set of antennas A the resulting signal strength is the maximum of a single antenna
S_A(x) = max{S_a(x) : for all a in A} ,
which means we 'lock' on the strongest antenna, which is what cell phones do.
Let K = R^3 x R denote a space of points (position, intensity). Now concider two finite subsets POI_min and POI_max of K. We want to find the set A with the minimal amount of antennas (|A| -> min.), that satisfies
for all (x,w) in POI_min : S_A(x) < w and for all (x,w) in POI_max : S_A(x) > w .
Implication
As S(x) > const is simply connected there has to be an antenna in a sphere around the position of each element (x,w) in POI_max with radius r = max{||xi - x|| : for all xi in S(xi) = w}. Which means that if we would put an antenna at the position of (x,w), then the furthest we can go away from x and still have signal strength w is the radius r within which an actual antenna has to be positioned.
With a similar argumentation for POI_min it follows that there is no antenna within r = min{||xi - x|| : for all xi in S(xi) = w}.
Solution
Instead of solving a nonlinear optimization task we can intersect spheres to obtain the optimal solution. If k spheres around the POI_max positions intersect, we can place a single antenna in the intersection, reducing the amount of antennas needed by k-1.
However each antenna that is placed must satisfy all constraints given by the elements of POI_min. Assuming that antennas are omnidirectional and thus orientation of an antenna doesn't matter we can do (pseudocode):
min_sphere = {(x_i,r_i) : from POI_min},
spheres_to_cover = {(x_i,r_i) : from POI_max}
A = {}
while not is_empty(spheres_to_cover)
power_set_score = struct // holds score, k
PS <- costruct power set of sphere_to_cover
for i = 1:number_of_elements(PS)
k = PS[i]
if intersection(k) \ min_sphere is not empty
power_set_score[i].score = |k|
else
power_set_score[i].score = 0
end if
power_set_score[i].k = k
end for
sort(power_set_score) // sort by score, biggest first
A <- add arbitrary point in (intersection(power_set_score[1].k) \ min_sphere)
spheres_to_cover = spheres_to_cover \ power_set_score[1].k
end while
On the other hand you have just given an example problem and thus this solution may not be applicable or broad enough for your case. I did make a few assumptions. So being more specific in the question might give you an even better answer.