Consider the hypothetical neural network here
$o_1$ is the output of neuron 1.
$o_2$ is the output of neuron 2.
$w_1$ is the weight of connection between 1 and 3.
$w_2$ is the weight of connection between 2 and 3.
So the input to neuron 3 is $i =o_1w_1 +o_2w_2$
Let the activation function of neuron 3 be sigmoid function.
$f(x) = \dfrac{1}{1+e^{-x}}$ and the threshold value of neuron 3 be $\theta$.
Therefore, output of neuron 3 will be $f(i)$ if $i\geq\theta$ and $0$ if $i\lt\theta$.
Am I correct?
Thresholds are used for binary neurons (I forget the technical name), whereas biases are used for sigmoid (and pretty much all modern) neurons. Your understanding of the threshold is correct, but again this is used in neurons where the output is either 1 or 0, which is not very useful for learning (optimization). With a sigmoid neuron, you would simply add the bias (previously the threshold but moved to the other side of the equation), so you're output would be f(weight * input + bias). All the sigmoid function is doing (for the most part) is limiting your output to a value between 0 and 1
I do not think it is the place to ask this sort of questions. You will find lot of NN ressources online. For your simple case, each link has a weight, so basicly the input of neuron 3 is :
Neuron3Input = Neuron1Output * WeightOfLinkNeuron1To3 + Neuron2Output * WeightOfLinkNeuron2To3
+ bias.
Then, to get the output, just use the activation function. Neuron3Output = F_Activation(Neuron3Input)
O3 = F(O1 * W1 + O2 * W2 + Bias)
Related
I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont understand, at least one step
Task:
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
I found also the solution, as can be seen on the second image
I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates
0.99 + 3*0.01
4*0.01
I really don't understand these two steps. I would be very happy if someone can help me understand this calculation
Thank you very much for help
Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:
x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.
Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:
0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03
So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.
I have trained an RNN in Keras. Now, I want to get the values of the trained weights:
model = Sequential()
model.add(SimpleRNN(27, return_sequences=True , input_shape=(None, 27), activation = 'softmax'))<br>
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.get_weights()
This gives me 2 arrays of shape (27,27) and 1 array of shape (27,1). I am not getting the meaning of these arrays. Also, I should get 2 more array of shape (27,27) and (27,1) that will calculate the hidden state 'a' activation. How can I get these weights?
The arrays returned by model.get_weights() directly correspond to the weights used by SimpleRNNCell. They include:
The kernel matrix of size (input_shape[-1], units). In your case, input_shape=(None, 27) and units=27, so it's (27, 27). The kernel gets multiplied by the input.
The recurrent_kernel matrix of size (units, units), which also happens to be (27, 27). This matrix gets multiplied by the previous state.
The bias array of shape (units,) == (27,).
These arrays correspond to the standard formula:
# W = kernel
# U = recurrent_kernel
# B = bias
output = new_state = act(W * input + U * state + B)
Note that keras implementation uses a single bias vector, so all in all there are exactly three arrays.
I get constantly error Total number of RBF neurons must be some integer to the power of 'dimensions' with using method SetRBFCentersAndWidthsEqualSpacing in C#.
Can someone who is familiar with RBF network in Encog check the line 232 in RBFNetwork.cs. I think there is maybe a bug or I miss something:
var expectedSideLength = (int) Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
double cmp = Math.Pow(totalNumHiddenNeurons, 1.0d/dimensions);
if (expectedSideLength != cmp) -> error
these two variables can't be equal, because (int) rounds the number. It's coincidence that it works for XOR example, it won't work with different dimenson like 19 for example.
This is how I create RBF network:
dataSet is VersatileMLDataSet
RBFNetwork n = new RBFNetwork(dataSet.CalculatedInputSize, dataSet.Count, 1, RBFEnum.Gaussian);
n.SetRBFCentersAndWidthsEqualSpacing(0, 1, RBFEnum.Gaussian, 2.0/(dataSet.CalculatedInputSize * dataSet.CalculatedInputSize), true);
My dataset has 19 attributes (dimension) with 731 records.
The number of hidden neurons is an integer raised to the power of the number of input neurons. So if you have 3 input attributes and a window size of 2, hidden neurons would be any integer (say 3) raised to the power of 6 (3 x 2) or 729. This limits the number of input attributes and window size as the number of hidden neurons gets very large very quickly.
From this site,
The output node has a "threshold" t.
Rule:
If summed input ≥ t, then it "fires" (output y = 1).
Else (summed input < t) it doesn't fire (output y = 0).
How y equals to zero. Any Ideas appreciated.
Neural networks have a so called "activation function", it's usually some form of a sigmoid-like function to map the inputs into separate outputs.
http://zephyr.ucd.ie/mediawiki/images/b/b6/Sigmoid.png
For you it happens to be either 0 or 1 and using a comparison instead of a sigmoid function,
so your activation curve will be even sharper than the graph above. In the said graph, your t, the threshold, is 0 on the X axis.
So as pseudo code :
sum = w1 * I1 + w2 + I2 + ... + wn * In
sum is the weighted sum of all in the inputs the neuron, now all you have to do is compare that sum to t, the threshold :
if sum >= t then y = 1 // Your neuron is activated
else y = 0
You can use the last neuron's output as the networks output to predict something into 1/0, true/false etc.
If you're studying NNs, I'd suggest you start with the XOR problem, then it will all make sense.
I am using 64 bit matlab with 32g of RAM (just so you know).
I have a file (vector) of 1.3 million numbers (integers). I want to make another vector of the same length, where each point is a weighted average of the entire first vector, weighted by the inverse distance from that position (actually it's position ^-0.1, not ^-1, but for example purposes). I can't use matlab's 'filter' function, because it can only average things before the current point, right? To explain more clearly, here's an example of 3 elements
data = [ 2 6 9 ]
weights = [ 1 1/2 1/3; 1/2 1 1/2; 1/3 1/2 1 ]
results=data*weights= [ 8 11.5 12.666 ]
i.e.
8 = 2*1 + 6*1/2 + 9*1/3
11.5 = 2*1/2 + 6*1 + 9*1/2
12.666 = 2*1/3 + 6*1/2 + 9*1
So each point in the new vector is the weighted average of the entire first vector, weighting by 1/(distance from that position+1).
I could just remake the weight vector for each point, then calculate the results vector element by element, but this requires 1.3 million iterations of a for loop, each of which contains 1.3million multiplications. I would rather use straight matrix multiplication, multiplying a 1x1.3mil by a 1.3milx1.3mil, which works in theory, but I can't load a matrix that large.
I am then trying to make the matrix using a shell script and index it in matlab so only the relevant column of the matrix is called at a time, but that is also taking a very long time.
I don't have to do this in matlab, so any advice people have about utilizing such large numbers and getting averages would be appreciated. Since I am using a weight of ^-0.1, and not ^-1, it does not drop off that fast - the millionth point is still weighted at 0.25 compared to the original points weighting of 1, so I can't just cut it off as it gets big either.
Hope this was clear enough?
Here is the code for the answer below (so it can be formatted?):
data = load('/Users/mmanary/Documents/test/insertion.txt');
data=data.';
total=length(data);
x=1:total;
datapad=[zeros(1,total) data];
weights = ([(total+1):-1:2 1:total]).^(-.4);
weights = weights/sum(weights);
Fdata = fft(datapad);
Fweights = fft(weights);
Fresults = Fdata .* Fweights;
results = ifft(Fresults);
results = results(1:total);
plot(x,results)
The only sensible way to do this is with FFT convolution, as underpins the filter function and similar. It is very easy to do manually:
% Simulate some data
n = 10^6;
x = randi(10,1,n);
xpad = [zeros(1,n) x];
% Setup smoothing kernel
k = 1 ./ [(n+1):-1:2 1:n];
% FFT convolution
Fx = fft(xpad);
Fk = fft(k);
Fxk = Fx .* Fk;
xk = ifft(Fxk);
xk = xk(1:n);
Takes less than half a second for n=10^6!
This is probably not the best way to do it, but with lots of memory you could definitely parallelize the process.
You can construct sparse matrices consisting of entries of your original matrix which have value i^(-1) (where i = 1 .. 1.3 million), multiply them with your original vector, and sum all the results together.
So for your example the product would be essentially:
a = rand(3,1);
b1 = [1 0 0;
0 1 0;
0 0 1];
b2 = [0 1 0;
1 0 1;
0 1 0] / 2;
b3 = [0 0 1;
0 0 0;
1 0 0] / 3;
c = sparse(b1) * a + sparse(b2) * a + sparse(b3) * a;
Of course, you wouldn't construct the sparse matrices this way. If you wanted to have less iterations of the inside loop, you could have more than one of the i's in each matrix.
Look into the parfor loop in MATLAB: http://www.mathworks.com/help/toolbox/distcomp/parfor.html
I can't use matlab's 'filter' function, because it can only average
things before the current point, right?
That is not correct. You can always add samples (i.e, adding or removing zeros) from your data or from the filtered data. Since filtering with filter (you can also use conv by the way) is a linear action, it won't change the result (it's like adding and removing zeros, which does nothing, and then filtering. Then linearity allows you to swap the order to add samples -> filter -> remove sample).
Anyway, in your example, you can take the averaging kernel to be:
weights = 1 ./ [3 2 1 2 3]; % this kernel introduces a delay of 2 samples
and then simply:
result = filter(w,1,[data, zeros(1,3)]); % or conv (data, w)
% removing the delay introduced by the kernel
result = result (3:end-1);
You considered only 2 options:
Multiplying 1.3M*1.3M matrix with a vector once or multiplying 2 1.3M vectors 1.3M times.
But you can divide your weight matrix to as many sub-matrices as you wish and do a multiplication of n*1.3M matrix with the vector 1.3M/n times.
I assume that the fastest will be when there will be the smallest number of iterations and n is such that creates the largest sub-matrix that fits in your memory, without making your computer start swapping pages to your hard drive.
with your memory size you should start with n=5000.
you can also make it faster by using parfor (with n divided by the number of processors).
The brute force way will probably work for you, with one minor optimisation in the mix.
The ^-0.1 operations to create the weights will take a lot longer than the + and * operations to compute the weighted-means, but you re-use the weights across all the million weighted-mean operations. The algorithm becomes:
Create a weightings vector with all the weights any computation would need:
weights = (-n:n).^-0.1
For each element in the vector:
Index the relevent portion of the weights vector to consider the current element as the 'centre'.
Perform the weighted-mean with the weights portion and the entire vector. This can be done with a fast vector dot-multiply followed by a scalar division.
The main loop does n^2 additions and subractions. With n equal to 1.3 million that's 3.4 trillion operations. A single core of a modern 3GHz CPU can do say 6 billion additions/multiplications a second, so that comes out to around 10 minutes. Add time for indexing the weights vector and overheads, and I still estimate you could come in under half an hour.