Currently I am trying to implement Resilient Propagation for my network. I'm doing this based on the encog implementation, but there is one thing I don't understand:
The documentation for RPROP and iRPROP+ says when change > 0: weightChange = -sign(gradient) * delta
The source code in lines 298 and 366 does not have a minus!
Since I assume both are in some case correct: Why is there a difference between the two?
And concerning the gradient: I'm using tanh as activion in the output layer. Is this the correct calculation of the gradient?
gradientOutput = (1 - lastOutput[j] * lastOutput[j]) * (target[j] - lastOutput[j]);
After re-reading the relevant papers and looking up in a textbook I think the documentation of encog is not correct at this point. Why don't you just try it out by temporarily adding the minus-signs in the source code? If you use the same initial weights, you should receive exact the same results, given the documentation was correct. But in the end it just matters how you use the weightUpdate variable. If the author of the documentation is used to subtracting the weightUpdate from the weights instead of adding it, this will work.
Edit: I revisited the part about the gradient calculation in my original answer.
First, here is a brief explanation on how you can imagine the gradient for the weights in your output layer. First, you calculate the error between your outputs and the target values.
What you are now trying to do is to "blame" those neurons in the previous layer, which were active. Imagine the output neuron saying "Well, I have an error here, who is responsible?". Responsible are the neurons of the previous layer. Depending on the output being too small or too large compared to the target value, it will increase or decrease the weights to each of the neurons in the previous layers depending on how active they have been.
x is the activation of a neuron in the hidden layer.
o is the activation of the output neuron.
φ is the activation function of the output neuron, φ' its derivative.
Edit2: Corrected the part below. Added matrix style computation of backpropagation.
The error at each output neuron j is:
(1) δout, j = φ'(oj)(t - oj)
The gradient for the weight connecting the hidden neuron i with the output neuron j:
(2) gradi, j = xi * δout, j
The backpropagated error at each hidden neuron i with the weights w:
(3) δhid, i = φ'(x)*∑wi, j * δout, j
By repeatedly applying formula 2 and 3, you can backpropagate up to the input layer.
Written in loops, regarding one training sample:
The error at each output neuron j is:
for(int j=0; j < numOutNeurons; j++) {
errorOut[j] = activationDerivative(o[j])*(t[j] - o[j]);
}
The gradient for the weight connecting the hidden neuron i with the output neuron j:
for(int i=0; i < numHidNeurons; i++) {
for(int j=0; j < numOutNeurons; j++) {
grad[i][j] = x[i] * errorOut[j]
}
}
The backpropagated error at each hidden neuron i:
for(int i=0; i < numHidNeurons; i++) {
for(int j=0; j < numOutNeurons; j++) {
errorHid[i] = activationDerivative(x[i]) * weights[i][j] * errorOut[j]
}
}
In fully connected Multilayer Perceptrons without convolution or anything like that you can can use standard matrix operations, which is a lot faster.
Assuming each of your samples is a row in your input matrix and the columns are its attributes, you can propagate the input through your network like this:
activations[0] = input;
for(int i=0; i < numWeightMatrices; i++){
activations[i+1] = activations[i].dot(weightMatrices[i]);
activations[i+1] = activationFunction(activations[i+1]);
}
Backpropagation then becomes:
n = numWeightMatrices;
error = activationDerivative(activations[n]) * (target - activations[n]);
for (int l=n-1; l >= 0; l--){
gradient[l] = activations[l].transposed().dot(error);
if (l > 0) {
error = error.dot(weightMatrices[l].transposed());
error = activationDerivative(activations[l])*error;
}
}
I omitted the bias neuron in the above explanations. In literature it is recommended to model the bias neuron as an additional column in each activation matrix which is alway 1.0 . You will need to deal with some slice assigns. When using the matrix backpropagation loop, do not forget to set the error at the position of the bias to 0 before each step!
private float resilientPropagation(int i, int j){
float gradientSignChange = sign(prevGradient[i][j]*gradient[i][j]);
float delta = 0;
if(gradientSignChange > 0){
float change = Math.min((prevChange[i][j]*increaseFactor), maxDelta);
delta = sign(gradient[i][j])*change;
prevChange[i][j] = change;
prevGradient[i][j] = gradient[i][j];
}
else if(gradientSignChange < 0){
float change = Math.max((prevChange[i][j]*decreaseFactor), minDelta);
prevChange[i][j] = change;
delta = -prevDelta[i][j];
prevGradient[i][j] = 0;
}
else if(gradientSignChange == 0){
float change = prevChange[i][j];
delta = sign(gradient[i][j])*change;
prevGradient[i][j] = gradient[i][j];
}
prevDelta[i][j] = delta;
return delta;
}
gradient[i][j] = error[j]*layerInput[i];
weights[i][j]= weights[i][j]+resilientPropagation(i,j);
Related
I have a large matrix of random values (e.g. 200,000 x 6,000) between 0-1 named 'allGSR.'
I used the following code to create a logical array (?) where 1 represents numbers less than .05
sig = (allGSR < .05);
What I'd like to do is to return an array of size 1 x 200,000 called maxSIG where each row represents the MAXIMUM number of sequential ones. So for example, if in row 1, columns 3-6 are ones, that is 4 ones in a row and if columns 100-109 are ones that is 10 ones in a row and if that is the maximum number of ones in a row I would like the first column of maxSIG to be the value '10.'
I have been doing this with for loops, if statements, and counters; this is ugly and tedious and was wondering if there is an easier or more efficient way.
Thank you for any insight.
EDIT: Whoops, should probably share the loop.
EDIT 2: So I just wrote out what my basic code is with a smaller (100 x 6,000) matrix. This code should run. Sorry for the inconvenience.
GSR = 6000;
samples = 100;
allGSR = zeros(samples, GSR);
for x = 1:samples
y = rand(GSR, 1)'; %Transpose so it's 1x6000 and not 6000x1
allGSR(x,:) = y;
end
countSIG = zeros(samples,1);
abovethreshold = (allGSR < .05); %.05 can be replaced by whatever
for z = 1:samples
count = 0;
holdArray = zeros(1,GSR);
for a = 1:GSR
if abovethreshold(z,a) == true
count = count + 1;
else
count = 0;
end
holdArray(1,a) = count;
end
maxrun = max(holdArray);
countSIG(z,1) = maxrun;
end
Here's one approach using diff, find & accumarray -
append_col = zeros(size(abovethreshold,1),1);
df = diff([append_col abovethreshold append_col],[],2).'; %//'
[R1,C1] = find(df==1);
[R2,C2] = find(df==-1);
out = zeros(samples,1);
out(1:max(C1)) = accumarray(C1,R2 - R1,[],#max);
In the code posted above, we are creating a fat array with abovethreshold and then transposing it. From performance point of view, the transpose operation might not be the best thing to do. So, rather we can move things around it rather than itself, like so -
append_col = zeros(size(abovethreshold,1),1);
df = diff([append_col abovethreshold append_col],[],2); %//'
[R1,C1] = find(df==1);
[R2,C2] = find(df==-1);
[~,idx1] = sort(R1);
[~,idx2] = sort(R2);
out = zeros(samples,1);
out(1:max(R1)) = accumarray(R1(idx1),C2(idx2) - C1(idx1),[],#max);
If you're worried about memory allocation, speed, etc... on huge arrays, I'd just do your same basic algorithm in c++. Throw this in something like myfunction.cpp file and compile with mex -largeArrayDims myfunction.cpp.
You can then call from matlab with counts = myfunction(allGSR, .05);
I haven't tested this beyond that it compiles.
#include "mex.h"
#include "matrix.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
if(nrhs != 2)
mexErrMsgTxt("Invalid number of inputs. Shoudl be 2 input argument.");
if(nlhs != 1)
mexErrMsgTxt("Invalid number of outputs. Should be 1 output arguments.");
if(!mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]))
mexErrMsgTxt("First two arguments are not doubles");
const mxArray *input_array = prhs[0];
const mxArray *threshold_array = prhs[1];
size_t input_rows = mxGetM(input_array);
size_t input_cols = mxGetN(input_array);
size_t threshold_rows = mxGetM(threshold_array);
size_t threshold_cols = mxGetN(threshold_array);
if(threshold_rows != 1 || threshold_cols != 1)
mexErrMsgTxt("threshold array should be a scalar");
mxArray *output_array = mxCreateDoubleMatrix(1, input_rows, mxREAL);
double *output_data = mxGetPr(output_array);
double *input_data = mxGetPr(input_array);
double threshold = *mxGetPr(threshold_array);
for(int z = 0; z < input_rows; z++) {
int count = 0;
int max_count = 0;
for(int a = 0; a < input_cols; a++) {
if(input_data[z + a * input_rows] < threshold) {
count++;
} else {
if(count > max_count)
max_count = count;
count = 0;
}
}
if(count > max_count)
max_count = count;
output_data[z] = max_count;
}
plhs[0] = output_array;
}
I'm not sure if you want to check for above or below threshold? Whatever you do, you'd change the input_data[z + a * input_rows] < threshold) to whatever comparison operator you want.
Here's a one-liner, albeit slow since cellfun is a loop:
maxSIG=cellfun(#(x) max(getfield(regionprops(x),'Area')),mat2cell(allGSR,ones(6000,1),100));
The Image Processing Toolbox function regionprops identifies connected groups of 1's in a logical matrix. By operating on each row of your matrix, and returning specifically the Area property, we get the length of each connected segment of 1's in each row. The max function picks out the length in each row you're looking for.
Note the mat2cell call is necessary to split allGSR into a cell matrix of rows, so that cellfun can be called.
I have a 900×1 vector of values (in MATLAB). Each 9 consecutive values should be averaged -without overlap- result in a 100×1 vector of values. The problem is that the averaging should be weighted based on a weighting vector of [1 2 1;2 4 2;1 2 1]. Is there any efficient way to do that averaging? I’ve heard about conv function in MATLAB; Is it helpful?
conv works by sliding a kernel through your data. But in your case, you need the mask to be jumping through your data, so I don't think conv will work for you.
If you want to use existing MATLAB function, you can do this (I have to assume your weighting matrix has only one dimension) :
kernel = [1;2;1;2;4;2;1;2;1];
in_matrix = reshape(in_matrix, 9, 100);
base = sum(kernel);
out_matrix = bsxfun(#times, in_matrix, kernel);
result = sum(out_matrix,1)/base;
I don't know if there is any clever way to speed this up. bsxfun allows singleton expansion, but maybe not dimension reduction.
A faster way would be to use mex. Open a new file in editor, paste the following code and save file as weighted_average.c.
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *in_matrix, *kernel, *out_matrix, base;
int niter;
size_t nrows_data, nrows_kernel;
/* Get number of element along first dimension of input matrix. */
nrows_kernel = mxGetM(prhs[1]);
nrows_data = mxGetM(prhs[0]);
/* Create output matrix*/
plhs[0] = mxCreateDoubleMatrix((mwSize)nrows_data/nrows_kernel,1,mxREAL);
/* Get a pointer to the real data */
in_matrix = mxGetPr(prhs[0]);
kernel = mxGetPr(prhs[1]);
out_matrix = mxGetPr(plhs[0]);
/* Sum the elements in weighting array */
base = 0;
for (int i = 0; i < nrows_kernel; i +=1)
{
base += kernel[i];
}
/* Perform calculation */
niter = nrows_data/nrows_kernel;
for (int i = 0; i < niter ; i += 1)
{
for (int j = 0; j < nrows_kernel; j += 1)
{
out_matrix[i] += in_matrix[i*nrows_kernel+j]*kernel[j];
}
out_matrix[i] /= base;
}
}
Then in command window , type in
mex weighted_average.c
To use it:
result = weighted_average(input, kernel);
Note that both input and kernel have to be M x 1 matrix. On my computer, the first method took 0.0012 second. The second method took 0.00007 second. That's an order of magnitude faster than the first method.
everyone I have created a neural network with 1600 input, one hidden layer with different number of neurons nodes and 24 output neurons.
My code shown that I can decrease the error each epoch, but the output of hidden layer always is 1. Due to this reason, the weight adjusted always produce same result for my testing data.
I try different number of neuron nodes and learning rate in the ANN and also randomly initialize my initial weight. I use sigmoid function as my activate function since my output is either 1 or 0 in different output.
May I know that what is the main reason that causes the output of hidden layer always is 1 and how should i solve it?
My purpose for this neural network is to recognize 24 hand shape for alphabet, I try intensities data in my first phase of project.
I have try 30 hidden neural nodes also 100 neural nodes even 1000 neural nodes but the output of hidden layer still is 1. Due to this reason, all of the outcome in testing data is always similar.
I added the code for my network
Thanks
g = inline('logsig(x)');
[row, col] = size(input);
numofInputNeurons = col;
weight_input_hidden = rand(numofInputNeurons, numofFirstHiddenNeurons);
weight_hidden_output = rand(numofFirstHiddenNeurons, numofOutputNeurons);
epochs = 0;
errorMatrix = [];
while(true)
if(totalEpochs > 0 && epochs >= totalEpochs)
break;
end
totalError = 0;
epochs = epochs + 1;
for i = 1:row
targetRow = zeros(1, numofOutputNeurons);
targetRow(1, target(i)) = 1;
hidden_output = g(input(1, 1:end)*weight_input_hidden);
final_output = g(hidden_output*weight_hidden_output);
error = abs(targetRow - final_output);
error = sum(error);
totalError = totalError + error;
if(error ~= 0)
delta_final_output = learningRate * (targetRow - final_output) .* final_output .* (1 - final_output);
delta_hidden_output = learningRate * (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
for m = 1:numofFirstHiddenNeurons
for n = 1:numofOutputNeurons
current_changes = delta_final_output(1, n) * hidden_output(1, m);
weight_hidden_output(m, n) = weight_hidden_output(m, n) + current_changes;
end
end
for m = 1:numofInputNeurons
for n = 1:numofFirstHiddenNeurons
current_changes = delta_hidden_output(1, n) * input(1, m);
weight_input_hidden(m, n) = weight_input_hidden(m, n) + current_changes;
end
end
end
end
totalError = totalError / (row);
errorMatrix(end + 1) = totalError;
if(errorThreshold > 0 && totalEpochs == 0 && totalError < errorThreshold)
break;
end
end
I see a few obvious errors that need fixing in your code:
1) You have no negative weights when initialising. This is likely to get the network stuck. The weight initialisation should be something like:
weight_input_hidden = 0.2 * rand(numofInputNeurons, numofFirstHiddenNeurons) - 0.1;
2) You have not implemented bias. That will severely limit the ability of the network to learn. You should go back to your notes and figure that out, it is usually implemented as an extra column of 1's inserted into input and activation vectors/matrix before determining the activations of each layer, and there should be a matching additional column of weights.
3) Your delta for output layer is wrong. This line
delta_final_output = learningRate * (targetRow - final_output) .* final_output .* (1 - final_output);
. . . is not the delta for the output layer activations. It has some extra unwanted factors.
The correct delta for logloss objective function and sigmoid activation in output layer would be:
delta_final_output = (final_output - targetRow);
There are other possibilities, depending on your objective function, which is not shown. You original code is close to correct for mean squared error, which would probably still work if you changed the sign and removed the factor of learningRate
4) Your delta for hidden layer is wrong. This line:
delta_hidden_output = learningRate * (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
. . . is not the delta for the hidden layer activations. You have multiplied by the learningRate for some reason (combined with the other delta that means you have a factor of learningRate squared).
The correct delta would be:
delta_hidden_output = (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
5) Your weight update step needs adjusting to match fixes to (3) and (4). These lines:
current_changes = delta_final_output(1, n) * hidden_output(1, m);
would need to be adjusted to get correct sign and learning rate multiplier
current_changes = -learningRate * delta_final_output(1, n) * hidden_output(1, m);
That's 5 bugs from looking through the code, I may have missed some. But I think that's more than enough for now.
I am analysing gafchromic filters in a freeware called ImageJ, which uses a simplified form of Java to write macros.
I have a set of datapoints I have successfully connected with different methods and have decided that a third degree polynomial fits the data best, however I need to work with the actual curve, so I need to somehow extract the equation/formula of said polynomial. This should be possible as the variables defining the polynomial are listed on the generated graph, however I can't seem to find a way to extract them in the code.
Here's my code so far:
n = nResults();
x = newArray(n);
for (i=0; i<x.length; i++)
{
x[i] = getResult("Grays ", i);
}
y = newArray(n);
for (i=0; i<y.length; i++)
{
y[i] = getResult("Mean ", i);
}
// Do all possible fits, plot them and add the plots to a stack
setBatchMode(true);
for (i = 0; i < Fit.nEquations; i++) {
Fit.doFit(i, x, y);
Fit.plot();
if (i == 0)
stack = getImageID;
else {
run("Copy");
close();
selectImage(stack);
run("Add Slice");
run("Paste");
}
Fit.getEquation(i, name, formula);
print(""); print(name+ " ["+formula+"]");
print(" R^2="+d2s(Fit.rSquared,3));
for (j=0; j<Fit.nParams; j++)
print(" p["+j+"]="+d2s(Fit.p(j),6));
}
setBatchMode(false);
run("Select None");
rename("Curve Fits");
}
As hinted above, I already got an answer elsewhere. Nonetheless, I'd like to also keep it here for the record.
Basically, the answer is already included in the original post, as it prints the individual variables into the "Log" window.
For the third-degree polynomial, I could have just used:
Fit.doFit(2, x, y); // 2 is 3rd Degree Polynomial
Fit.plot();
rename("Calibrating curve");
And then the can be extracted easily as thus:
a = Fit.p(0);
b = Fit.p(1);
c = Fit.p(2);
d = Fit.p(3);
I am searching for a Matlab implementation of the Moore-Penrose algorithm computing pseudo-inverse matrix.
I tried several algoithm, this one
http://arxiv.org/ftp/arxiv/papers/0804/0804.4809.pdf
appeared good at the first look.
However, the problem it, that for large elements it produces badly scaled matrices and some internal operations fail. It concerns the following steps:
L=L(:,1:r);
M=inv(L'*L);
I am trying to find a more robust solution which is easily implementable in my other SW. Thanks for your help.
I re-implemented one in C# using the Mapack matrix library by Lutz Roeder. Perhaps this, or the Java version, will be useful to you.
/// <summary>
/// The difference between 1 and the smallest exactly representable number
/// greater than one. Gives an upper bound on the relative error due to
/// rounding of floating point numbers.
/// </summary>
const double MACHEPS = 2E-16;
// NOTE: Code for pseudoinverse is from:
// http://the-lost-beauty.blogspot.com/2009/04/moore-penrose-pseudoinverse-in-jama.html
/// <summary>
/// Computes the Moore–Penrose pseudoinverse using the SVD method.
/// Modified version of the original implementation by Kim van der Linde.
/// </summary>
/// <param name="x"></param>
/// <returns>The pseudoinverse.</returns>
public static Matrix MoorePenrosePsuedoinverse(Matrix x)
{
if (x.Columns > x.Rows)
return MoorePenrosePsuedoinverse(x.Transpose()).Transpose();
SingularValueDecomposition svdX = new SingularValueDecomposition(x);
if (svdX.Rank < 1)
return null;
double[] singularValues = svdX.Diagonal;
double tol = Math.Max(x.Columns, x.Rows) * singularValues[0] * MACHEPS;
double[] singularValueReciprocals = new double[singularValues.Length];
for (int i = 0; i < singularValues.Length; ++i)
singularValueReciprocals[i] = Math.Abs(singularValues[i]) < tol ? 0 : (1.0 / singularValues[i]);
Matrix u = svdX.GetU();
Matrix v = svdX.GetV();
int min = Math.Min(x.Columns, u.Columns);
Matrix inverse = new Matrix(x.Columns, x.Rows);
for (int i = 0; i < x.Columns; i++)
for (int j = 0; j < u.Rows; j++)
for (int k = 0; k < min; k++)
inverse[i, j] += v[i, k] * singularValueReciprocals[k] * u[j, k];
return inverse;
}
What is wrong with using the built-in pinv?
Otherwise, you could take a look at the implementation used in Octave. It is not in Octave/MATLAB syntax, but I guess you should be able to port it without much problems.
Here is the R code [I][1] have written to compute M-P pseudoinverse. I think that is simple enough to be translated into matlab code.
pinv<-function(H){
x=t(H) %*% H
s=svd(x)
xp=s$d
for (i in 1:length(xp)){
if (xp[i] != 0){
xp[i]=1/xp[i]
}
else{
xp[i]=0
}
}
return(s$u %*% diag(xp) %*% t(s$v) %*% t(H))
}
[1]: http://hamedhaseli.webs.com/downloads