First I want to say that I'm really new to neural networks and I don't understand it very good ;)
I've made my first C# implementation of the backpropagation neural network. I've tested it using XOR and it looks it work.
Now I would like change my implementation to use resilient backpropagation (Rprop - http://en.wikipedia.org/wiki/Rprop).
The definition says: "Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight".
Could somebody tell me what partial derivative over all patterns is? And how should I compute this partial derivative for a neuron in hidden layer.
Thanks a lot
UPDATE:
My implementation base on this Java code: www_.dia.fi.upm.es/~jamartin/downloads/bpnn.java
My backPropagate method looks like this:
public double backPropagate(double[] targets)
{
double error, change;
// calculate error terms for output
double[] output_deltas = new double[outputsNumber];
for (int k = 0; k < outputsNumber; k++)
{
error = targets[k] - activationsOutputs[k];
output_deltas[k] = Dsigmoid(activationsOutputs[k]) * error;
}
// calculate error terms for hidden
double[] hidden_deltas = new double[hiddenNumber];
for (int j = 0; j < hiddenNumber; j++)
{
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + output_deltas[k] * weightsOutputs[j, k];
}
hidden_deltas[j] = Dsigmoid(activationsHidden[j]) * error;
}
//update output weights
for (int j = 0; j < hiddenNumber; j++)
{
for (int k = 0; k < outputsNumber; k++)
{
change = output_deltas[k] * activationsHidden[j];
weightsOutputs[j, k] = weightsOutputs[j, k] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumOutpus[j, k];
lastChangeWeightsForMomentumOutpus[j, k] = change;
}
}
// update input weights
for (int i = 0; i < inputsNumber; i++)
{
for (int j = 0; j < hiddenNumber; j++)
{
change = hidden_deltas[j] * activationsInputs[i];
weightsInputs[i, j] = weightsInputs[i, j] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumInputs[i, j];
lastChangeWeightsForMomentumInputs[i, j] = change;
}
}
// calculate error
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + 0.5 * (targets[k] - activationsOutputs[k]) * (targets[k] - activationsOutputs[k]);
}
return error;
}
So can I use change = hidden_deltas[j] * activationsInputs[i] variable as a gradient (partial derivative) for checking the sing?
I think the "over all patterns" simply means "in every iteration"... take a look at the RPROP paper
For the paritial derivative: you've already implemented the normal back-propagation algorithm. This is a method for efficiently calculate the gradient... there you calculate the δ values for the single neurons, which are in fact the negative ∂E/∂w values, i.e. the parital derivative of the global error as function of the weights.
so instead of multiplying the weights with these values, you take one of two constants (η+ or η-), depending on whether the sign has changed
The following is an example of a part of an implementation of the RPROP training technique in the Encog Artificial Intelligence Library. It should give you an idea of how to proceed. I would recommend downloading the entire library, because it will be easier to go through the source code in an IDE rather than through the online svn interface.
http://code.google.com/p/encog-cs/source/browse/#svn/trunk/encog-core/encog-core-cs/Neural/Networks/Training/Propagation/Resilient
http://code.google.com/p/encog-cs/source/browse/#svn/trunk
Note the code is in C#, but shouldn't be difficult to translate into another language.
Related
I tried programming a neural network in processing IDE.
I managed to do it quite well, until I tried using the MNIST handwritten digits data set. I tried the iris data set and few others from UCI machine learning repository, but when I used the MNIST data set it didn't worked. for some reason all of the outputs approached zero with time, and that caused the total error to be always equal to 1. I am almost sure that my problem is the activation function; so I tried using softmax for classification, but it wasn't very successful. I got the same results. I think maybe I should have use a different loss function, so I tried the negative log probability according to this video. the results now are the same cost value for each output neuron, and the sum of the outputs is not 1 as it should be.
Here are the functions for each part of the code that I have changed (I prefer not to share the full code because it's long and messy, and not really helpful):
softmax:
float[] softmax(float[] inputVector){
float[] result = new float[inputVector.length];
float sigma = 0;
for(int i = 0; i < inputVector.length; i++){
sigma += exp(inputVector[i]);
}
for(int i = 0; i < result.length; i++){
result[i] = exp(inputVector[i]) / sigma;
}
return result;
}
derivative of softmax:
float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
loss function:
for(int i = 0; i < outputNeuronsNumber; i++){
float tempSigma = 0;
for(int j = 0; j < outputNeuronsNumber; j++){
tempSigma += target[diffCounter2] * log(outputLayer[j]);
}
cost[i] = -tempSigma;
}
I can't see what is the problem with my code.
float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
I believe this is wrong, given the derivative of the softmax as defined on wikipedia.
float[] derivativeSoftmax(float[] inputVector, int k){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * ((i==k ? 1 : 0) - softmax(inputVector)[k]);
}
return result;
}
You should be taking the derivative at an index with respect to some other index. The equation as you have it, which is x*(1-x) doesn't make a lot of sense. But I may be wrong.
I am designing a Feed-Forward BackPropogation ANN with 22 inputs and 1 output (either a 1 or 0). The NN has 3 layers and is using 10 hidden neurons. When I run the NN it only changes the weights a tiny bit and the total error for the output is about 40%. Intially, I thought it was over/under fitting but after I changed the number of hidden neurons, nothing changed.
N is the number of inputs (22)
M is the number of hidden neurons (10)
This is the code that I am using to backpropagate
oin is the output calculated before putting into sigmoid function
oout is the output after going through sigmoid function
double odelta = sigmoidDerivative(oin) * (TARGET_VALUE1[i] - oout);
double dobias = 0.0;
double doweight[] = new double[m];
for(int j = 0; j < m; j++)
{
doweight[j] = (ALPHA * odelta * hout[j]) + (MU * (oweight[j] - oweight2[j]));
oweight2[j] = oweight[j];
oweight[j] += doweight[j];
} // j
dobias = (ALPHA * odelta) + (MU * (obias - obias2));
obias2 = obias;
obias += dobias;
updateHidden(N, m, odelta);
This is the code I am using to change the hidden neurons.
for(int j = 0; j < m; j++)
{
hdelta = (d * oweight[j]) * sigmoidDerivative(hin[j]);
for(int i = 0; i < n; i++)
{
dhweight[i][j] = (ALPHA * hdelta * inputNeuron[i]) + (MU * (hweight[i][j] - hweight2[i][j]));
hweight2[i][j] = hweight[i][j];
hweight[i][j] += dhweight[i][j];
}
dhbias[j] = (ALPHA * hdelta) + (MU * (hbias[j] - hbias2[j]));
hbias2[j] = hbias[j];
hbias[j] += dhbias[j];
} `
You are learning your network to output on one node two classes. the weights connected to this network are adapting to predict a single class then another. so most of the time your weights are adapted to the dominate class in your data. to avoid having this problem add another node to have two nodes on your output each one refer to one class.
I've heard that it should be possible to do a lossless rotation on a jpeg image. That means you do the rotation in the frequency domain without an IDCT. I've tried to google it but haven't found anything. Could someone bring some light to this?
What I mean by lossless is that I don't lose any additional information in the rotation. And of course that's probably only possible when rotating multiples of 90 degrees.
You do not need to IDCT an image to rotate it losslessly (note that lossless rotation for raster images is only possible for angles that are multiples of 90 degrees).
The following steps achieve a transposition of the image, in the DCT domain:
transpose the elements of each DCT block
transpose the positions of each DCT block
I'm going to assume you can already do the following:
Grab the raw DCT coefficients from the JPEG image (if not, see here)
Write the coefficients back to the file (if you want to save the rotated image)
I can't show you the full code, because it's quite involved, but here's the bit where I IDCT the image (note the IDCT is for display purposes only):
Size s = coeff.size();
Mat result = cv::Mat::zeros(s.height, s.width, CV_8UC1);
for (int i = 0; i < s.height - DCTSIZE + 1; i += DCTSIZE)
for (int j = 0; j < s.width - DCTSIZE + 1; j += DCTSIZE)
{
Rect rect = Rect(j, i, DCTSIZE, DCTSIZE);
Mat dct_block = cv::Mat::Mat(coeff, rect);
idct_step(dct_block, i/DCTSIZE, j/DCTSIZE, result);
}
This is the image that is shown:
Nothing fancy is happening here -- this is just the original image.
Now, here's the code that implements both the transposition steps I mentioned above:
Size s = coeff.size();
Mat result = cv::Mat::zeros(s.height, s.width, CV_8UC1);
for (int i = 0; i < s.height - DCTSIZE + 1; i += DCTSIZE)
for (int j = 0; j < s.width - DCTSIZE + 1; j += DCTSIZE)
{
Rect rect = Rect(j, i, DCTSIZE, DCTSIZE);
Mat dct_block = cv::Mat::Mat(coeff, rect);
Mat dct_bt(cv::Size(DCTSIZE, DCTSIZE), coeff.type());
cv::transpose(dct_block, dct_bt); // First transposition
idct_step(dct_bt, j/DCTSIZE, i/DCTSIZE, result); // Second transposition, swap i and j
}
This is the resulting image:
You can see that the image is now transposed. To achieve proper rotation, you need to combine reflection with transposition.
EDIT
Sorry, I forgot that reflection is also not trivial. It also consists of two steps:
Obviously, reflect the positions of each DCT block in the required axis
Less obviously, invert (multiply by -1) each odd row OR column in each DCT block. If you're flipping vertically, invert odd rows. If you're flipping horizontally, invert odd columns.
Here's code that performs a vertical reflection after the transposition.
for (int i = 0; i < s.height - DCTSIZE + 1; i += DCTSIZE)
for (int j = 0; j < s.width - DCTSIZE + 1; j += DCTSIZE)
{
Rect rect = Rect(j, i, DCTSIZE, DCTSIZE);
Mat dct_block = cv::Mat::Mat(coeff, rect);
Mat dct_bt(cv::Size(DCTSIZE, DCTSIZE), coeff.type());
cv::transpose(dct_block, dct_bt);
// This is the less obvious part of the reflection.
Mat dct_flip = dct_bt.clone();
for (int k = 1; k < DCTSIZE; k += 2)
for (int l = 0; l < DCTSIZE; ++l)
dct_flip.at<double>(k, l) *= -1;
// This is the more obvious part of the reflection.
idct_step(dct_flip, (s.width - j - DCTSIZE)/DCTSIZE, i/DCTSIZE, result);
}
Here's the image you get:
You will note that this constitutes a rotation by 90 degrees counter-clockwise.
I need to make a good implementation for matrix multiplication better than the naive method
here is the methods i used :
1- removed false dependencies which made the performance a lot better
2- used a recursive approach
and then there is something i need to try loop unrolling. The thing is each time i used it , it makes the performance worst i can't find an explanation for it
i need help here is the code
for (i = 0; i < M; i++)
for (j = 0; j < N; j++) {
double sum = 0;
#pragma unroll(5)
for (k = 0; k < K; k++)
{
sum += A[i + k*LDA] * B[k + j*LDB];
}
C[i + j*LDC] = sum ;
}
This is part of a code from spectral subtraction algorithm,i'm trying to optimize it for android.please help me.
this is the matlab code:
function Seg=segment(signal,W,SP,Window)
% SEGMENT chops a signal to overlapping windowed segments
% A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented
% and windowed frames of the input one dimentional signal, X. W is the
% number of samples per window, default value W=256. SP is the shift
% percentage, default value SP=0.4. WIN is the window that is multiplied by
% each segment and its length should be W. the default window is hamming
% window.
% 06-Sep-04
% Esfandiar Zavarehei
if nargin<3
SP=.4;
end
if nargin<2
W=256;
end
if nargin<4
Window=hamming(W);
end
Window=Window(:); %make it a column vector
L=length(signal);
SP=fix(W.*SP);
N=fix((L-W)/SP +1); %number of segments
Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))';
hw=repmat(Window,1,N);
Seg=signal(Index).*hw;
and this is our java code for this function:
public class MatrixAndSegments
{
public int numberOfSegments;
public double[][] res;
public MatrixAndSegments(int numberOfSegments,double[][] res)
{
this.numberOfSegments = numberOfSegments;
this.res = res;
}
}
public MatrixAndSegments segment (double[] signal_in,int samplesPerWindow, double shiftPercentage, double[] window)
{
//default shiftPercentage = 0.4
//default samplesPerWindow = 256 //W
//default window = hanning
int L = signal_in.length;
shiftPercentage = fix(samplesPerWindow * shiftPercentage); //SP
int numberOfSegments = fix ( (L - samplesPerWindow)/ shiftPercentage + 1); //N
double[][] reprowMatrix = reprowtrans(samplesPerWindow,numberOfSegments);
double[][] repcolMatrix = repcoltrans(numberOfSegments, shiftPercentage,samplesPerWindow );
//Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))';
double[][] index = new double[samplesPerWindow+1][numberOfSegments+1];
for (int x = 1; x < samplesPerWindow+1; x++ )
{
for (int y = 1 ; y < numberOfSegments + 1; y++) //numberOfSegments was 3
{
index[x][y] = reprowMatrix[x][y] + repcolMatrix[x][y];
}
}
//hamming window
double[] hammingWindow = this.HammingWindow(samplesPerWindow);
double[][] HW = repvector(hammingWindow, numberOfSegments);
double[][] seg = new double[samplesPerWindow][numberOfSegments];
for (int y = 1 ; y < numberOfSegments + 1; y++)
{
for (int x = 1; x < samplesPerWindow+1; x++)
{
seg[x-1][y-1] = signal_in[ (int)index[x][y]-1 ] * HW[x-1][y-1];
}
}
MatrixAndSegments Matrixseg = new MatrixAndSegments(numberOfSegments,seg);
return Matrixseg;
}
public int fix(double val) {
if (val < 0) {
return (int) Math.ceil(val);
}
return (int) Math.floor(val);
}
public double[][] repvector(double[] vec, int replications)
{
double[][] result = new double[vec.length][replications];
for (int x = 0; x < vec.length; x++) {
for (int y = 0; y < replications; y++) {
result[x][y] = vec[x];
}
}
return result;
}
public double[][] reprowtrans(int end, int replications)
{
double[][] result = new double[end +1][replications+1];
for (int x = 1; x <= end; x++) {
for (int y = 1; y <= replications; y++) {
result[x][y] = x ;
}
}
return result;
}
public double[][] repcoltrans(int end, double multiplier, int replications)
{
double[][] result = new double[replications+1][end+1];
for (int x = 1; x <= replications; x++) {
for (int y = 1; y <= end ; y++) {
result[x][y] = (y-1)*multiplier;
}
}
return result;
}
public double[] HammingWindow(int size)
{
double[] window = new double[size];
for (int i = 0; i < size; i++)
{
window[i] = 0.54-0.46 * (Math.cos(2.0 * Math.PI * i / (size-1)));
}
return window;
}
"Porting" Matlab code statement by statement to Java is a bad approach.
Data is rarely manipulated in Matlab using loops and addressing individual elements (because the Matlab interpreter/VM is rather slow), but rather through calls to block processing functions (which have been carefully written and optimized). This leads to a very idiosyncratic programming style in which repmat, reshape, find, fancy indexing et al. are used to do operations which would be much more naturally expressed through Java loops.
For example, to multiply each column of a matrix A by a vector v, you will write in matlab:
A = diag(v) * A
or
A = repmat(v', 1, size(A, 2)) .* A
This solution:
for i = 1:size(A, 2),
A(:, i) = A(:, i) .* v';
end;
is inefficient.
But it would be terribly foolish to try to do the same thing in Java and invoke a matrix product or to build a matrix with repeated copies of v. Instead, just do:
for (int i = 0; i < rows; i++) {
for (int j = 0; j < columns; j++) {
a[i][j] *= v[i]
}
}
I suggest you to try to understand what this matlab function is actually doing, instead of focusing on how it is doing it, and reimplement it from scratch in Java, forgetting all the matlab implementation except the specifications given in the comments. Half of the code you have written is useless, indeed. Actually, it seems to me that this function wouldn't be needed at all, and what it does could be efficiently integrated in the caller's code.