how to implement Softmax function for neural networks in processing 3 environment? - neural-network

I tried programming a neural network in processing IDE.
I managed to do it quite well, until I tried using the MNIST handwritten digits data set. I tried the iris data set and few others from UCI machine learning repository, but when I used the MNIST data set it didn't worked. for some reason all of the outputs approached zero with time, and that caused the total error to be always equal to 1. I am almost sure that my problem is the activation function; so I tried using softmax for classification, but it wasn't very successful. I got the same results. I think maybe I should have use a different loss function, so I tried the negative log probability according to this video. the results now are the same cost value for each output neuron, and the sum of the outputs is not 1 as it should be.
Here are the functions for each part of the code that I have changed (I prefer not to share the full code because it's long and messy, and not really helpful):
softmax:
float[] softmax(float[] inputVector){
float[] result = new float[inputVector.length];
float sigma = 0;
for(int i = 0; i < inputVector.length; i++){
sigma += exp(inputVector[i]);
}
for(int i = 0; i < result.length; i++){
result[i] = exp(inputVector[i]) / sigma;
}
return result;
}
derivative of softmax:
float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
loss function:
for(int i = 0; i < outputNeuronsNumber; i++){
float tempSigma = 0;
for(int j = 0; j < outputNeuronsNumber; j++){
tempSigma += target[diffCounter2] * log(outputLayer[j]);
}
cost[i] = -tempSigma;
}
I can't see what is the problem with my code.

float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
I believe this is wrong, given the derivative of the softmax as defined on wikipedia.
float[] derivativeSoftmax(float[] inputVector, int k){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * ((i==k ? 1 : 0) - softmax(inputVector)[k]);
}
return result;
}
You should be taking the derivative at an index with respect to some other index. The equation as you have it, which is x*(1-x) doesn't make a lot of sense. But I may be wrong.

Related

loop unrolling for matrix multiplication

I need to make a good implementation for matrix multiplication better than the naive method
here is the methods i used :
1- removed false dependencies which made the performance a lot better
2- used a recursive approach
and then there is something i need to try loop unrolling. The thing is each time i used it , it makes the performance worst i can't find an explanation for it
i need help here is the code
for (i = 0; i < M; i++)
for (j = 0; j < N; j++) {
double sum = 0;
#pragma unroll(5)
for (k = 0; k < K; k++)
{
sum += A[i + k*LDA] * B[k + j*LDB];
}
C[i + j*LDC] = sum ;
}

Porting signal windowing code from Matlab to Java

This is part of a code from spectral subtraction algorithm,i'm trying to optimize it for android.please help me.
this is the matlab code:
function Seg=segment(signal,W,SP,Window)
% SEGMENT chops a signal to overlapping windowed segments
% A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented
% and windowed frames of the input one dimentional signal, X. W is the
% number of samples per window, default value W=256. SP is the shift
% percentage, default value SP=0.4. WIN is the window that is multiplied by
% each segment and its length should be W. the default window is hamming
% window.
% 06-Sep-04
% Esfandiar Zavarehei
if nargin<3
SP=.4;
end
if nargin<2
W=256;
end
if nargin<4
Window=hamming(W);
end
Window=Window(:); %make it a column vector
L=length(signal);
SP=fix(W.*SP);
N=fix((L-W)/SP +1); %number of segments
Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))';
hw=repmat(Window,1,N);
Seg=signal(Index).*hw;
and this is our java code for this function:
public class MatrixAndSegments
{
public int numberOfSegments;
public double[][] res;
public MatrixAndSegments(int numberOfSegments,double[][] res)
{
this.numberOfSegments = numberOfSegments;
this.res = res;
}
}
public MatrixAndSegments segment (double[] signal_in,int samplesPerWindow, double shiftPercentage, double[] window)
{
//default shiftPercentage = 0.4
//default samplesPerWindow = 256 //W
//default window = hanning
int L = signal_in.length;
shiftPercentage = fix(samplesPerWindow * shiftPercentage); //SP
int numberOfSegments = fix ( (L - samplesPerWindow)/ shiftPercentage + 1); //N
double[][] reprowMatrix = reprowtrans(samplesPerWindow,numberOfSegments);
double[][] repcolMatrix = repcoltrans(numberOfSegments, shiftPercentage,samplesPerWindow );
//Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))';
double[][] index = new double[samplesPerWindow+1][numberOfSegments+1];
for (int x = 1; x < samplesPerWindow+1; x++ )
{
for (int y = 1 ; y < numberOfSegments + 1; y++) //numberOfSegments was 3
{
index[x][y] = reprowMatrix[x][y] + repcolMatrix[x][y];
}
}
//hamming window
double[] hammingWindow = this.HammingWindow(samplesPerWindow);
double[][] HW = repvector(hammingWindow, numberOfSegments);
double[][] seg = new double[samplesPerWindow][numberOfSegments];
for (int y = 1 ; y < numberOfSegments + 1; y++)
{
for (int x = 1; x < samplesPerWindow+1; x++)
{
seg[x-1][y-1] = signal_in[ (int)index[x][y]-1 ] * HW[x-1][y-1];
}
}
MatrixAndSegments Matrixseg = new MatrixAndSegments(numberOfSegments,seg);
return Matrixseg;
}
public int fix(double val) {
if (val < 0) {
return (int) Math.ceil(val);
}
return (int) Math.floor(val);
}
public double[][] repvector(double[] vec, int replications)
{
double[][] result = new double[vec.length][replications];
for (int x = 0; x < vec.length; x++) {
for (int y = 0; y < replications; y++) {
result[x][y] = vec[x];
}
}
return result;
}
public double[][] reprowtrans(int end, int replications)
{
double[][] result = new double[end +1][replications+1];
for (int x = 1; x <= end; x++) {
for (int y = 1; y <= replications; y++) {
result[x][y] = x ;
}
}
return result;
}
public double[][] repcoltrans(int end, double multiplier, int replications)
{
double[][] result = new double[replications+1][end+1];
for (int x = 1; x <= replications; x++) {
for (int y = 1; y <= end ; y++) {
result[x][y] = (y-1)*multiplier;
}
}
return result;
}
public double[] HammingWindow(int size)
{
double[] window = new double[size];
for (int i = 0; i < size; i++)
{
window[i] = 0.54-0.46 * (Math.cos(2.0 * Math.PI * i / (size-1)));
}
return window;
}
"Porting" Matlab code statement by statement to Java is a bad approach.
Data is rarely manipulated in Matlab using loops and addressing individual elements (because the Matlab interpreter/VM is rather slow), but rather through calls to block processing functions (which have been carefully written and optimized). This leads to a very idiosyncratic programming style in which repmat, reshape, find, fancy indexing et al. are used to do operations which would be much more naturally expressed through Java loops.
For example, to multiply each column of a matrix A by a vector v, you will write in matlab:
A = diag(v) * A
or
A = repmat(v', 1, size(A, 2)) .* A
This solution:
for i = 1:size(A, 2),
A(:, i) = A(:, i) .* v';
end;
is inefficient.
But it would be terribly foolish to try to do the same thing in Java and invoke a matrix product or to build a matrix with repeated copies of v. Instead, just do:
for (int i = 0; i < rows; i++) {
for (int j = 0; j < columns; j++) {
a[i][j] *= v[i]
}
}
I suggest you to try to understand what this matlab function is actually doing, instead of focusing on how it is doing it, and reimplement it from scratch in Java, forgetting all the matlab implementation except the specifications given in the comments. Half of the code you have written is useless, indeed. Actually, it seems to me that this function wouldn't be needed at all, and what it does could be efficiently integrated in the caller's code.

Extracting largest element from my array

I'm extracting samples from an audio file in order to draw the wave form - I've generated an array with the samples ( waveDisplayArray in my code below ).
I wish to extract the largest value from this waveDisplayArray, what's the best way to do this?
(I initally defined waveDisplayArray like this: int waveDisplayArray[280]={ 0 };
I'm not sure this is the best way to go about this)
thanks in advance :)
for( int y=0; y<convertedData.mNumberBuffers; y++ )
{
NSLog(#"buffer# %u", y);
AudioBuffer audioBuffer = convertedData.mBuffers[y];
int bufferSize = audioBuffer.mDataByteSize / sizeof(Float32);
Float32 *frame = audioBuffer.mData;
NSLog(#"Buffer Size is: %i", bufferSize);
int numberOfPixels = 280;
int waveDisplayArray[280]={ 0 };
int i;
for (i = 0; i<numberOfPixels; i++)
{
//NSLog(#"i is: %i", i);
int j;
int numberOfSamplesPerPixel = bufferSize/numberOfPixels;
float average = 0;
for (j=i*numberOfSamplesPerPixel; j<(i+1)*numberOfSamplesPerPixel; j++){
average += frame[j];
average = average/numberOfSamplesPerPixel;
}
waveDisplayArray[i] = average;
NSLog(#"Average %i is %f",i,average);
NSLog(#"waveDisplay Array %i: %f",i, waveDisplayArray[i]);
}
}
I have edited and posted only relevant part to find the largest value.
int i;
float largest = 0;
for (i = 0; i<numberOfPixels; i++)
{
//NSLog(#"i is: %i", i);
int j;
int numberOfSamplesPerPixel = bufferSize/numberOfPixels;
float average = 0;
for (j=i*numberOfSamplesPerPixel; j<(i+1)*numberOfSamplesPerPixel; j++){
average += frame[j];
average = average/numberOfSamplesPerPixel;
}
waveDisplayArray[i] = average;
if( largest < average )
{
largest = average;
}
NSLog(#"Average %i is %f",i,average);
NSLog(#"waveDisplay Array %i: %f",i, waveDisplayArray[i]);
}
NSLog(#"Largest %f",largest);
Do you ever thought of using Apples Accelerate framework for this? There is for example a function which calculates the maximum value of a whole vector (your array).
One line replaces your for loop:
vDSP_maxmgv(&maxValue, 1, waveDisplayArray, 280);
There are also useful functions for calculating mean, min or whatever. This should be a lot faster.

Teaching a Neural Net: Bipolar XOR

I'm trying to to teach a neural net of 2 inputs, 4 hidden nodes (all in same layer) and 1 output node. The binary representation works fine, but I have problems with the Bipolar. I can't figure out why, but the total error will sometimes converge to the same number around 2.xx. My sigmoid is 2/(1+ exp(-x)) - 1. Perhaps I'm sigmoiding in the wrong place. For example to calculate the output error should I be comparing the sigmoided output with the expected value or with the sigmoided expected value?
I was following this website here: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html , but they use different functions then I was instructed to use. Even when I did try to implement their functions I still ran into the same problem. Either way I get stuck about half the time at the same number (a different number for different implementations). Please tell me if I have made a mistake in my code somewhere or if this is normal (I don't see how it could be). Momentum is set to 0. Is this a common 0 momentum problem? The error functions we are supposed to be using are:
if ui is an output unit
Error(i) = (Ci - ui ) * f'(Si )
if ui is a hidden unit
Error(i) = Error(Output) * weight(i to output) * f'(Si)
public double sigmoid( double x ) {
double fBipolar, fBinary, temp;
temp = (1 + Math.exp(-x));
fBipolar = (2 / temp) - 1;
fBinary = 1 / temp;
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// Initialize the weights to random values.
private void initializeWeights(double neg, double pos) {
for(int i = 0; i < numInputs + 1; i++){
for(int j = 0; j < numHiddenNeurons; j++){
inputWeights[i][j] = Math.random() - pos;
if(inputWeights[i][j] < neg || inputWeights[i][j] > pos){
print("ERROR ");
print(inputWeights[i][j]);
}
}
}
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenWeights[i] = Math.random() - pos;
if(hiddenWeights[i] < neg || hiddenWeights[i] > pos){
print("ERROR ");
print(hiddenWeights[i]);
}
}
}
// Computes output of the NN without training. I.e. a forward pass
public double outputFor ( double[] argInputVector ) {
for(int i = 0; i < numInputs; i++){
inputs[i] = argInputVector[i];
}
double weightedSum = 0;
for(int i = 0; i < numHiddenNeurons; i++){
weightedSum = 0;
for(int j = 0; j < numInputs + 1; j++){
weightedSum += inputWeights[j][i] * inputs[j];
}
hiddenActivation[i] = sigmoid(weightedSum);
}
weightedSum = 0;
for(int j = 0; j < numHiddenNeurons + 1; j++){
weightedSum += (hiddenActivation[j] * hiddenWeights[j]);
}
return sigmoid(weightedSum);
}
//Computes the derivative of f
public static double fPrime(double u){
double fBipolar, fBinary;
fBipolar = 0.5 * (1 - Math.pow(u,2));
fBinary = u * (1 - u);
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// This method is used to update the weights of the neural net.
public double train ( double [] argInputVector, double argTargetOutput ){
double output = outputFor(argInputVector);
double lastDelta;
double outputError = (argTargetOutput - output) * fPrime(output);
if(outputError != 0){
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenError[i] = hiddenWeights[i] * outputError * fPrime(hiddenActivation[i]);
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
hiddenWeights[i] += deltaHiddenWeights[i];
}
for(int in = 0; in < numInputs + 1; in++){
for(int hid = 0; hid < numHiddenNeurons; hid++){
lastDelta = deltaInputWeights[in][hid];
deltaInputWeights[in][hid] = learningRate * hiddenError[hid] * inputs[in] + (momentum * lastDelta);
inputWeights[in][hid] += deltaInputWeights[in][hid];
}
}
}
return 0.5 * (argTargetOutput - output) * (argTargetOutput - output);
}
General coding comments:
initializeWeights(-1.0, 1.0);
may not actually get the initial values you were expecting.
initializeWeights should probably have:
inputWeights[i][j] = Math.random() * (pos - neg) + neg;
// ...
hiddenWeights[i] = (Math.random() * (pos - neg)) + neg;
instead of:
Math.random() - pos;
so that this works:
initializeWeights(0.0, 1.0);
and gives you initial values between 0.0 and 1.0 rather than between -1.0 and 0.0.
lastDelta is used before it is declared:
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
I'm not sure if the + 1 on numInputs + 1 and numHiddenNeurons + 1 are necessary.
Remember to watch out for rounding of ints: 5/2 = 2, not 2.5!
Use 5.0/2.0 instead. In general, add the .0 in your code when the output should be a double.
Most importantly, have you trained the NeuralNet long enough?
Try running it with numInputs = 2, numHiddenNeurons = 4, learningRate = 0.9, and train for 1,000 or 10,000 times.
Using numHiddenNeurons = 2 it sometimes get "stuck" when trying to solve the XOR problem.
See also XOR problem - simulation

Resilient backpropagation neural network - question about gradient

First I want to say that I'm really new to neural networks and I don't understand it very good ;)
I've made my first C# implementation of the backpropagation neural network. I've tested it using XOR and it looks it work.
Now I would like change my implementation to use resilient backpropagation (Rprop - http://en.wikipedia.org/wiki/Rprop).
The definition says: "Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight".
Could somebody tell me what partial derivative over all patterns is? And how should I compute this partial derivative for a neuron in hidden layer.
Thanks a lot
UPDATE:
My implementation base on this Java code: www_.dia.fi.upm.es/~jamartin/downloads/bpnn.java
My backPropagate method looks like this:
public double backPropagate(double[] targets)
{
double error, change;
// calculate error terms for output
double[] output_deltas = new double[outputsNumber];
for (int k = 0; k < outputsNumber; k++)
{
error = targets[k] - activationsOutputs[k];
output_deltas[k] = Dsigmoid(activationsOutputs[k]) * error;
}
// calculate error terms for hidden
double[] hidden_deltas = new double[hiddenNumber];
for (int j = 0; j < hiddenNumber; j++)
{
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + output_deltas[k] * weightsOutputs[j, k];
}
hidden_deltas[j] = Dsigmoid(activationsHidden[j]) * error;
}
//update output weights
for (int j = 0; j < hiddenNumber; j++)
{
for (int k = 0; k < outputsNumber; k++)
{
change = output_deltas[k] * activationsHidden[j];
weightsOutputs[j, k] = weightsOutputs[j, k] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumOutpus[j, k];
lastChangeWeightsForMomentumOutpus[j, k] = change;
}
}
// update input weights
for (int i = 0; i < inputsNumber; i++)
{
for (int j = 0; j < hiddenNumber; j++)
{
change = hidden_deltas[j] * activationsInputs[i];
weightsInputs[i, j] = weightsInputs[i, j] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumInputs[i, j];
lastChangeWeightsForMomentumInputs[i, j] = change;
}
}
// calculate error
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + 0.5 * (targets[k] - activationsOutputs[k]) * (targets[k] - activationsOutputs[k]);
}
return error;
}
So can I use change = hidden_deltas[j] * activationsInputs[i] variable as a gradient (partial derivative) for checking the sing?
I think the "over all patterns" simply means "in every iteration"... take a look at the RPROP paper
For the paritial derivative: you've already implemented the normal back-propagation algorithm. This is a method for efficiently calculate the gradient... there you calculate the δ values for the single neurons, which are in fact the negative ∂E/∂w values, i.e. the parital derivative of the global error as function of the weights.
so instead of multiplying the weights with these values, you take one of two constants (η+ or η-), depending on whether the sign has changed
The following is an example of a part of an implementation of the RPROP training technique in the Encog Artificial Intelligence Library. It should give you an idea of how to proceed. I would recommend downloading the entire library, because it will be easier to go through the source code in an IDE rather than through the online svn interface.
http://code.google.com/p/encog-cs/source/browse/#svn/trunk/encog-core/encog-core-cs/Neural/Networks/Training/Propagation/Resilient
http://code.google.com/p/encog-cs/source/browse/#svn/trunk
Note the code is in C#, but shouldn't be difficult to translate into another language.