I have been trying to learn neural networking and all the examples I saw on the internet gave examples of emulating logic gates say XOR gate. But what I want to do is create a network that can be trained to emulate functions say the x^2 or e^x. Is this possible? What changes in the network do I need to make?
Here's my code for a neural network consisting of 1 input node, one hidden layer consisting of 3 nodes, and one output node.
#include <iostream.h>
#include <fstream.h>
#include <math.h>
#include <time.h>
const double eeta=0.9;
const int n=5;
struct Net_elem
{
double weights1[3];
double weights2[3];
double bias1,bias2;
};//structure to store network paramenters
Net_elem net_elem;
double sigma(double input)
{
return 1/(1+exp(-input));
}
void show_net_elem()
{
cout.precision(15);
for(int i=0;i<3;i++)
{
cout<<"weights1["<<i<<"]="<<net_elem.weights1[i];
cout<<endl;
}
for(int i=0;i<3;i++)
{
cout<<"weights2["<<i<<"]="<<net_elem.weights2[i];
cout<<endl;
}
cout<<"bias1="<<net_elem.bias1<<" bias2="<<net_elem.bias2<<endl;
system("pause");
system("cls");
}
//function to train the network
void train(double input,double expected)
{
double Output,output[3],Delta,delta[3],delta_bias1,delta_bias2;
//Propogate forward
double sum=0;
for(int i=0;i<3;i++)
output[i]=sigma(input*net_elem.weights1[i]+net_elem.bias1);
sum=0;
for(int i=0;i<3;i++)
sum=sum+output[i]*net_elem.weights2[i];
Output=sigma(sum+net_elem.bias2);
cout<<"Output="<<Output<<endl;
//Backpropogate
Delta=expected-Output;
for(int i=0;i<3;i++)
delta[i]=net_elem.weights2[i]*Delta;
delta_bias2=net_elem.bias2*Delta;
//Update weights
for(int i=0;i<3;i++)
net_elem.weights1[i]=net_elem.weights1[i]+eeta*delta[i]*output[i]*(1-output[i])*input;
for(int i=0;i<3;i++)
net_elem.weights2[i]=net_elem.weights2[i]+eeta*Delta*Output*(1-Output)*output[i];
net_elem.bias2=net_elem.bias2+eeta*delta_bias2;
double sum1=0;
for(int i=0;i<3;i++)
sum1=sum1+net_elem.weights1[i]*delta[i];
net_elem.bias1=net_elem.bias1+eeta*sum1;
show_net_elem();
}
void test()
{
cout.precision(15);
double input,Output,output[3];
cout<<"Enter Input:";
cin>>input;
//Propogate forward
double sum=0;
for(int i=0;i<3;i++)
output[i]=sigma(input*net_elem.weights1[i]+net_elem.bias1);
for(int i=0;i<3;i++)
sum=sum+output[i]*net_elem.weights2[i];
Output=sigma(sum+net_elem.bias2);
cout<<"Output="<<Output<<endl;
}
I have tried to run it to emulate the square root function. But the output simply jumps between 0 and 1, alternating.
Main:
int main()
{
net_elem.weights1[0]=(double)(rand()%100+0)/10;
net_elem.weights1[1]=(double)(rand()%100+0)/10;
net_elem.weights1[2]=(double)(rand()%100+0)/10;
net_elem.weights2[0]=(double)(rand()%100+0)/10;
net_elem.weights2[1]=(double)(rand()%100+0)/10;
net_elem.weights2[2]=(double)(rand()%100+0)/10;;
net_elem.bias1=(double)(rand()%100+0)/10;
net_elem.bias2=(double)(rand()%100+0)/10;
double output[n],input[n];
int ch;
for(int i=1;i<n;i++)
{
input[i]=100;
output[i]=sqrt(input[i]);
}
do
{
cout<<endl<<"1. Train"<<endl;
cout<<"2. Test"<<endl;
cout<<"3. Exit"<<endl;
cin>>ch;
switch(ch)
{
case 1:for(int i=1;i<n;i++)
{
train(input[i],output[i]);
}
break;
case 2:test();break;
case 3:break;
default:cout<<"Enter Proper Choice"<<endl;
}
}while(ch!=3);
}
I think you are missing the point of using a neural network. Neural networks don't imitate known functions. They separate areas in an unknown vector space. The XOR problem is often given as an example, because it is the minimal non-linearly separable problem: A simple perceptron is simply a line separating two areas in you problem
In this case, the blue dots can be separated from the red dots using a simple line (the problem is linearly separable). However, in the XOR problem, the dots are situated like this:
Here, a single line (a perceptron) is not enough. However, a multi-layer perceptron (most probably the type of neural network you are using) can use multiple perceptrons, (in this case two) to separate the blue and red dots. In a similar manner, a neural network can separate any space.
However, the XOR problem produces two types of output, and we use a neural network to separate them. On the other hand, x^2 produces a continuous lines of points, so there's nothing to separate. Also, keep in mind that imitating the XOR function is given as an example of such problems. In practice, nobody ever uses a neural network to replace the XOR function. If you want to use a function, just use it, instead of building something that approximates it.
PS: If you still want to emulate the x^2 function for practice, you need regression. What you are doing is classification (since you are using a sigma function in you output). However, for practicing you'd better stick with classification problems. They are by far more common. Also, for such problems try Matlab, or, if you want to write in C++ use some linear algebra library (eg EIGEN 3) to make it easier writing without a thousand for loops.
Related
I have a model for a mining problem. I am working on adding into the model to use the shortest path inside a mine(open pit) for hauling ore and waste. For this, I was thinking of Dijkstra's algorithm. I could not find any example of the use of Dijkstra's algorithm in OPL. Has anyone done it before and can you share some ideas, please.
if you need to write Dijsktra's algorithm then Daniel is right and you d rather use the scripting part. Now if you need a shortest path within an existing OPL model you could use the following shortest path example:
.mod
tuple edge
{
key int o;
key int d;
int weight;
}
{edge} edges=...;
{int} nodes={i.o | i in edges} union {i.d | i in edges};
int st=1; // start
int en=8; // end
dvar int obj; // distance
dvar boolean x[edges]; // do we use that edge ?
minimize obj;
subject to
{
obj==sum(e in edges) x[e]*e.weight;
forall(i in nodes)
sum(e in edges:e.o==i) x[e]
-sum(e in edges:e.d==i) x[e]
==
((i==st)?1:((i==en)?(-1):0));
}
{edge} shortestPath={e | e in edges : x[e]==1};
execute
{
writeln(shortestPath);
}
.dat
edges=
{
<1,2,9>,
<1,3,9>,
<1,4,8>,
<1,10,18>,
<2,3,3>,
<2,6,6>,
<3,4,9>,
<3,5,2>,
<3,6,2>,
<4,5,8>,
<4,7,7>,
<4,9,9>,
<4,10,10>,
<5,6,2>,
<5,7,9>,
<6,7,9>,
<7,8,4>,
<7,9,5>,
<8,9,1>,
<8,10,4>,
<9,10,3>,
};
which gives
// solution (optimal) with objective 19
{<1 4 8> <4 7 7> <7 8 4>}
If you have a problem that can be solved using Dijkstra's algorithm then it seems a bit of overkill to use OPL or CPLEX to solve it. You could code up the algorithm in any programming language and use it from there. I guess that is why you don't find any examples.
If you still want to implement in OPL then use a scripting (execute) or a main block. The scripting code you can provide there is a superset of JavaScript, so you can implement Dijkstra's algorithm in JavaScript and put it there.
I am calculating the intersection point of two lines given in the polar coordinate system:
typedef ap_fixed<16,3,AP_RND> t_lines_angle;
typedef ap_fixed<16,14,AP_RND> t_lines_rho;
bool get_intersection(
hls::Polar_< t_lines_angle, t_lines_rho>* lineOne,
hls::Polar_< t_lines_angle, t_lines_rho>* lineTwo,
Point* point)
{
float angleL1 = lineOne->angle.to_float();
float angleL2 = lineTwo->angle.to_float();
t_lines_angle rhoL1 = lineOne->rho.to_float();
t_lines_angle rhoL2 = lineTwo->rho.to_float();
t_lines_angle ct1=cosf(angleL1);
t_lines_angle st1=sinf(angleL1);
t_lines_angle ct2=cosf(angleL2);
t_lines_angle st2=sinf(angleL2);
t_lines_angle d=ct1*st2-st1*ct2;
// we make sure that the lines intersect
// which means that parallel lines are not possible
point->X = (int)((st2*rhoL1-st1*rhoL2)/d);
point->Y = (int)((-ct2*rhoL1+ct1*rhoL2)/d);
return true;
}
After synthesis for our FPGA I saw that the 4 implementations of the float sine (and cos) take 4800 LUTs per implementation, which sums up to 19000 LUTs for these 4 functions. I want to reduce the LUT count by using a fixed point sine. I already found a implementation of CORDIC but I am not sure how to use it. The input of the function is an integer but i have a ap_fixed datatype. How can I map this ap_fixed to integer? and how can I map my 3.13 fixed point to the required 2.14 fixed point?
With the help of one of my colleagues I figured out a quite easy solution that does not require any hand written implementations or manipulation of the fixed point data:
use #include "hls_math.h" and the hls::sinf() and hls::cosf() functions.
It is important to say that the input of the functions should be ap_fixed<32, I> where I <= 32. The output of the functions can be assigned to different types e.g., ap_fixed<16, I>
Example:
void CalculateSomeTrig(ap_fixed<16,5>* angle, ap_fixed<16,5>* output)
{
ap_fixed<32,5> functionInput = *angle;
*output = hls::sinf(functionInput);
}
LUT consumption:
In my case the consumption of LUT was reduced to 400 LUTs for each implementation of the function.
You can use bit-slicing to get the fraction and the integer parts of the ap_fixed variable, and then manipulate them to get the new ap_fixed. Perhaps something like:
constexpr int max(int a, int b) { return a > b ? a : b; }
template <int W2, int I2, int W1, int I1>
ap_fixed<W2, I2> convert(ap_fixed<W1, I1> f)
{
// Read fraction part as integer:
ap_fixed<max(W2, W1) + 1, max(I2, I1) + 1> result = f(W1 - I1 - 1, 0);
// Shift by the original number of bits in the fraction part
result >>= W1 - I1;
// Add the integer part
result += f(W1 - 1, W1 - I1);
return result;
}
I haven't tested this code well, so take it with a grain of salt.
A question/problem for anyone experienced with Xilinx Vivado HLS and FPGA design:
I need help reducing the utilization numbers of a design within the confines of HLS (i.e. can't just redo the design in an HDL). I am targeting the Zedboard (Zynq 7020).
I'm trying to implement 2048-bit RSA in HLS, using the Tenca-koc multiple-word radix 2 montgomery multiplication algorithm, shown below (More algorithm details here):
I wrote this algorithm in HLS and it works in simulation and in C/RTL cosim. My algorithm is here:
#define MWR2MM_m 2048 // Bit-length of operands
#define MWR2MM_w 8 // word size
#define MWR2MM_e 257 // number of words per operand
// Type definitions
typedef ap_uint<1> bit_t; // 1-bit scan
typedef ap_uint< MWR2MM_w > word_t; // 8-bit words
typedef ap_uint< MWR2MM_m > rsaSize_t; // m-bit operand size
/*
* Multiple-word radix 2 montgomery multiplication using carry-propagate adder
*/
void mwr2mm_cpa(rsaSize_t X, rsaSize_t Yin, rsaSize_t Min, rsaSize_t* out)
{
// extend operands to 2 extra words of 0
ap_uint<MWR2MM_m + 2*MWR2MM_w> Y = Yin;
ap_uint<MWR2MM_m + 2*MWR2MM_w> M = Min;
ap_uint<MWR2MM_m + 2*MWR2MM_w> S = 0;
ap_uint<2> C = 0; // two carry bits
bit_t qi = 0; // an intermediate result bit
// Store concatenations in a temporary variable to eliminate HLS compiler warnings about shift count
ap_uint<MWR2MM_w> temp_concat=0;
// scan X bit-by bit
for (int i=0; i<MWR2MM_m; i++)
{
qi = (X[i]*Y[0]) xor S[0];
// C gets top two bits of temp_concat, j'th word of S gets bottom 8 bits of temp_concat
temp_concat = X[i]*Y.range(MWR2MM_w-1,0) + qi*M.range(MWR2MM_w-1,0) + S.range(MWR2MM_w-1,0);
C = temp_concat.range(9,8);
S.range(MWR2MM_w-1,0) = temp_concat.range(7,0);
// scan Y and M word-by word, for each bit of X
for (int j=1; j<=MWR2MM_e; j++)
{
temp_concat = C + X[i]*Y.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + qi*M.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j);
C = temp_concat.range(9,8);
S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) = temp_concat.range(7,0);
S.range(MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)) = (S.bit(MWR2MM_w*j), S.range( MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)+1));
}
S.range(S.length()-1, S.length()-MWR2MM_w) = 0;
C=0;
}
// if final partial sum is greater than the modulus, bring it back to proper range
if (S >= M)
S -= M;
*out = S;
}
Unfortunately, the LUT utilization is huge.
This is problematic because I need to be able to fit multiple of these blocks in hardware as axi4-lite slaves.
Could someone please provide a few suggestions as to how I can reduce the LUT utilization, WITHIN THE CONFINES OF HLS?
I've already tried the following:
Experimenting with different word lengths
switching the top level inputs to arrays so they are BRAM (i.e. not using ap_uint<2048>, but instead ap_uint foo[MWR2MM_e])
Experimenting with all sorts of directives: compartmentalizing into multiple inline functions, dataflow architecture, resource limits on lshr, etc.
However, nothing really drives the LUT utilization down in a meaningful way. Is there a glaringly obvious way that I could reduce the utilization that is apparent to anyone?
In particular, I've seen papers on implementations of the mwr2mm algorithm that (only use one DSP block and one BRAM). Is this even worth attempting to implement using HLS? Or is there no way that I can actually control the resources that the algorithm is mapped to without describing it in HDL?
Thanks for the help.
I have implemented back propagation algorithm to train my neural network. It solves AND & OR perfectly, but when I try to train to solve XOR, the total error is really high.
The network topology for XOR network is : 2 neurons at input layer, 2 neurons at the hidden layer, and one neuron at the output layer.
I'm using sigmoid as my activation function, and weighted sum as input.
Here is the part of my code responsible for back propagation:
protected void updateOutputLayer(double[] outputErr)
{
double delta;
Neuron neuron;
double errorDerivative;
for ( int i=0;i<this.getNeuralNetwork().getOutputLayer().getSize();i++)
{
neuron=this.getNeuralNetwork().getOutputLayer().getAt(i);
errorDerivative=neuron.getTransferFunction().getDerivative(neuron.getNetInput());
delta=outputErr[i]*errorDerivative;
neuron.setDelta(roundThreeDecimals(delta));
// now update the weights
this.updateNeuronWeights(neuron);
}
}
protected void updateHiddenLayerNeurons()
{
List<Layer> layers=this.network.getLayers();
Layer currentLayer;
double neuronErr;
for ( int i=layers.size()-2;i>0;i--)
{
currentLayer= layers.get(i);
for (int j=0;j<currentLayer.getSize();j++)
{
neuronErr=calculateHiddenLayerError(currentLayer.getAt(j));
currentLayer.getAt(j).setDelta(neuronErr);
this.updateNeuronWeights(currentLayer.getAt(j));
}
}
//System.out.println("*****************************************");
}
protected double calculateHiddenLayerError(Neuron node)
{
List<Connection> outputCon= node.getOutputConnections();
double errFactor=0;
for (Connection outputCon1 : outputCon) {
//System.out.println("output od dst: "+outputCon1.getDst().getOutput());
// System.out.println("w dst: "+outputCon1.getWeight());
//System.out.println("in CalcErr Factor err: "+outputCon.get(i).getDst().getError()+" w: "+outputCon.get(i).getWeight());
errFactor += outputCon1.getDst().getDelta() * outputCon1.getWeight();
}
double derivative= node.getTransferFunction().getDerivative(node.getNetInput());
return roundThreeDecimals(derivative*errFactor);
}
public void updateNeuronWeights(Neuron neuron)
{
double weightChange;
double input, error;
for (Connection con: neuron.getInConnections())
{
input=con.getInput();
// System.out.println("input: "+input);
error = neuron.getDelta();
weightChange=this.learningRate*error*input;// error here is : output error * error derivative
con.setWeight(roundThreeDecimals(con.getWeight()+weightChange));
}
// now update bias
if(neuron.isBiasUsed())
{
//System.out.println("old bias: "+neuron.getBias());
double biasChange=neuron.getBias()+neuron.getDelta()*this.learningRate;
//System.out.println("new bias: "+biasChange);
neuron.setBias(roundThreeDecimals(biasChange));
}
}
I'm using a learning rate in the range [0.01,0.5]. Can anyone tell me what is wrong with my code?
TL;DR: You should update the bias with retropropagation in the very same way that weights are learned.
For sure, the bias plays a big role in learning the XOR compared to the OR or the AND (see Why is a bias neuron necessary for a backpropagating neural network that recognizes the XOR operator? ). Hence, the bias might be the culprit.
You say I'm using sigmoid as my activation function, and weighted sum as input. You need a bias than can be learned in the very same way that weights are learned.
Note: the bias shall be added in the summation, before applying the activation function.
So I'm trying to write a simple genetic algorithm for solving a sudoku (not the most efficient way, I know, but it's just to practice evolutionary algorithms). I'm having some problems coming up with an efficient evaluation function to test if the puzzle is solved or not and how many errors there are. My first instinct would be to check if each row and column of the matrix (doing it in octave, which is similar to matlab) have unique elements by ordering them, checking for duplicates and then putting them back the way they were, which seems long winded. Any thoughts?
Sorry if this has been asked before...
Speedups:
Use bitwise operations instead of sorting.
I made 100 line sudoku solver in c it reasonably fast. For or super speed you need to implement DLX algorhitm, there is also some file on matlab exchange for that.
http://en.wikipedia.org/wiki/Exact_cover
http://en.wikipedia.org/wiki/Dancing_Links
http://en.wikipedia.org/wiki/Knuth's_Algorithm_X
#include "stdio.h"
int rec_sudoku(int (&mat)[9][9],int depth)
{
int sol[9][9][10]; //for eliminating
if(depth == 0) return 1;
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
sol[i][j][9]=9;
for(int k=0;k<9;k++)
{
if(mat[i][j]) sol[i][j][k]=0;
else sol[i][j][k]=1;
}
}
}
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
if(mat[i][j] == 0) continue;
for(int k=0;k<9;k++)
{
if(sol[i][k][mat[i][j]-1])
{
if(--sol[i][k][9]==0) return 0;
sol[i][k][mat[i][j]-1]=0;
}
if(sol[k][j][mat[i][j]-1])
{
if(--sol[k][j][9]==0) return 0;
sol[k][j][mat[i][j]-1]=0;
}
}
for(int k=(i/3)*3;k<(i/3+1)*3;k++)
{
for(int kk=(j/3)*3;kk<(j/3+1)*3;kk++)
{
if(sol[k][kk][mat[i][j]-1])
{
if(--sol[k][kk][9]==0) return 0;
sol[k][kk][mat[i][j]-1]=0;
}
}
}
}
}
for(int c=1;c<=9;c++)
{
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
if(sol[i][j][9] != c) continue;
for(int k=0;k<9;k++)
{
if(sol[i][j][k] != 1) continue;
mat[i][j]=k+1;
if(rec_sudoku(mat,depth-1)) return 1;
mat[i][j]=0;
}
return 0;
}
}
}
return 0;
}
int main(void)
{
int matrix[9][9] =
{
{1,0,0,0,0,7,0,9,0},
{0,3,0,0,2,0,0,0,8},
{0,0,9,6,0,0,5,0,0},
{0,0,5,3,0,0,9,0,0},
{0,1,0,0,8,0,0,0,2},
{6,0,0,0,0,4,0,0,0},
{3,0,0,0,0,0,0,1,0},
{0,4,0,0,0,0,0,0,7},
{0,0,7,0,0,0,3,0,0}
};
int d=0;
for(int i=0;i<9;i++) for(int j=0;j<9;j++) if(matrix[i][j] == 0) d++;
if(rec_sudoku(matrix,d)==0)
{
printf("no solution");
return 0;
}
for(int i=0;i<9;i++)
{
for(int j=0;j<9;j++)
{
printf("%i ",matrix[i][j]);
}
printf("\n");
}
return 1;
}
The check is easy, you'll create sets for rows, columns, and 3x3's adding a number if it does not exist and altering your fitness accordingly if it does not.
The real trick however is "altering your fitness" accordingly. Some problems seem well suited to GA and ES (evolution strategies), that is we look for a solution in tolerance, sudoku has an exact answer... tricky.
My first crack would probably be creating solutions with variable length chromosomes (well they could be fixed length but 9x9's with blanks). The fitness function should be able to determine which part of the solution is guaranteed and which part is not (sometimes you must take a guess in the dark in a really tough sudoku game and then back track if it does not work out), it would be a good idea to create children for each possible branch.
This then is a recursive solution. However you could start scanning from different positions on the board. Recombination would combine solutions which combine unverified portions which have overlapping solutions.
Just thinking about it in this high level easy going fashion I can see how mind bending this will be to implement!
Mutation would only be applied when there is more than one path to take, after all a mutation is a kind of guess.
Sounds good, except for the 'putting them back' part. You can just put the numbers from any line, column or square in the puzzle in a list and check for doubles any way you want. If there are doubles, there is an error. If all numbers are unique there's not. You don't need to take the actual numbers out of the puzzle, so there is no need for putting them back either.
Besides, if you're writing a solver, it should not make any invalid move, so this check would not be needed at all.
I would use the grid's numbers as an index, and increment an 9 elements length array's respective element => s_array[x]++ where x is the number taken from the grid.
Each and every element must be 1 in the array at the end of checking one row. If 0 occurs somewhere in the array, that line is wrong.
However this is just a simple sanity check if there are no problems, line-wise.
PS: if it were 10 years ago, I would suggest an assembly solution with bit manipulation (1st bit, 2nd bit, 3rd bit, etc. for the values 1,2 or 3) and check if the result is 2^10-1.
When I solved this problem, I just counted the number of duplicates in each row, column and sub-grid (in fact I only had to count duplicates in columns and sub-grids as my evolutionary operators were designed never to introduce duplicates into rows). I just used a HashSet to detect duplicates. There are faster ways but this was quick enough for me.
You can see this visualised in my Java applet (if it's too fast, increase the population size to slow it down). The coloured squares are duplicates. Yellow squares conflict with one other square, orange with two other squares and red with three or more.
Here is my solution. Sudoku solving solution in C++
Here is my solution using set. If for a line, a block or a column you get a set length of (let say) 7, your fitness would be 9 - 7.
If you are operating on a small set of integers sorting can be done in O(n) using bucket sorting.
You can use tmp arrays to do this task in matlab:
function tf = checkSubSet( board, sel )
%
% given a 9x9 board and a selection (using logical 9x9 sel matrix)
% verify that board(sel) has 9 unique elements
%
% assumptions made:
% - board is 9x9 with numbers 1,2,...,9
% - sel has only 9 "true" entries: nnz(sel) = 9
%
tmp = zeros(1,9);
tmp( board( sel ) ) = 1; % poor man's bucket sorting
tf = all( tmp == 1 ) && nnz(sel) == 9 && numel(tmp) == 9; % check validity
Now we can use checkSubSet to verify the board is correct
function isCorrect = checkSudokuBoard( board )
%
% assuming board is 9x9 matrix with entries 1,2,...,9
%
isCorrect = true;
% check rows and columns
for ii = 1:9
sel = false( 9 );
sel(:,ii) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
sel = false( 9 );
sel( ii, : ) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
end
% check all 3x3
for ii=1:3:9
for jj=1:3:9
sel = false( 9 );
sel( ii + (0:2) , jj + (0:2) ) = true;
isCorrect = checkSubSet( board, sel );
if ~isCorrect
return;
end
end
end