loop unrolling for matrix multiplication

loop unrolling for matrix multiplication - loop-unrolling

I need to make a good implementation for matrix multiplication better than the naive method
here is the methods i used :
1- removed false dependencies which made the performance a lot better
2- used a recursive approach
and then there is something i need to try loop unrolling. The thing is each time i used it , it makes the performance worst i can't find an explanation for it
i need help here is the code
for (i = 0; i < M; i++)
for (j = 0; j < N; j++) {
double sum = 0;
#pragma unroll(5)
for (k = 0; k < K; k++)
{
sum += A[i + k*LDA] * B[k + j*LDB];
}
C[i + j*LDC] = sum ;
}

Related

Bit slicing with variable width in SystemVerilog

I am trying to access certain sections of an array using the +: operator however getting the infamous [variable] is not a constant error. The only problem is, the width I would like the get from the array is changing as well.
This is the loop I have:
logic [N-1:0] a;
logic [2**N-2:0] b;
for (i = 0; i < N; i++)
a[i] = b[(2**i)-1 +: 2**i] == {(2**i){1'b1}};
In other words, if N = 4, I want this loop to do this:
a[0] = b[0:0] == 1'b1;
a[1] = b[2:1] == 2'b11;
a[2] = b[6:3] == 4'b1111;
a[3] = b[14:7] == 8'b11111111;
Logically, I'm pretty certain that the loop I provided works however SystemVerilog doesn't allow non-constants to be used for setting the width (after the a:).
How can I utilize the +: operator when my starting index and width are both dependent on a non-constant variable? or is there another way of doing this considering that N can be a large number.
Thanks!
EDIT:
This can be done with shifts, here is a working code:
for (i = 0; i < N; i++)
a[i] <= ((b >> (2**i)-1) << ((2**N) - (2**i) - 1)) ==
{(2**N-1){1'b1}} << ((2**N) - (2**i) - 1);

You cannot use +: with variable widths. It is actually just a short-hand for shifts and masks. For example, something like the following should work in your case:
logic [N-1:0] a;
logic [2**N-2:0] b;
always_comb begin
for (int i = 0; i < N; i++) begin
logic [2**N-2:0] tmpb, tmp1;
tmpb = b >> ((2**i)+1);
tmp1 = ((2**N)'(1) << (2**i)) - 1;
a[i] = (tmpb & tmp1) == 0;
end
end
you just need to figure out exact numbers of shifts and widths.

You can use a combination of '+:` operator with a mask
parameter N = 8; localparam N2 = 2**(N-1);
logic [N-1:0] a;
logic [2**N-1:0] b;
initial begin
b ={8'b000001,4'b1111,2'b01,1'b1};
for (int i = 0; i < N; i++)
a[i] = (b[(2**i)-1 +: N2] | ~N2'((1 << 2**i)- 1)) == '1;
$displayb(a,,b);
end

You can use +: provided that the right hand side is a constant or genvar.
logic [N-1:0] a;
logic [2**N-2:0] b;
for (genvar i = 0; i < N; i++) begin : gen_a
assign a[i] = b[(2**i)-1 +: 2**i] == {(2**i){1'b1}};
end
Note that this for-loop is a generate-for-loop which is not within a procedural block (ie begin-end)

Converting and an arma::mat adjacency matrix into an igraph graph in C (Rcpp)

I use Armadillo objects in some (Rcpp) code where I work with matrices.
The matrices are adjacency matrices and I need to quickly compute the components of the underlying network and though I could do this via igraph.
But I fail already at converting the adjacency matrix into something that can be used with igraph.
#include <RcppArmadillo.h>
#include <iostream>
#include <igraph-0.7.1\include\igraph.h>
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
vec component_membership(const mat& adjacencymatrix) {
igraph_t g;
igraph_adjacency(&g,&adjacencymatrix,IGRAPH_ADJ_DIRECTED);
// here is more code that is immaterial to my problem
}
On compilation it complains
cannot convert 'const mat* {aka const arma::Mat<double>*}' to
'igraph_matrix_t*' for argument '2' to
'int igraph_adjacency(igraph_t*, igraph_matrix_t*, igraph_adjacency_t)'
I understand why that is the case: I believe igraph_matrix_t and arma::matrix must be fundamentally different data types. How can I convert, i.e., but how do i fix this easily?

As you suspected, igraph_matrix_t and arma::matrix are completely different types. The igraph documentation lists no methods that would make use of a C array for constructing an igraph_matrix_t, so I think one has to do it by hand. Something like this might work (totally untested!):
igraph_matrix_t *m;
int rc = igraph_matrix_init(m, mat.n_rows, mat.n_cols);
for (unsigned long j = 0; j < mat.n_cols; ++j)
for (unsigned long i = 0; i < mat.n_rows; ++i)
igraph_matrix_set(m, i, j, mat(i, j));

Following #Ralf_Stubner's suggestion I ended up using the following code. Not sure it is smart, I thought I'd share it anyways
void armamat_to_igraph_matrix(const mat &x_in, igraph_matrix_t *x_out) {
igraph_matrix_init(x_out, x_in.n_rows, x_in.n_cols);
for (unsigned long j = 0; j < x_in.n_cols; ++j)
for (unsigned long i = 0; i < x_in.n_rows; ++i)
igraph_matrix_set(x_out, i, j, x_in(i, j));
return;
}
void igraph_vector_to_armauvec(const igraph_vector_t *x_in, uvec &x_out) {
x_out = uvec(igraph_vector_size(x_in));
for (unsigned long j = 0; j < igraph_vector_size(x_in); ++j)
x_out(j) = igraph_vector_e(x_in,j);
return;
}
void igraph_vector_to_armavec(const igraph_vector_t *x_in, vec &x_out) {
x_out = vec(igraph_vector_size(x_in));
for (unsigned long j = 0; j < igraph_vector_size(x_in); ++j)
x_out(j) = igraph_vector_e(x_in,j);
return;
}

how to implement Softmax function for neural networks in processing 3 environment?

I tried programming a neural network in processing IDE.
I managed to do it quite well, until I tried using the MNIST handwritten digits data set. I tried the iris data set and few others from UCI machine learning repository, but when I used the MNIST data set it didn't worked. for some reason all of the outputs approached zero with time, and that caused the total error to be always equal to 1. I am almost sure that my problem is the activation function; so I tried using softmax for classification, but it wasn't very successful. I got the same results. I think maybe I should have use a different loss function, so I tried the negative log probability according to this video. the results now are the same cost value for each output neuron, and the sum of the outputs is not 1 as it should be.
Here are the functions for each part of the code that I have changed (I prefer not to share the full code because it's long and messy, and not really helpful):
softmax:
float[] softmax(float[] inputVector){
float[] result = new float[inputVector.length];
float sigma = 0;
for(int i = 0; i < inputVector.length; i++){
sigma += exp(inputVector[i]);
}
for(int i = 0; i < result.length; i++){
result[i] = exp(inputVector[i]) / sigma;
}
return result;
}
derivative of softmax:
float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
loss function:
for(int i = 0; i < outputNeuronsNumber; i++){
float tempSigma = 0;
for(int j = 0; j < outputNeuronsNumber; j++){
tempSigma += target[diffCounter2] * log(outputLayer[j]);
}
cost[i] = -tempSigma;
}
I can't see what is the problem with my code.

float[] derivativeSoftmax(float[] inputVector){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * (1 - softmax(inputVector)[i]);
}
return result;
}
I believe this is wrong, given the derivative of the softmax as defined on wikipedia.
float[] derivativeSoftmax(float[] inputVector, int k){
float[] result = new float[inputVector.length];
for(int i = 0; i < result.length; i++){
result[i] = softmax(inputVector)[i] * ((i==k ? 1 : 0) - softmax(inputVector)[k]);
}
return result;
}
You should be taking the derivative at an index with respect to some other index. The equation as you have it, which is x*(1-x) doesn't make a lot of sense. But I may be wrong.

Time complexity Analysis for loop:

why is the time complexity, O(n) instead of O(nlogn)? Wouldn't you have to multiply the complexity of outer loop with that of inner loop?
int fun(int n){
int count = 0;
for (int i = n; i > 0; i /= 2)
for (int j = 0; j < i; j++)
count += 1;
return count;
}

In the first iteration of the loop the inner loop covers half of n. The next iteration covers a quarter, then an eighth, and so forth. You can represent the coefficients by the function below. As you can see it's an infinite series that sums to one. Thus the entire function is O(n)

Resilient backpropagation neural network - question about gradient

First I want to say that I'm really new to neural networks and I don't understand it very good ;)
I've made my first C# implementation of the backpropagation neural network. I've tested it using XOR and it looks it work.
Now I would like change my implementation to use resilient backpropagation (Rprop - http://en.wikipedia.org/wiki/Rprop).
The definition says: "Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight".
Could somebody tell me what partial derivative over all patterns is? And how should I compute this partial derivative for a neuron in hidden layer.
Thanks a lot
UPDATE:
My implementation base on this Java code: www_.dia.fi.upm.es/~jamartin/downloads/bpnn.java
My backPropagate method looks like this:
public double backPropagate(double[] targets)
{
double error, change;
// calculate error terms for output
double[] output_deltas = new double[outputsNumber];
for (int k = 0; k < outputsNumber; k++)
{
error = targets[k] - activationsOutputs[k];
output_deltas[k] = Dsigmoid(activationsOutputs[k]) * error;
}
// calculate error terms for hidden
double[] hidden_deltas = new double[hiddenNumber];
for (int j = 0; j < hiddenNumber; j++)
{
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + output_deltas[k] * weightsOutputs[j, k];
}
hidden_deltas[j] = Dsigmoid(activationsHidden[j]) * error;
}
//update output weights
for (int j = 0; j < hiddenNumber; j++)
{
for (int k = 0; k < outputsNumber; k++)
{
change = output_deltas[k] * activationsHidden[j];
weightsOutputs[j, k] = weightsOutputs[j, k] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumOutpus[j, k];
lastChangeWeightsForMomentumOutpus[j, k] = change;
}
}
// update input weights
for (int i = 0; i < inputsNumber; i++)
{
for (int j = 0; j < hiddenNumber; j++)
{
change = hidden_deltas[j] * activationsInputs[i];
weightsInputs[i, j] = weightsInputs[i, j] + learningRate * change + momentumFactor * lastChangeWeightsForMomentumInputs[i, j];
lastChangeWeightsForMomentumInputs[i, j] = change;
}
}
// calculate error
error = 0.0;
for (int k = 0; k < outputsNumber; k++)
{
error = error + 0.5 * (targets[k] - activationsOutputs[k]) * (targets[k] - activationsOutputs[k]);
}
return error;
}
So can I use change = hidden_deltas[j] * activationsInputs[i] variable as a gradient (partial derivative) for checking the sing?

I think the "over all patterns" simply means "in every iteration"... take a look at the RPROP paper
For the paritial derivative: you've already implemented the normal back-propagation algorithm. This is a method for efficiently calculate the gradient... there you calculate the δ values for the single neurons, which are in fact the negative ∂E/∂w values, i.e. the parital derivative of the global error as function of the weights.
so instead of multiplying the weights with these values, you take one of two constants (η+ or η-), depending on whether the sign has changed

The following is an example of a part of an implementation of the RPROP training technique in the Encog Artificial Intelligence Library. It should give you an idea of how to proceed. I would recommend downloading the entire library, because it will be easier to go through the source code in an IDE rather than through the online svn interface.
http://code.google.com/p/encog-cs/source/browse/#svn/trunk/encog-core/encog-core-cs/Neural/Networks/Training/Propagation/Resilient
http://code.google.com/p/encog-cs/source/browse/#svn/trunk
Note the code is in C#, but shouldn't be difficult to translate into another language.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

loop unrolling for matrix multiplication - loop-unrolling

Related

Bit slicing with variable width in SystemVerilog

Converting and an arma::mat adjacency matrix into an igraph graph in C (Rcpp)

how to implement Softmax function for neural networks in processing 3 environment?

Time complexity Analysis for loop:

Resilient backpropagation neural network - question about gradient

Categories

Resources