How to solve a linear system of matrices in scala breeze? - scala

How to solve a linear system of matrices in scala breeze? ie, I have Ax = b, where A is a matrix (usually positive definite), and x and b are vectors.
I can see that there is a cholesky decomposition available, but I couldn't seem to find a solver? (if it was matlab I could do x = b \ A. If it was scipy I could do x = A.solve(b) )

Apparently, it is quite simple in fact, and built into scala-breeze as an operator:
x = A \ b
It doesnt use Cholesky, it uses LU decomposition, which is I think about half as fast, but they are both O(n^3), so comparable.

Well, I wrote my own solver in the end. I'm not sure if this is the optimal way to do it, but it doesn't seem unreasonable? :
// Copyright Hugh Perkins 2012
// You can use this under the terms of the Apache Public License 2.0
// http://www.apache.org/licenses/LICENSE-2.0
package root
import breeze.linalg._
object Solver {
// solve Ax = b, for x, where A = choleskyMatrix * choleskyMatrix.t
// choleskyMatrix should be lower triangular
def solve( choleskyMatrix: DenseMatrix[Double], b: DenseVector[Double] ) : DenseVector[Double] = {
val C = choleskyMatrix
val size = C.rows
if( C.rows != C.cols ) {
// throw exception or something
}
if( b.length != size ) {
// throw exception or something
}
// first we solve C * y = b
// (then we will solve C.t * x = y)
val y = DenseVector.zeros[Double](size)
// now we just work our way down from the top of the lower triangular matrix
for( i <- 0 until size ) {
var sum = 0.
for( j <- 0 until i ) {
sum += C(i,j) * y(j)
}
y(i) = ( b(i) - sum ) / C(i,i)
}
// now calculate x
val x = DenseVector.zeros[Double](size)
val Ct = C.t
// work up from bottom this time
for( i <- size -1 to 0 by -1 ) {
var sum = 0.
for( j <- i + 1 until size ) {
sum += Ct(i,j) * x(j)
}
x(i) = ( y(i) - sum ) / Ct(i,i)
}
x
}
}

Related

Fast CVX solvers in Matlab

I am wondering what is the fastest convex optimizer in Matlab or is there any way to speed up current solvers? I'm using CVX, but it's taking forever to solve the optimization problem I have.
The optimization I have is to solve
minimize norm(Ax-b, 2)
subject to
x >= 0
and x d <= delta
where the size of A and b are very large.
Is there any way that I can solve this by a least square solver and then transfer it to the constraint version to make it faster?
I'm not sure what x.d <= delta means, but I'll just assume it's supposed to be x <= delta.
You can solve this problem using the projected gradient method or an accelerated projected gradient method (which is just a slight modification of the projected gradient method, which "magically" converges much faster). Here is some python code that shows how to minimize .5|| Ax - b ||^2 subject to the constraint that 0 <= x <= delta using FISTA, which is an accelerated projected gradient method. More details about the projected gradient method and FISTA can be found for example in Boyd's manuscript on proximal algorithms.
import numpy as np
import matplotlib.pyplot as plt
def fista(gradf,proxg,evalf,evalg,x0,params):
# This code does FISTA with line search
maxIter = params['maxIter']
t = params['stepSize'] # Initial step size
showTrigger = params['showTrigger']
increaseFactor = 1.25
decreaseFactor = .5
costs = np.zeros((maxIter,1))
xkm1 = np.copy(x0)
vkm1 = np.copy(x0)
for k in np.arange(1,maxIter+1,dtype = np.double):
costs[k-1] = evalf(xkm1) + evalg(xkm1)
if k % showTrigger == 0:
print "Iteration: " + str(k) + " cost: " + str(costs[k-1])
t = increaseFactor*t
acceptFlag = False
while acceptFlag == False:
if k == 1:
theta = 1
else:
a = tkm1
b = t*(thetakm1**2)
c = -t*(thetakm1**2)
theta = (-b + np.sqrt(b**2 - 4*a*c))/(2*a)
y = (1 - theta)*xkm1 + theta*vkm1
(gradf_y,fy) = gradf(y)
x = proxg(y - t*gradf_y,t)
fx = evalf(x)
if fx <= fy + np.vdot(gradf_y,x - y) + (.5/t)*np.sum((x - y)**2):
acceptFlag = True
else:
t = decreaseFactor*t
tkm1 = t
thetakm1 = theta
vkm1 = xkm1 + (1/theta)*(x - xkm1)
xkm1 = x
return (xkm1,costs)
if __name__ == '__main__':
delta = 5.0
numRows = 300
numCols = 50
A = np.random.randn(numRows,numCols)
ATrans = np.transpose(A)
xTrue = delta*np.random.rand(numCols,1)
b = np.dot(A,xTrue)
noise = .1*np.random.randn(numRows,1)
b = b + noise
def evalf(x):
AxMinusb = np.dot(A, x) - b
val = .5 * np.sum(AxMinusb ** 2)
return val
def gradf(x):
AxMinusb = np.dot(A, x) - b
grad = np.dot(ATrans, AxMinusb)
val = .5 * np.sum(AxMinusb ** 2)
return (grad, val)
def evalg(x):
return 0.0
def proxg(x,t):
return np.maximum(np.minimum(x,delta),0.0)
x0 = np.zeros((numCols,1))
params = {'maxIter': 500, 'stepSize': 1.0, 'showTrigger': 5}
(x,costs) = fista(gradf,proxg,evalf,evalg,x0,params)
plt.figure()
plt.plot(x)
plt.plot(xTrue)
plt.figure()
plt.semilogy(costs)

Best Way to Add 3 Numbers (or 4, or N) in Java - Kahan Sums?

I found a completely different answer to this question, the whole original question makes no sense anymore. However, the answer way be useful, so I modify it a bit...
I want to sum up three double numbers, say a, b, and c, in the most numerically stable way possible.
I think using a Kahan Sum would be the way to go.
However, a strange thought occured to me: Would it make sense to:
First sum up a, b, and c and remember the (absolute value of the) compensation.
Then sum up a, c, b
If the (absolute value of the) compensation of the second sum is smaller, use this sum instead.
Proceed similar with b, a, c and other permutations of the numbers.
Return the sum with the smallest associated absolute compensation.
Would I get a more "stable" Addition of three numbers this way? Or does the order of numbers in the sum have no (use-able) impact on the compensation left at the end of the Summation? With (use-able) I mean to ask whether the compensation value itself is stable enough to contain Information that I can use?
(I am using the Java programming language, although I think this does not matter here.)
Many thanks,
Thomas.
I think I found a much more reliable way to solve the "Add 3" (or "Add 4" or "Add N" numbers problem.
First of all, I implemented my idea from the original post. It resulted into quite some big code which seemed, initially, to work. However, it failed in the following case: add Double.MAX_VALUE, 1, and -Double.MAX_VALUE. The result was 0.
#njuffa's comments inspired me dig somewhat deeper and at http://code.activestate.com/recipes/393090-binary-floating-point-summation-accurate-to-full-p/, I found that in Python, this problem has been solved quite nicely. To see the full code, I downloaded the Python source (Python 3.5.1rc1 - 2015-11-23) from https://www.python.org/getit/source/, where we can find the following method (under PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2):
static PyObject*
math_fsum(PyObject *self, PyObject *seq)
{
PyObject *item, *iter, *sum = NULL;
Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
double x, y, t, ps[NUM_PARTIALS], *p = ps;
double xsave, special_sum = 0.0, inf_sum = 0.0;
volatile double hi, yr, lo;
iter = PyObject_GetIter(seq);
if (iter == NULL)
return NULL;
PyFPE_START_PROTECT("fsum", Py_DECREF(iter); return NULL)
for(;;) { /* for x in iterable */
assert(0 <= n && n <= m);
assert((m == NUM_PARTIALS && p == ps) ||
(m > NUM_PARTIALS && p != NULL));
item = PyIter_Next(iter);
if (item == NULL) {
if (PyErr_Occurred())
goto _fsum_error;
break;
}
x = PyFloat_AsDouble(item);
Py_DECREF(item);
if (PyErr_Occurred())
goto _fsum_error;
xsave = x;
for (i = j = 0; j < n; j++) { /* for y in partials */
y = p[j];
if (fabs(x) < fabs(y)) {
t = x; x = y; y = t;
}
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0.0)
p[i++] = lo;
x = hi;
}
n = i; /* ps[i:] = [x] */
if (x != 0.0) {
if (! Py_IS_FINITE(x)) {
/* a nonfinite x could arise either as
a result of intermediate overflow, or
as a result of a nan or inf in the
summands */
if (Py_IS_FINITE(xsave)) {
PyErr_SetString(PyExc_OverflowError,
"intermediate overflow in fsum");
goto _fsum_error;
}
if (Py_IS_INFINITY(xsave))
inf_sum += xsave;
special_sum += xsave;
/* reset partials */
n = 0;
}
else if (n >= m && _fsum_realloc(&p, n, ps, &m))
goto _fsum_error;
else
p[n++] = x;
}
}
if (special_sum != 0.0) {
if (Py_IS_NAN(inf_sum))
PyErr_SetString(PyExc_ValueError,
"-inf + inf in fsum");
else
sum = PyFloat_FromDouble(special_sum);
goto _fsum_error;
}
hi = 0.0;
if (n > 0) {
hi = p[--n];
/* sum_exact(ps, hi) from the top, stop when the sum becomes
inexact. */
while (n > 0) {
x = hi;
y = p[--n];
assert(fabs(y) < fabs(x));
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0.0)
break;
}
/* Make half-even rounding work across multiple partials.
Needed so that sum([1e-16, 1, 1e16]) will round-up the last
digit to two instead of down to zero (the 1e-16 makes the 1
slightly closer to two). With a potential 1 ULP rounding
error fixed-up, math.fsum() can guarantee commutativity. */
if (n > 0 && ((lo < 0.0 && p[n-1] < 0.0) ||
(lo > 0.0 && p[n-1] > 0.0))) {
y = lo * 2.0;
x = hi + y;
yr = x - hi;
if (y == yr)
hi = x;
}
}
sum = PyFloat_FromDouble(hi);
_fsum_error:
PyFPE_END_PROTECT(hi)
Py_DECREF(iter);
if (p != ps)
PyMem_Free(p);
return sum;
}
This summation method is different from Kahan's method, it uses a variable number of compensation variables. When adding the ith number, at most i additional compensation variables (stored in the array p) get used. This means if I want to add 3 numbers, I may need 3 additional variables. For 4 numbers, I may need 4 additional variables. Since the number of used variables may increase from n to n+1 only after the nth summand is loaded, I can translate the above code to Java as follows:
/**
* Compute the exact sum of the values in the given array
* {#code summands} while destroying the contents of said array.
*
* #param summands
* the summand array – will be summed up and destroyed
* #return the accurate sum of the elements of {#code summands}
*/
private static final double __destructiveSum(final double[] summands) {
int i, j, n;
double x, y, t, xsave, hi, yr, lo;
boolean ninf, pinf;
n = 0;
lo = 0d;
ninf = pinf = false;
for (double summand : summands) {
xsave = summand;
for (i = j = 0; j < n; j++) {
y = summands[j];
if (Math.abs(summand) < Math.abs(y)) {
t = summand;
summand = y;
y = t;
}
hi = summand + y;
yr = hi - summand;
lo = y - yr;
if (lo != 0.0) {
summands[i++] = lo;
}
summand = hi;
}
n = i; /* ps[i:] = [summand] */
if (summand != 0d) {
if ((summand > Double.NEGATIVE_INFINITY)
&& (summand < Double.POSITIVE_INFINITY)) {
summands[n++] = summand;// all finite, good, continue
} else {
if (xsave <= Double.NEGATIVE_INFINITY) {
if (pinf) {
return Double.NaN;
}
ninf = true;
} else {
if (xsave >= Double.POSITIVE_INFINITY) {
if (ninf) {
return Double.NaN;
}
pinf = true;
} else {
return Double.NaN;
}
}
n = 0;
}
}
}
if (pinf) {
return Double.POSITIVE_INFINITY;
}
if (ninf) {
return Double.NEGATIVE_INFINITY;
}
hi = 0d;
if (n > 0) {
hi = summands[--n];
/*
* sum_exact(ps, hi) from the top, stop when the sum becomes inexact.
*/
while (n > 0) {
x = hi;
y = summands[--n];
hi = x + y;
yr = hi - x;
lo = y - yr;
if (lo != 0d) {
break;
}
}
/*
* Make half-even rounding work across multiple partials. Needed so
* that sum([1e-16, 1, 1e16]) will round-up the last digit to two
* instead of down to zero (the 1e-16 makes the 1 slightly closer to
* two). With a potential 1 ULP rounding error fixed-up, math.fsum()
* can guarantee commutativity.
*/
if ((n > 0) && (((lo < 0d) && (summands[n - 1] < 0d)) || //
((lo > 0d) && (summands[n - 1] > 0d)))) {
y = lo * 2d;
x = hi + y;
yr = x - hi;
if (y == yr) {
hi = x;
}
}
}
return hi;
}
This function will take the array summands and add up the elements while simultaneously using it to store the compensation variables. Since we load the summand at index i before the array element at said index may become used for compensation, this will work.
Since the array will be small if the number of variables to add is small and won't escape the scope of our method, I think there is a decent chance that it will be allocated directly on the stack by the JIT, which may make the code quite fast.
I admit that I did not fully understand why the authors of the original code handled infinities, overflows, and NaNs the way they did. Here my code deviates from the original. (I hope I did not mess it up.)
Either way, I can now sum up 3, 4, or n double numbers by doing:
public static final double add3(final double x0, final double x1,
final double x2) {
return __destructiveSum(new double[] { x0, x1, x2 });
}
public static final double add4(final double x0, final double x1,
final double x2, final double x3) {
return __destructiveSum(new double[] { x0, x1, x2, x3 });
}
If I want to sum up 3 or 4 long numbers and obtain the precise result as double, I will have to deal with the fact that doubles can only represent longs in -9007199254740992..9007199254740992L. But this can easily be done by splitting each long into two parts:
public static final long add3(final long x0, final long x1,
final long x2) {
double lx;
return __destructiveSum(new long[] {new double[] { //
lx = x0, //
(x0 - ((long) lx)), //
lx = x1, //
(x1 - ((long) lx)), //
lx = x2, //
(x2 - ((long) lx)), //
});
}
public static final long add4(final long x0, final long x1,
final long x2, final long x3) {
double lx;
return __destructiveSum(new long[] {new double[] { //
lx = x0, //
(x0 - ((long) lx)), //
lx = x1, //
(x1 - ((long) lx)), //
lx = x2, //
(x2 - ((long) lx)), //
lx = x3, //
(x3 - ((long) lx)), //
});
}
I think this should be about right. At least I can now add Double.MAX_VALUE, 1, and -Double.MAX_VALUE and get 1 as result.

How to implement the Softmax derivative independently from any loss function?

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output layers just becomes the product of the loss derivative and the activation derivative.
However, I failed to implement the derivative of the Softmax activation function independently from any loss function. Due to the normalization i.e. the denominator in the equation, changing a single input activation changes all output activations and not just one.
Here is my Softmax implementation where the derivative fails the gradient checking by about 1%. How can I implement the Softmax derivative so that it can be combined with any loss function?
import numpy as np
class Softmax:
def compute(self, incoming):
exps = np.exp(incoming)
return exps / exps.sum()
def delta(self, incoming, outgoing):
exps = np.exp(incoming)
others = exps.sum() - exps
return 1 / (2 + exps / others + others / exps)
activation = Softmax()
cost = SquaredError()
outgoing = activation.compute(incoming)
delta_output_layer = activation.delta(incoming) * cost.delta(outgoing)
Mathematically, the derivative of Softmax σ(j) with respect to the logit Zi (for example, Wi*X) is
where the red delta is a Kronecker delta.
If you implement iteratively:
def softmax_grad(s):
# input s is softmax value of the original input x. Its shape is (1,n)
# i.e. s = np.array([0.3,0.7]), x = np.array([0,1])
# make the matrix whose size is n^2.
jacobian_m = np.diag(s)
for i in range(len(jacobian_m)):
for j in range(len(jacobian_m)):
if i == j:
jacobian_m[i][j] = s[i] * (1 - s[i])
else:
jacobian_m[i][j] = -s[i] * s[j]
return jacobian_m
Test:
In [95]: x
Out[95]: array([1, 2])
In [96]: softmax(x)
Out[96]: array([ 0.26894142, 0.73105858])
In [97]: softmax_grad(softmax(x))
Out[97]:
array([[ 0.19661193, -0.19661193],
[-0.19661193, 0.19661193]])
If you implement in a vectorized version:
soft_max = softmax(x)
# reshape softmax to 2d so np.dot gives matrix multiplication
def softmax_grad(softmax):
s = softmax.reshape(-1,1)
return np.diagflat(s) - np.dot(s, s.T)
softmax_grad(soft_max)
#array([[ 0.19661193, -0.19661193],
# [-0.19661193, 0.19661193]])
It should be like this: (x is the input to the softmax layer and dy is the delta coming from the loss above it)
dx = y * dy
s = dx.sum(axis=dx.ndim - 1, keepdims=True)
dx -= y * s
return dx
But the way you compute the error should be:
yact = activation.compute(x)
ycost = cost.compute(yact)
dsoftmax = activation.delta(x, cost.delta(yact, ycost, ytrue))
Explanation: Because the delta function is a part of the backpropagation algorithm, its responsibility is to multiply the vector dy (in my code, outgoing in your case) by the Jacobian of the compute(x) function evaluated at x. If you work out what does this Jacobian look like for softmax [1], and then multiply it from the left by a vector dy, after a bit of algebra you'll find out that you get something that corresponds to my Python code.
[1] https://stats.stackexchange.com/questions/79454/softmax-layer-in-a-neural-network
The other answers are great, here to share a simple implementation of forward/backward, regardless of loss functions.
In the image below, it is a brief derivation of the backward for softmax. The 2nd equation is loss function dependent, not part of our implementation.
backward verified by manual grad checking.
import numpy as np
class Softmax:
def forward(self, x):
mx = np.max(x, axis=1, keepdims=True)
x = x - mx # log-sum-exp trick
e = np.exp(x)
probs = e / np.sum(np.exp(x), axis=1, keepdims=True)
return probs
def backward(self, x, probs, bp_err):
dim = x.shape[1]
output = np.empty(x.shape)
for j in range(dim):
d_prob_over_xj = - (probs * probs[:,[j]]) # i.e. prob_k * prob_j, no matter k==j or not
d_prob_over_xj[:,j] += probs[:,j] # i.e. when k==j, +prob_j
output[:,j] = np.sum(bp_err * d_prob_over_xj, axis=1)
return output
def compute_manual_grads(x, pred_fn):
eps = 1e-3
batch_size, dim = x.shape
grads = np.empty(x.shape)
for i in range(batch_size):
for j in range(dim):
x[i,j] += eps
y1 = pred_fn(x)
x[i,j] -= 2*eps
y2 = pred_fn(x)
grads[i,j] = (y1 - y2) / (2*eps)
x[i,j] += eps
return grads
def loss_fn(probs, ys, loss_type):
batch_size = probs.shape[0]
# dummy mse
if loss_type=="mse":
loss = np.sum((np.take_along_axis(probs, ys.reshape(-1,1), axis=1) - 1)**2) / batch_size
values = 2 * (np.take_along_axis(probs, ys.reshape(-1,1), axis=1) - 1) / batch_size
# cross ent
if loss_type=="xent":
loss = - np.sum( np.take_along_axis(np.log(probs), ys.reshape(-1,1), axis=1) ) / batch_size
values = -1 / np.take_along_axis(probs, ys.reshape(-1,1), axis=1) / batch_size
err = np.zeros(probs.shape)
np.put_along_axis(err, ys.reshape(-1,1), values, axis=1)
return loss, err
if __name__ == "__main__":
batch_size = 10
dim = 5
x = np.random.rand(batch_size, dim)
ys = np.random.randint(0, dim, batch_size)
for loss_type in ["mse", "xent"]:
S = Softmax()
probs = S.forward(x)
loss, bp_err = loss_fn(probs, ys, loss_type)
grads = S.backward(x, probs, bp_err)
def pred_fn(x, ys):
pred = S.forward(x)
loss, err = loss_fn(pred, ys, loss_type)
return loss
manual_grads = compute_manual_grads(x, lambda x: pred_fn(x, ys))
# compare both grads
print(f"loss_type = {loss_type}, grad diff = {np.sum((grads - manual_grads)**2) / batch_size}")
Just in case you are processing in batches, here is an implementation in NumPy (tested vs TensorFlow). However, I will suggest avoiding the associated tensor operations, by mixing the jacobian with the cross-entropy, which leads to a very simple and efficient expression.
def softmax(z):
exps = np.exp(z - np.max(z))
return exps / np.sum(exps, axis=1, keepdims=True)
def softmax_jacob(s):
return np.einsum('ij,jk->ijk', s, np.eye(s.shape[-1])) \
- np.einsum('ij,ik->ijk', s, s)
def np_softmax_test(z):
return softmax_jacob(softmax(z))
def tf_softmax_test(z):
z = tf.constant(z, dtype=tf.float32)
with tf.GradientTape() as g:
g.watch(z)
a = tf.nn.softmax(z)
jacob = g.batch_jacobian(a, z)
return jacob.numpy()
z = np.random.randn(3, 5)
np.all(np.isclose(np_softmax_test(z), tf_softmax_test(z)))
Here is a c++ vectorized version, using intrinsics ( 22 times (!) faster than the non-SSE version):
// How many floats fit into __m256 "group".
// Used by vectors and matrices, to ensure their dimensions are appropriate for
// intrinsics.
// Otherwise, consecutive rows of matrices will not be 16-byte aligned, and
// operations on them will be incorrect.
#define F_MULTIPLE_OF_M256 8
//check to quickly see if your rows are divisible by m256.
//you can 'undefine' to save performance, after everything was verified to be correct.
#define ASSERT_THE_M256_MULTIPLES
#ifdef ASSERT_THE_M256_MULTIPLES
#define assert_is_m256_multiple(x) assert( (x%F_MULTIPLE_OF_M256) == 0)
#else
#define assert_is_m256_multiple (q)
#endif
// usually used at the end of our Reduce functions,
// where the final __m256 mSum needs to be collapsed into 1 scalar.
static inline float slow_hAdd_ps(__m256 x){
const float *sumStart = reinterpret_cast<const float*>(&x);
float sum = 0.0f;
for(size_t i=0; i<F_MULTIPLE_OF_M256; ++i){
sum += sumStart[i];
}
return sum;
}
f_vec SoftmaxGrad_fromResult(const float *softmaxResult, size_t size,
const float *gradFromAbove){//<--gradient vector, flowing into us from the above layer
assert_is_m256_multiple(size);
//allocate vector, where to store output:
f_vec grad_v(size, true);//true: skip filling with zeros, to save performance.
const __m256* end = (const __m256*)(softmaxResult + size);
for(size_t i=0; i<size; ++i){// <--for every row
//go through this i'th row:
__m256 sum = _mm256_set1_ps(0.0f);
const __m256 neg_sft_i = _mm256_set1_ps( -softmaxResult[i] );
const __m256 *s = (const __m256*)softmaxResult;
const __m256 *gAbove = (__m256*)gradFromAbove;
for (s; s<end; ){
__m256 mul = _mm256_mul_ps(*s, neg_sft_i); // sftmaxResult_j * (-sftmaxResult_i)
mul = _mm256_mul_ps( mul, *gAbove );
sum = _mm256_add_ps( sum, mul );//adding to the total sum of this row.
++s;
++gAbove;
}
grad_v[i] = slow_hAdd_ps( sum );//collapse the sum into 1 scalar (true sum of this row).
}//end for every row
//reset back to start and subtract a vector, to account for Kronecker delta:
__m256 *g = (__m256*)grad_v._contents;
__m256 *s = (__m256*)softmaxResult;
__m256 *gAbove = (__m256*)gradFromAbove;
for(s; s<end; ){
__m256 mul = _mm256_mul_ps(*s, *gAbove);
*g = _mm256_add_ps( *g, mul );
++s;
++g;
}
return grad_v;
}
If for some reason somebody wants a simple (non-SSE) version, here it is:
inline static void SoftmaxGrad_fromResult_nonSSE(const float* softmaxResult,
const float *gradFromAbove, //<--gradient vector, flowing into us from the above layer
float *gradOutput,
size_t count ){
// every pre-softmax element in a layer contributed to the softmax of every other element
// (it went into the denominator). So gradient will be distributed from every post-softmax element to every pre-elem.
for(size_t i=0; i<count; ++i){
//go through this i'th row:
float sum = 0.0f;
const float neg_sft_i = -softmaxResult[i];
for(size_t j=0; j<count; ++j){
float mul = gradFromAbove[j] * softmaxResult[j] * neg_sft_i;
sum += mul;//adding to the total sum of this row.
}
//NOTICE: equals, overwriting any old values:
gradOutput[i] = sum;
}//end for every row
for(size_t i=0; i<count; ++i){
gradOutput[i] += softmaxResult[i] * gradFromAbove[i];
}
}

How to add a row to a sparse matrix in itpp

In matlab we write :
H2= H(p==1,:)
where H2 and H are sparse double matrix and p is a logical vector.
how I can write it in itpp?
Not 100% familiar with itpp, but I would try something like
int h = 0, w = H.cols();
// count number of set elements in p to get number of rows of H2
for ( int i = 0 ; i < p.length() ; i++ ) {
h += (p[i] == 1);
}
// alocate H2
H2 = Sparse_Mat( h, w, H.nnz() ); // estimate number of nonzeros in H2
// copy the relevant elements
for ( int i = 0, i2 = 0 ; i < p.length() && i2 < h ; i++ ) {
if ( p[i] != 1 ) {
continue;
}
H2.set_submatrix( i2, 0, H.get_submatrix( i, i+1, 0, w ).full() );
i2++;
}
Clearly, working with sparse columns is much easier using get_col and set_col, so you might consider transposing H first and then perform the operation returning H2.transpose().

Simplex noise displaying incorrectly with BufferedImage

I finally got a working tile-able version of Simplex noise working after much work, but I can't seem to get it to record and display correctly when using a BufferedImage. Whenever I try to create an image, it ends up with bands or rings of black and white, instead of a smooth change of shades, which is what I'm expecting. I'm guessing there's something simple I'm not doing, but for the life of me, I can't find it.
This is my code (quite a bit of which is from Stefan Gustavson's Simplex noise implementation):
import java.awt.image.BufferedImage
import javax.imageio.ImageIO
import java.io.File
import scala.util.Random
object ImageTest {
def main(args: Array[String]): Unit = {
val image = generate(1024, 1024, 1)
ImageIO.write(image, "png", new File("heightmap.png"))
}
def generate(width: Int, height: Int, octaves: Int) = {
val map = new BufferedImage(width, height, BufferedImage.TYPE_USHORT_GRAY)
val pi2 = Math.PI * 2
for ( x <- 0 until width;
y <- 0 until height) {
var total = 0.0
for (oct <- 1 to octaves) {
val scale = (1 - 1/Math.pow(2, oct))
val s = x / width.toDouble
val t = y / height.toDouble
val dx = 1-scale
val dy = 1-scale
val nx = scale + Math.cos(s*pi2) * dx
val ny = scale + Math.cos(t*pi2) * dy
val nz = scale + Math.sin(s*pi2) * dx
val nw = scale + Math.sin(t*pi2) * dy
total += (((noise(nx,ny,nz,nw)+1)/2)) * Math.pow(0.5, oct)
}
map.setRGB(x,y, (total * 0xffffff).toInt)
}
map
}
// Simplex 4D noise generator
// returns -1.0 <-> 1.0
def noise(x: Double, y: Double, z: Double, w: Double) = {
// Skew the (x,y,z,w) space to determine which cell of 24 simplices we're in
val s = (x + y + z + w) * F4; // Factor for 4D skewing
val i = Math.floor(x+s).toInt
val j = Math.floor(y+s).toInt
val k = Math.floor(z+s).toInt
val l = Math.floor(w+s).toInt
val t = (i+j+k+l) * G4 // Factor for 4D unskewing
val xBase = x - (i-t) // Unskew the cell space and set the x, y, z, w
val yBase = y - (j-t) //distances from the cell origin
val zBase = z - (k-t)
val wBase = w - (l-t)
// For the 4D case, the simplex is a 4D shape I won't even try to describe.
// To find out which of the 24 possible simplices we're in, we need to
// determine the magnitude ordering of x0, y0, z0 and w0.
// Six pair-wise comparisons are performed between each possible pair
// of the four coordinates, and the results are used to rank the numbers.
var rankx = 0
var ranky = 0
var rankz = 0
var rankw = 0
if(xBase > yBase) rankx+=1 else ranky+=1
if(xBase > zBase) rankx+=1 else rankz+=1
if(xBase > wBase) rankx+=1 else rankw+=1
if(yBase > zBase) ranky+=1 else rankz+=1
if(yBase > wBase) ranky+=1 else rankw+=1
if(zBase > wBase) rankz+=1 else rankw+=1
// simplex[c] is a 4-vector with the numbers 0, 1, 2 and 3 in some order.
// Many values of c will never occur, since e.g. x>y>z>w makes x<z, y<w and x<w
// impossible. Only the 24 indices which have non-zero entries make any sense.
// We use a thresholding to set the coordinates in turn from the largest magnitude.
// Rank 3 denotes the largest coordinate.
val i1 = if (rankx >= 3) 1 else 0
val j1 = if (ranky >= 3) 1 else 0
val k1 = if (rankz >= 3) 1 else 0
val l1 = if (rankw >= 3) 1 else 0
// Rank 2 denotes the second largest coordinate.
val i2 = if (rankx >= 2) 1 else 0
val j2 = if (ranky >= 2) 1 else 0
val k2 = if (rankz >= 2) 1 else 0
val l2 = if (rankw >= 2) 1 else 0
// Rank 1 denotes the second smallest coordinate.
val i3 = if (rankx >= 1) 1 else 0
val j3 = if (ranky >= 1) 1 else 0
val k3 = if (rankz >= 1) 1 else 0
val l3 = if (rankw >= 1) 1 else 0
// The fifth corner has all coordinate offsets = 1, so no need to compute that.
val xList = Array(xBase, xBase-i1+G4, xBase-i2+2*G4, xBase-i3+3*G4, xBase-1+4*G4)
val yList = Array(yBase, yBase-j1+G4, yBase-j2+2*G4, yBase-j3+3*G4, yBase-1+4*G4)
val zList = Array(zBase, zBase-k1+G4, zBase-k2+2*G4, zBase-k3+3*G4, zBase-1+4*G4)
val wList = Array(wBase, wBase-l1+G4, wBase-l2+2*G4, wBase-l3+3*G4, wBase-1+4*G4)
// Work out the hashed gradient indices of the five simplex corners
val ii = if (i < 0) 256 + (i % 255) else i % 255
val jj = if (j < 0) 256 + (j % 255) else j % 255
val kk = if (k < 0) 256 + (k % 255) else k % 255
val ll = if (l < 0) 256 + (l % 255) else l % 255
val gradIndices = Array(
perm(ii+perm(jj+perm(kk+perm(ll)))) % 32,
perm(ii+i1+perm(jj+j1+perm(kk+k1+perm(ll+l1)))) % 32,
perm(ii+i2+perm(jj+j2+perm(kk+k2+perm(ll+l2)))) % 32,
perm(ii+i3+perm(jj+j3+perm(kk+k3+perm(ll+l3)))) % 32,
perm(ii+1+perm(jj+1+perm(kk+1+perm(ll+1)))) % 32)
// Calculate the contribution from the five corners
var total = 0.0
for (dim <- 0 until 5) {
val (x,y,z,w) = (xList(dim), yList(dim), zList(dim), wList(dim))
var t = 0.5 - x*x - y*y - z*z - w*w
total += {
if (t < 0) 0.0
else {
t *= t
val g = grad4(gradIndices(dim))
t * t * ((g.x*x)+(g.y*y)+(g.z*z)+(g.w*w))
}
}
}
// Sum up and scale the result to cover the range [-1,1]
27.0 * total
}
case class Grad(x: Double, y: Double, z: Double, w: Double = 0.0)
private lazy val grad4 = Array(
Grad(0,1,1,1), Grad(0,1,1,-1), Grad(0,1,-1,1), Grad(0,1,-1,-1),
Grad(0,-1,1,1),Grad(0,-1,1,-1),Grad(0,-1,-1,1),Grad(0,-1,-1,-1),
Grad(1,0,1,1), Grad(1,0,1,-1), Grad(1,0,-1,1), Grad(1,0,-1,-1),
Grad(-1,0,1,1),Grad(-1,0,1,-1),Grad(-1,0,-1,1),Grad(-1,0,-1,-1),
Grad(1,1,0,1), Grad(1,1,0,-1), Grad(1,-1,0,1), Grad(1,-1,0,-1),
Grad(-1,1,0,1),Grad(-1,1,0,-1),Grad(-1,-1,0,1),Grad(-1,-1,0,-1),
Grad(1,1,1,0), Grad(1,1,-1,0), Grad(1,-1,1,0), Grad(1,-1,-1,0),
Grad(-1,1,1,0),Grad(-1,1,-1,0),Grad(-1,-1,1,0),Grad(-1,-1,-1,0))
private lazy val perm = new Array[Short](512)
for(i <- 0 until perm.length)
perm(i) = Random.nextInt(256).toShort
private lazy val F4 = (Math.sqrt(5.0) - 1.0) / 4.0
private lazy val G4 = (5.0 - Math.sqrt(5.0)) / 20.0
}
I've checked the output values of the noise function I'm using, which as of yet hasn't returned anything outside of (-1, 1) exclusive. And for a single octave, the value I'm supplying to the image (total) has not gone outside of (0,1) exclusive, either.
The only thing I can see that would be a problem is either the BufferedImage type is set incorrectly, or I'm multiplying total by the wrong hex value when setting the values in the image.
I've looked through the Javadocs on BufferedImage for information about the types and the values they accept, though nothing I've found seems to be out of place in my code (though, I am fairly new to using BufferedImage in general). And I've tried changing the hex value, but neither seems to change anything. The only thing I've found that has any affect is if I divide the (total * 0xffffff).toInt value by 256, which seems to darken the bands a bit and a slight gradient appears over the areas it should, but if I increase the division too much, the image just becomes black. So as of right now I'm stuck on what could be the issue.
(total * 0xffffff).toInt doesn't seem to make sense. You are creating an ARGB value from a grayscale float with a single multiplication?
I think you want something like this:
val i = (total * 0xFF).toInt
val rgb = 0xFF000000 | (i << 16) | (i << 8) | i
That gives me a smooth random texture, although with very low contrast—with 1 octave, your total seems to vary approx from 0.2 to 0.3, so you may need to adjust the scale a bit.
I'm not sure though how you can get 16-bit grayscale resolution. Perhaps you need to set the raster data directly instead of using setRGB (which forces you down to 8 bits). The comments below this question suggest that you use the raster directly.