I have a CoffeeScript code
for y in [coY - limit .. coY + limit]
for x in [coX - limit .. coX + limit]
I was looking for ways how to improve speed of my code and found what it compiles into:
for (y = _i = _ref = coY - limit, _ref1 = coY + limit; _ref <= _ref1 ? _i <= _ref1 : _i >= _ref1; y = _ref <= _ref1 ? ++_i : --_i) {
for (x = _j = _ref2 = coX - limit, _ref3 = coX + limit; _ref2 <= _ref3 ? _j <= _ref3 : _j >= _ref3; x = _ref2 <= _ref3 ? ++_j : --_j) {
When I replaced that with my own JavaScript
for(y = coY - limit; y <= coY + limit; y++) {
for(x = coX - limit; x <= coX + limit; x++) {
I have measured the script to be significantly faster (from 25 to 15 ms). Can I somehow force CoffeeScript to compile into code similar to mine? Or is there other solution?
Thank you.
Assuming your loop will always go from a smaller number to a bigger number, you can use by 1:
for y in [coY - limit .. coY + limit] by 1
for x in [coX - limit .. coX + limit] by 1
Which compiles to:
for (y = _i = _ref = coY - limit, _ref1 = coY + limit; _i <= _ref1; y = _i += 1) {
for (x = _j = _ref2 = coX - limit, _ref3 = coX + limit; _j <= _ref3; x = _j += 1) {
It's not HEAPS better, but possibly a bit.
I dunno buddy, the code in your edit compiles to this for me:
// Generated by CoffeeScript 1.4.0
var x, y, _i, _j, _ref, _ref1, _ref2, _ref3;
for (y = _i = _ref = coY - limit, _ref1 = coY + limit; _i <= _ref1; y = _i += 1) {
for (x = _j = _ref2 = coX - limit, _ref3 = coX + limit; _j <= _ref3; x = _j += 1) {
}
}
To get it exactly like you want it, you might just have to actually write it in JavaScript. Luckily, CoffeeScript has syntax for inserting literal JS into a CS file. If you surround JS with backticks (`), the CS compiler will include it in the output but it won't change what's in the backticks in any way.
Here's an example:
console.log "regular coffeescript"
#surround inline JS with backticks, like so:
`for(y = coY - limit; y <= coY + limit; y++) {
for(x = coX - limit; x <= coX + limit; x++) {
console.log('inline JS!');
}
}`
console.log "continue writing regular CS after"
Source: http://coffeescript.org/#embedded
Related
In the following cuda code taken from book "Accelerating MATLAB with GPU computing: a primer with examples", I think
int row = blockIdx.x * blockDim.x + threadIdx.x;
if (row < 1 || row > numRows - 1)
return;
int col = blockIdx.y * blockDim.y + threadIdx.y;
if (col < 1 || col > numCols - 1)
return;
should actually be
int row = blockIdx.x * blockDim.x + threadIdx.x;
if (row < 0 || row > numRows - 1)
return;
int col = blockIdx.y * blockDim.y + threadIdx.y;
if (col < 0 || col > numCols - 1)
return;
Am I right?
The following is the whole code that does image convolution using cuda code called from MATLAB.
#include "conv2Mex.h"
__global__ void conv2MexCuda(float* src,
float* dst,
int numRows,
int numCols,
float* mask)
{
int row = blockIdx.x * blockDim.x + threadIdx.x;
if (row < 1 || row > numRows - 1)
return;
int col = blockIdx.y * blockDim.y + threadIdx.y;
if (col < 1 || col > numCols - 1)
return;
int dstIndex = col * numRows + row;
dst[dstIndex] = 0;
int mskIndex = 3 * 3 - 1;
for (int kc = -1; kc < 2; kc++)
{
int srcIndex = (col + kc) * numRows + row;
for (int kr = -1; kr < 2; kr++)
{
dst[dstIndex] += mask[mskIndex--] * src[srcIndex + kr];
}
}
}
void conv2Mex(float* src, float* dst, int numRows, int numCols, float* msk)
{
...
conv2MexCuda<<<gridSize, blockSize>>>...
...
}
Am I right?
I don't think you are right.
The construction of the row and col indices in the kernel code is such that they will vary (across threads in the grid) from 0 to numRows-1 and 0 to numCols-1 (and perhaps larger, depending on actual grid sizing, which you haven't shown).
Based on the code you have shown, the mask is evidently a 3x3 mask, which means that it acts as a stencil over the current (row, col) position, and extends plus and minus one row, and plus and minus one column. Let's take a careful look at the indexing here for the case where (row, col) = (0,0); this is one of the positions you have allowed to execute based on your proposed change:
for (int kc = -1; kc < 2; kc++)
{
int srcIndex = (col + kc) * numRows + row;
for (int kr = -1; kr < 2; kr++)
{
dst[dstIndex] += mask[mskIndex--] * src[srcIndex + kr];
At the first iteration of the outer for loop, kc will be -1, therefore srcIndex is (0-1)*numRows+0. Let's assume numRows is reasonably large, like 256. So srcIndex is -1*256 or -256. At the first iteration of the inner for-loop, kr is -1, so the computed index for the access to src is -256-1 = -257. That is almost never sensible.
If anything, the upper bounds look incorrect to me. If we assume that the valid image index ranges are 0..numRows-1 and 0..numCols-1, then I think the restrictions should be as follows:
int row = blockIdx.x * blockDim.x + threadIdx.x;
if (row < 1 || row > numRows - 2)
return;
int col = blockIdx.y * blockDim.y + threadIdx.y;
if (col < 1 || col > numCols - 2)
return;
That appears to be the classic computer science off-by-1 error.
I am working on a problem that Given a string s, partitions s such that every substring of the partition is a palindrome.
Return the minimum cuts needed for a palindrome partitioning of s. The problem can also be found in here. https://oj.leetcode.com/problems/palindrome-partitioning-ii/
Version 1 is one version of solution I found online.
Version 2 is my code.
They both seem to work in very similar ways. However, with a reasonably large input, version 2 takes more than 6000 milliseconds whereas version 1 takes around 71 milliseconds.
Can anyone provide any idea where the time difference is from?
Version 1:
int minSol(string s) {
int len = s.size();
vector<int> D(len + 1);
vector<vector<int>> P;
for (int i = 0; i < len; i++){
vector<int> t(len);
P.push_back(t);
}
for (int i = 0; i <= len; i++)
D[i] = len - i;
for (int i = 0; i < len; i++)
for (int j = 0; j < len; j++)
P[i][j] = false;
for (int i = len - 1; i >= 0; i--){
for (int j = i; j < len; j++){
if (s[i] == s[j] && (j - i < 2 || P[i + 1][j - 1])){
P[i][j] = true;
D[i] = min(D[i], D[j + 1] + 1);
}
}
}
return D[0] - 1;
}
Version 2:
int minCut(string s) {
int size = s.size();
vector<vector<bool>> map;
for (int i = 0; i < size; i++){
vector<bool> t;
for (int j = 0; j < size; j++){
t.push_back(false);
}
map.push_back(t);
}
vector<int> minCuts;
for (int i = 0; i < size; i++){
map[i][i] = true;
minCuts.push_back(size - i - 1);
}
for (int i = size - 1; i >= 0; i--){
for (int j = size - 1; j >= i; j--){
if (s[i] == s[j] && (j - i <= 1 || map[i + 1][j - 1])){
map[i][j] = true;
if (j == size - 1){
minCuts[i] = 0;
}else if (minCuts[i] > minCuts[j + 1] + 1){
minCuts[i] = minCuts[j + 1] + 1;
}
}
}
}
return minCuts[0];
}
I would guess it's because in the second version you're doing size^2 push_back's, whereas in the first version you're just doing size push_back's.
Noob question. I am trying to write a for loop with a range. For example, this is what I want to produce in JavaScript:
var i, a, j, b, len = arr.length;
for (i = 0; i < len - 1; i++) {
a = arr[i];
for (j = i + 1; i < len; j++) {
b = arr[j];
doSomething(a, b);
}
}
The closest I've come so far is the following, but
It generates unnecessary and expensive slice calls
accesses the array length inside the inner loop
CoffeeScript:
for a, i in a[0...a.length-1]
for b, j in a[i+1...a.length]
doSomething a, b
Generated code:
var a, b, i, j, _i, _j, _len, _len1, _ref, _ref1;
_ref = a.slice(0, a.length - 1);
for (i = _i = 0, _len = _ref.length; _i < _len; i = ++_i) {
a = _ref[i];
_ref1 = a.slice(i + 1, a.length);
for (j = _j = 0, _len1 = _ref1.length; _j < _len1; j = ++_j) {
b = _ref1[j];
doSomething(a, b);
}
}
(How) can this be expressed in CoffeeScript?
Basically, transcribing your first JS code to CS:
len = arr.length
for i in [0...len - 1] by 1
a = arr[i]
for j in [i + 1...len] by 1
b = arr[j]
doSomething a, b
Seems like the only way to avoid the extra variables is with a while loop http://js2.coffee
i = 0
len = arr.length
while i < len - 1
a = arr[i]
j = i + 1
while j < len
b = arr[j]
doSomething a, b
j++
i++
or a bit less readable:
i = 0; len = arr.length - 1
while i < len
a = arr[i++]; j = i
while j <= len
doSomething a, arr[j++]
I'm looking to implement SLAB6 into my raycaster, especially the kv6 support for voxelmodels. However the SLAB6 source by Ken Silverman is totally unreadably (mostly ASM) so I was hoping someone could point me to a proper C / Java source to load kv6 models or maybe to explain me the workings in some pseudocode preferably (since I want to know how to support the kv6, I know how it works). Thanks, Kaj
EDIT: the implementation would be in Java.
I found some code in an application called VoxelGL (author not mentioned in sourcecode):
void CVoxelWorld::generateSlabFromData(unsigned char *data, VoxelData *vdata, Slab *slab)
{
int currentpattern = 1;
int i = 0;
int n, totalcount, v, count;
n = 0;
v = 0;
while (1)
{
while (data[i] == currentpattern)
{
if (currentpattern == 1)
v++;
i++;
if (i == 256)
break;
}
n++;
if (i == 256)
{
if (currentpattern == 0)
n--;
break;
}
currentpattern ^= 1;
}
slab->nentries = n;
if (slab->description != 0)delete [] slab->description;
if (slab->data != 0)delete [] slab->data;
slab->description = new int[n];
slab->data = new VoxelData[v];
totalcount = 0;
v = 0;
currentpattern = 1;
for (i = 0; i < n; i++)
{
count = 0;
while (data[totalcount] == currentpattern)
{
count++;
totalcount++;
if (totalcount == 256)
break;
}
slab->description[i] = count-1;
if (i % 2 == 0)
{
memcpy(slab->data + v, vdata + totalcount - count, 3 * count);
v += count;
}
currentpattern ^= 1;
}
}
And:
#define clustersize 8
Slab *CVoxelWorld::getSlab(int x, int z)
{
int xgrid = x / clustersize;
int ygrid = z / clustersize;
int clusteroffset = xgrid * 1024 * clustersize + ygrid * clustersize * clustersize;
return &m_data[clusteroffset + (x & (clustersize - 1)) + (z & (clustersize - 1)) * clustersize];
}
And:
int CVoxelWorld::isSolid(int x, int y, int z)
{
Slab *slab;
if (y < 0 || y > 256)
return 0;
slab = getSlab(x, z);
int counter = 0;
for (int i = 0; i < slab->nentries; i++)
{
int height = slab->description[i] + 1;
if (i % 2 == 0)
{
if (y >= counter && y < counter + height)
return 1;
}
counter += height;
}
return 0;
}
I'm trying to to teach a neural net of 2 inputs, 4 hidden nodes (all in same layer) and 1 output node. The binary representation works fine, but I have problems with the Bipolar. I can't figure out why, but the total error will sometimes converge to the same number around 2.xx. My sigmoid is 2/(1+ exp(-x)) - 1. Perhaps I'm sigmoiding in the wrong place. For example to calculate the output error should I be comparing the sigmoided output with the expected value or with the sigmoided expected value?
I was following this website here: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html , but they use different functions then I was instructed to use. Even when I did try to implement their functions I still ran into the same problem. Either way I get stuck about half the time at the same number (a different number for different implementations). Please tell me if I have made a mistake in my code somewhere or if this is normal (I don't see how it could be). Momentum is set to 0. Is this a common 0 momentum problem? The error functions we are supposed to be using are:
if ui is an output unit
Error(i) = (Ci - ui ) * f'(Si )
if ui is a hidden unit
Error(i) = Error(Output) * weight(i to output) * f'(Si)
public double sigmoid( double x ) {
double fBipolar, fBinary, temp;
temp = (1 + Math.exp(-x));
fBipolar = (2 / temp) - 1;
fBinary = 1 / temp;
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// Initialize the weights to random values.
private void initializeWeights(double neg, double pos) {
for(int i = 0; i < numInputs + 1; i++){
for(int j = 0; j < numHiddenNeurons; j++){
inputWeights[i][j] = Math.random() - pos;
if(inputWeights[i][j] < neg || inputWeights[i][j] > pos){
print("ERROR ");
print(inputWeights[i][j]);
}
}
}
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenWeights[i] = Math.random() - pos;
if(hiddenWeights[i] < neg || hiddenWeights[i] > pos){
print("ERROR ");
print(hiddenWeights[i]);
}
}
}
// Computes output of the NN without training. I.e. a forward pass
public double outputFor ( double[] argInputVector ) {
for(int i = 0; i < numInputs; i++){
inputs[i] = argInputVector[i];
}
double weightedSum = 0;
for(int i = 0; i < numHiddenNeurons; i++){
weightedSum = 0;
for(int j = 0; j < numInputs + 1; j++){
weightedSum += inputWeights[j][i] * inputs[j];
}
hiddenActivation[i] = sigmoid(weightedSum);
}
weightedSum = 0;
for(int j = 0; j < numHiddenNeurons + 1; j++){
weightedSum += (hiddenActivation[j] * hiddenWeights[j]);
}
return sigmoid(weightedSum);
}
//Computes the derivative of f
public static double fPrime(double u){
double fBipolar, fBinary;
fBipolar = 0.5 * (1 - Math.pow(u,2));
fBinary = u * (1 - u);
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// This method is used to update the weights of the neural net.
public double train ( double [] argInputVector, double argTargetOutput ){
double output = outputFor(argInputVector);
double lastDelta;
double outputError = (argTargetOutput - output) * fPrime(output);
if(outputError != 0){
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenError[i] = hiddenWeights[i] * outputError * fPrime(hiddenActivation[i]);
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
hiddenWeights[i] += deltaHiddenWeights[i];
}
for(int in = 0; in < numInputs + 1; in++){
for(int hid = 0; hid < numHiddenNeurons; hid++){
lastDelta = deltaInputWeights[in][hid];
deltaInputWeights[in][hid] = learningRate * hiddenError[hid] * inputs[in] + (momentum * lastDelta);
inputWeights[in][hid] += deltaInputWeights[in][hid];
}
}
}
return 0.5 * (argTargetOutput - output) * (argTargetOutput - output);
}
General coding comments:
initializeWeights(-1.0, 1.0);
may not actually get the initial values you were expecting.
initializeWeights should probably have:
inputWeights[i][j] = Math.random() * (pos - neg) + neg;
// ...
hiddenWeights[i] = (Math.random() * (pos - neg)) + neg;
instead of:
Math.random() - pos;
so that this works:
initializeWeights(0.0, 1.0);
and gives you initial values between 0.0 and 1.0 rather than between -1.0 and 0.0.
lastDelta is used before it is declared:
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
I'm not sure if the + 1 on numInputs + 1 and numHiddenNeurons + 1 are necessary.
Remember to watch out for rounding of ints: 5/2 = 2, not 2.5!
Use 5.0/2.0 instead. In general, add the .0 in your code when the output should be a double.
Most importantly, have you trained the NeuralNet long enough?
Try running it with numInputs = 2, numHiddenNeurons = 4, learningRate = 0.9, and train for 1,000 or 10,000 times.
Using numHiddenNeurons = 2 it sometimes get "stuck" when trying to solve the XOR problem.
See also XOR problem - simulation