Boids in compute shader: How can I avoid a stable state when processing every boid in parallel?

Boids in compute shader: How can I avoid a stable state when processing every boid in parallel? - simulation

I want to implement the simple, age-old boid algorithm in 2D but as compute shader in HLSL (host programm ist VVVV).
Now my problem: In all 2D sample-applications I`ve found, boids are updated one at a time. This results in some jittering that causes the typical "random" results. Direction changes, etc.
I am not sure if my implementation is correct, but if I test each rule individually, my results look like other references. If I apply the rules together however (in pretty much any combination), very soon my boids enter a stable state where they form a fixed formation and just fly in some particular direction. Changing the view-radius influences the size and number of formations but doesn`t introduce anything "chaotic" or flock like, just static bunches after a couple of seconds.
Is there an implementation problem or is this just a property of parallel compute you have to compensate (somehow?
Relevant code:
void CS(uint3 tid : SV_DispatchThreadID){
if (tid.x >= elementCount)
return;
if(reset){
OutputBuffer[tid.x].x = random(float2(rand.x + 12,tid.x-4)); // PosXY
OutputBuffer[tid.x].y = random(float2(tid.x + 16,rand.y*6));
OutputBuffer[tid.x].z = random(float2(tid.x,tid.x * 2)) / 100; //VelXY
OutputBuffer[tid.x].w = random(float2(tid.x * 16, tid.x / 4))/ 100;
}else{
float maxSpeed = 0.01;
float2 myPos = OutputBuffer[tid.x].xy;
float2 myVel = OutputBuffer[tid.x].zw;
float2 myAcc = 0;
float2 steerAlign = 0;
float2 steerCohesion = 0;
float2 steerAvoid = 0;
int alignCount = 0;
int cohesionCount = 0;
int avoidCount = 0;
for(uint i = 0; i < elementCount; i++){
if(i != tid.x){
float2 iPos = OutputBuffer[i].xy;
float2 iVel = OutputBuffer[i].wz;
float dist = distance(iPos,myPos);
if(dist < range / 2){
steerAlign += iVel;
alignCount++;
}
if(dist < range * 3){
steerCohesion += iPos - myPos;
cohesionCount++;
}
if(dist < range){
float2 diff = myPos - iPos;
diff /= dist * dist;
steerAvoid += diff;
avoidCount++;
}
}
}
if(alignCount > 0 && steerAlign.x != 0){
steerAlign /= alignCount;
steerAlign = normalize(steerAlign) * maxForce;
}
if(cohesionCount > 0){
steerCohesion /= cohesionCount;
steerCohesion = normalize(steerCohesion) * maxForce;
}
if(avoidCount > 0){
steerAvoid /= avoidCount;
steerAvoid = normalize(steerAvoid) * maxForce;
}
if(myPos.x < -1){
myPos.x = 1;
}
if(myPos.x > 1){
myPos.x = -1;
}
if(myPos.y < -1){
myPos.y = 1;
}
if(myPos.y > 1){
myPos.y = -1;
}
myAcc = (steerAlign * alignFx) + (steerCohesion * cohesionFx) + (steerAvoid * seperationFx);
myAcc = clamp(myAcc, -maxForce, maxForce);
myVel += myAcc;
myVel = clamp(myVel,-maxSpeed,maxSpeed);
myPos += myVel;
OutputBuffer[tid.x].xy = myPos;
OutputBuffer[tid.x].zw = myVel;
}
}

Related

How to implement Offset on Perlin noise?

I need a little help,
I have this Perlin noise function, but I don't know how to properly create offsets.
I am using this to create infinite terrain generation and when I use this script it the noise values of individual chunks don't fit together properly. And they create holes.
Is there a way of fixing this ?
public float[,] GenerateNoise(int chunkSize, int octaves, string seed, float noiseScale, float persistence, float lacunarity, Vector2 offset)
{
if (noiseScale <= 0)
{
noiseScale = 0.0001f;
}
float halfWidth = chunkSize / 2f;
float halfHeight = chunkSize / 2f;
float[,] noiseMap = new float[chunkSize, chunkSize];
System.Random rand = new System.Random(seed.GetHashCode());
//Octaves offset
Vector2[] octavesOffset = new Vector2[octaves];
for (int i = 0; i < octaves; i++)
{
float offset_X = rand.Next(-100000, 100000) + offset.x;
float offset_Y = rand.Next(-100000, 100000) + offset.y;
octavesOffset[i] = new Vector2(offset_X / chunkSize , offset_Y / chunkSize);
}
for (int x = 0; x < chunkSize; x++)
{
for (int y = 0; y < chunkSize; y++)
{
float amplitude = 1;
float frequency = 1;
float noiseHeight = 0;
float superpositionCompensation = 0;
for (int i = 0; i < octaves; i++)
{
float sampleX = (x - halfWidth) / noiseScale * frequency + octavesOffset[i].x * frequency;
float sampleY = (y - halfHeight) / noiseScale * frequency + octavesOffset[i].y * frequency;
float noiseValue = Mathf.PerlinNoise(sampleX, sampleY);
noiseHeight += noiseValue * amplitude;
noiseHeight -= superpositionCompensation;
amplitude *= persistence;
frequency *= lacunarity;
superpositionCompensation = amplitude / 2;
}
noiseMap[x, y] = Mathf.Clamp01(noiseHeight);
}
}
return noiseMap;
}

It is quite simple actually, just add the chunk x,y coordinates to Mathf.PerlinNoise. Taking your code as an example, you can:
Pass chunkPosition as an argument to it:
public float[,] GenerateNoise(Vector2 chunkPos, int chunkSize, int octaves, string seed, float noiseScale, float persistence, float lacunarity, Vector2 offset)
Add it to Mathf.PerlinNoise invocation:
float noiseValue = Mathf.PerlinNoise(sampleX + chunkPos.x, sampleY + chunkPos.y);
Then make sure to generate each chunk with an appropriate chunkPos, where chunkPos can be its transform.position or whatever coordinates you have.
That's it.

Unity perlin noise having repeating patterns

I made a Noise class using the Perlin Noise from Unity like this:
public static float[,] GetNoise(Vector2Int initialOffset, float scale, float persistance, float lacunarity, int octaves)
{
float[,] noiseMap = new float[Chunk.width, Chunk.height];
float maxHeight = 0;
float minHeight = 0;
for (int y = 0; y < Chunk.height; y++)
{
for (int x = 0; x < Chunk.width; x++)
{
float amplitude = 1;
float frequency = 1;
float noiseHeight = 0;
for (int oc = 0; oc < octaves; oc++)
{
float coordX = (x + initialOffset.x) / scale * frequency;
float coordY = (y + initialOffset.y) / scale * frequency;
float perlin = Mathf.PerlinNoise(coordX, coordY) * 2 - 1;
noiseHeight += perlin * amplitude;
amplitude *= persistance;
frequency *= lacunarity;
}
if (noiseHeight < minHeight)
{
minHeight = noiseHeight;
}
if (noiseHeight > maxHeight)
{
maxHeight = noiseHeight;
}
noiseMap[x, y] = noiseHeight;
}
}
for (int y = 0; y < Chunk.height; y++)
{
for (int x = 0; x < Chunk.width; x++)
{
noiseMap[x, y] = Mathf.InverseLerp(minHeight, maxHeight, noiseMap[x, y]);
}
}
return noiseMap;
}
However this code is giving me repeating patterns like this:
What am I doing wrong? Or there is no way to get rid of the patterns?

I got it working, not very well, but working. The way I did was I generate the height map for every tile in the chunk, then I did some random placing of tiles, while having in account the height map. Something like this:
if (heightMap[x, y] < 0.3 && Random.value < 0.5)
// Add tile
This way I got this result:
EDIT:
Doing some more research about Perlin Noise I found out that it just doesn't like negative coords for some reason, so I did this way, hope this helps someone!

so .. fixed the negative coords like this:
//account for negatives (ex. -1 % 256 = -1, needs to loop around to 255)
if (noiseOffset.x < 0)
noiseOffset = new Vector2(noiseOffset.x + noiseRange.x, noiseOffset.y);
if (noiseOffset.y < 0)
noiseOffset = new Vector2(noiseOffset.x, noiseOffset.y + noiseRange.y);

Renderscript hangs device

I am implementing part of FFT algorithm using Renderscript in Android, When I ran the code my application got hanged. I want to process 512 values from real and img allocation at a time. Kernel will execute 512 times using provided dummy allocation of size 512.
Here is my java code
RenderScript rs = RenderScript.create(WajinViewerApplication
.getApplication());
ScriptC_fft scriptC_fft = new ScriptC_fft(rs);
float inReal[] = new float[512 * 512];
float inImg[] = new float[512 * 512];
int k = 0;
for (int i = 0; i < 512; i++) {
for (int j = 0; j < 512; j++) {
// copy values from complex 2d array to 1d array
inReal[k] = data[i][j].real;
inImg[k] = data[i][j].imaginary;
k++;
}
}
Allocation realAllocation = Allocation.createSized(rs, Element.F32(rs),
512 * 512);
Allocation imgAllocation = Allocation.createSized(rs, Element.F32(rs),
512 * 512);
realAllocation.copyFrom(inReal);
imgAllocation.copyFrom(inImg);
scriptC_fft.set_real(realAllocation);
scriptC_fft.set_img(imgAllocation);
Allocation inAllocation = Allocation.createSized(rs, Element.U16(rs),
512);
Allocation outAllocation = Allocation.createTyped(rs,
inAllocation.getType());
inAllocation.copyFrom(new short[512]);
// set direction
if (direction == Direction.Forward) {
scriptC_fft.set_is_forward(true);
} else {
scriptC_fft.set_is_forward(false);
}
scriptC_fft.set_len(512);
scriptC_fft.set_levels(Integer.numberOfLeadingZeros(512));
scriptC_fft.forEach_root(inAllocation, outAllocation);
outAllocation.copyTo(new short[512]);
float outReal[] = new float[512 * 512];
float outImg[] = new float[512 * 512];
scriptC_fft.get_real().copyTo(outReal);
scriptC_fft.get_img().copyTo(outImg);
k = 0;
for (int i = 0; i < 512; i++) {
for (int j = 0; j < 512; j++) {
// copy values from complex 1d array to 2d array
data[i][j].real = outReal[k];
data[i][j].imaginary = outImg[k];
k++;
}
}
rs.destroy();
And here is my Renderscript code
#pragma version(1)
#pragma rs java_package_name(jp.drmh.wajin.newversion)
#include "common.rsh"
rs_allocation real;
rs_allocation img;
bool is_forward;
uint32_t len;
uint32_t levels;
uint16_t __attribute__((kernel)) root(uint16_t in, uint32_t x, uint32_t y){
// rsDebug("call",x);
float realval[512];
float imagval[512];
if(is_forward){
for (uint32_t i = 0; i < len; i++) {
realval[i]=rsGetElementAt_float(real,x*512+i);
imagval[i]=rsGetElementAt_float(img,x*512+i);
//rsDebug("values", realval[i], imagval[i]);
}
}else{
for (uint32_t i = 0; i < len; i++) {
realval[i]=rsGetElementAt_float(img,x*512+i);
imagval[i]=rsGetElementAt_float(real,x*512+i);
}
}
float costable[256],sintable[256];
for (uint32_t i = 0; i < len / 2; i++) {
costable[i]=cos(2 * M_PI * i / len);
sintable[i]=sin(2 * M_PI * i / len);
}
// Bit-reversed addressing permutation
for (uint32_t i = 0; i < len; i++) {
uint32_t j = bit_reverse32(i);
uint32_t ans=j>>(32 - levels);
if (j > i) {
float temp = realval[i];
realval[i] = realval[j];
realval[j] = temp;
temp = imagval[i];
imagval[i] = imagval[j];
imagval[j] = temp;
}
}
for (uint32_t size = 2; size <= len; size *= 2) {
uint32_t halfsize = size / 2;
uint32_t tablestep = len / size;
for (uint32_t i = 0; i < len; i += size) {
for (uint32_t j = i, k = 0; j < i + halfsize; j++, k += tablestep) {
float tpre=realval[j + halfsize] * costable[k]
+ imagval[j + halfsize] * sintable[k];
float tpim = -realval[j + halfsize] * sintable[k]
+ imagval[j + halfsize] * costable[k];
realval[j + halfsize] = realval[j] - tpre;
imagval[j + halfsize] = imagval[j] - tpim;
realval[j] += tpre;
imagval[j] += tpim;
}
}
if (size == len)
break;
}
if(!is_forward){
for(uint32_t i = 0; i < len; i++){
realval[i]=realval[i]/len;
imagval[i]=imagval[i]/len;
rsDebug("values", realval[i], imagval[i]);
}
for (uint32_t i = 0; i < len; i++) {
rsSetElementAt_float(real, realval[i], x*512+i);
rsSetElementAt_float(img, imagval[i], x*512+i);
}
}
return in;
}

iRPROP+ Multilayer Perceptron

Hello everyone This is the code of iRPROP+ algo for my MLP. When I try to train my network, standart deviation decreases for 1500 epoches (so slow: from ~0.5 to 0.4732) but suddenly it starts to increase.
Can someone say what did I do wrong?
public void RPROP()
{
double a = 1.2, b = 0.5, nMax = 50, nMin = 0.000001;
for (int l = Network.Length - 1; l > 0; l--)
{
for (int i = 0; i < Network[l].getSize(); i++)
{
Neuron n = Network[l].Neurons[i];
double sum = 0;
if (l == Network.Length - 1) n.Delta = (n.Output - DesiredOutput[i]) * ActFunc.calcDeprivateFunction(n.Output);
else
{
for (int k = 0; k < Network[l + 1].getSize(); k++)
{
sum += Network[l + 1].Neurons[k].getWeight(i) * Network[l + 1].Neurons[k].Delta;
}
n.Delta = sum * ActFunc.calcDeprivateFunction(n.Output);
}
}
}
for (int l = 1; l < Network.Length; l++)
{
for (int i = 0; i < Network[l].getSize(); i++)
{
Neuron n = Network[l].Neurons[i];
if ((n.PrevDelta * n.Delta) > 0)
{
n.N = Math.Min(a * n.PrevN, nMax);
n.Bias -= n.N * Math.Sign(n.Delta);
for (int j = 0; j < Network[l - 1].getSize(); j++)
{
n.setWeight(j, n.getWeight(j) - n.N * Math.Sign(n.Delta));
}
n.PrevDelta = n.Delta;
}
else if ((n.PrevDelta * n.Delta) < 0)
{
n.N = Math.Max(b * n.PrevN, nMin);
if (this.CurrentError > this.LastError)
{
n.Bias += n.PrevN * Math.Sign(n.PrevDelta);
for (int j = 0; j < Network[l - 1].getSize(); j++)
{
n.setWeight(j, n.getWeight(j) + n.PrevN * Math.Sign(n.PrevDelta));
}
}
n.Delta = 0;
}
else if ((n.PrevDelta * n.Delta) == 0)
{
n.Bias -= n.N * Math.Sign(n.Delta);
for (int j = 0; j < Network[l - 1].getSize(); j++)
{
n.setWeight(j, n.getWeight(j) - n.N * Math.Sign(n.Delta));
}
n.PrevDelta = n.Delta;
}
n.PrevN = n.N;
}
}
}

For the first view, you calculate one train element error and you instantly teach it to the network. try to run over the full train set, without change the weights, and just summarize the Delta. After that, update the weights once, set the prev delta and start over.
Also, there is no update for neuron threshold.

Teaching a Neural Net: Bipolar XOR

I'm trying to to teach a neural net of 2 inputs, 4 hidden nodes (all in same layer) and 1 output node. The binary representation works fine, but I have problems with the Bipolar. I can't figure out why, but the total error will sometimes converge to the same number around 2.xx. My sigmoid is 2/(1+ exp(-x)) - 1. Perhaps I'm sigmoiding in the wrong place. For example to calculate the output error should I be comparing the sigmoided output with the expected value or with the sigmoided expected value?
I was following this website here: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html , but they use different functions then I was instructed to use. Even when I did try to implement their functions I still ran into the same problem. Either way I get stuck about half the time at the same number (a different number for different implementations). Please tell me if I have made a mistake in my code somewhere or if this is normal (I don't see how it could be). Momentum is set to 0. Is this a common 0 momentum problem? The error functions we are supposed to be using are:
if ui is an output unit
Error(i) = (Ci - ui ) * f'(Si )
if ui is a hidden unit
Error(i) = Error(Output) * weight(i to output) * f'(Si)
public double sigmoid( double x ) {
double fBipolar, fBinary, temp;
temp = (1 + Math.exp(-x));
fBipolar = (2 / temp) - 1;
fBinary = 1 / temp;
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// Initialize the weights to random values.
private void initializeWeights(double neg, double pos) {
for(int i = 0; i < numInputs + 1; i++){
for(int j = 0; j < numHiddenNeurons; j++){
inputWeights[i][j] = Math.random() - pos;
if(inputWeights[i][j] < neg || inputWeights[i][j] > pos){
print("ERROR ");
print(inputWeights[i][j]);
}
}
}
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenWeights[i] = Math.random() - pos;
if(hiddenWeights[i] < neg || hiddenWeights[i] > pos){
print("ERROR ");
print(hiddenWeights[i]);
}
}
}
// Computes output of the NN without training. I.e. a forward pass
public double outputFor ( double[] argInputVector ) {
for(int i = 0; i < numInputs; i++){
inputs[i] = argInputVector[i];
}
double weightedSum = 0;
for(int i = 0; i < numHiddenNeurons; i++){
weightedSum = 0;
for(int j = 0; j < numInputs + 1; j++){
weightedSum += inputWeights[j][i] * inputs[j];
}
hiddenActivation[i] = sigmoid(weightedSum);
}
weightedSum = 0;
for(int j = 0; j < numHiddenNeurons + 1; j++){
weightedSum += (hiddenActivation[j] * hiddenWeights[j]);
}
return sigmoid(weightedSum);
}
//Computes the derivative of f
public static double fPrime(double u){
double fBipolar, fBinary;
fBipolar = 0.5 * (1 - Math.pow(u,2));
fBinary = u * (1 - u);
if(bipolar){
return fBipolar;
}else{
return fBinary;
}
}
// This method is used to update the weights of the neural net.
public double train ( double [] argInputVector, double argTargetOutput ){
double output = outputFor(argInputVector);
double lastDelta;
double outputError = (argTargetOutput - output) * fPrime(output);
if(outputError != 0){
for(int i = 0; i < numHiddenNeurons + 1; i++){
hiddenError[i] = hiddenWeights[i] * outputError * fPrime(hiddenActivation[i]);
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
hiddenWeights[i] += deltaHiddenWeights[i];
}
for(int in = 0; in < numInputs + 1; in++){
for(int hid = 0; hid < numHiddenNeurons; hid++){
lastDelta = deltaInputWeights[in][hid];
deltaInputWeights[in][hid] = learningRate * hiddenError[hid] * inputs[in] + (momentum * lastDelta);
inputWeights[in][hid] += deltaInputWeights[in][hid];
}
}
}
return 0.5 * (argTargetOutput - output) * (argTargetOutput - output);
}

General coding comments:
initializeWeights(-1.0, 1.0);
may not actually get the initial values you were expecting.
initializeWeights should probably have:
inputWeights[i][j] = Math.random() * (pos - neg) + neg;
// ...
hiddenWeights[i] = (Math.random() * (pos - neg)) + neg;
instead of:
Math.random() - pos;
so that this works:
initializeWeights(0.0, 1.0);
and gives you initial values between 0.0 and 1.0 rather than between -1.0 and 0.0.
lastDelta is used before it is declared:
deltaHiddenWeights[i] = learningRate * outputError * hiddenActivation[i] + (momentum * lastDelta);
I'm not sure if the + 1 on numInputs + 1 and numHiddenNeurons + 1 are necessary.
Remember to watch out for rounding of ints: 5/2 = 2, not 2.5!
Use 5.0/2.0 instead. In general, add the .0 in your code when the output should be a double.
Most importantly, have you trained the NeuralNet long enough?
Try running it with numInputs = 2, numHiddenNeurons = 4, learningRate = 0.9, and train for 1,000 or 10,000 times.
Using numHiddenNeurons = 2 it sometimes get "stuck" when trying to solve the XOR problem.
See also XOR problem - simulation

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Boids in compute shader: How can I avoid a stable state when processing every boid in parallel? - simulation

Related

How to implement Offset on Perlin noise?

Unity perlin noise having repeating patterns

Renderscript hangs device

iRPROP+ Multilayer Perceptron

Teaching a Neural Net: Bipolar XOR

Categories

Resources