Why I can't use tex2D inside a loop in Unity ShaderLab? - unity3d

I am trying to do something like
while (currentLayerDepth < currentDepth)
{
currentUV -= step;
currentDepth = tex2D(_HeightTex,currentUV).a;
currentLayerDepth += eachLayer;
}
It logged a error Shader error in 'Unlit/CustomParallax': unable to unroll loop, loop does not appear to terminate in a timely manner (1024 iterations) at line 76 (on metal)
So now I have two choices, one is to add [unroll(100)] to limit loop times and the other is using tex2Dlod instead of tex2D.
I'm curious why this happened?
Besides, why tex2Dlod can be used in a loop?

tex2D has to compute a local derivative to determine the correct LOD for the sample. Because of how derivatives are usually computed (as difference between neighbouring computation units), they can only be computed for predictable control flow.
Your loop doesn't predictably do the same number of calls to tex2D for neighbouring fragments, so the derivative can't be predictably computed.
For more details have a look at the GLSL specs. Search for "derivative" and "uniform control flow"

Related

Is there an efficient way to append values in an SSB in a compute shader with GLSL?

I have an OpenGL compute shader that generates an undefined number of vertices and stores them in a shader storage buffer (SSB). The SSB capacity is big enough so that the compute shader never generates a number of vertices that exceeds its capacity. I need the generated values to fill the buffer from the beginning and with no discontinuities (just like using push_back on a C++ vector). For that I'm using an atomic counter to count the index where to place the vertex values in the SSB when one is generated. This method seems to work but makes the compute shader run much more slower. Here is what the GLSL function looks like:
void createVertex(/*some parameters*/){
uint index = atomicCounterIncrement(numberOfVertices);
Vector vertex;
// some processing that calculates the coordinates of the vertex
vertices[index] = vertex;
}
Where vertices is a vec3 SSB defined by :
struct Vector
{
float x, y, z;
};
layout (std430, binding = 1) buffer vertexBuffer
{
Vector vertices[];
};
And numberOfVertices is an atomic counter buffer which value is initialized to 0 before running the shader.
Once the shader finished running I can load back the numberOfVertices variable on the CPU side to know the number of created vertices that are stored in the buffer in the range [0; numberOfVertices*3*sizeof(float)].
When measuring the time the shader took to run (with glBegin/EndQuery(GL_TIME_ELAPSED)), I get about 50ms. However when removing the atomicCounterIncrement line (and therefore also not assigning the vertex into the array) the measured time is around a few milliseconds. And that gap increases as I increase the number of workgroups.
I think the problem may be caused by the use of the atomic operation. So is there a better way to append values in an SSB ? In a way that would also give me the total number of added values once the shader has finished running ?
EDIT: After some refactoring and tests I noticed that it's actually the assignement of values inside the buffer (vertices[index] = vertex;) that slows all (about 40ms less when this line is removed). I should inform that the createVertex() function is called inside a for loop which number of loops is different between shader instances.

Dimensionality reduction using PCA - MATLAB

I am trying to reduce dimensionality of a training set using PCA.
I have come across two approaches.
[V,U,eigen]=pca(train_x);
eigen_sum=0;
for lamda=1:length(eigen)
eigen_sum=eigen_sum+eigen(lamda,1);
if(eigen_sum/sum(eigen)>=0.90)
break;
end
end
train_x=train_x*V(:, 1:lamda);
Here, I simply use the eigenvalue matrix to reconstruct the training set with lower amount of features determined by principal components describing 90% of original set.
The alternate method that I found is almost exactly the same, save the last line, which changes to:
train_x=U(:,1:lamda);
In other words, we take the training set as the principal component representation of the original training set up to some feature lamda.
Both of these methods seem to yield similar results (out of sample test error), but there is difference, however minuscule it may be.
My question is, which one is the right method?
The answer depends on your data, and what you want to do.
Using your variable names. Generally speaking is easy to expect that the outputs of pca maintain
U = train_x * V
But this is only true if your data is normalized, specifically if you already removed the mean from each component. If not, then what one can expect is
U = train_x * V - mean(train_x * V)
And in that regard, weather you want to remove or maintain the mean of your data before processing it, depends on your application.
It's also worth noting that even if you remove the mean before processing, there might be some small difference, but it will be around floating point precision error
((train_x * V) - U) ./ U ~~ 1.0e-15
And this error can be safely ignored

How to mimic MATLAB/Simulink relay behavior?

I am trying to mimic the behavior of MATLAB's Simulink relay block with just MATLAB code.
My code is as follows (not familiar with persistent variable? click) :
function out = fcn(u,delta)
persistent y;
if isempty(y)
y = 0;
end
if u >= delta
y = 1;
elseif u <= -delta
y = 0;
end
out = y;
When I look to the output and compare with the real relay block I see :
Where does the difference come from?
Both blocks insert the same sample time, does the relay block have something extra to show the discontinuity?
Simulink block diagram download
I'm not quite sure about this explanation, maybe somebody can support it.
The MATLAB function Block does not support Zero-Crossing Detection, the Relay Block does. That means the latter knows in advance, when your sine will reach the threshold delta and sets the output accordingly to the correct time. The MATLAB function Block needs 2 or more steps to detect the slope (respectively the crossing of the threshold). So from one step to another it realizes that the condition for the new output was set and updates the output and you get a ramp, not a step.
C/C++ S-Functions do have Zero-Crossing Detection - though it seems quite complicated.

For iterator (loop)

I am trying to simulate throw of the ball under angles using simulink. I'm able to simulate it for one angle but I would like to simulate it using loop. This is what I want to do in simulink using FOR :
for i=-5:10:85
Here is picture of my simulink:
If I understand your question correctly, you essentially want to rerun your simulation multiple times for different values of the constant Degrees. Instead of using a For Iterator, you may be able to achieve effectively the same result by using vector operations. That is to say, change the value of the constant Degrees from being a scalar value to instead being a vector (in this particular case just set its value to be [5:10:85]). The outputs of your Simulink model (ie the x and y results) should now be vectors corresponding to the various Degree values.
Put all the blocks into the for-iterator subsystem. The For Iterator block will output the current iteration, you can use that index (which starts at 0/1) to cycle the angle from -5 to 85 (try to hook the For Iterator block up to a Gain and Sum block). At each iteration, all the blocks in the for-iterator subsystem will run, and the output of the For Iterator block will increment by one.
The previous solution to make the angles a vector will also work.
Using MATLAB's for reference page, I'd rewrite your line as:
for i=5:10:85
...
end

Neural Network Backpropagation?

Can anyone recommend a website or give me a brief of how backpropagation is implemented in a NN? I understand the basic concept, but I'm unsure of how to go about writing the code.
Many of sources I've found simply show equations without giving any explanation of why they're doing it, and the variable names make it difficult to find out.
Example:
void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
int j;
double o, t, errsum;
errsum = 0.0;
for (j = 1; j <= nj; j++) {
o = output[j];
t = target[j];
delta[j] = o * (1.0 - o) * (t - o);
errsum += ABS(delta[j]);
}
*err = errsum;
}
In that example, can someone explain the purpose of
delta[j] = o * (1.0 - o) * (t - o);
Thanks.
The purpose of
delta[j] = o * (1.0 - o) * (t - o);
is to find the error of an output node in a backpropagation network.
o represents the output of the node, t is the expected value of output for the node.
The term, (o * (1.0 - o), is the derivative of a common transfer function used, the sigmoid function. (Other transfer functions are not uncommon, and would require a rewrite of the code that has the sigmoid first derivative instead. A mismatch between function and derivative would likely mean that training would not converge.) The node has an "activation" value that is fed through a transfer function to obtain the output o, like
o = f(activation)
The main thing is that backpropagation uses gradient descent, and the error gets backward-propagated by application of the Chain Rule. The problem is one of credit assignment, or blame if you will, for the hidden nodes whose output is not directly comparable to the expected value. We start with what is known and comparable, the output nodes. The error is taken to be proportional to the first derivative of the output times the raw error value between the expected output and actual output.
So more symbolically, we'd write that line as
delta[j] = f'(activation_j) * (t_j - o_j)
where f is your transfer function, and f' is the first derivative of it.
Further back in the hidden layers, the error at a node is its estimated contribution to the errors found at the next layer. So the deltas from the succeeding layer are multiplied by the connecting weights, and those products are summed. That sum is multiplied by the first derivative of the activation of the hidden node to get the delta for a hidden node, or
delta[j] = f'(activation_j) * Sum(delta[k] * w_jk)
where j now references a hidden node and k a node in a succeeding layer.
(t-o) is the error in the output of the network since t is the target output and o is the actual output. It is being stored in a normalized form in the delta array. The method used to normalize depends on the implementation and the o * ( 1.0 - o ) seems to be doing that (I could be wrong about that assumption).
This normalized error is accumulated for the entire training set to judge when the training is complete: usually when errsum is below some target threshold.
Actually, if you know the theory, the programs should be easy to understand. You can read the book and do some simple samples using a pencil to figure out the exact steps of propagation. This is a general principle for implementing numerical programs, you must know the very details in small cases.
If you know Matlab, I'd suggest you to read some Matlab source code (e.g. here), which is easier to understand than C.
For the code in your question, the names are quite self-explanatory, output may be the array of your prediction, target may be the array of training labels, delta is the error between prediction and true values, it also serves as the value to be updated into the weight vector.
Essentially, what backprop does is run the network on the training data, observe the output, then adjust the values of the nodes, going from the output nodes back to the input nodes iteratively.