How does the reversebits function of HLSL SM5 work? - unity3d

I am trying to implement an inverse FFT in a HLSL compute shader and don't understand how the new inversebits function works. The shader is run under Unity3D, but that shouldn't make a difference.
The problem is, that the resulting texture remains black with the exception of the leftmost one or two pixels in every row. It seems to me, as if the reversebits function wouldn't return the correct indexes.
My very simple code is as following:
#pragma kernel BitReverseHorizontal
Texture2D<float4> inTex;
RWTexture2D<float4> outTex;
uint2 getTextureThreadPosition(uint3 groupID, uint3 threadID) {
uint2 pos;
pos.x = (groupID.x * 16) + threadID.x;
pos.y = (groupID.y * 16) + threadID.y;
return pos;
}
[numthreads(16,16,1)]
void BitReverseHorizontal (uint3 threadID : SV_GroupThreadID, uint3 groupID : SV_GroupID)
{
uint2 pos = getTextureThreadPosition(groupID, threadID);
uint xPos = reversebits(pos.x);
uint2 revPos = uint2(xPos, pos.y);
float4 values;
values.x = inTex[pos].x;
values.y = inTex[pos].y;
values.z = inTex[revPos].z;
values.w = 0.0f;
outTex[revPos] = values;
}
I played around with this for quite a while and found out, that if I replace the reversebits line with this one here:
uint xPos = reversebits(pos.x << 23);
it works. Although I have no idea why. Could be just coincidence. Could someone please explain to me, how I have to use the reversebits function correctly?

Are you sure you want to reverse the bits?
x = 0: reversed: x = 0
x = 1: reversed: x = 2,147,483,648
x = 2: reversed: x = 1,073,741,824
etc....
If you fetch texels from a texture using coordinates exceeding the width of the texture then you're going to get black. Unless the texture is > 1 billion texels wide (it isn't) then you're fetching well outside the border.

I am doing the same and came to the same problem and these answers actually answered it for me but i'll give you the explanation and a whole solution.
So the solution with variable length buffers in HLSL is:
uint reversedIndx;
uint bits = 32 - log2(xLen); // sizeof(uint) - log2(numberOfIndices);
for (uint j = 0; j < xLen; j ++)
reversedIndx = reversebits(j << bits);
And what you found/noticed essentially pushes out all the leading 0 of your index so you are just reversing the least significant or rightmost bits up until the max bits we want.
for example:
int length = 8;
int bits = 32 - 3; // because 1 << 3 is 0b1000 and we want the inverse like a mask
int j = 6;
and since the size of an int is generally 32bits in binary j would be
j = 0b00000000000000000000000000000110;
and reversed it would be (AKA reversebits(j);)
j = 0b01100000000000000000000000000000;
Which was our error, so j bit shifted by bits would be
j = 0b11000000000000000000000000000000;
and then reversed and what we want would be
j = 0b00000000000000000000000000000011;

Related

Generating a flat mesh that shares vertices using compute shaders

I have what seemed to be a simple problem that has now resulted in noise complaints from the neighbours over my screams of frustration.
TL;DR
Procedural meshes are normally make using strips of quads. I'm instead trying to make a mesh as one piece, reusing edge vertices, instead of lining up the quad strips as if it was one mesh.
I'm testing something so maybe this is a wierd way to do it, but it should work.
Shader 1:
RWStructuredBuffer<float3> vertexBuffer;
uniform uint yColumnHeight;
[numthreads(8,8,1)]
void calcVerts (uint3 id : SV_DispatchThreadID)
{
//convert x and y to 1 dimensional counter
int idx = (id.y + (yColumnHeight * id.x));
//create a flat array of vertices
float3 vA = float3(id.x, 1, id.y);
vertexBuffer[idx] = vA;
}
Shader 2:
RWStructuredBuffer<float3> vertexBuffer;
RWStructuredBuffer<float3> triangleBuffer;
uniform uint yColumnHeight;
[numthreads(8,8,1)]
void createMeshFromVerts (uint3 id : SV_DispatchThreadID)
{
int idx = (id.y + (yColumnHeight * id.x));
if (id.x > 0 && id.y > 0){
//convert idx to index for tri/quad vertices, skipping first row and column
int subtractFirstYColumn = idx - yColumnHeight;
int subtractFirstXRow = id.y - 1;
int trID = (subtractFirstYColumn - subtractFirstXRow) * 6;
//find the vertices of the quad using verts from first row and column
int tri_a = idx - yColumnHeight - 1;
int tri_b = idx - 1;
int tri_c = idx;
int tri_d = idx - yColumnHeight;
triangleBuffer[trID] = vertexBuffer[tri_a];
triangleBuffer[trID + 1] = vertexBuffer[tri_b];
triangleBuffer[trID + 2] = vertexBuffer[tri_c];
triangleBuffer[trID + 3] = vertexBuffer[tri_d];
triangleBuffer[trID + 4] = vertexBuffer[tri_a];
triangleBuffer[trID + 5] = vertexBuffer[tri_c];
}
}
The second shader may initially seem obtuse, but it's quite simple. I'm getting an array of verts:
. . . .
. . . .
. . . .
. . . .
In the above, that's a 3x3 grid of quads, made of 4x4 verts.
I start by getting the vert 1 across and 1 down, and making a quad with the top left corner verts.
Each quad starts with vert _ and uses preceeding verts . like this:
. .
. _
And tied together in the main C#:
//buffers for vertices and map of vertices to make triangles
vertexBuffer = new ComputeBuffer(triVertCount, stride, ComputeBufferType.Default);
triangleBuffer = new ComputeBuffer(tris, stride, ComputeBufferType.Default);
//create initial vertices grid
calcVerts.SetBuffer(verts, "vertexBuffer", vertexBuffer);
calcVerts.Dispatch(verts, Mathf.Max(1, (widthInVertices) / (int)threadsx), Mathf.Max(1, (heightInVertices) / (int)threadsy), (int)z);
//use vertices grid to make mesh
createMeshFromVerts.SetBuffer(meshFromVerts, "vertexBuffer", vertexBuffer);
createMeshFromVerts.SetBuffer(meshFromVerts, "triangleBuffer", triangleBuffer);
createMeshFromVerts.Dispatch(meshFromVerts, Mathf.Max(1, (widthInVertices) / (int)threadsx), Mathf.Max(1, (heightInVertices) / (int)threadsy), (int)z);
I skipped the code for normals, and where I pass to material to render. When this runs I get scrambled triangles. Can you see where I messed up?
The calculation of trId results in overlapping of some indices, and skipping of other values.
With a 4x2 grid of vertices (idx shown) and yColumnHeight of 4:
0 4
x <- (desired trID 0)
1 5
x <- (desired trID 6)
2 6
x <- (desired trID 12)
3 7
The currently calculated trId for id = 1,1 (idx 5) comes out to 6, but it should probably come to 0 so that the first 6 items in the triangleBuffer are set to something useful. In fact, no trId ever equals 0 using the current calculation. Furthermore, the currently calculated trId for id = 1,2 (idx 6) comes out to 6 as well! And so does id=1,3 (idx 7).
Sadly, this overlap occurs in every column, and most of the triangleBuffer goes unset as a result of this.
The answer is to change how trId is calculated.
A simple way is to re-use your method of mapping from 2d to 1d array, only reducing the x and y coordinates by 1 and also reducing the height by one:
Vertex mapping (current):
int idx = (id.y + (yColumnHeight * id.x));
Triangle mapping (proposed):
int trId = ((id.y-1) + ((yColumnHeight-1) * (id.x-1));
trId *= 6;
or more simply:
int trId = 6 * (id.y - 1 + (yColumnHeight-1) * (id.x-1));
or, expanding and substituting idx. I find this less clear what's happening but it's more succinct:
// = 6 * (id.y - 1 + yColumnHeight * id.x - yColumnHeight - id.x + 1)
// = 6 * (id.y + yColumnHeight * id.x - yColumnHeight - id.x)
int trId = 6 * (idx - yColumnHeight - id.x);

Renderscript Greyscale not quite working

This is my renderscript code for now:
#pragma version(1)
#pragma rs java_package_name(com.apps.foo.bar)
rs_allocation inPixels;
uchar4 RS_KERNEL root(uchar4 in, uint32_t x, uint32_t y) {
uchar4 pixel = in.rgba;
pixel.r = (pixel.r + pixel.g + pixel.b)/3;
pixel.g = (pixel.r + pixel.g + pixel.b)/3;
pixel.b = (pixel.r + pixel.g + pixel.b)/3;
return pixel;
}
My phone shows a "greyscaled" picture. I say "grayscaled" because red for example, is still kinda red...It is gray-ish but you can still see that is red. I know I can use more sophisticated methods, but I would like to stick to the simple one for now.
I would like to know if my renderscript code is wrong. Should I be converting the char to another type?
Use a temporary variable to hold the result as you compute it. Otherwise, in the first line you're modifying pixel.r, and in the very next one you are using it to calculate pixel.g. No wonder you get artifacts.
Also, don't forget to assign the alpha value to avoid surprises with "invisible" output.
Also I would recommend not to use equal weights for r, g and b but the weights as below. See e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale/
char4 __attribute__((kernel)) gray(uchar4 in) {
uchar4 out;
float gr= 0.2125*in.r + 0.7154*in.g + 0.0721*in.b;
out.r = out.g = out.b = gr;
out.a = in.a;
return out;
}

Perform autocorrelation with vDSP_conv from Apple Accelerate Framework

I need to perform the autocorrelation of an array (vector) but I am having trouble finding the correct way to do so. I believe that I need the method "vDSP_conv" from the Accelerate Framework, but I can't follow how to successfully set it up. The thing throwing me off the most is the need for 2 inputs. Perhaps I have the wrong function, but I couldn't find one that operated on a single vector.
The documentation can be found here
Copied from the site
vDSP_conv
Performs either correlation or convolution on two vectors; single
precision.
void vDSP_conv ( const float __vDSP_signal[], vDSP_Stride
__vDSP_signalStride, const float __vDSP_filter[], vDSP_Stride __vDSP_strideFilter, float __vDSP_result[], vDSP_Stride __vDSP_strideResult, vDSP_Length __vDSP_lenResult, vDSP_Length __vDSP_lenFilter );
Parameters
__vDSP_signal
Input vector A. The length of this vector must be at least __vDSP_lenResult + __vDSP_lenFilter - 1.
__vDSP_signalStride
The stride through __vDSP_signal.
__vDSP_filter
Input vector B.
__vDSP_strideFilter
The stride through __vDSP_filter.
__vDSP_result
Output vector C.
__vDSP_strideResult
The stride through __vDSP_result.
__vDSP_lenResult
The length of __vDSP_result.
__vDSP_lenFilter
The length of __vDSP_filter.
For an example, just assume you have an array of float x = [1.0, 2.0, 3.0, 4.0, 5.0]. How would I take the autocorrelation of that?
The output should be something similar to float y = [5.0, 14.0, 26.0, 40.0, 55.0, 40.0, 26.0, 14.0, 5.0] //generated using Matlab's xcorr(x) function
performing autocorrelation simply means you take the cross-correlation of one vector with itself. There is nothing fancy about it.
so in your case, do:
vDSP_conv(x, 1, x, 1, result, 1, 2*len_X-1, len_X);
check a sample code for more details: (which does a convolution)
http://disanji.net/iOS_Doc/#documentation/Performance/Conceptual/vDSP_Programming_Guide/SampleCode/SampleCode.html
EDIT: This borders on ridiculous, but you need to offset the x value by a specific number of zeros, which is just crazy.
the following is a working code, just set filter to the value of x you desire, and it will put the rest in the correct position:
float *signal, *filter, *result;
int32_t signalStride, filterStride, resultStride;
uint32_t lenSignal, filterLength, resultLength;
uint32_t i;
filterLength = 5;
resultLength = filterLength*2 -1;
lenSignal = ((filterLength + 3) & 0xFFFFFFFC) + resultLength;
signalStride = filterStride = resultStride = 1;
printf("\nConvolution ( resultLength = %d, "
"filterLength = %d )\n\n", resultLength, filterLength);
/* Allocate memory for the input operands and check its availability. */
signal = (float *) malloc(lenSignal * sizeof(float));
filter = (float *) malloc(filterLength * sizeof(float));
result = (float *) malloc(resultLength * sizeof(float));
for (i = 0; i < filterLength; i++)
filter[i] = (float)(i+1);
for (i = 0; i < resultLength; i++)
if (i >=resultLength- filterLength)
signal[i] = filter[i - filterLength+1];
/* Correlation. */
vDSP_conv(signal, signalStride, filter, filterStride,
result, resultStride, resultLength, filterLength);
printf("signal: ");
for (i = 0; i < lenSignal; i++)
printf("%2.1f ", signal[i]);
printf("\n filter: ");
for (i = 0; i < filterLength; i++)
printf("%2.1f ", filter[i]);
printf("\n result: ");
for (i = 0; i < resultLength; i++)
printf("%2.1f ", result[i]);
/* Free allocated memory. */
free(signal);
free(filter);
free(result);

How would I implement the Matlab skeletonizing/thinning algorithm on the iPhone?

How can I implement the Matlab algorithm that will skeletonize / thin binary (black
and white) images in Objective-C within an iPhone app?
Well basically you could use morpholocial operators for this...
Build eight hit-or-miss operators like this:
0 0 0
St1 = x 1 x (for deleting upper pixels)
1 1 1
rotate this 4 times to get it for the 4 sides. Then also build 4 more ofr the corners like this:
0 0 x
St5 = 0 1 1 (rotate this again 4 times for the 4 corners)
x 1 1
Then you erode your image (with loops) until none of the operators can be used anymore... what is left is the skeleton of that image...
This shouldn't be too hard to implement in Objective C I guess... (not familiar with it) ... this is a general strategy...
Hope that helps... if not, keep asking... ;-)
EDIT:
I've wrote GLSL fragment shader which performs fast skeletonization on images. You can apply this shader in a loop until you get what you need. GLSL shader code:
uniform sampler2D Texture0;
varying vec2 texCoord;
// 3x3 pixel window
// (-1,+1) (0,+1) (+1,+1)
// (-1,0) (0,0) (+1,0)
// (-1,-1) (0,-1) (+1,-1)
float dtex = 1.0 / float(textureSize(Texture0,0));
vec4 pixel(int dx, int dy) {
return texture2D(Texture0,texCoord +
vec2(float(dx)*dtex, float(dy)*dtex));
}
int exists(int dx, int dy) {
return int(pixel(dx,dy).r < 0.5);
}
int neighbors() {
return exists(-1,+1) +
exists(0,+1) +
exists(+1,+1) +
exists(-1,0) +
exists(+1,0) +
exists(-1,-1) +
exists(0,-1) +
exists(+1,-1);
}
int transitions() {
return int(
clamp(float(exists(-1,+1))-float(exists(0,+1)),0.,1.) + // (-1,+1) -> (0,+1)
clamp(float(exists(0,+1))-float(exists(+1,+1)),0.,1.) + // (0,+1) -> (+1,+1)
clamp(float(exists(+1,+1))-float(exists(+1,0)),0.,1.) + // (+1,+1) -> (+1,0)
clamp(float(exists(+1,0))-float(exists(+1,-1)),0.,1.) + // (+1,0) -> (+1,-1)
clamp(float(exists(+1,-1))-float(exists(0,-1)),0.,1.) + // (+1,-1) -> (0,-1)
clamp(float(exists(0,-1))-float(exists(-1,-1)),0.,1.) + // (0,-1) -> (-1,-1)
clamp(float(exists(-1,-1))-float(exists(-1,0)),0.,1.) + // (-1,-1) -> (-1,0)
clamp(float(exists(-1,0))-float(exists(-1,+1)),0.,1.) // (-1,0) -> (-1,+1)
);
}
int MarkedForRemoval() {
int neib = neighbors();
int tran = transitions();
if (exists(0,0)==0 // do not remove if already white
|| neib==0 // do not remove an isolated point
|| neib==1 // do not remove tip of a line
|| neib==7 // do not remove located in concavity
|| neib==8 // do not remove not a boundary point
|| tran>=2 // do not remove on a bridge connecting two or more edge pieces
)
return 0;
else
return 1;
}
void main(void)
{
int remove = MarkedForRemoval();
vec4 curr = texture2D(Texture0,texCoord);
vec4 col = vec4(remove,remove,remove,1.0);
gl_FragColor = (remove==1)? col:((curr.r > 0.05)?
vec4(1.0,1.0,1.0,1.0):curr);
}
Only this time code is based on this lecture (actually on first part of lecture, so algorithm has some bugs :-) )
See what happens when poor chimpanzee was constantly fed with this GLSL shader:
Iteration 0
Iteration 5
Iteration 10
Iteration 15

Looking for some help working with premultiplied alpha

I am trying to update a source image with the contents of multiple destination images. From what I can tell using premultiplied alpha is the way to go with this, but I think I am doing something wrong (function below). the image I am starting with is initialized with all ARGB values set to 0. When I run the function once the resulting image looks great, but when I start compositing on any others all the pixels that have alpha information get really messed up. Does anyone know if I am doing something glaringly wrong or if there is something extra I need to do to modify the color values?
void CompositeImage(unsigned char *src, unsigned char *dest, int srcW, int srcH){
int w = srcW;
int h = srcH;
int px0;
int px1;
int px2;
int px3;
int inverseAlpha;
int r;
int g;
int b;
int a;
int y;
int x;
for (y = 0; y < h; y++) {
for (x= 0; x< w*4; x+=4) {
// pixel number
px0 = (y*w*4) + x;
px1 = (y*w*4) + (x+1);
px2 = (y*w*4) + (x+2);
px3 = (y*w*4) + (x+3);
inverseAlpha = 1 - src[px3];
// create new values
r = src[px0] + inverseAlpha * dest[px0];
g = src[px1] + inverseAlpha * dest[px1];
b = src[px2] + inverseAlpha * dest[px2];
a = src[px3] + inverseAlpha * dest[px3];
// update destination image
dest[px0] = r;
dest[px1] = g;
dest[px2] = b;
dest[px3] = a;
}
}
}
I'm not clear on what data you are working with. Do your source images already have the alpha values pre-multiplied as they are stored? If not, then pre-multiplied alpha does not apply here and you would need to do normal alpha blending.
Anyway, the big problem in your code is that you're not keeping track of the value ranges that you're dealing with.
inverseAlpha = 1 - src[px3];
This needs to be changed to:
inverseAlpha = 255 - src[px3];
You have all integral value types here, so the normal incoming 0..255 value range will result in an inverseAlpha range of -254..1, which will give you some truly wacky results.
After changing the 1 to 255, you also need to divide your results for each channel by 255 to scale them back down to the appropriate range. The alternative is to do the intermediate calculations using floats instead of integers and divide the initial channel values by 255.0 (instead of these other changes) to get values in the 0..1 range.
If your source data really does already have pre-multiplied alpha, then your result lines should look like this.
r = src[px0] + inverseAlpha * dest[px0] / 255;
If your source data does not have pre-multiplied alpha, then it should be:
r = src[px0] * src[px3] / 255 + inverseAlpha * dest[px0] / 255;
There's nothing special about blending the alpha channel. Use the same calculation as for r, g, and b.