Generating a flat mesh that shares vertices using compute shaders - unity3d

I have what seemed to be a simple problem that has now resulted in noise complaints from the neighbours over my screams of frustration.
TL;DR
Procedural meshes are normally make using strips of quads. I'm instead trying to make a mesh as one piece, reusing edge vertices, instead of lining up the quad strips as if it was one mesh.
I'm testing something so maybe this is a wierd way to do it, but it should work.
Shader 1:
RWStructuredBuffer<float3> vertexBuffer;
uniform uint yColumnHeight;
[numthreads(8,8,1)]
void calcVerts (uint3 id : SV_DispatchThreadID)
{
//convert x and y to 1 dimensional counter
int idx = (id.y + (yColumnHeight * id.x));
//create a flat array of vertices
float3 vA = float3(id.x, 1, id.y);
vertexBuffer[idx] = vA;
}
Shader 2:
RWStructuredBuffer<float3> vertexBuffer;
RWStructuredBuffer<float3> triangleBuffer;
uniform uint yColumnHeight;
[numthreads(8,8,1)]
void createMeshFromVerts (uint3 id : SV_DispatchThreadID)
{
int idx = (id.y + (yColumnHeight * id.x));
if (id.x > 0 && id.y > 0){
//convert idx to index for tri/quad vertices, skipping first row and column
int subtractFirstYColumn = idx - yColumnHeight;
int subtractFirstXRow = id.y - 1;
int trID = (subtractFirstYColumn - subtractFirstXRow) * 6;
//find the vertices of the quad using verts from first row and column
int tri_a = idx - yColumnHeight - 1;
int tri_b = idx - 1;
int tri_c = idx;
int tri_d = idx - yColumnHeight;
triangleBuffer[trID] = vertexBuffer[tri_a];
triangleBuffer[trID + 1] = vertexBuffer[tri_b];
triangleBuffer[trID + 2] = vertexBuffer[tri_c];
triangleBuffer[trID + 3] = vertexBuffer[tri_d];
triangleBuffer[trID + 4] = vertexBuffer[tri_a];
triangleBuffer[trID + 5] = vertexBuffer[tri_c];
}
}
The second shader may initially seem obtuse, but it's quite simple. I'm getting an array of verts:
. . . .
. . . .
. . . .
. . . .
In the above, that's a 3x3 grid of quads, made of 4x4 verts.
I start by getting the vert 1 across and 1 down, and making a quad with the top left corner verts.
Each quad starts with vert _ and uses preceeding verts . like this:
. .
. _
And tied together in the main C#:
//buffers for vertices and map of vertices to make triangles
vertexBuffer = new ComputeBuffer(triVertCount, stride, ComputeBufferType.Default);
triangleBuffer = new ComputeBuffer(tris, stride, ComputeBufferType.Default);
//create initial vertices grid
calcVerts.SetBuffer(verts, "vertexBuffer", vertexBuffer);
calcVerts.Dispatch(verts, Mathf.Max(1, (widthInVertices) / (int)threadsx), Mathf.Max(1, (heightInVertices) / (int)threadsy), (int)z);
//use vertices grid to make mesh
createMeshFromVerts.SetBuffer(meshFromVerts, "vertexBuffer", vertexBuffer);
createMeshFromVerts.SetBuffer(meshFromVerts, "triangleBuffer", triangleBuffer);
createMeshFromVerts.Dispatch(meshFromVerts, Mathf.Max(1, (widthInVertices) / (int)threadsx), Mathf.Max(1, (heightInVertices) / (int)threadsy), (int)z);
I skipped the code for normals, and where I pass to material to render. When this runs I get scrambled triangles. Can you see where I messed up?

The calculation of trId results in overlapping of some indices, and skipping of other values.
With a 4x2 grid of vertices (idx shown) and yColumnHeight of 4:
0 4
x <- (desired trID 0)
1 5
x <- (desired trID 6)
2 6
x <- (desired trID 12)
3 7
The currently calculated trId for id = 1,1 (idx 5) comes out to 6, but it should probably come to 0 so that the first 6 items in the triangleBuffer are set to something useful. In fact, no trId ever equals 0 using the current calculation. Furthermore, the currently calculated trId for id = 1,2 (idx 6) comes out to 6 as well! And so does id=1,3 (idx 7).
Sadly, this overlap occurs in every column, and most of the triangleBuffer goes unset as a result of this.
The answer is to change how trId is calculated.
A simple way is to re-use your method of mapping from 2d to 1d array, only reducing the x and y coordinates by 1 and also reducing the height by one:
Vertex mapping (current):
int idx = (id.y + (yColumnHeight * id.x));
Triangle mapping (proposed):
int trId = ((id.y-1) + ((yColumnHeight-1) * (id.x-1));
trId *= 6;
or more simply:
int trId = 6 * (id.y - 1 + (yColumnHeight-1) * (id.x-1));
or, expanding and substituting idx. I find this less clear what's happening but it's more succinct:
// = 6 * (id.y - 1 + yColumnHeight * id.x - yColumnHeight - id.x + 1)
// = 6 * (id.y + yColumnHeight * id.x - yColumnHeight - id.x)
int trId = 6 * (idx - yColumnHeight - id.x);

Related

Manually building Hexagonal Torus

I am interested in building a hexagonal Torus using a mesh of points?
I think I can start with a 2-d polygon, and then iterate 360 times (1 deg resolution) to build a complete solid.
Is this the best way to do this? What I'm really after is building wing profiles with variable cross section geometry over it's span.
In Your way You can do this with polyhedron(). Add an appropriate number of points per profile in defined order to a vector „points“, define faces by the indices of the points in a second vector „faces“ and set both vectors as parameter in polyhedron(), see documentation. You can control the quality of the surface by the number of points per profile and the distance between the profiles (sectors in torus).
Here an example code:
// parameter:
r1 = 20; // radius of torus
r2 = 4; // radius of polygon/ thickness of torus
s = 360; // sections per 360 deg
p = 6; // points on polygon
a = 30; // angle of the first point on Polygon
// points on cross-section
// angle = 360*i/p + startangle, x = r2*cos(angle), y = 0, z = r2*sin(angle)
function cs_point(i) = [r1 + r2*cos(360*i/p + a), 0, r2*sin(360*i/p + a)];
// returns to the index in the points - vector the section number and the number of the point on this section
function point_index(i) = [floor(i/p), i - p*floor(i/p)];
// returns the points x-, y-, z-coordinates by rotatating the corresponding point from crossection around the z-axis
function iterate_cs(i) = [cs[point_index(i)[1]][0]*cos(360*floor(i/p)/s), cs[point_index(i)[1]][0]*sin(360*floor(i/p)/s), cs[point_index(i)[1]][2]];
// for every point find neighbour points to build faces, ( + p: point on the next cross-section), points ordered clockwise
// to connect point on last section to corresponding points on first section
function item_add1(i) = i >= (s - 1)*p ? -(s)*p : 0;
// to connect last point on section to first points on the same and the next section
function item_add2(i) = i - p*floor(i/p) >= p-1 ? -p : 0;
// build faces
function find_neighbours1(i) = [i, i + 1 + item_add2(i), i + 1 + item_add2(i) + p + item_add1(i)];
function find_neighbours2(i) = [i, i + 1 + + item_add2(i) + p + item_add1(i), i + p + item_add1(i)];
cs = [for (i = [0:p-1]) cs_point(i)];
points = [for (i = [0:s*p - 1]) iterate_cs(i)];
faces1 = [for (i = [0:s*p - 1]) find_neighbours1(i)];
faces2 = [for (i = [0:s*p - 1]) find_neighbours2(i)];
faces = concat(faces1, faces2);
polyhedron(points = points, faces = faces);
here the result:
Since openscad 2015-03 faces can have more than 3 points, if all points of the face are on the same plane. So in this case faces could be build in one step too.
Are you building smth. like NACA airfoils? https://en.wikipedia.org/wiki/NACA_airfoil
There are a few OpenSCAD designs for those floating around, see e.g. https://www.thingiverse.com/thing:898554

How does the reversebits function of HLSL SM5 work?

I am trying to implement an inverse FFT in a HLSL compute shader and don't understand how the new inversebits function works. The shader is run under Unity3D, but that shouldn't make a difference.
The problem is, that the resulting texture remains black with the exception of the leftmost one or two pixels in every row. It seems to me, as if the reversebits function wouldn't return the correct indexes.
My very simple code is as following:
#pragma kernel BitReverseHorizontal
Texture2D<float4> inTex;
RWTexture2D<float4> outTex;
uint2 getTextureThreadPosition(uint3 groupID, uint3 threadID) {
uint2 pos;
pos.x = (groupID.x * 16) + threadID.x;
pos.y = (groupID.y * 16) + threadID.y;
return pos;
}
[numthreads(16,16,1)]
void BitReverseHorizontal (uint3 threadID : SV_GroupThreadID, uint3 groupID : SV_GroupID)
{
uint2 pos = getTextureThreadPosition(groupID, threadID);
uint xPos = reversebits(pos.x);
uint2 revPos = uint2(xPos, pos.y);
float4 values;
values.x = inTex[pos].x;
values.y = inTex[pos].y;
values.z = inTex[revPos].z;
values.w = 0.0f;
outTex[revPos] = values;
}
I played around with this for quite a while and found out, that if I replace the reversebits line with this one here:
uint xPos = reversebits(pos.x << 23);
it works. Although I have no idea why. Could be just coincidence. Could someone please explain to me, how I have to use the reversebits function correctly?
Are you sure you want to reverse the bits?
x = 0: reversed: x = 0
x = 1: reversed: x = 2,147,483,648
x = 2: reversed: x = 1,073,741,824
etc....
If you fetch texels from a texture using coordinates exceeding the width of the texture then you're going to get black. Unless the texture is > 1 billion texels wide (it isn't) then you're fetching well outside the border.
I am doing the same and came to the same problem and these answers actually answered it for me but i'll give you the explanation and a whole solution.
So the solution with variable length buffers in HLSL is:
uint reversedIndx;
uint bits = 32 - log2(xLen); // sizeof(uint) - log2(numberOfIndices);
for (uint j = 0; j < xLen; j ++)
reversedIndx = reversebits(j << bits);
And what you found/noticed essentially pushes out all the leading 0 of your index so you are just reversing the least significant or rightmost bits up until the max bits we want.
for example:
int length = 8;
int bits = 32 - 3; // because 1 << 3 is 0b1000 and we want the inverse like a mask
int j = 6;
and since the size of an int is generally 32bits in binary j would be
j = 0b00000000000000000000000000000110;
and reversed it would be (AKA reversebits(j);)
j = 0b01100000000000000000000000000000;
Which was our error, so j bit shifted by bits would be
j = 0b11000000000000000000000000000000;
and then reversed and what we want would be
j = 0b00000000000000000000000000000011;

3D Texture emulation in shader (subpixel related)

I am working on a Unity3D project which relies on a 3D texture momentarily.
The problem is, Unity only allows Pro users to make use of Texture3D. Hence I'm looking for an alternative to Texture3D, perhaps a one dimensional texture (although not natively available in Unity) that is interpreted as 3 dimensional in the shader (which uses the 3D texture).
Is there a way to do this whilst (preferably) keeping subpixel information?
(GLSL and Cg tags added because here lies the core of the problem)
Edit: The problem is addressed here as well: webgl glsl emulate texture3d
However this is not yet finished and working properly.
Edit: For the time being I disregard proper subpixel information. So any help on converting a 2D texture to contain 3D information is appreciated!
Edit: I retracted my own answer as it isn't sufficient as of yet:
float2 uvFromUvw( float3 uvw ) {
float2 uv = float2(uvw.x, uvw.y / _VolumeTextureSize.z);
uv.y += float(round(uvw.z * (_VolumeTextureSize.z - 1))) / _VolumeTextureSize.z;
return uv;
}
With initialization as Texture2D(volumeWidth, volumeHeight * volumeDepth).
Most of the time it works, but sometimes it shows wrong pixels, probably because of subpixel information it is picking up on. How can I fix this? Clamping the input doesn't work.
I'm using this for my 3D clouds if that helps:
float SampleNoiseTexture( float3 _UVW, float _MipLevel )
{
float2 WrappedUW = fmod( 16.0 * (1000.0 + _UVW.xz), 16.0 ); // UW wrapped in [0,16[
float IntW = floor( WrappedUW.y ); // Integer slice number
float dw = WrappedUW.y - IntW; // Remainder for intepolating between slices
_UVW.x = (17.0 * IntW + WrappedUW.x + 0.25) * 0.00367647058823529411764705882353; // divided by 17*16 = 272
float4 Value = tex2D( _TexNoise3D, float4( _UVW.xy, 0.0, 0.0 ) );
return lerp( Value.x, Value.y, dw );
}
The "3D texture" is packed as 16 slices of 17 pixels wide in a 272x16 texture, with the 17th column of each slice being a copy of the 1st column (wrap address mode)...
Of course, no mip-mapping allowed with this technique.
Here's the code I'm using to create the 3D texture if that's what bothering you:
static const NOISE3D_TEXTURE_POT = 4;
static const NOISE3D_TEXTURE_SIZE = 1 << NOISE3D_TEXTURE_POT;
// <summary>
// Create the "3D noise" texture
// To simulate 3D textures that are not available in Unity, I create a single long 2D slice of (17*16) x 16
// The width is 17*16 so that all 3D slices are packed into a single line, and I use 17 as a single slice width
// because I pad the last pixel with the first column of the same slice so bilinear interpolation is correct.
// The texture contains 2 significant values in Red and Green :
// Red is the noise value in the current W slice
// Green is the noise value in the next W slice
// Then, the actual 3D noise value is an interpolation of red and green based on the W remainder
// </summary>
protected NuajTexture2D Build3DNoise()
{
// Build first noise mip level
float[,,] NoiseValues = new float[NOISE3D_TEXTURE_SIZE,NOISE3D_TEXTURE_SIZE,NOISE3D_TEXTURE_SIZE];
for ( int W=0; W < NOISE3D_TEXTURE_SIZE; W++ )
for ( int V=0; V < NOISE3D_TEXTURE_SIZE; V++ )
for ( int U=0; U < NOISE3D_TEXTURE_SIZE; U++ )
NoiseValues[U,V,W] = (float) SimpleRNG.GetUniform();
// Build actual texture
int MipLevel = 0; // In my original code, I build several textures for several mips...
int MipSize = NOISE3D_TEXTURE_SIZE >> MipLevel;
int Width = MipSize*(MipSize+1); // Pad with an additional column
Color[] Content = new Color[MipSize*Width];
// Build content
for ( int W=0; W < MipSize; W++ )
{
int Offset = W * (MipSize+1); // W Slice offset
for ( int V=0; V < MipSize; V++ )
{
for ( int U=0; U <= MipSize; U++ )
{
Content[Offset+Width*V+U].r = NoiseValues[U & (MipSize-1),V,W];
Content[Offset+Width*V+U].g = NoiseValues[U & (MipSize-1),V,(W+1) & (MipSize-1)];
}
}
}
// Create texture
NuajTexture2D Result = Help.CreateTexture( "Noise3D", Width, MipSize, TextureFormat.ARGB32, false, FilterMode.Bilinear, TextureWrapMode.Repeat );
Result.SetPixels( Content, 0 );
Result.Apply( false, true );
return Result;
}
I followed Patapoms response and came to the following. However it's still off as it should be.
float getAlpha(float3 position)
{
float2 WrappedUW = fmod( _Volume.xz * (1000.0 + position.xz), _Volume.xz ); // UW wrapped in [0,16[
float IntW = floor( WrappedUW.y ); // Integer slice number
float dw = WrappedUW.y - IntW; // Remainder for intepolating between slices
position.x = ((_Volume.z + 1.0) * IntW + WrappedUW.x + 0.25) / ((_Volume.z + 1.0) * _Volume.x); // divided by 17*16 = 272
float4 Value = tex2Dlod( _VolumeTex, float4( position.xy, 0.0, 0.0 ) );
return lerp( Value.x, Value.y, dw );
}
public int GetPixelId(int x, int y, int z) {
return y * (volumeWidth + 1) * volumeDepth + z * (volumeWidth + 1) + x;
}
// Code to set the pixelbuffer one pixel at a time starting from a clean slate
pixelBuffer[GetPixelId(x, y, z)].r = color.r;
if (z > 0)
pixelBuffer[GetPixelId(x, y, z - 1)].g = color.r;
if (z == volumeDepth - 1 || z == 0)
pixelBuffer[GetPixelId(x, y, z)].g = color.r;
if (x == 0) {
pixelBuffer[GetPixelId(volumeWidth, y, z)].r = color.r;
if (z > 0)
pixelBuffer[GetPixelId(volumeWidth, y, z - 1)].g = color.r;
if (z == volumeDepth - 1 || z == 0)
pixelBuffer[GetPixelId(volumeWidth, y, z)].g = color.r;
}

How would I implement the Matlab skeletonizing/thinning algorithm on the iPhone?

How can I implement the Matlab algorithm that will skeletonize / thin binary (black
and white) images in Objective-C within an iPhone app?
Well basically you could use morpholocial operators for this...
Build eight hit-or-miss operators like this:
0 0 0
St1 = x 1 x (for deleting upper pixels)
1 1 1
rotate this 4 times to get it for the 4 sides. Then also build 4 more ofr the corners like this:
0 0 x
St5 = 0 1 1 (rotate this again 4 times for the 4 corners)
x 1 1
Then you erode your image (with loops) until none of the operators can be used anymore... what is left is the skeleton of that image...
This shouldn't be too hard to implement in Objective C I guess... (not familiar with it) ... this is a general strategy...
Hope that helps... if not, keep asking... ;-)
EDIT:
I've wrote GLSL fragment shader which performs fast skeletonization on images. You can apply this shader in a loop until you get what you need. GLSL shader code:
uniform sampler2D Texture0;
varying vec2 texCoord;
// 3x3 pixel window
// (-1,+1) (0,+1) (+1,+1)
// (-1,0) (0,0) (+1,0)
// (-1,-1) (0,-1) (+1,-1)
float dtex = 1.0 / float(textureSize(Texture0,0));
vec4 pixel(int dx, int dy) {
return texture2D(Texture0,texCoord +
vec2(float(dx)*dtex, float(dy)*dtex));
}
int exists(int dx, int dy) {
return int(pixel(dx,dy).r < 0.5);
}
int neighbors() {
return exists(-1,+1) +
exists(0,+1) +
exists(+1,+1) +
exists(-1,0) +
exists(+1,0) +
exists(-1,-1) +
exists(0,-1) +
exists(+1,-1);
}
int transitions() {
return int(
clamp(float(exists(-1,+1))-float(exists(0,+1)),0.,1.) + // (-1,+1) -> (0,+1)
clamp(float(exists(0,+1))-float(exists(+1,+1)),0.,1.) + // (0,+1) -> (+1,+1)
clamp(float(exists(+1,+1))-float(exists(+1,0)),0.,1.) + // (+1,+1) -> (+1,0)
clamp(float(exists(+1,0))-float(exists(+1,-1)),0.,1.) + // (+1,0) -> (+1,-1)
clamp(float(exists(+1,-1))-float(exists(0,-1)),0.,1.) + // (+1,-1) -> (0,-1)
clamp(float(exists(0,-1))-float(exists(-1,-1)),0.,1.) + // (0,-1) -> (-1,-1)
clamp(float(exists(-1,-1))-float(exists(-1,0)),0.,1.) + // (-1,-1) -> (-1,0)
clamp(float(exists(-1,0))-float(exists(-1,+1)),0.,1.) // (-1,0) -> (-1,+1)
);
}
int MarkedForRemoval() {
int neib = neighbors();
int tran = transitions();
if (exists(0,0)==0 // do not remove if already white
|| neib==0 // do not remove an isolated point
|| neib==1 // do not remove tip of a line
|| neib==7 // do not remove located in concavity
|| neib==8 // do not remove not a boundary point
|| tran>=2 // do not remove on a bridge connecting two or more edge pieces
)
return 0;
else
return 1;
}
void main(void)
{
int remove = MarkedForRemoval();
vec4 curr = texture2D(Texture0,texCoord);
vec4 col = vec4(remove,remove,remove,1.0);
gl_FragColor = (remove==1)? col:((curr.r > 0.05)?
vec4(1.0,1.0,1.0,1.0):curr);
}
Only this time code is based on this lecture (actually on first part of lecture, so algorithm has some bugs :-) )
See what happens when poor chimpanzee was constantly fed with this GLSL shader:
Iteration 0
Iteration 5
Iteration 10
Iteration 15

Looking for some help working with premultiplied alpha

I am trying to update a source image with the contents of multiple destination images. From what I can tell using premultiplied alpha is the way to go with this, but I think I am doing something wrong (function below). the image I am starting with is initialized with all ARGB values set to 0. When I run the function once the resulting image looks great, but when I start compositing on any others all the pixels that have alpha information get really messed up. Does anyone know if I am doing something glaringly wrong or if there is something extra I need to do to modify the color values?
void CompositeImage(unsigned char *src, unsigned char *dest, int srcW, int srcH){
int w = srcW;
int h = srcH;
int px0;
int px1;
int px2;
int px3;
int inverseAlpha;
int r;
int g;
int b;
int a;
int y;
int x;
for (y = 0; y < h; y++) {
for (x= 0; x< w*4; x+=4) {
// pixel number
px0 = (y*w*4) + x;
px1 = (y*w*4) + (x+1);
px2 = (y*w*4) + (x+2);
px3 = (y*w*4) + (x+3);
inverseAlpha = 1 - src[px3];
// create new values
r = src[px0] + inverseAlpha * dest[px0];
g = src[px1] + inverseAlpha * dest[px1];
b = src[px2] + inverseAlpha * dest[px2];
a = src[px3] + inverseAlpha * dest[px3];
// update destination image
dest[px0] = r;
dest[px1] = g;
dest[px2] = b;
dest[px3] = a;
}
}
}
I'm not clear on what data you are working with. Do your source images already have the alpha values pre-multiplied as they are stored? If not, then pre-multiplied alpha does not apply here and you would need to do normal alpha blending.
Anyway, the big problem in your code is that you're not keeping track of the value ranges that you're dealing with.
inverseAlpha = 1 - src[px3];
This needs to be changed to:
inverseAlpha = 255 - src[px3];
You have all integral value types here, so the normal incoming 0..255 value range will result in an inverseAlpha range of -254..1, which will give you some truly wacky results.
After changing the 1 to 255, you also need to divide your results for each channel by 255 to scale them back down to the appropriate range. The alternative is to do the intermediate calculations using floats instead of integers and divide the initial channel values by 255.0 (instead of these other changes) to get values in the 0..1 range.
If your source data really does already have pre-multiplied alpha, then your result lines should look like this.
r = src[px0] + inverseAlpha * dest[px0] / 255;
If your source data does not have pre-multiplied alpha, then it should be:
r = src[px0] * src[px3] / 255 + inverseAlpha * dest[px0] / 255;
There's nothing special about blending the alpha channel. Use the same calculation as for r, g, and b.