Optimizing Unity mesh generation | Faster than SetVertexBufferData/ SetIndexBufferData

I’m trying to squeeze every bit of performance out of some voxel mesh generation, and it takes about 1.1ms per chunk at the moment (average measurements):
~0.04ms for vertex, normal, and triangle index calculation in a burst job
~0.61ms for the SetVertexBufferData call
~0.45ms for SetIndexBufferData call
Is there a better way to update a mesh than this? Maybe one that works inside a burst job? I know 1.1ms is pretty fast but I can't help but feel like I'm missing out on performance when the actual data generation is an order of magnitude faster.
Here's what I have:
// Input heightmap
// arr is an int[] created from perlin noise
NativeArray<int> heights = new NativeArray<int>(arr, Allocator.TempJob);
int numVerts = heights.Length * 20;
// VData is the struct: { float3 Vert; float3 Norm; }
NativeArray<VData> verts = new NativeArray<VData>(numVerts, Allocator.TempJob);
// Triangle indices
NativeArray<ushort> tris = new NativeArray<ushort>(heights.Length * 30, Allocator.TempJob);
// create Verts, Tris, and Norms in IJob
Job job = new Job
Heights = heights,
Verts = verts,
Tris = tris
// Calculate the values
int indices = heights.Length * 30;
Mesh mesh = new Mesh();
new VertexAttributeDescriptor(VertexAttribute.Position),
new VertexAttributeDescriptor(VertexAttribute.Normal)
// slow
mesh.SetVertexBufferData(verts, 0, 0, numVerts, 0, MeshUpdateFlags.DontValidateIndices);
mesh.SetIndexBufferParams(indices, IndexFormat.UInt16);
// also slow
mesh.SetIndexBufferData(tris, 0, 0, indices, MeshUpdateFlags.DontValidateIndices);
mesh.SetSubMesh(0, new SubMeshDescriptor(0, indices));
If you have any alternatives or if you see any ways I could improve this, I'm all ears!


Do two floats in a compute shader being added or subtracted not give the same value 100% of the time?

I have a function I call to generate some randomness in my hlsl compute shader code
float rand3dTo1d(float3 value, float3 dotDir = float3(12.9898, 78.233, 37.719)){
//make value smaller to avoid artefacts
float3 smallValue = sin(value);
//get scalar value from 3d vector
float random = dot(smallValue, dotDir);
//make value more random by making it bigger and then taking the factional part
random = frac(sin(random) * 43758.5453);
return random;
If I pass in an incoming vectors location, all is fine, but if I try to pass in the center point of three vectors using this function into the randomness:
float3 GetTriangleCenter3d(float3 a, float3 b, float3 c) {
return (a + b + c) / 3.0;
Then ocassionally SOME of my points are not the same from frame to frame (shown by the color I paint the triangles with using this code). I get flickering of color.
float3 color = lerp(_ColorFrom, _ColorTo, rand1d);
I am at a total loss. I was able to at least get consitant results by using the thread id as the seed for the randomness, but not being able to use the centerpoint of the triangle is really weird to me and I have no idea what I am doing wrong or what I am missing. Any help would be great.

MeshData GetVertexData has the incorrect length

I'm trying to optimize some mesh generation using MeshData & the Job System, but for some reason when I try to use 2 params in meshData.SetVertexBufferParams, the resulting meshData.GetVertexData is half the length it should be (I set the vertex count to 5120, but the resulting VertexData NativeArray is only 2560 items long).
When I force it to be double the length (SetVertexBufferParams(numVerts * 2, ...)), it creates a mesh that appears to treat the norms and vert positions as all position data and also makes the screen go black so no screen shot.
Here's my code:
// generate 256 height values
int[] arr = new int[256];
for (int i = 0; i < arr.Length; i++)
arr[i] = (int) (Mathf.PerlinNoise(i / 16 / 16f, i % 16 / 16f) * 5);
// put it in a NativeArray
NativeArray<int> heights = new NativeArray<int>(arr, Allocator.TempJob);
// 4 verts per face * 5 faces = 20
int numVerts = heights.Length * 20; // this value is always 5120
// 2 tris per face * 5 daces * 3 indices = 30
int indices = heights.Length * 30;
// MeshData setup
Mesh.MeshDataArray meshDataArray = Mesh.AllocateWritableMeshData(1);
Mesh.MeshData meshData = meshDataArray[0];
new VertexAttributeDescriptor(VertexAttribute.Position, VertexAttributeFormat.Float32, 3, stream:0),
new VertexAttributeDescriptor(VertexAttribute.Normal, VertexAttributeFormat.Float32, 3, stream:1)
meshData.SetIndexBufferParams(indices, IndexFormat.UInt16);
// Create job
Job job = new Job
Heights = heights,
MeshData = meshData
// run job
// struct I'm using for vertex data
public struct VData
public float3 Vert;
public float3 Norm;
// Here's some parts of the job
public struct Job : IJob
public NativeArray<int> Heights;
public Mesh.MeshData MeshData;
public void Execute()
NativeArray<VData> Verts = MeshData.GetVertexData<VData>();
NativeArray<ushort> Tris = MeshData.GetIndexData<ushort>();
// loops from 0 to 255
for (int i = 0; i < Heights.Length; i++)
ushort t1 = (ushort)(w1 + 16);
// This indicates that Verts.Length is 2560 when it should be 5120
int t = i * 30; // tris
int height = Heights[i];
// x and y coordinate in chunk
int x = i / 16;
int y = i % 16;
float3 up = new float3(0, 1, 0);
// This throws and index out of bounds error because t1 becomes larger than Verts.Length
Verts[t1] = new VData { Vert = new float3(x + 1, height, y + 1), Norm = up};
// ...
new VertexAttributeDescriptor(VertexAttribute.Position, VertexAttributeFormat.Float32, 3, stream:0),
new VertexAttributeDescriptor(VertexAttribute.Normal, VertexAttributeFormat.Float32, 3, stream:1)
Your SetVertexBufferParams here places VertexAttribute.Position and VertexAttribute.Normal on a separate streams thus halving the size of the buffer per stream and later the length of the buffers if buffer becomes reinterpreted with the wrong struct by mistake.
This is how documentation explains streams:
Vertex data is laid out in separate "streams" (each stream goes into a separate vertex buffer in the underlying graphics API). While Unity supports up to 4 vertex streams, most meshes use just one. Separate streams are most useful when some vertex attributes don't need to be processed, for example skinned meshes often use two vertex streams (one containing all the skinned data: positions, normals, tangents; while the other stream contains all the non-skinned data: colors and texture coordinates).
But why it might end up re-interpreted as half the length? Well, because of this line:
NativeArray<VData> Verts = MeshData.GetVertexData<VData>();
How? Because there is a implicit stream parameter value there (doc)
public NativeArray<T> GetVertexData(int stream = 0);
and it defaults to 0. So what happens here is this:
var Verts = Positions_Only.Reinterpret<Position_And_Normals>();
or in other words:
var Verts = NativeArray<float3>().Reinterpret<float3x2>();
case solved :T
Change stream:1 to stream:0 so both vertex attributes end up on the same stream.
or var Positions = MeshData.GetVertexData<float3>(0); & var Normals = MeshData.GetVertexData<float3>(1);
or create a dedicated VData struct per stream var Stream0 = MeshData.GetVertexData<VStream0>(0); & var Stream1 = MeshData.GetVertexData<VStream1>(1);

Unity Compute Shader Texture Array Sampling

I've been struggling with this for while now and it is quite time critical so I have to ask here. I'm quite new to compute shaders but from what I've read, it is what I need for my usecase. I'm trying to find the total score from an array of textures, with the score being the product of each channel and a given weight. Previously, I was using NodeJS to do it but it doesn't scale as well given increasing the dimensions by 4 would increase the area required per texture by 16 and with multiple textures this isn't a good solution.
This is my compute shader right now:
// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel CSMain
// Create a RenderTexture with enableRandomWrite flag and set it
// with cs.SetTexture
SamplerState linearClampSampler;
float4 weights;
RWStructuredBuffer<Texture2DArray<float4>> scoreInput;
float output;
void CSMain (uint3 id : SV_DispatchThreadID)
float4 result_mult = scoreInput[id.z].Sample(id.uv).rgba * weights.xyzw;
output = result_mult.r + result_mult.g + result_mult.b + result_mult.a;
For my C# dispatcher, I am doing:
string[] paths = new string[sessionData.masks.Length];
Texture2D[] textures = new Texture2D[sessionData.masks.Length];
for (int i = 0; i < sessionData.masks.Length; i++)
paths[i] = sessionData.masks[i].combinedMasks;
textures[i] = CustomUtility.LoadPNG(paths[i]);
int colourSize = sizeof(float) * 4;
ComputeBuffer wallBuffer = new ComputeBuffer(textures.Length, colourSize);
CalculateScoreShader.SetBuffer(0, "scoreInput", wallBuffer);
CalculateScoreShader.Dispatch(0, 8,8,1);
I can't figure out how to sample the texture properly, and I want to make sure that I am setting up the buffer correctly for the shader to used like this. I also want to retrieve the output, but again I'm unsure how to do this.
I have looked through a decent amount of tutorials and documentation but I just can't seem to find the solution.

In unity, how do you find voxel information at a given worldspace position?

I am trying to have a gameobject in unity react with sound if another object is inside it. I want the gameobject to use the entering objects location to then see what voxel is closest and then play audio based on the voxel intensity/colour. Does anyone have any ideas? I am working with a dataset that is 512x256x512 voxels. I want it to work if the object is resized as well. Any help is much appreciated :).
The dataset I'm working with is a 3d .mhd medical scan of a body. Here is how the texture is added to the renderer on start:
for (int k = 0; k < NumberOfFrames; k++) {
string fname_ = "T" + k.ToString("D2");
Color[] colors = LoadData(Path.Combine (imageDir, fname_+".raw"));
_volumeBuffer.Add (new Texture3D (dim [0], dim [1], dim [2], TextureFormat.RGBAHalf, mipmap));
_volumeBuffer [k].Apply ();
GetComponent<Renderer>().material.SetTexture("_Data", _volumeBuffer[0]);
The size of the object is defined by using the mdh header files spacing as well as voxel dimensions:
transform.localScale = new Vector3(mhdheader.spacing[0] * volScale, mhdheader.spacing[1] * volScale * dim[1] / dim[0], mhdheader.spacing[2] * volScale * dim[2] / dim[0]);
I have tried making my own function to get the index from the world by offsetting it to the beginning of the render mesh (not sure if this is right). Then, scaling it by the local scale. Then, multiplying by the amount of voxels in each dimension. However, I am not sure if my logic is right whatsoever... Here is the code I tried:
public Vector3Int GetIndexFromWorld(Vector3 worldPos)
Vector3 startOfTex = gameObject.GetComponent<Renderer>().bounds.min;
Vector3 localPos = transform.InverseTransformPoint(worldPos);
Vector3 localScale = gameObject.transform.localScale;
Vector3 OffsetPos = localPos - startOfTex;
Vector3 VoxelPosFloat = new Vector3(OffsetPos[0] / localScale[0], OffsetPos[1] / localScale[1], OffsetPos[2] / localScale[2]);
VoxelPosFloat = Vector3.Scale(VoxelPosFloat, new Vector3(voxelDims[0], voxelDims[1], voxelDims[2]));
Vector3Int voxelPos = Vector3Int.FloorToInt(VoxelPosFloat);
return voxelPos;
You can try setting up a large amount of box colliders and the OnTriggerEnter() function running on each. But a much better solution is to sort your array of voxels and then use simple math to clamp the moving objects position vector to ints and do some maths to map the vector to an index in the array. For example the vector (0,0,0) could map to voxels[0]. Then just fetch that voxels properties as you like. For a voxel application this would be a much needed faster calculation than colliders.
I figured it out I think. If anyone sees any flaw in my coding, please let me know :).
public Vector3Int GetIndexFromWorld(Vector3 worldPos)
Vector3 deltaBounds = rend.bounds.max - rend.bounds.min;
Vector3 OffsetPos = worldPos - rend.bounds.min;
Vector3 normPos = new Vector3(OffsetPos[0] / deltaBounds[0], OffsetPos[1] / deltaBounds[1], OffsetPos[2] / deltaBounds[2]);
Vector3 voxelPositions = new Vector3(normPos[0] * voxelDims[0], normPos[1] * voxelDims[1], normPos[2] * voxelDims[2]);
Vector3Int voxelPos = Vector3Int.FloorToInt(voxelPositions);
return voxelPos;

combining several iterations of same object in different location using .add() or .merge()

I am trying to make a coil with several small loops. I have a custom function to create a single helix for each loop, and at first I was calling that within a for loop several hundred times, but it was taking too long to render and slowed down the scene.
I tried the merge function several different ways to no avail, so I'm now simply trying to combine two meshes by using the .add command. Here is my process:
(1) add the helix mesh to the total mesh
(2) move the position of the helix mesh
(3) try to add it again so that the total mesh will include both helixes
Only the second (moved) helix shows up when I say scene.add(createCoil()); in my init() function though. How do I add, or merge, several differently positioned helices into one object, geometry, mesh, or whatever, without calling a function to create a new Geometry for every iteration of the for loop?
Here is the code (I took the for loop out just to try one iteration):
function createCoil(){
var geometry = new THREE.TorusGeometry( 11, 0.5, 16, 100 );
var material = new THREE.MeshBasicMaterial( { color: 0x017FFF } );
mesh = new THREE.Mesh( geometry, material );
var clockwise = false;
var radius = 10;
var height = 3.4;
var arc = 1;
var radialSegments = 24;
var tubularSegments = 2;
var tube = 0.1;
var bottom = new THREE.Vector3();
bottom.set(radius, -height / 2, 0);
mesh2 = createHelix(clockwise, radius, height, arc, radialSegments, tubularSegments, tube, material, bottom);
return mesh;
createHelix(...) creates a new THREE.Geometry. I have also tried this and the merge function with the helix being a THREE.Object3D
Please don't point to an answer that includes
THREE.GeometryUtils.merge(geometry, otherGeometry);
(...it's obsolete)
Used another link that was helpful, but I can only (1) change the position of a mesh (not geometry), and (2) only merge geometries (not meshes), within the for loop.
How do I get 500 loops of a coil into a scene without a terrible frame rate?
Please and Thanks!
Use the matrix4 toolset for translation (and rotation if you want), then merge your geometrys:
var geometry = new THREE.TorusGeometry( 11, 0.5, 16, 100 );
var mergeGeometry = new THREE.Geometry();
var matrix = new THREE.Matrix4();
for( i = 1; i <= 50; i++ ) {
matrix.makeTranslation( 0, 3.4 * i, 0 );
mergeGeometry.merge( geometry, matrix );
var mesh = new THREE.Mesh( mergeGeo, material );