I have a simple task: I have 10,000 3D boxes, each with a x,y,z, width, height, depth, rotation, and color. I want to throw them into a 3D space, visualize it, and let the user fly through it using the mouse. Is there an easy way to put this together?
One easy way of doing this using recent (v 3.2) OpenGL would be:
make an array with 8 vertices (the corners of a cube), give them coordinates on the unit cube, that is from (-1,-1, -1) to (1, 1, 1)
create a vertex buffer object
use glBufferData to get your array into the vertex buffer
bind the vertex buffer
create, set up, and bind any textures that you may want to use (skip this if you don't use textures)
create a vertex shader which applies a transform matrix that is read from "some source" (see below) according to the value of gl_InstanceID
compile the shader, link the program, bind the program
set up the instance transform data (see below) for all cube instances
depending on what method you use to communicate the transform data, you may draw everything in one batch, or use several batches
call glDrawElementsInstanced N number of times with count set to as many elements as will fit into one batch
if you use several batches, update the transform data in between
the vertex shader applies the transform in addition to the normal MVP stuff
To communicate the per-cube transform data, you have several alternatives, among them are:
uniform buffer objects, you have a guaranteed minimum of 4096 values, respectively 256 4x4 matrices, but you can query the actual value
texture buffer objects, again you have a guaranteed minimum of 65536 values, respectively 4096 4x4 matrices (but usually something much larger, my elderly card can do 128,000,000 values, you should query the actual value)
manually set uniforms for each batch, this does not need any "buffer" stuff, but is most probably somewhat slower
Alternatively: Use pseudo-instancing which will work even on hardware that does not support instancing directly. It is not as elegant and very slightly slower, but it does the job.
Related
i want to create a shader that can cover a surface with "circles" from many random positions.
the circles keep growing until all surface covered with them.
here my first try with amplify shader editor.
the problem is i don't know how make this shader that create array of "point maker" with random positions.also i want to controll circles with
c# example:
point_maker = new point_maker[10];
point_maker[1].position = Vector2.one;
point_maker[1].scale = 1;
and etc ...
Heads-up: That's probably not the way to do what you're looking for, as every pixel in your shader would need to loop over all your input points, while each of those pixels will only be covered by one at most. It's a classic case of embracing the benefits of the parallel nature of shaders. (The keyword for me here is 'random', as in 'random looking').
There's 2 distinct problems here: generating circles, and masking them.
I would go onto generating a grid out of your input space (most likely your UV coordinates so I'll assume that from here), by taking the fractional part of the coords scaled by some value: UV (usually) go between 0 and 1, so if you want 100 circles you'd multiply the coord by 10. You now have a grid of 100 pieces of UVs, where you can do something similar to what you have to generate the circle (tip: dot product a vector on itself gives the square distance, which is much cheaper to compute).
You want some randomness, so you need to add some offset to the center of the circle. You need some sort of random number (there might be some in ASE I can't remember, or make one your own - there's plenty of that you look online) that is unique per cell of the grid. To do this you'd input the remainder of your frac() as value to your hash/random method. You also need to limit that offset depending on the radius of the circle so it doesn't touch the sides of the cell. You can overlay more than one layer of circles if you want more coverage as well.
Second step is to figure out if you want to display those circles at all, and for this you could make the drawing conditional to the distance from the center of the circle to an input coordinate you provide to the shader, by some threshold. (it doesn't have to be an 'if' condition per se, it could be clamping the value to the bg color or something)
I'm making a lot of assumptions on what you want to do here, and if you have stronger conditions on the point distribution you might be better off rendering quads to a render texture for example, but that's a whole other topic :)
Which Matlab functions or examples should be used to (1) track distance from moving object to stereo (binocular) cameras, and (2) track centroid (X,Y,Z) of moving objects, ideally in the range of 0.6m to 6m. from cameras?
I've used the Matlab example that uses the PeopleDetector function, but this becomes inaccurate when a person is within 2m. because it begins clipping heads and legs.
The first thing that you need deal with, is in how detect the object of interest (I suppose you have resolved this issue). There are a lot of approaches of how to detect moving objects. If your cameras will stand in a fix position you can work only with one camera and use some background subtraction to get the objects that appear in the scene (Some info here). If your cameras are are moving, I think the best approach is to work with optical flow of the two cameras (instead to use a previous frame to get the flow map, the stereo pair images are used to get the optical flow map in each fame).
In MatLab, there is an option called disparity computation, this could help you to try to detect the objects in scene, after this you need to add a stage to extract the objects of your interest, you can use some thresholds. Once you have the desired objects, you need to put them in a binary mask. In this mask you can use some image momentum (Check this and this) extractor to calculate the centroids. If the images in the binary mask look noissy you can use some morphological operations to improve the reults (watch this).
In my web application I only need to add static objects to my scene. It worked slow so I started searching and I found that merging geometries and merging vertices were the solution. When I implemented it, it indeed worked a lot better. All the articles said that the reason for this improvement is the decrease in number of WebGL calls. As I am not very familiar with things like OpenGL and WebGL (I use Three.js to avoid their complexity), I would like to know why exactly it reduces the WebGL calls?
Because you send one large object instead of many littles, the overhead reduces. So I understand that loading one big mesh to the scene goes faster than many small meshes.
BUT I do not understand why merging geometries also has a positive influence on the rendering calculation? I would also like to know the difference between merging geometries and merging vertices?
Thanks in advance!
three.js is a framework that helps you work with the WebGL API.
What a "mesh" is to three.js, to webgl, it's a series of low level calls that set up state and issue calls to the GPU.
Let's take a sphere for example. With three.js you would create it with a few lines:
var sphereGeometry = new THREE.SphereGeometry(10);
var sphereMaterial = new THREE.MeshBasicMaterial({color:'red'});
var sphereMesh = new THREE.Mesh( sphereGeometry, sphereMaterial);
myScene.add( sphereMesh );
You have your renderer.render() call, and poof, a sphere appears on screen.
A lot of stuff happens under the hood though.
The first line, creates the sphere "geometry" - the cpu will a bunch of math and logic describing a sphere with points and triangles. Points are vectors, three floats grouped together, triangles are a structure that groups these points by indecis (groups of integers).
Somewhere there is a loop that calculates the vectors based on trigonometry (sin, cos), and another, that weaves the resulting array of vectors into triangles (take every N , N + M , N + 2M, create a triangle etc).
Now these numbers exist in javascript land, it's just a bunch of floats and ints, grouped together in a specific way to describe shapes such as cubes, spheres and aliens.
You need a way to draw this construct on a screen - a two dimensional array of pixels.
WebGL does not actually know much about 3D. It knows how to manage memory on the gpu, how to compute things in parallel (or gives you the tools), it does know how to do mathematical operations that are crucial for 3d graphics, but the same math can be used to mine bitcoins, without even drawing anything.
In order for WebGL to draw something on screen, it first needs the data put into appropriate buffers, it needs to have the shader programs, it needs to be setup for that specific call (is there going to be blending - transparency in three.js land, depth testing, stencil testing etc), then it needs to know what it's actually drawing (so you need to provide strides, sizes of attributes etc to let it know where a 'mesh' actually is in memory), how it's drawing it (triangle strips, fans, points...) and what to draw it with - which shaders will it apply on the data you provided.
So, you need a way to 'teach' WebGL to do 3d.
I think the best way to get familiar with this concept is to look at this tutorial , re-reading if necessary, because it explains what happens pretty much on every single 3d object in perspective, ever.
To sum up the tutorial:
a perspective camera is basically two 4x4 matrices - a perspective matrix, that puts things into perspective, and a view matrix, that moves the entire world into camera space. Every camera you make, consists of these two matrices.
Every object exists in it's object space. TRS matrix, (world matrix in three.js terms) is used to transform this object into world space.
So this stuff - a concept such as "projective matrix" is what teaches webgl how to draw perspective.
Three.js abstracts this further and gives you things like "field of view" and "aspect ratio" instead of left right, top bottom.
Three.js also abstracts the transformation matrices (view matrix on the camera, and world matrices on every object) because it allows you to set "position" and "rotation" and computes the matrix based on this under the hood.
Since every mesh has to be processed by the vertex shader and the pixel shader in order to appear on the screen, every mesh needs to have all this information available.
When a draw call is being issued for a specific mesh, that mesh will have the same perspective matrix, and view matrix as any other object being rendered with the same camera. They will each have their own world matrices - numbers that move them around around your scene.
This is transformation alone, happening in the vertex shader. These results are then rasterized, and go to the pixel shader for processing.
Lets consider two materials - black plastic and red plastic. They will have the same shader, perhaps one you wrote using THREE.ShaderMaterial, or maybe one from three's library. It's the same shader, but it has one uniform value exposed - color. This allows you to have many instances of a plastic material, green, blue, pink, but it means that each of these requires a separate draw call.
Webgl will have to issue specific calls to change that uniform from red to black, and then it's ready to draw stuff using that 'material'.
So now imagine a particle system, displaying a thousand cubes each with a unique color. You have to issue a thousand draw calls to draw them all, if you treat them as separate meshes and change colors via a uniform.
If on the other hand, you assign vertex colors to each cube, you don't rely on the uniform any more, but on an attribute. Now if you merge all the cubes together, you can issue a single draw call, processing all the cubes with the same shader.
You can see why this is more efficient simply by taking a glance at webglrenderer from three.js, and all the stuff it has to do in order to translate your 3d calls to webgl. Better done once than a thousand times.
Back to those 3 lines, the sphereMaterial can take a color argument, if you look at the source, this will translate to a uniform vec3 in the shader. However, you can also achieve the same thing by rendering the vertex colors, and assigning the color you want before hand.
sphereMesh will wrap that computed geometry into an object that three's webglrenderer understands, which in turn sets up webgl accordingly.
EDIT - To help clarify the question up top.. I guess I'm looking for which sorting would perform better: sorting by program or sorting by textures? Will it matter? All my objects are in similar z space and all are stored in the same VBO. And if I don't switch shaders via glUseProgram do I have to re-set attributes for each object?
Original Post:
This is sort of a 2-part question. I'm trying to figure out how best to sort my 3d objects before drawing them, and what open gl calls have to be done for each glDrawElements and which ones can be done once per screen refresh (or even just once). The purpose is of course for speed. For my game let's assume that z front to back isn't much of an issue (most objects are at the same z). So I won't be sorting for z other than to do all objects with transparency last.
Of course I don't want the sorting process to take longer than rendering unsorted.
Part 2 is which open gl calls have to be used per glDrawElements and which ones can be done only when the information changes? And does presentRenderbuffer wipe certain things out so that you have to re-call them.
Most opengl 2 demos do every call for every object. Actually most demos only draw one object. So in an 3d engine (like I'm writing) I want to avoid unnecessary redundant calls.
This is the order I was doing it (unsorted, unoptimized):
glUseProgram(glPrograms[useProgram]);
glDisable(GL_BLEND);
glEnable(GL_CULL_FACE);
Loop through objects {
Do all matrix calcs
Set Uniforms (matrix, camera pos, light pos, light colors, material properties)
Activate Textures.. (x2)
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texture0);
glUniform1i(glUniforms[useProgram][U_textureSampler], 0);
Bind VBOs
glBindBuffer(GL_ARRAY_BUFFER, modelVertVBO);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, modelIndVBO);
Set Attributes (vertexpos, texcoord, norm, tan, bitan)
glDrawElements(GL_TRIANGLES, models[modelToUse].indSize, GL_UNSIGNED_INT, (void *) (models[modelToUse].indOffset * sizeof(GLuint)));
}
Of course that only worked when all objects used the same shader/program. In practice they won't.
3D objects are in an array with all the properties for each object: model id, shader id, texture ids, position, etc. So my idea was to do a fast simple sort to stack similar objects' index numbers in other arrays. Then draw the items in each of those arrays. I could sort by 3d model (object type), texture, or by shader. Many models share the same texture. Many models share the same shader. At this point I have 3 shaders. ALL OBJECTS share a single VBO.
Can I do it like this?
Bind the VBO - since all objects use the same one
Loop through object types {
If shader hasn't changed
glUseProgram
Set Attributes
If texture hasn't changed
glActiveTexture(s) - based on which program is active
Loop through objects of that type {
Do matrix calcs
Set Uniforms - based on which program is active
glDrawElements
}
}
EDIT - To be clear - I'm still drawing all objects, just in a different order to combine uses of shaders and/or textures so as to avoid binding and then rebinding again within one 'frame' of the game.
I'm currently getting a crash on a glDrawElements on the 2nd refresh, but I think that will be easy to find. I only include this fact because it leads me to think that binding a texture might not carry over to a second frame (or presentBuffers).
Is it going to be faster to avoid changing the shader, or changing the texture? Will attributes, the vbo, and the textures stay active across multiple glDrawElement calls? Across multiple presentBuffers?
Answering my own question.
First some context. I currently have 3 shaders and expect I'll end up with no more than 4 or 5. Example I have a bump map shader that uses a base and normal texture. I also have a shader that doesn't use a base texture and instead uses a solid color for the object but does still have a normal texture. Then I have the opposite, a flat lighting simple shader that uses a base texture only.
I have many different 3d models but all use the same VBO. And some 3d models use the same textures as others.
So in the definition of a 3d object I added a renderSort property that I can preset knowing what shader program it uses and what texture(s) it needs.
Then as I update objects and determine if they need to be drawn on screen, I also do a one pass simple sort on them based on the renderSort property of their 3d object type... I just toss the array index of the object in a 'bucket' array. I don't see having more than 10 of these buckets.
After the update and quick-sort I render.
The render iterates through the buckets, and inside that through the objects in each bucket. Inside the inner loop I check to see if the program has changed since the last object, and do a glUseProgram if it's changed. Same with textures.. I only bind them if they're not currently bound. Then update all the other uniforms and do the glDrawElements.
The previous way.. unsorted.. if there were 1000 objects it would call glUseProgram, bind the textures, bind the vbo, set all the attributes.. 1000 times.
Now.. it only changes these things when it needs to.. if it needs to 1000 times it will still do it 1000 times. But with the bucket sort it should only need to do it once per bucket. This way I prioritize drawing properly even if they're not sorted properly.
Here's the code:
Sorting...
if (drawThisOne) {
// if an object needs to be drawn - toss it in a sort bucket.
// each itemType has a predetermined bucket number so that objects will be grouped into rough program and texture groups
int itemTypeID = allObjects[objectIndex].itemType;
int bucket = itemTypes[itemTypeID].renderSort;
sorted3dObjects[bucket][sorted3Counts[bucket]]=objectIndex;
// increment the count for that bucket
sorted3Counts[bucket]++;
}
Rendering...
// only do these once per cycle as all objects are in the same VBO
glBindBuffer(GL_ARRAY_BUFFER, modelVertVBO);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, modelIndVBO);
for (int bucket=0; bucket<10; bucket++) {
// does this bucket have anything in it?
if (sorted3Counts[bucket]>0) {
// if so itterate though items in that bucket
for (int thisObject=0; thisObject < sorted3Counts[bucket]; thisObject++) {
// get the object index for this object in this bucket
int objectIndex = sorted3dObjects[bucket][thisObject];
int itemTypeID = candyPieces[objectIndex].pieceType;
int modelToUse = itemTypes[itemTypeID].model;
// switching to psudocode...
GLuint useProgram = itemTypes[itemTypeID].shader;
if (Program Changed or is not set) {
glUseProgram(glPrograms[useProgram]);
glDisable(GL_BLEND);
glEnable(GL_CULL_FACE);
currentProgram=useProgram;
USE glVertexAttribPointer to set all attributes
}
// based on which program is active set textures and program specific uniforms
switch (useProgram) { ....
if (Texture Changed or is not set) {
glActiveTexture(s)
}
}
Matrix Calculations
glUniform - to set unforms
glDrawElements(GL_TRIANGLES, models[modelToUse].indSize, GL_UNSIGNED_INT, (void *) (models[modelToUse].indOffset * sizeof(GLuint)));
}}}
I'd like to hear what people think the optimal draw calls are for Open GL ES (on the iphone).
Specifically I've read in many places that it is best to minimise the number of calls to glDrawArrays/glDrawElements - I think Apple say 10 should be the max in their recent WWDC presentation. As I understand it to do this you need to put all the vertices into one array if possible, so you only need to make the drawArrays call once.
But I am confused because this surely means you can't use the translate, rotate, scale functions, because it would apply across the whole geometry. Which is fine except doesn't that mean you need to pre-calculate every vertex position yourself, rather than getting open gl to do it?
Also, doesn't it mean you can't use any of the fan/strip settings unless you just have a continuous shape?
These drawbacks make me think I'm not understanding something correctly, so I guess I'm looking for confirmation that I should:
Be trying to make an uber array of all triangles to draw.
Resign myself to the fact I'll have to work out all the vertex positions myself.
Forget about push'ing and pop'ing each thing to draw into it's desired location
Is that what others do?
Thanks
Vast question, batching is always a matter of compromise.
The ideal structure for performance would be, as you mention, to one single array containing all triangles to draw.
Starting from here, we can start adding constraints :
One additional constraint is that
having vertex indices in 16bits saves
bandwidth and memory, and probably
the fast path for your platform. So
you could consider grouping triangles
in chunks of 65536 vertices.
Then, if you want to switch the
shader/material/glState used to draw
geometry, you have no choice (*) but
to emit one draw call per
shader/material/glState. So grouping
triangles could consider grouping by
shaderID/materialID/glStateID.
Next, if you want to animate things,
you have no choice (*) but to
transmit your transform matrix to GL,
and then issue a draw call. So
grouping triangles could consider
grouping triangles by 'transform
groups', for example, all static
geometry together, animated geometry
that have common transforms can be
grouped too.
In these cases, you'd have to transform the vertices yourself (using CPU) before merging the meshes together.
Regarding triangle strips, you can transform any mesh in strips, even if it has discontinuities in its topology, by introducing degenerate triangles. So this is a technique that always apply.
All in all, reducing draw calls is a game of compromises, some techniques might work well for a 3d model, while others may be more suited for other 3d models. IMHO, the key is to be creative and to carefully benchmark your application to see if your changes actually improve performance on your target platform.
HTH, cheers,
(*) actually there are techniques that allow to reduce the number of draw calls in these cases, such as :
texture atlases to group different textures in a single one, to prevent
switching textures in GL, thus
allowing to limit draw calls
(pseudo) hardware instancing that allow shaders to fetch transforms
from various sources to transform
mesh instances in different ways.
...