Efficient way of editing vertex positions during runtime in Unreal Engine - unreal-engine4

I’m looking for a way to update the vertex positions of every vertex of a mesh with 65536 vertices from C++ code. It needs to be updated every few frames with values calculated in code, so it needs to be somewhat efficient.
I tried this, with no effect:
if (NewElement->GetStaticMeshComponent()->GetStaticMesh()->RenderData->LODResources.Num() > 0)
{
FPositionVertexBuffer* VertexBuffer = &NewElement->GetStaticMeshComponent()->GetStaticMesh()->RenderData->LODResources[0].VertexBuffers.PositionVertexBuffer;
if (VertexBuffer)
{
const int32 VertexCount = VertexBuffer->GetNumVertices();
for (int32 Index = 0; Index < VertexCount; Index++)
{
VertexBuffer->VertexPosition(Index) += FVector(float(Index), float(100* Index), float(10000 * Index));
}
}
}
I’ll appreciate help with finding a working solution.
As for now, I’m looking for a simple solution, just to start with something. But I know, that updating the mesh CPU side is not the most efficient way, so maybe it would be easier/faster to calculate the position for every vertex and then pass it to Vertex shader? Or generate some pseudo-texture, upload it to GPU and use it in the vertex shader? Does anyone have an example of such a mechanism in UE?
Regards

Your code doesn't actually push any updates to the GPU. You're using a static mesh here which isn't really intended to have vertices modified at runtime , hence the "static" moniker. That's not to say you can't modify that data at runtime but that's not what you're doing here. Your code is only changing data CPU-side.
If you look through the various vertex buffers implemented in engine code, you'll see that ultimately they all extend FRenderResource which provides RHI-management functions or FVertexBuffer, which is an FRenderResource and contains an FBufferRHIRef field, which is the actual GPU-bound vertex buffer.
Because rendering in Unreal Engine is multithreaded, the engine uses the concept of scene proxies which extend from FPrimitiveSceneProxy. Each primitive type that exists on the game thread and needs to be rendered will have some form of a FPrimitiveSceneProxy created and will pass data and updates to its proxy in a thread-safe manner, usually by queuing rendering commands via ENQUEUE_RENDER_COMMAND(...) which you would pass a lamba function of what should be executed when the rendering thread determines its time to run it. This proxy will contain the vertex and index buffers, and is where the "real" updates to your rendered geometry happen.
One example could be the following (excerpt taken from BaseDynamicMeshSceneProxy.h, FMeshRenderBufferSet::TransferVertexUpdateToGPU() function), which shows the render buffer collection in a scene proxy for a UDynamicMeshComponent pushing an update of its vertex positions to the GPU by copying its CPU-bound data directly into its GPU-bound vertex position buffer:
FPositionVertexBuffer& VertexBuffer = this->PositionVertexBuffer;
void* VertexBufferData = RHILockBuffer(VertexBuffer.VertexBufferRHI, 0, VertexBuffer.GetNumVertices() * VertexBuffer.GetStride(), RLM_WriteOnly);
FMemory::Memcpy(VertexBufferData, VertexBuffer.GetVertexData(), VertexBuffer.GetNumVertices() * VertexBuffer.GetStride());
RHIUnlockBuffer(VertexBuffer.VertexBufferRHI);
I won't be providing a full sample here for you because, as you can see from everything described to this point, there is much more to it than a simple snippet of code to achieve what you're looking for, but I wanted to outline the overall concept and patterns of what you'll need to understand to achieve this because if you're going to do this directly in your own code, you must understand these concepts and it can be a bit confusing when you first start digging into Unreal Engine's rendering code.
The best resource to help gain a solid understanding of the patterns the engine expects you to follow would be the official documentation found here: Unreal Engine Graphics Programming.
If you are wanting to modify geometry at runtime, there are also other options available which will make the process mush easier than trying to write it completely yourself, such as the engine-provided Procedural Mesh Component plugin, the third-party RuntimeMeshComponent plugin, and in later versions of Unreal Engine (4 and 5), the UDynamicMeshComponent (aka USimpleDynamicMeshComponent in earlier versions) which is part of the Interactive tools framework and in most recent versions of the engine has become a core part of the engine runtime module GeometryFramework.
I hope this helps you in your journey. Runtime-modifiable geometry is tough to get started but it's definitely worth the journey.

Related

In WebGPU, can you reuse the same render pass in multiple frames?

In WebGPU you can create a render pass by defining its descriptor:
const renderPassDesc: GPURenderPassDescriptor = {
colorAttachments: [
{
view: context.getCurrentTexture().createView(),
loadValue: [0.2, 0.3, 0.5, 1],
storeOp: "store"
}
]
};
And then run it through the command encoder and start recording.
const commandEncoder = device.createCommandEncoder();
const renderPass = commandEncoder.beginRenderPass(renderPassDesc);
So, essentially, it appears that you need the current texture to start recording (i.e. without calling context.getCurrentTexture().createView() you can't create the descriptor and without it you can't start the recording). But the API seems to suggest that the texture can change every frame (note that this used to be the case even months ago, when the API was different and you would be retrieving the texture from the swap chain). So, basically, it appears that you can't reuse render passes across different frames (unless of course you don't render to the swap chain, and target an offscreen texture instead).
So, the question is. In WebGPU, can you reuse the same render pass in multiple frames?
Comparison with Vulkan
My question stems from the (little) exposure I had to Vulkan. In Vulkan, you can reuse recorded resources because there is a way to know upfront how many VKImage objects are in the swap chain; they are going to have 0-based indices such as 0, 1 and 2. I can't remember the exact syntax, but I remember that basically you can record 3 separate command buffers, one per VKImage and reuse them across frames. All you have to do is query in the render loop the index of the current VKImage and retrieve the corresponding recorded command buffer.
By seeing the specification about the getCurrentTexture it seems that there is no control over the number of "swap" textures, at this time.
The texture is created (if it is null or it is destroyed) in the "allocate a new context texture" step, as the note there states that:
If a previously presented texture from context matches the required criteria, its GPU memory may be re-used.
Each time on the "update the rendering [of the] Document" step, if the current texture is not null and its not destroyed then it will be presented, destroyed, and set to null.
Another note from the specs:
Developers can expect that the same GPUTexture object will be returned by every call to getCurrentTexture() made within the same frame (i.e. between invocations of Update the rendering) unless configure() is called.
All of this seems to point that you have to get the current texture for each frame and create all related other objects as well.

Pd-GEM - using multiple, separate particle streams

I'm working on a live music visualisation project, where I am using a particle stream to visualise each channel of audio (vocals, guitar, percussion, bass) which are each coming from a looper.
I have the visualisation aspects working - I do envelope tracking in a separate pd instance, send the envelope details via udp to my gem instance, which then uses that to vary the size and colour of multiple particle streams.
The problem I have is that I am trying to set the origin point of each stream, and they are either interacting or they are controlling the origin of a different stream. The part_velocity also seems to be having a similar issue.
Each particle system has it's own gemhead (which I init as say [gemhead 20] so each one is unique), but changing the XYZ for its [part_source 1 point] object seems to affect a stream that's in a different gemhead chain.
I have also moved it off into an abstraction, where I name its head [gemhead $0] and I am having the same issue.
This unanswered thread from years ago shows two other people having the same problem, but no answers.
Here's a portion of my main patch which calls the abstraction:
And this is the abstraction:
Am I missing something simple here, or is there perhaps a bug in that one of the part_xxx objects is not checking which gemhead list it's in? Note that there are other gemheads in the main patch, some have an argument, some don't, but they're doing other stuff.
Oh yeah, and input is welcome on the somewhat dumb-looking way that I'm preserving state here, I've NO idea what the patterns are here, and cannot for the life of me find any good advice on it!

Unity - use GetRawTextureData to change underlying RGB bytes without copying

I am trying to use OpenCV with Unity for image processing, and I am trying to make the data transfers between OpenCV and Unity code as efficient as possible.
Currently, I am able to create a new byte[] in C#, then load an image into these bytes in OpenCV, and then use texture.LoadRawTextureData(array) and texture.Apply() to show this texture in Unity.
However, the Unity documentation recommends to use texture.GetRawTextureData() to get a reference to the NativeArray (the version of function that returns byte[] makes a copy of the raw data) and then write the data directly into this buffer (+ call Apply()).
Unfortunately, the documentation on NativeArrays is rather scarce - how exactly do NativeArrays look in the memory? They do have an ToArray() function, but this again makes a copy of the data. What I need is a byte[] array, which can be either RGB24 or RGBA32 (RGBA seems to be preferred, as even though it is memory-inefficient if the texture is opaque, the modern GPUs apparently do not support RGB24).
Is there any way to pass the pointer to the beginning of the buffer in the texture without making copies and calling LoadRawTextureData()? Or are the data in the Texture in a completely different format?
I had the same confusion over NativeArray. It looks like the CopyTo() method doesn't seem to alloc memory. There is a ToArray() method, which I'm certain allocates.
I was able to work out this utility method which is working fine for a webcam feed.
private byte[] m_byteCache = null;
public byte[] GetRawTextureData(Texture2D texture)
{
NativeArray<byte> nativeByteArray = texture.GetRawTextureData<byte>();
if (m_byteCache?.Length != nativeByteArray.Length)
{
m_byteCache = new byte[nativeByteArray.Length];
}
nativeByteArray.CopyTo(m_byteCache);
return m_byteCache;
}
ToArray() allocates a new array. CopyTo() doesn't alloc memory but of course it copies the data.
But what I gather from the documentation is that you should be able to just access the NativeArray like a normal array to modify the memory and then call .Apply() on the corresponding Texture object. If you can make your C# OpenCV code write to it, that should do the trick.
The issue with NativeArray is I guess that you directly get the memory of whatever implementation your code runs on, so the exact byte representation could differ depending on the platform. Also the memory will be invalid as soon as the texture is gone.

Register Ranges in HLSL?

I am currently refactoring a large chunk of old code and have finally dove into the HLSL section where my knowledge is minimal due to being out of practice. I've come across some documentation online that specifies which registers are to be used for which purposes:
t – for shader resource views (SRV)
s – for samplers
u – for unordered access views (UAV)
b – for constant buffer views (CBV)
This part is pretty self explanatory. If I want to create a constant buffer, I can just declare as:
cbuffer LightBuffer: register(b0) { };
cbuffer CameraBuffer: register(b1) { };
cbuffer MaterialBuffer: register(b2) { };
cbuffer ViewBuffer: register(b3) { };
However, originating from the world of MIPS Assembly I can't help but wonder if there are finite and restricted ranges on these. For example, temporary registers are restricted to a range of t0 - t7 in MIPS Assembly. In the case of HLSL I haven't been able to find any documentation surrounding this topic as everything seems to point to assembly languages and microprocessors (such as the 8051 if you'd like a random topic to read up on).
Is there a set range for the four register types in HLSL or do I just continue as much as needed in a sequential fashion and let the underlying assembly handle the messy details?
Note
I have answered this question partially, as I am unable to find a range for u currently; however, if someone has a better, more detailed answer than what I've given through testing, then feel free to post it and I will mark that as the correct answer. I will leave this question open until December 1st, 2018 to give others a chance to give a better answer for future readers.
Resource slot count (for d3d11, indeed d3d12 case expands that) are specified in Resource Limit msdn page.
The ones which are of interest for you here are :
D3D11_COMMONSHADER_INPUT_RESOURCE_REGISTER_COUNT (which is t) = 128
D3D11_COMMONSHADER_SAMPLER_SLOT_COUNT (which is s) = 16
D3D11_COMMONSHADER_CONSTANT_BUFFER_HW_SLOT_COUNT (which is b) = 15 but one is reserved to eventually store some constant data from shaders (if you have a static const large array for example)
The u case is different, as it depends on Feature Level (and tbh is a vendor/os version mess) :
D3D11_FEATURE_LEVEL_11_1 or greater, this is 64 slots
D3D11_FEATURE_LEVEL_11 : It will always be 8 (but some cards/driver eventually support 64, you need at least windows 8 for it (It might also be available in windows 7 with some platform update too). I do not recall a way to test if 64 is supported (many nvidia in their 700 range do for example).
D3D11_FEATURE_LEVEL_10_1 : either 0 or 1, there's a way to check is compute is supported
You need to perform a feature check:
D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS checkData;
d3dDevice->CheckFeatureSupport(D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS, &checkData);
BOOL computeSupport = checkData.ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x
Please note that for some OS/Driver version I had this flag returning TRUE while not supported (Intel was doing that on win7/8), so in that case the only valid solution was to try to either create a small Raw / Byte Address buffer or a Structured Buffer and check the HRESULT
As a side note feature feature level 10 or below are for for quite old configurations nowadays, so except for rare scenarios you can probably safely ignore it (I just leave it for information purpose).
Since it's usually a long wait time for these types of questions I tested the b register by attempting to create a cbuffer in register b51. This failed as I expected and luckily SharpDX spit out an exception that stated it has a maximum of 14. So for the sake of future readers I am testing all four register types and posting back the ranges I find successful.
b has a range of b0 - b13.
s has a range of s0 - s15.
t has a range of t0 - t127.
u has a range of .
At the current moment, I am unable to find a range for the u register as I have no examples of it in my code, and haven't actually ever used it. If someone comes along that does have an example usage then feel free to test it and update this post for future readers.
I did find a contradiction to my findings above in the documentation linked in my question; they have an example using a t register above the noted range in this answer:
Texture2D a[10000] : register(t0);
Texture2D b[10000] : register(t10000);
ConstantBuffer<myConstants> c[10000] : register(b0);
Note
I would like to point out that I am using the SharpDX version of the HLSL compiler and so I am unsure if these ranges vary from compiler to compiler; I heavily doubt that they do, but you can never be too sure until you try to exceed them. GLSL may be the same due to being similar to HLSL, but it could also be very different.

Maya Plugin attribute validation

I am trying to validate my custom MPxEmitterNode attributes.
I have force_min and force_max attributes that are double3 typed in maya parlance, basically two objects containing double[3] data.
I want to ensure the force_min is less than force_max for each of its 3 components. I'd like to do this by just swapping the min and max around if someone enters a value on the attribute in the attribute editor, or calls mels setAttr for those attributes, which then fails the "min < max" check.
I have tried setting up ATTRIBUTE_AFFECTS relationships between force_min, force_max and their individual component x,y,z objects. That seems to cause a cyclic issue leading to Maya crashing. I have also tried editing the custom compute function for the derived MPxEmitterNode, so it sets the force_min and force_max values to swap. The force_* attributes are seemingly never computed in this case.
Any help would be much appreciated.
Generally the 'Maya' way to do this would be to let the output look wrong if the min and max are set incorrectly. You don't know who is going to set those attributes -- it could be as connection or a script, and it could even get reset in between frames of an animation -- and so it's better to let the dag evaluation flow through even if the result is nonsense. It's like setting a radius of zero on a sphere node --it's 'correct' even thought it's wrong.
You can however swap the values inside your compute() method to get the same effect as swapping the values without resetting the plug values themselves. Setting an input plug from inside compute is a bad idea, because it introduces a loop into the flow of the dag evaluation. Dag nodes must be acyclical (that's the "a" in dag: Directed Acyclic Graph)