Storage of `ray_data` in ray tracing payload - swift

I am currently working with Metal's ray tracing API. I remembered I could pass data from an intersection function to the compute kernel that started the ray intersection process. After rewatching the WWDC 2020 talk Discover ray tracing with Metal by Sean James (linked here), I found the relevant section around 16:13 where he talks about the ray payload.
However, I was curious where this payload is stored as it’s passed to the intersection function. When declared with the relevant [[ payload ]] attribute in the intersection function, it must be in the ray_data address space. According to the Metal Shading Language Specification (version 2.3), pg. 64, the data passed into the intersection function is copied in the ray_data address space and is copied back out once the intersection function returns. However, this doesn't specify if, e.g., the data is stored in tile memory (like data in the threadgroup address space is) or stored in the per-thread memory (thread address space). The video did not specify this either.
In fact, the declarations for the intersect function (see pg. 204) that include the payload term are in the thread address space (which makes sense)
So where does the copied ray_data "version" of the data stored in the thread address space in the kernel go?

According to the answer I received on the Apple Developer Forums,
The way the GPU stores the payload varies between device and there is no particular size. All we can really say is that cost scales roughly with the size so you should minimize that payload. If the payload gets too large you may run into a dramatic performance drop.

Related

What is the purpose of the CIR if I have the MDR in Von Neumann Architecture?

From the fetch decode execute cycle of the Von Neumann Architecture, at a basic level, here is what I understand:
Memory address in PC is copied to MAR.
PC +=1
The instruction / data in the address of the MAR is stored in the MDR after being fetched from main memory.
Instruction from MDR is copied to CIR
Instruction / data in memory is decoded & executed by the CU .
Result from the calculation stored in ACC.
Repeat
Now if the MDR value is copied to the CIR, why are they both necessary. I am quite new to systems architecture so I may have gotten the wrong end of the stick, but I've tried my best :)
Think about what happens if the current instruction is a load or store: does anything need to happen after the MDR? If so, how is the CPU going to remember what it's in the middle of doing if it doesn't keep track of that somehow.
Whether that requires the original instruction bits the whole time or not depends on the design.
A CPU that needs to do a lot of decoding (e.g. a CISC with a compact variable-length instruction set, like original 8086) may not keep the actual instruction itself around, but instead just some decode results. For example, actual 8086 decoded incrementally, scanning through prefixes one byte at a time until reaching the opcode. And modern x86 decodes to uops which it sends down the pipeline; the original machine-code bytes aren't needed.
But CPUs like MIPS were specifically designed so parts of the instruction word could be used directly as internal control signals. Still, it's not always necessary to keep the whole instruction around in one piece.
It might make more sense to look at CIR as being the input latches of the decoding process that produces the necessary internal control signals, or sequence of microcode depending on the design. Having a truly physical CIR separate from that is ok if you don't mind redoing decoding at any step you need to consult it to figure out what step to do next.

What is the different between UPrimitiveComponent's GetMass and CalculateMass?

In an udemy course I just went through a lecture where we needed to calculate the mass of some stuff.
I ended up using CalculateMass; but, the instructor used GetMass.
The unreal documention for CalculateMass shows it accepts a parameter FName BoneName; but, I did not use this parameter and it still works. The documentation also talks about CalculateMass being potentially ~0.1 KiloGrams off the actual Mass; but, this looks like an insignificant amount.
What is the important difference between these 2 functions? When should one be used over the other?
float CalculateMass(FName BoneName)
Returns mass for the specified bone / body. In physics assets you can setup multiple physics bodies for a single mesh and get the mass separately. Hence the bone name parameter. It also returns with the overridden mass instead if it was specified. This is an assumed mass by the engine and may be a little different compared to GetMass. And it seems to be faster.
On the other hand...
float GetMass() const
Is pretty much similar for calling CalculateMass(NAME_None);
Except it returns the real mass of the body instance calculated by the physics engine.
To get the exact value it causes a thread lock on the physics engine to read the value.
So this is a little slower when executing.
I would use CalculateMass unless the exact mass must be precise.
Also caching the CalculatedMass in a variable if it's not changing frequently.
You can also dig into the engine code to better understand where these values come from ;)

How to retrieve signal quality measures in iPhone?

In particular I would like to retrieve:
1. RSSI (received signal strength indicator)
2. RSCP (signal level),
3. SC (Scrambling Code) and
4. EcNo (Signal To Noise Ratio)
Which API function from iPhone SDK can help me to retrieve these values.
Further to your comment above, there is also a GetSignalStrength function referenced among the private functions here.
But if you use one of these GetSignalStrength functions how do you know what you are really getting?
I can't find any documentation, but I would question the assumption that it will always be RSSI.
There is no standard for calculating the number of bars that are shown on a screen. However, there is a standard for calculating the network strength, when the mobile phone decides whether or not to move over to another cell.
For GSM, this standard is RSSI.
For UMTS, it is CPICH RSCP.
For LTE, it is RSRP.
Therefore, if you have 1 single function, that purports to return RSSI in all cases, I ask myself whether it will actually return RSCP when on a UMTS network, and RSRP when on an LTE network. In other words, is it a fudge that over-simplifies the true case?
The 3GPP AT command AT+CESQ (defined here) retrieves network strength. It has parameters that allow for any of the three network types, and you would expect that if you are currently registered on a UMTS cell (for example), that it would return UMTS parameters only. But I can't see any evidence of an equivalent way to get all the data across iPhone APIs.
The next obvious question to ask would be "Can I use that AT command on the iPhone?" Someone has asked that on StackOverflow here. I don't know if AT+CESQ is supported on the iPhone.

Is using Vertex Buffer Object's for very dynamic data a good idea performance-wise?

I have many particles who's vertices change every frame. The vertices are currently being drawn using a vertex array in 'client' memory. What performance characteristics can I expect if I use a vertex buffer object?
Since I have to use a number of glBuffersubData's to update the particle vertices, I am therefore transferring the vertices to video memory every frame anyway right(like i would if i use a regular vertex array)? Is there any benefit to VBO's in this case?
This is for iOS devices. The actual draw call: glDrawElements(GL_POINTS,num_particles,GL_UNSIGNED_SHORT,pindices);
Should I use GL_STREAM_DRAW or GL_DYNAMIC_DRAW?
Apple's documentation appears to recommend VBOs in all situations. If you're using ES 2.x then the GL_STREAM_DRAW vertex buffer type is explicitly for "when your application needs to create transient geometry that is rendered a small number of times and then discarded. This is most useful when your application must dynamically change vertex data every frame in a way that cannot be performed in a vertex shader." Use of glBufferSubData is then directly advocated.
Logically, I guess the only difference between supplying the data completely afresh and sending it to an existing GL_STREAM_DRAW or GL_DYNAMIC_DRAW buffer is that your space in the memory map (GPU or CPU, depending on the chip — MBXs don't really do VBOs but Apple supports them for other performance reasons) can be allocated once rather than allocated and released every frame.
Using the alignment and packing tips given in that document is likely to give a better improvement than a switch to VBOs, since otherwise the CPU just has to unpack and repack data upon glDrawElements. Though quite probably you're already aware of that and I appreciate that it isn't directly part of the question — I mainly throw it in as a comparative guess about performance benefits.
By setting VBOs properly, you are using optimal way of transferring data to the GPU. By doing so, you might skip some driver processing. The only way to see how much you get of improvement you get is to measure. It is different from card to card.
For VBO how-to, see this : VBO tutorial
EDIT
Forgot to answer the question : yes, it is a good idea. But first measure.

Dijkstra algorithm for iPhone

It is possible to easily use the GPS functionality in the iPhone since sdk 3.0, but it is explicitly forbidden to use Google's Maps.
This has two implications, I think:
You will have to provide maps yourself
You will have to calculate the shortest routes yourself.
I know that calculating the shortest route has puzzled mathematicians for ages, but both Tom Tom and Google are doing a great job, so that issue seems to have been solved.
Searching on the 'net, not being a mathematician myself, I came across the Dijkstra Algorithm. Is there anyone of you who has successfully used this algorithm in a Maps-like app in the iPhone?
Would you be willing to share it with me/the community?
Would this be the right approach, or are the other options?
Thank you so much for your consideration.
I do not believe Dijkstra's algorithm would be useful for real-world mapping because, as Tom Leys said (I would comment on his post, but lack the rep to do so), it requires a single starting point. If the starting point changes, everything must be recalculated, and I would imagine this would be quite slow on a device like the iPhone for a significantly large data set.
Dijkstra's algorithm is for finding the shortest path to all nodes (from a single starting node). Game programmers use a directed search such as A*. Where Dijkstra processes the node that is closest to the starting position first, A* processes the one that is estimated to be nearest to the end position
The way this works is that you provide a cheap "estimate" function from any given position to the end point. A good example is how far a bird would fly to get there. A* adds this to the current distance from the start for each node and then chooses the node that seems to be on the shortest path.
The better your estimate, the shorter the time it will take to find a good path. If this time is still too long, you can do a path find on a simple map and then another on a more complex map to find the route between the places you found on the simple map.
Update
After much searching, I have found an article on A* for you to to read
Dijkstra's algorithm is O(m log n) for n nodes and m edges (for a single path) and is efficient enough to be used for network routing. This means that it's efficient enough to be used for a one-off computation.
Briefly, Dijkstra's algorithm works like:
Take the start node
Assign it a depth of zero
Insert it into a priority queue at its depth key
Repeat:
Pop the node with the lowest depth from the priority queue
Record the node that you came from so you can track the path back
Mark the node as having been visited
If this node is the destination:
Break
For each neighbour:
If the node has not previously been visited:
Calculate depth as depth of current node + distance to neighbour
Insert neighbour into the priority queue at the calculated depth.
Return the destination node and list of the nodes through which it was reached.
Contrary to popular belief, Dijkstra's algorithm is not necessarily an all-pairs shortest path calculator, although it can be adapted to do this.
You would have to get a graph of the streets and intersections with the distances between the intersections. If you had this data you could use Dijkstra's algorithm to compute a shortest route.
If you look at technology tomtom calls 'IQ routes', they measure actual speed and travel time per roadstretch per time of day. This makes the arrival time more accurate. So the expected arrival time is more fact-based http://www.tomtom.com/page/iq-routes
Calculating a route using the A* algorithm is plenty fast enough on an iPhone with offline map data. I have experience of doing this commercially. I use the A* algorithm as documented on Wikipedia, and I keep the road network in memory and re-use it; once it's loaded, routing even over a large area like Spain or the western half of Canada is practically instant.
I take data from OpenStreetMap or elswhere and convert it into a directed graph, assuming (which is the right way to do it according to those who know) that any two roads sharing a point with the same ID are joined. I assign weights to different types of roads based on expected speeds, and if a portion of a road is one-way I create only a single arc; two-way roads get two arcs, one in each direction. That's pretty much the whole thing apart from some ad-hoc code to prevent dangerous turns, and implementing routing restrictions.
This was discussed earlier here: What algorithms compute directions from point a to point b on a map?
Have a look at CloudMade. They offer a free service for iPhone and iPad that allows navigation based on your current location. It is built on open street maps and has some nifty features like making your own mapstyle. It is a little slow from time to time but its totally free.