When to use device and when to use constant address space qualifier in metal shading language? - swift

I know that device address space is used when indexing a buffer and constant address space is used when many invocations of the function will access the same portion of the buffer. But I am still not very clear. Thank you!

Based on this Metal Shading Language Specification
device Address Space
The device address space name refers to buffer memory objects
allocated from the device memory pool that are both readable and
writeable. A buffer memory object can be declared as a pointer or
reference to a scalar, vector or userdefined structure. In an app,
Metal API calls allocate the memory for the buffer object, which
determines the actual size of the buffer memory. Some examples are:
// An array of a float vector with four components.
device float4 *color;
struct Foo {
float a[3];
int b[2];
}
// An array of Foo elements.
device Foo *my_info;
Since you always allocate texture objects from the device address
space, you do not need the device address attribute for texture types.
constant Address Space
The constant address space name refers to buffer memory objects
allocated from the device memory pool but are read-only. Variables in
program scope must be declared in the constant address space and
initialized during the declaration statement. The initializer(s)
expression must be a core constant expression. Variables in program
scope have the same lifetime as the program, and their values persist
between calls to any of the compute or graphics functions in the
program.
constant float samples[] = { 1.0f, 2.0f, 3.0f, 4.0f };
Pointers or references to the constant address space are allowed as
arguments to functions. Writing to variables declared in the constant
address space is a compile-time error. Declaring such a variable
without initialization is also a compile-time error. To decide which
address space (device or constant) a read-only buffer passed to a
graphics or kernel function uses, look at how the buffer is accessed
inside the graphics or kernel function. The constant address space is
optimized for multiple instances executing a graphics or kernel
function accessing the same location in the buffer. Some examples of
this access pattern are accessing light or material properties for
lighting / shading, matrix of a matrix array used for skinning, filter
weight accessed from a filter weight array for convolution. If
multiple executing instances of a graphics or kernel function are
accessing the buffer using an index such as the vertex ID, fragment
coordinate, or the thread position in grid, the buffer must be
allocated in the device address space.

Related

Storage of `ray_data` in ray tracing payload

I am currently working with Metal's ray tracing API. I remembered I could pass data from an intersection function to the compute kernel that started the ray intersection process. After rewatching the WWDC 2020 talk Discover ray tracing with Metal by Sean James (linked here), I found the relevant section around 16:13 where he talks about the ray payload.
However, I was curious where this payload is stored as it’s passed to the intersection function. When declared with the relevant [[ payload ]] attribute in the intersection function, it must be in the ray_data address space. According to the Metal Shading Language Specification (version 2.3), pg. 64, the data passed into the intersection function is copied in the ray_data address space and is copied back out once the intersection function returns. However, this doesn't specify if, e.g., the data is stored in tile memory (like data in the threadgroup address space is) or stored in the per-thread memory (thread address space). The video did not specify this either.
In fact, the declarations for the intersect function (see pg. 204) that include the payload term are in the thread address space (which makes sense)
So where does the copied ray_data "version" of the data stored in the thread address space in the kernel go?
According to the answer I received on the Apple Developer Forums,
The way the GPU stores the payload varies between device and there is no particular size. All we can really say is that cost scales roughly with the size so you should minimize that payload. If the payload gets too large you may run into a dramatic performance drop.

How is a variable assigned a memory address?

If I write an instruction x = 7, I understand x to be some address. What then assigns a memory address to x? Is this address a virtual address that is then translated into a physical memory address?
If I write an instruction x = 7, I understand x to be some address. What then assigns a memory address to x?
It depends on the type of var x.
if x is a global or static variable, several tools will cooperate to give it an address
the compiler will write in the object file that it needs to store a global var named x with 4 bytes.
the linker will collect all the global vars in object files, put them in the data segment, and choose a position for them. For instance, x will be at #data_segment+0x1000. The linker will then modify all references to x in the code by #data_segment+0x1000
when it runs the program, the loader will first ask the operating system memory to store the different segments, including data segment. One then knows the value of #data_segment and the actual address of x1.
if x is a local variable, things are slightly simpler. All local vars are in the stack and their address is computed relatively to stack (or frame) pointer by the compiler. So address of x will be something like #stack_pointer+8 and it is generated by the compiler. But its actual value is only known at execution and depends on the stack pointer.
if x is dynamically allocated (malloc-ed), its address is only known at run-time. malloc() asks the OS for chunks of memory and dynamically positions vars in it. x will be put at a position that depends on free space in the memory managed by malloc()
Is this address a virtual address that is then translated into a physical memory address?
All addresses seen by the computer are virtual addresses that are converted to physical memory addresses.
1 Virtual addresses of program segments (including data segment) used to be constant for different executions of the program, but it is no longer true. For security reasons, they are randomized.
There are generally four ways this is done.
1) The variable is mapped to a hardware register. In that case x has no address.
2) The variable has an absolute address. This is usually considered bad form because the code using absolute addresses cannot be relocated; meaning it has to be placed in a fixed location in the address space. However, there are cases where a variable must be at a specific locations, such as some interfaces to devices.
In this case the address of x may be specified by the compiler or by the linker.
3) The variable is defined as an offset from a stack-related register. The is the method used to implement local variables in most programming languages. If you have 4-byte integers and say a C declaration like
int x, y ;
in a function with no other variables, there were be instructions at the top fo the function that look something like:
SUBL2 #8, SP ; Allocate 8 bytes from the stack
MOVL SP, BP ; Set the Base Pointer Register to the start of the allocation
where SP is the stack pointer and BP is some based pointer register.
In that case, x could then be the offset located at BP + 0 and, y could be at BP + 4.
Thus something like
x = y
would look like
MOVL X(BP), Y(BP)
or written as:
MOVL (BP), 4(BP)
The memory location of x and y are entirely determined at run time. Only the offset from the base pointer register is known. In fact, there could be multiple x and y active at the same time having different addresses if their containing function is called recursively or through an interrupt.
4) The memory location is another register offset (usually the program counter).
Let's say you are using traditional uppercase FORTRAN where all variable are static. It is common for the compiler to determine an location for a variable but refer to it using an offset from the program counter register (or some other register). The variable remains in a fixed place at run time but the location could be variable. Using such an offset allows the code to be position independent; meaning it can be loaded anywhere in memory. This allows the code to be used in shared libraries that can be used by multiple programs.
Usually the compiler sets some location for the variable and then that gets fixed by the linker.

Testpoint and its use in Matlab

What is a test point and its intended use in matlab ?
I am working on a model and have to use 3 AND gates in conjunction coupled with similar 2 more AND gates. While checking the model I am getting warning "Identify single logical operator blocks with more than 9 inputs signals.", which is not shown if I use testpoint on each of these AND gate output.
Think of a signal in Simulink as corresponding to a memory location.
In an effort to reduce memory consumption, one of the standard optimizations used by Simulink is to re-use the same memory address when possible.
For instance, assume the input to a gain block is stored at memory location X. Then the output of the gain block would overwrite the data in X. Consequently the input value would no longer be available. But it doesn't need to be as it's value is never used again. (This assumes that the input value is not used elsewhere, such as feeding a block like a Scope.)
In your case, the warning is telling you something about Simulink storing the logical values in memory locations that it subsequently overwrites when possible.
Note that Simulink will never re-use memory when it needs the signal value in subsequent calculations, i.e. when it would effect the simulation result if it did so.
Nor will it re-use memory (for a specific signal) when you designate the signal as being a test point.
This is why the warning is going away in your case.
One particular use of a test point is if you are using a Floating Scope. Floating Scopes cannot be made to look at signals where the memory is being re-used because then it wouldn't be clear which signal was being displaying.
By looking at only test points it is guaranteed that you are looking at the expected data/memory.

Dynamically sized bus objects in Simulink

I wrote a C S function which has a variable number of states depending on one parameter, which is passed to it (I'm using computational fluid dynamics and the parameter is the number of cells). I want to output a bus object from my S function that contains a temperature profile. Problem is I don't know the length of the output when I create the bus object in Simulink (in Bus Editor). Is there a way to dynamically set the size of the bus object from the C S function?
I think you can set the DimensionsMode property to "variable" instead of "fixed" (the default). See Simulink.BusElement and Variable-Size Signal Basics in the documentation for more details. Not sure how to code this in the S-function though.

How do scalar variables perform in memory?

In other languages besides Perl when you declare an integer it has minimum and maximum values based on the amount of space in memory the variable is taking up.
When you declare a scalar variable in Perl, whether it be a number or string, does the language only allocate enough for the variable value and then increase the space if necessary later or does Perl allocate a large amount of memory initially?
In Perl, a scalar variable is a pointer to a C struct called an SV. This includes various fields for metadata like the reference count, a bitfield that determines the exact type, and a pointer to additional (meta-)data.
If you use a scalar as an integer, it is called an IV and contains an integer. The size of this integer is fixed at compilation of perl. You can look at the perl -V output to view the size of various data types. I have ivsize=8. The representable values are the same as for the C integer of that size.
If you use a scalar as a decimal, it is called an NV (numerical value) and contains a double, usually. Again, the exact size is determined at compile time.
If you use a scalar as a string, it is called a PV and contains a pointer to a C string, plus some additional metadata like length. The C string is reallocated if it grows.
If you use a scalar as a string and as a number, it is a PVIV or PVNV resp. and includes the data of both types.
There are additional types like references (RV) or unsigned integers (UV).
For the IV and NV, Perl does not automatically promote the numbers to bignums when they grow large enough.
Then there are hashes HV and arrays AV. These use the SV header for things like reference counting but point to more complicated data structures.
Arrays contain a C array of pointers to SVs. If the array grows, it is reallocated.
Hashes are far more complex. Basically, they are an array as well, but contain hash entries instead of SVs. The elements in this hash are called buckets. If the entries-to-buckets ratio is too high, the array is reallocated (usually to double size) and the entries newly distributed across these buckets. This isn't strictly neccessary, but if this isn't done then lookup is O(n) instead of O(1) (i.e. slow).
Variable sized data structures like strings, arrays, hashes are initially allocated conservatively. If more space is required, then a larger piece of memory is allocated, and the data copied over.
Scalars have a constant-sized header. Additional memory for additional metadata is allocated when the type changes (e.g. through stringification).
For more information and confusing pointer diagrams read the Illustrated Perl Guts.