WDF EvtIoDeviceControl buffer lengths - drivers

When handling DeviceIoControl requests in a WDF driver what is the correct way to get the size of the input/output buffer.
It seems to be both passed as a parameter:
VOID
EvtIoDeviceControl(IN WDFQUEUE Queue, IN WDFREQUEST Request,
IN size_t OutputBufferLength,
IN size_t InputBufferLength,
IN ULONG IoControlCode)
But also available through WdfRequestRetrieveInputBuffer which is required in order to gain access to the request buffer.
I am therefore wondering if there is a difference between the InputBufferLength parameter and the value set by WdfRequestRetrieveInputBuffer (the Length parameter).

There is no difference between those two. The guy who designed the interface thought it would be convenient to have the parameter in both places. I've never found that to be the case, myself, but it's there in case you do.
Jake Oshins

Related

Behaviour of passing struct as a parameter to a CUDA kernel

I'm relatively new to CUDA programming, so I want to clarify the behaviour of a struct when I pass it into a kernel. I've defined the following struct to somewhat imitate the behavior of a 3D array that knows its own size:
struct protoarray {
size_t dim1;
size_t dim2;
size_t dim3;
float* data;
};
I create two variables of type protoarray, dynamically allocate space to data via malloc and cudaMalloc on the host and device side, and update dim1, dim2 and dim3 to reflect the size of array I want this struct to represent. I read in this thread that the struct should be passed via copy. So this is what I do in my kernel
__global__ void kernel(curandState_t *state, protoarray arr_device){
const size_t dim1 = arr_device.dim1;
const size_t dim2 = arr_device.dim2;
for(size_t j(0); j < dim2; j++){
for(size_t i(0); i < dim1; i++){
// Do something
}
}
}
The struct is passed by copy, so all its contents are copied into shared memory of each block. This is where I'm getting bizarre behaviour, which I'm hoping you could help me with. Suppose I had set arr_device.dim1 = 2 on the host side. While debugging inside the kernel and setting a breakpoint at one of the for loops, checking the value of arr_device.dim1 yields something like 16776576, nowhere large enough to cause overflow, but this value copies correctly into dim1 as 2, which means that the for loops execute as I intended them to. As a side question, is using size_t which is essential unsigned long long int bad practice, seeing as the GPU's are made of 32bit cores?
Generally, how safe is it to pass struct and class into kernels as arguments, is bad practice that should be avoided at all cost? I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory, and that they should be very lightweight if I want to pass them by value.
This is a partial answer, since without a proper program to look into, it is difficult/impossible to guess why you would see an invalid value in your arr_device.dim1.
The struct is passed by copy, so all its contents are copied into shared memory of each block.
Incorrect. Kernel arguments are stored in constant memory, which is device-global and not block-specific. They are not stored shared memory (which is block-specific).
When a thread runs, it typically reads arguments from constant memory into registers (and again, not shared memory).
Generally, how safe is it to pass struct and class into kernels as arguments
My personal rule of thumb on this matter is: If the struct/class...
is trivially-copyable; and
all its members of the struct/class are defined both for the host and the device side, or at least - designed with GPU use in mind;
then it should be safe to pass to a kernel.
passing struct and class into kernels as arguments [ - ] is [it] bad practice that should be avoided at all cost?
No. But remember that most C++ libraries only provide host-side code; and were not written with a mind of being used on a GPU. So I'd be wary of using non-trivial classes without a lot of scrutiny.
I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory
Yes, this can be problematic. However - if you used cuda::memory::managed::allocate(), cuda::memory::managed::make_unique() or cudaMallocManaged() - then this should "just work", i.e. the relevant memory pages will be fetched to the GPU or the CPU as necessary when accessed. See:
Unified Memory in CUDA for beginners
Beyond GPU Memory Limits with Unified Memory on Pascal
and that they should be very lightweight if I want to pass [objects to kernels] by value.
Yes, because each and every thread has to read each argument from constant memory before it can use that argument. And while constant memory allows this to happen relatively quickly, it's still a bunch of overhead that you want to minimize.
Also remember that you can't pass anything to kernels by (C++) reference; it's all "by-value" - the object itself or a pointer to it.

Do I call g_free on the return from gtk_text_buffer_get_text

The gtk documentation makes it clear in most areas on whether you have to call g_free or g_object_unref on objects returned in response to a gtk function/sub call.
However, in a few locations, it is not explicitly specified or some cryptic message is specified, making the assumption that the reader is informed enough to figure it out.
In the case of gtk_text_buffer_get_text, it specifies that the returned is "an allocated UTF-8 string."
Does that mean it is an internal needed by gtk allocated string. Or is it an allocated just for my return string that has to have g_free called on it?
In general, what is the proper assumption on my part if it is not specified what to do with a string/object returned?
The rule of thumb for returned strings is:
const char* values are owned by GTK, and should not be freed
char* values are allocated, and the ownership is transferred to you
In ambiguous situations, especially for random pointers, the API is annotated with a transfer none tag, which means the ownership is not transferred alongside the pointer, and that the callee still owns the data; or a transfer full tag, which means the owner of the pointer is the caller.

"nfds" parameter in select() in sockets programming

I never really got the idea behind this parameter, what is it good for? I also noticed this parameter is ignored in WinSock2, why is that?
Do Unix systems use this parameter or do they ignore it as well?
Windows' implementation of select() uses linked lists internally, so it doesn't need to use the nfds parameter for anything.
On other OS's, however, the fd_set struct is implemented to hold an array of bits (one bit per socket). For example, here is how it is declared (in sys/_types/_fd_def.h) under MacOS/X:
typedef struct fd_set {
__int32_t fds_bits[__DARWIN_howmany(__DARWIN_FD_SETSIZE, __DARWIN_NFDBITS)];
} fd_set;
... and in order to do the right thing, the select() call will have to loop over the bits in the array to see what they contain. By supplying select() with the nfds parameter, we tell the select() implementation that it only needs to iterate over the first (nfds) bits of the array, rather than always having to iterate over the entire array on every call. This allows select() to be more efficient than it would otherwise be.

NSInvocation needing NSMethodSignature

I have been wondering for a couple of days if NSInvocation should need the NSMethodSignature.
Lets say we want to write our own NSInvocation, my requirements would be as so:
I need a selector SEL
The target object to call the selector on
The argument array
Then i would get the IMP out from the target and the SEL, and pass the argument as parameters.
So, my question is, why do we need an NSMethodSignature to construct and use an NSInvocation?
Note: I do know that by having only a SEL and a target, we dont have the arguments and return type for this method, but why would we care about the types of the args and returns?
Each type in C has a different size. (Even the same type can have different sizes on different systems, but we'll ignore that for now.) int can have 32 or 64 bits, depending on the system. double takes 64 bits. char represents 8 bits (but might be passed as a regular int depending on the system's passing convention). And finally and most importantly, struct types have various sizes, depending on how many elements are in it and each of their sizes; there is no bound to how big it can be. So it is impossible to pass arguments the same way regardless of type. Therefore, how the calling function arranges the arguments, and how the called function interprets its arguments, must be dependent on the function's signature. (You cannot have a type-agnostic "argument array"; what would be the size of the array elements?) When normal function calls are compiled, the compiler knows the signature at compile time, and can arrange it correctly according to calling convention. But NSInvocation is for managing an invocation at runtime. Therefore, it needs a representation of the method signature to work.
There are several things that the NSInvocation can do. Each of those things requires knowledge of the number and types (at least the sizes of the types) of the parameters:
When a message is sent to an object that does not have a method for it, the runtime constructs an NSInvocation object and passes it to -forwardInvocation:. The NSInvocation object contains a copy of all the arguments that are passed, since it can be stored and invoked later. Therefore, the runtime needs to know, at the very least, how big the parameters in total are, in order to copy the right amount of data from registers and/or stack (depending on how arguments are arranged in the calling convention) into the NSInvocation object.
When you have an NSInvocation object, you can query for the value of the i'th argument, using -getArgument:atIndex:. You can also set/change the value for the i'th argument, using -setArgument:atIndex:. This requires it to know 1) Where in its data buffer the i'th parameter starts; this requires knowing how big the previous parameters are, and 2) how big the i'th parameter is, so that it can copy the right amount of data (if it copies too little, it will have a corrupt value; if it copies too much, say, when you do getArgument, it can overwrite the buffer you gave it; or when you do setArgument, overwrite other arguments).
You can have it do -retainArguments, which causes it to retain all arguments of object pointer type. This requires it to distinguish between object pointer types and other types, so the type information must include not only the size.
You can invoke the NSInvocation, which causes it to construct and execute the call to the method. This requires it to know, at the very least, how much data to copy from its buffer into the registers/stack to place all the data where the function will expect it. This requires knowing at least the combined size of all parameters, and probably also needs to know the sizes of individual parameters, so that the divide between parameters on registers and parameters on stack can be figured out correctly.
You can get the return value of the call using -getReturnValue:; this has similar issues to the getting of arguments above.
A thing not mentioned above is that the return type may also have a great effect on the calling mechanism. On x86 and ARM, the common architectures for Objective-C, when the return type is a struct type, the calling convention is very different -- effectively an additional (first) parameter is added before all the regular parameters, which is a pointer to the space that the struct result should be written. This is instead of the regular calling convention where the result is returned in a register. (In PowerPC I believe that double return type is also treated specially.) So knowing the return type is essentially for constructing and invoking the NSInvocation.
NSMethodSignature is required for the message sending and forwarding mechanism to work properly for invocations. NSMethodSignature and NSInvocation were built as a wrapper around __builtin_call(), which is both architecture dependent, and extremely conservative about the stack space a given function requires. Hence, when invocations are invoked, __builtin_call() gets all the information it needs from the method signature, and can fail gracefully by throwing the call out to the forwarding mechanism knowing that it, too, receives the proper information about how the stack should look for re-invocation.
That being said, you cannot make a primitive NSInvocation without a method signature without modifying the C language to support converting arrays to VARARGS seeing as objc_msgSend() and its cousins won't allow it. Even if you could get around that, you'd need to calculate the size of the arguments and the return type (not too hard, but if you're wrong, you're wrong big time), and manage a proper call to __builtin_call(), which would require an intimate knowledge of the message sending architecture or an ffi (which probably drops down to __builtin_call() anyhow).

An IOCP documentation interpretation question - buffer ownership ambiguity

Since I'm not a native English speaker I might be missing something so maybe someone here knows better than me.
Taken from WSASend's doumentation at MSDN:
lpBuffers [in]
A pointer to an array of WSABUF
structures. Each WSABUF structure
contains a pointer to a buffer and the
length, in bytes, of the buffer. For a
Winsock application, once the WSASend
function is called, the system owns
these buffers and the application may
not access them. This array must
remain valid for the duration of the
send operation.
Ok, can you see the bold text? That's the unclear spot!
I can think of two translations for this line (might be something else, you name it):
Translation 1 - "buffers" refers to the OVERLAPPED structure that I pass this function when calling it. I may reuse the object again only when getting a completion notification about it.
Translation 2 - "buffers" refer to the actual buffers, those with the data I'm sending. If the WSABUF object points to one buffer, then I cannot touch this buffer until the operation is complete.
Can anyone tell what's the right interpretation to that line?
And..... If the answer is the second one - how would you resolve it?
Because to me it implies that for each and every data/buffer I'm sending I must retain a copy of it at the sender side - thus having MANY "pending" buffers (in different sizes) on an high traffic application, which really going to hurt "scalability".
Statement 1:
In addition to the above paragraph (the "And...."), I thought that IOCP copies the data to-be-sent to it's own buffer and sends from there, unless you set SO_SNDBUF to zero.
Statement 2:
I use stack-allocated buffers (you know, something like char cBuff[1024]; at the function body - if the translation to the main question is the second option (i.e buffers must stay as they are until the send is complete), then... that really screws things up big-time! Can you think of a way to resolve it? (I know, I asked it in other words above).
The answer is that the overlapped structure and the data buffer itself cannot be reused or released until the completion for the operation occurs.
This is because the operation is completed asynchronously so even if the data is eventually copied into operating system owned buffers in the TCP/IP stack that may not occur until some time in the future and you're notified of when by the write completion occurring. Note that with write completions these may be delayed for a surprising amount of time if you're sending without explicit flow control and relying on the the TCP stack to do flow control for you (see here: some OVERLAPS using WSASend not returning in a timely manner using GetQueuedCompletionStatus?) ...
You can't use stack allocated buffers unless you place an event in the overlapped structure and block on it until the async operation completes; there's not a lot of point in doing that as you add complexity over a normal blocking call and you don't gain a great deal by issuing the call async and then waiting on it.
In my IOCP server framework (which you can get for free from here) I use dynamically allocated buffers which include the OVERLAPPED structure and which are reference counted. This means that the cleanup (in my case they're returned to a pool for reuse) happens when the completion occurs and the reference is released. It also means that you can choose to continue to use the buffer after the operation and the cleanup is still simple.
See also here: I/O Completion Port, How to free Per Socket Context and Per I/O Context?