Multiple I/O Completion Ports

Multiple I/O Completion Ports - class

Can I create multiple I/O Completion Ports in a single application? I mean, hold two or more CreateIoCompletionPort handles with their own CompletionKey's? My application has 2 IOCP Classes with their own client structures starting from the index 0. I use these indexes in the CompletionKey so I believe that in some point this causing conflict because my application leads to a deadlock without any logical reason. Triple checked for any deadlock situation and run in debugging mode not helped!

Yes. You can create as many IOCPs as you like*.
I expect you have a bug in your code or a standard 'deadlock caused by lock inversions'.
Can you break into the app in the debugger when it has deadlocked and see what the threads are doing?
(* subject to the usual resource limitations, memory, etc).

Related

How Resource Allocation Graph Algorithm can prevent deadlocks?

According to Operating System Concepts book, Resource-Allocation-Graph Algorithm can prevent deadlocks as follow:
If we have the following allocation graph
https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/images/Chapter7/7_07_DeadlockAvoidance.jpg
And P1 tried to allocate resource R2, the system prevents it and makes it wait, because that will lead to an unsafe state.
My question is as shown from the graph, P2 is waiting for P1 to release R1, and P1 is now waiting to allocate R2 and that leads to a deadlock. How this algorithm can prevent this type of deadlocks ?

I don't have a copy of your book, but I suspect a typo. The idea is to return an error (EDEADLOCK) to the resource allocation request that would complete the cycle; thus detecting pending deadlock rather than actively avoiding it. It is still up to the process with the failed request to take some corrective action, like dropping all its resources and trying to re-acquire them.
If you replace resources with semaphore or mutex, it should be clear that waiting isn't going to help anything.
To actively avoid deadlock, you pretty much need to either use semaphore sets -- that is acquire all the locks that a particular code path will need in one place (see system V semaphores) -- or arrange your code to use a particular ordering of locks. An example of the latter is to allocate locks by increasing address, thus all actors will attempt the allocation in the same order. Neither is practical for finely grained general purpose code, but possible for transaction processing applications.

How did the apples fell for threads to be conceived

I was going through the following lecture notes on OS :
http://williamstallings.com/Extras/OS-Notes/h2.html
What I could draw is that "A process is a stream of execution ,i.e.basically a sequence of statements and so is a thread .However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread. For every process at least one thread is allotted or dedicated ,when a process is started the OS activities for that process are taken over by the thread ( or a thread)"
What was the rationale behind conceiving the idea of threads ? When the OS is running a particular process why do we need some intermediate like a thread between them ?
"However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread".
Can the above statement be taken as in the code for a process we cannot access the register states of a another process but in a code for a thread we can access the register states of another thread ?
(The above question did have the substitution of process and thread by their definition as codes or sequences of streams )
P.S : The title of the question is a metaphoric one .Please forgive if it misleads in any way . :P Could I take the liberty to broaden up and ask that if
the processor generates a thread for every process what does it write in the code for a thread ?(How does the code for a thread look like ? )

Terminology - for a system with virtual memory, threads share the same virtual memory address space, while each process has it's own address space. Processes can share physical memory by having a portion of that memory shared into their virtual address spaces (but the virtual address for each process may be different even though it is the same physical memory block).
Early (1960's) instances of multi-processing were mainframes that ran multiple processes that usually did not communicate with each other. Most of this activity was for batch oriented jobs, with a stream of jobs to be run, often from a punched card reader, or in more advanced situations, from remote job entry sites, which were other computers with a few peripherals (card readers, tape drives, line printers, ... ) that communicated with the mainframe to run jobs. There were also time sharing applications, similar to servers, except in many cases, relatively dumb terminals were used to communicate with the main frame. By the 1970's, APL/SV (A Programmming Language / Share Variables) was a time sharing application / programming language that could share variables between users.
For multi-process / multi-threaded operating systems, the device drivers operate from a queue of requests (such as a file read or write). Each request to be added to a device driver queue is done similar to a context switch so there won't be conflicts between process or thread requests for I/O. Some peripherals, such as mainframe, SCSI, or ... disk drives also operated from an internal queue, and could process I/O requests out of order to reduce random access overhead.

The basic problem that drove thread was how can an application handle multiple tasks at the same time and do it in a system-independent manner?
In classical eunuchs, a process could only do one thing at a time. If you needed to handle multiple things you kicked off multiple processes.
In the olde RSX and VMS systems (and Windoze under the covers), programmers relied on software interrupts. A process could queue I/O requests to multiple devices and receive a software interrupt when the request completed, thus allowing the application to do multiple things at once.
Another approach to the multiple things at once problem was to use event queues (Windoze, X Windows).
The ADA programming language was the first (and still really the only) mainstream programming language to support threads (tasks) as a system independent way to handle these kinds of problems. DOD compliance mandates drove the creation of threads.
Originally, threads were implemented through libraries ("use threads", "many to one model"). With the rise of multiprocessor systems, there became an increased demand to be able to have threads execute in parallel on different processors. This drove the creates of kernel threads in operating systems. (Many operating systems still do not support kernel threads).

Is it required to use spin_lock inside tasklets?

As far as I know in interrupt handler, there is no need of synchronization technique. The interrupt handler cannot run concurrently. In short, the pre-emption is disabled in ISR. However, I have a doubt regarding tasklets. As per my knowledge, tasklets runs under interrupt context. Thus, In my opinion, there is no need for spin lock under tasklet function routine. However, I am not sure on it. Can somebody please explain on it? Thanks for your replies.

If data is shared between top half and bottom half then go for lock. Simple rules for locking. Locks meant to protect data not code.
1. What to protect?.
2. Why to protect?
3. How to protect.

Two tasklets of the same type do not ever run simultaneously. Thus, there is no need to protect data used only within a single type of tasklet. If the data is shared between two different tasklets, however, you must obtain a normal spin lock before accessing the data in the bottom half. You do not need to disable bottom halves because a tasklet never preempts another running tasklet on the same processor.

For synchronization between code running in process context (A) and code running in softirq context (B) we need to use special locking primitives. We must use spinlock operations augmented with deactivation of bottom-half handlers on the current processor in (A), and in (B) only basic spinlock operations. Using spinlocks makes sure that we don't have races between multiple CPUs while deactivating the softirqs makes sure that we don't deadlock in the softirq is scheduled on the same CPU where we already acquired a spinlock. (c) Kernel docs

What's the best way to handle incoming messages?

I'm writing a server for an online game, that should be able to handle 1,000-2,000 clients in the end. The 3 ways I found to do this were basically:
1 thread/connection (blocking)
Making a list of clients, and loop through them (non-blocking)
Select (Basically a blocking statement for all clients at once with optional timeout?)
In the past I've used 1, but as we all know, it doesn't scale well. 2 is okay, but I have mixed feelings, about one client technically being able to make everyone else freeze. 3 sounds interesting (a bit better than 2), but I've heard it's not suitable for too many connections.
So, what would be the best way to do it (in D)? Are there other options?

The usual approach is closest to 3: asynchronous programming with a higher-performance select alternative, such as the poll or epoll system calls on Linux, IOCP on Windows, or higher-level libraries wrapping them. D does not support them directly, but you can find D bindings or 3rd-party D libraries (e.g. Tango) providing support for them.
Higher-performance servers (e.g. nginx) use one thread/process per CPU core, and use asynchronous event handling within that thread/process.

One option to consider is to have a single thread that runs the select/pole/epoll but not process the results. Rather it queues up connections known to have results and lets a thread pool feed from that. If checking that a full request has been read in is cheap, you might do that in the poll thread with non-blocking IO and only queue up full requests.
I don't know if D provides any support for any of that aside from (possibly) the inter-thread communication and queuing.

Are nonblocking I/O operations in Perl limited to one thread? Good design?

I am attempting to develop a service that contains numerous client and server sockets (a server service as well as clients that connect out to managed components and persist) that are synchronously polled through IO::Select. The idea was to handle the I/O and/or request processing needs that arise through pools of worker threads.
The shared keyword that makes data shareable across threads in Perl (threads::shared) has its limits--handle references are not among the primitives that can be made shared.
Before I figured out that handles and/or handle references cannot be shared, the plan was to have a select() thread that takes care of the polling, and then puts the relevant handles in certain ThreadQueues spread across a thread pool to actually do the reading and writing. (I was, of course, designing this so that modification to the actual descriptor sets used by select would be thread-safe and take place in one thread only--the same one that runs select(), and therefore never while it's running, obviously.)
That doesn't seem like it's going to happen now because the handles themselves can't be shared, so the polling as well as the reading and writing is all going to need to happen from one thread. Is there any workaround for this? I am referring to the decomposition of the actual system calls across threads; clearly, there are ways to use queues and buffers to have data produced in other threads and actually sent in others.
One problem that arises from this situation is that I have to give select() a timeout, and expect that it'll be high enough to not cause any issues with polling a rather large set of descriptors while low enough not to introduce too much latency into my timing event loop - although, I do understand that if there is actual I/O set membership detected in the polling process, select() will return early, which partly mitigates the problem. I'd rather have some way of waking select() up from another thread, but since handles can't be shared, I cannot easily think of a way of doing that nor see the value in doing so; what is the other thread going to know about when it's appropriate to wake select() anyway?
If no workaround, what is a good design pattern for this type of service in Perl? I have a requirement for a rather high amount of scalability and concurrent I/O, and for that reason went the nonblocking route rather than just spawning threads for each listening socket and/or client and/or server process, as many folks using higher-level languages these days are wont to do when dealing with sockets - it seems to be kind of a standard practice in Java land, and nobody seems to care about java.nio.* outside the narrow realm of systems-oriented programming. Maybe that's just my impression. Anyway, I don't want to do it that way.
So, from the point of view of an experienced Perl systems programmer, how should this stuff be organised? Monolithic I/O thread + pure worker (non-I/O) threads + lots of queues? Some sort of clever hack? Any thread safety gotchas to look out for beyond what I have already enumerated? Is there a Better Way? I have extensive experience architecting this sort of program in C, but not with Perl idioms or runtime characteristics.
EDIT: P.S. It has definitely occurred to me that perhaps a program with these performance requirements and this design should simply not be written in Perl. But I see an awful lot of very sophisticated services produced in Perl, so I am not sure about that.

Bracketing out your several, larger design questions, I can offer a few approaches to sharing filehandles across perl threads.
One may pass $client to a thread start routine or simply reference it in a new thread:
$client = $server_socket->accept();
threads->new(\&handle_client, $client);
async { handle_client($client) };
# $client will be closed only when all threads' references
# to it pass out of scope.
For a Thread::Queue design, one may enqueue() the underlying fd:
$q->enqueue( POSIX::dup(fileno $client) );
# we dup(2) so that $client may safely go out of scope,
# closing its underlying fd but not the duplicate thereof
async {
my $client = IO::Handle->new_from_fd( $q->dequeue, "r+" );
handle_client($client);
};
Or one may just use fds exclusively, and the bit vector form of Perl's select.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse