Determine maximum number of threads that run on different windows systems - threadpool

Can anyone tell me if there is a way to find out the maximum number of threads that can run on different windows systems?
For example - (Assumption)A windows 32-bit system can run maximum 4000 threads.

I doubt there is a maximum number. Well, since we're using a finite amount of memory, it would be as many threads as you can fit into memory or as many as you can keep track of. Each system is different and I know Java and C don't have a function to provide this. C# can tell you how much memory a specific object/app needs so you could go calculate the estimate.
You could test this on your system. Write a sample app which spawns threads and see when you run out of memory. Use a counter to count them. This will give you roughly the range for your system.
In Java, you can use an ExecutorService with a thread pool.. Depending on which executor service you use, it can keep spawning threads if you submit more jobs.
A similar technique exists in C#.
A better question is what the maximum number of threads to spawn and avoid thrashing is.
Are you trying to take over the OS and do your own process/thread management? You should not be doing this.

Related

What is the relation between threads and concurrency?

Concurrency means the ability to allow more than one tasking process at a time
But where does threading fit in it?
What's the relation between threading and concurrency?
What is the important link between these two which will fully clear all the confusion?
Threads are one way to achieve concurrency. Concurrency can be achieved at many levels and in many ways. Here are some of them from low to high level to give you a rough idea:
CPU pipelines: at a hardware level, multiple instructions are executed in parallel (each instruction is at a different stage in the pipeline)
Duplication of ALU and FPU CPU units. There are more arithmetic-logic units and floating point units in a processor that can execute instructions in parallel.
vectorized instructions. Instructions which execute for multiple data.
hyperthreading/SMT. Duplication of the process context.
threads. Streams of instructions which can be executed in parallel.
processes. You run both a browser and a word processor on your system.
tasks. Higher abstraction over threads and async work.
multiple computers. Run your program on multiple computers
I'm new here but I don't really understand the down votes? Could someone explain it to me? Is it just because this question has (likely) been answered or because it's considered obvious?
Now that that's out of the way...
Nothing being executed on the CPU is from a "process" or anything else. They're all threads, scheduled and entirely managed by the kernel using a variety of algorithms to reach expected performance for any given application. The CPU only allows n threads, where n equals (cores * hyperthreads). In most cases hyperthreads will be 2 so you have double the core count to get logical CPU count. What this really means is that instead of 4 (for example) threads being run at once, it can support up to 8. Now the OS may have hundreds of threads at any given time, how is that possible? Well the kernel uses a variety of checks such as how frequently and long the thread sleeps to assign it a priority. Whenever the CPU triggers a timer interrupt the OS will swap out threads appropriately if they've reached their alotted time slice based on the OS determination of its priority.

Can Multiprocessor CPUs avoid context-switching?

Today's computer architecture are trying to maximize the number of registers. It is faster to access a register (which is an integrated memory circuit near the cpu) than to access first-level cache. The problem is, that each context switch has to save all registers into cache, because the next thread needs other register values. What a modern CPU is doing is to cycle in one second through 100 tasks and everytime it saves the registers, and fetches the old one until the task can be started.
IMHO it would be nice to use one CPU for one task, and no context switching is happening. That means we get 100 CPUs, each 1000 registers which has to be never saved. Is that possible or have I a ignored an important detail?
The only way to completely avoid context switching is by having at least as many cores as there are tasks. Generally, there is no guarantee regarding the maximum number of tasks that may run. Current GPUs and manycore processors and co-processors contain hundreds of small cores. If you put multiple of these things in the same system or in a cluster of systems, you can have thousands or more cores. Still, even if you could avoid context switching with such design, these cores are much slower than the traditional high-end CPU cores, so the net effect might be negative.
But let's take a step back here. The number of context switches is not primarily determined by the number of tasks and cores. Tasks don't just perform computations, they also need to interact with I/O devices and wait for things to happen such as results from other tasks or user input. So some tasks would be in a wait state. The overhead of context switching depends on not only the number of tasks but also the behavior of these tasks.
Both processors architects and OS developers are aware of context switching overhead and employ a variety of techniques to alleviate it. For example, x86 provides a number of instructions that are tuned to saving the context (partially) of the current task. The OS thread scheduler uses techniques such as priorities, preemption (with possibly large time slices on servers), and priority boosting. All of these help reducing the number of context switches and therefore their overall overhead. In addition, reducing the overhead of context switching is not the only thing that matters. In particular, the responsiveness of the system is very important as well, which is at odds with that overhead.

Number of processes a real time operating system can handle

I was asked this question in a interview long time back in a design your own RTOS question. Is there a limitation to the number of processes a real time operating system can handle? What would cause this limitation? From what I know each process should have its own PC, call stack, heap, file descriptors, page tables, etc.. I assume the kernel has to keep track of the process using some data structure. Is the limitation derived from this data structure?
In most cases the amount of RAM available is the only limiting factor (as is the case in FreeRTOS), however in a few cases there are limitations imposed by the chosen scheduling algorithm. For example uCOS/II has (I think) a limitation of 255 because of the bitmap scheduler used - but even so 255 is more than you would ever need in a real time system of the type it is designed for.

Number of threads with NSOperationQueueDefaultMaxConcurrentOperationCount

I'm looking for any concrete info related to the number of background threads an NSOperationQueue with create given the NSOperationQueueDefaultMaxConcurrentOperationCount maximum concurrency setting.
I had assumed that some sort of load monitoring is employed to determine the most appropriate number of threads to spawn, plus this setting is recommended in the docs. What I'm finding is that the queue spawns roughly 100 background threads and my app (running on iPad 3 with iOS 5.1.1) crashes with SIGABRT. I've reduced this to a more acceptable number like 3 and everything is working fine.
Any comments or insight would be appreciated.
My experience matches yours (though not to 100 threads; do put in some instrumenting to make sure that you really have that many running simultaneously. I've never seen it go quite that high). Unless you manually manage the number of concurrent operations, NSOperationQueue will tend to generate too many concurrent operations. (I have yet to see anyone refute this with testable code rather than inferences from the documentation.) For anything that may generate a large number of potentially concurrent operations, I recommend setMaxConcurrentOperations. While not ideal, I often wind up using a function like this one to assist (this of course doesn't help you balance between queues, so is very sub-optimal):
unsigned int countOfCores() {
unsigned int ncpu;
size_t len = sizeof(ncpu);
sysctlbyname("hw.ncpu", &ncpu, &len, NULL, 0);
return ncpu;
}
I eagerly await anyone posting real code demonstrating NSOperationQueue automatically performing correct load balancing for CPU-bound operations. I've posted a sample gist demonstrating what I'm talking about. Without calling setMaxConcurrentOperations:, it will spawn about 6 parallel processes on a 2-core iPad 3. In this very simplistic case with no contention or shared resources, this adds about a 10%-15% overhead. In more complicated code with contention (and particularly if operations might be cancelled), it can slow things down by an order of magnitude.
assuming your threads are busy working, 100 active threads in one process on a dual-core iPad is unreasonable. each thread consumes a good amount of time and memory. having that many busy threads is going to slow things down on a dual-core.
regardless of whether you're doing something silly (like sleeping them all or adding run loops or just giving them nothing to do), this would be a bug.
From the documentation:
The default maximum number of operations is determined dynamically by the NSOperationQueue object based on current system conditions.
The iPad 3 has a powerful processor and 1Gb of ram. Since NSOperationQueue calculates the amount of thread based on system conditions, it's very likely that it determined to be able to run a large number of NSOperation based on the power available on that device. The reason why it crashed might not have to do with the amount of threads running simultaneously, but on the code being executed inside those threads. Check the backtrace and see if there is some condition or resource being shared among these treads.

Can memcached make full use of multi-core?

Is memcached capable of making full use of multi-core? Or is there any way tuning this?
memcached has "-t" option:
-t <threads>
Number of threads to use to process incoming requests. This option is only meaningful
if memcached was compiled with thread support enabled. It is typically not useful to
set this higher than the number of CPU cores on the memcached server. The default is
4.
so, I believe it can use all your CPU cores, of course if it was compiled with corresponding option.
memcached is multi-threaded by default and has no problem saturating many cores. It's a bit harder to saturate all cores on more massively parallel boxes (e.g. a 256-core CMT box) just because it gets harder to get the data in and out of the network.
If you find areas where some sort of contention is preventing you from saturating cores, file a bug or start a discussion.
Based on a this research by Intel, Memcached v.1.6 beta cannot scale well on a multicore system. Their experiments show that as core counts increase from 1 to 8, maximum throughput (with a median RTT < 1ms SLA) only doubles.
CAREFUL. This terminology is quite confusing. Memcached man page says -t option is only good up to the number of cores. However, this is odd because threads and processes are VERY different. Threads have NOTHING to do with the number of cores. Processes can definitely run on more than one cor, while threads cannot (unless they call to an OS routine, then they can thread switch and pack in more than 100% cpu usage). Threads share memory and just depend on an instruction pointer to differentiate who is who. Processes share nothing unless it is explicitly declared as shared ahead of time, and sharing occurs via the OS.
Overall, I want MORE CLARITY from the Memcached people about whether their app is multiprocessing or multithreaded and thus if it can use more than 100% of cpu.