Is it possible to avoid a wakeup-waiting race using only POSIX semaphores? Is it benign? - queue

I'd like to use POSIX semaphores to manage atomic get and put from a file representing a queue. I want the flexibility of having something named in the filesystem, so that completely unrelated processes can share a queue. I think this plan rules out pthreads. The named posix semaphores are great for putting something in the filesystem that any process can see, but I can't find the standard CondWait primitive:
... decide we have to wait ....
CondWait(sem, cond);
When CondWait is called by a process it atomically posts to sem and waits on cond. When some other process posts to cond, the waiting process wakes up only if it can atomically decrement sem as well. The alternative of
... decide we have to wait ....
sem_post(sem);
sem_wait(cond);
sem_wait(sem);
is subject to a race condition in which some other process signals cond just before this process waits on it.
I hardly ever do any concurrent programming, so I thought I would ask SO: if I use a standard POSIX counting semaphore for the condition variable, is it possible that this race is benign?
Just in case anybody wants the larger context, I am building get and put operations for an atomic queue that can be called from shell scripts.

Since there are no other answers I will follow up with what I've learned:
Pthreads will not work with my application because I have processes without a common ancestor which need to share an atomic queue.
Posix semaphores are subject to the wakeup-waiting race, but because unlike classic condition variables they are counting semaphores, the race is benign. I don't have a proof of this claim but I have had a system running for two days now and working well. (Completely meaningless I know, but at least it meant I got the job done.)
Named Posix semaphores are difficult to garbage-collect from the filesystem.
To summarize, named Posix semaphores turned out to be a good basis for implementing an atomic queue abstraction to be shared among unrelated processes.
I would like to have a proof or a validated SPIN model, but as my need for the application is limited, it seems unlikely that I will write one. I hope this helps someone else who may want to use Posix semaphores.

According to the POSIX standard, the set of semaphore routines is:
sem_close()
sem_destroy()
sem_getvalue()
sem_init()
sem_open()
sem_post()
sem_timedwait()
sem_trywait()
sem_unlink()
sem_wait()
The sem_trywait() and sem_timedwait() functions might be what you are looking for.

I know this question is old, but the obvious solution would be to just use process-shared mutexes and condition variables located in a file you can mmap.

You are looking for: pthread_cond_wait, pthread_cond_signal, I think.
That's if you are using posix threads, then the pthread methods would supply the functionality of CondWait and Signal.
Look here for source code on multiprocess pthreads via shared memory.
http://linux.die.net/man/3/pthread_mutexattr_init
That's for Linux, but the documents are posix. They're similar to Solaris, but you'll want to peruse the man pages on your OS.

Related

What condition variables can do that unlock+yield cannot?

In POSIX, there's the requirement that when a wait is called on a condition variable and a mutex, the 2 operations - unlocking the mutex and blocking the thread, be atomically performed, in such way that any broadcast/signal should take effect as if they happened after blocking. I suppose there should be equivalent requirements on C11, C++ condition variables as well, and I won't go on to do a verbose enumeration.
However, in some system (such as many people's nostalgia WinXP), there wasn't a condition variable mechanism. Instead, they have to perform a unlock+yield to achieve similar (same?) effect. And this works, because even if the broadcast/signal occured in-between the unlock and yield, when the thread is re-scheduled, its observable behavior is the same as if the wake occured after the block. WinXP supported mutex, and it had an SleepEx function that can work like an yield.
So it begs the question: What condition variables can do, that unlock+yield cannot?
In response to the comment: I use WinXP as an example because it happens to be one that supported mutex but not condvar, and the fact that it's one generation's memory. Of course, we assume correctness and reasonable performance, and the question doesn't specifically ask Windows and it asks any implementation in general.

From where is the code for dealing with critical section originated?

While learning the subject of operating systems, Critical Section is a topic which I've come across. To solve this problem, certain methods are provided like semaphores, certain software solutions, etc...etc..etc. But I've a question that from where is the code for implementing these solutions originated? As programmers never are found writing such codes for their program. Suppose I write a simple program executing printf in 'C', I never write any code for critical section problem. And the code is converted into low level instructions and is executed by OS, which behaves as our obedient servant. So, where does code dealing with critical section originate and fit in? Let resources like frame buffer be the critical section.
The OS kernel supplies such inter-thread comms synchronization mechanisms, mutex, semaphore, event, critical section, conditional variables etc. It has to because the kernel needs to block threads that cannot proceed. Many languages provide convenient wrappers around such calls.
Your app accesses them, directly or indirectly, via system calls, ie intrrupts that enter kernel state and ask for such services.
In some cases, a short-term user-space spinlock may get plastered on top, but such code should defer to a system call if the spinner is not quickly satisfied.
In the case of C printf, the relevant library, (stdio usually), will make the calls to lock/unlock the I/O stream, (assuming you have linked in a multithreaded version of the library).

Is there a way to copy files in a non-blocking way in Scala?

I have checked java.nio.file.Files.copy but that blocks a thread until the copy is done. Are there any libraries that allow one to copy a file in a non-blocking way? I need to perform many of these operations simultaneously and cannot afford to have so many threads blocked.
While I could write something myself using non-blocking streams, I would rather use something tried and tested that would guarantee a correct copy every time (or detect if something went wrong).
Check this: Iterate over lines in a file in parallel (Scala)?
val chunkSize = 128 * 1024
val iterator = Source.fromFile(path).getLines.grouped(chunkSize)
iterator.foreach { lines =>
lines.par.foreach { line => process(line) }
}
Reading (copying) files by chunks in parallel. In this case "par" is used.
So it quite non-blocking in terms / scope of processors (cores).
But you may follow same idea of chunks, for example using Akka/Future/Promises to be even in wider scopes.
You may customize you chunk-size deepening on your performance characteristic, level of system load, etc..
One more link that explains possible way to do read / write data from (property) file in parallel using Akka Actors. This is not quite that you might be want, but it may give an idea.
Idea - you may build your own not-blocking way of reading / copying files.
--
And about your statement "While I could write something myself using non-blocking streams":
I would remind that each OS / File System (FS) may have its own vision about what and where to block. Like Windows blocks a file (write-block at leat) if one thread writes to it. On Linux is is configurable. So if you want to stick to something stable, I would suggest to think it out and go with your own wrapper (over FS) solution based on events, chunks, states.
I have used the Process class, issuing an operating system command to copy the file. Of course, one has to check under which OS the application is running, and issue the appropriate command, but this allows for fast and asynchronous copies.
As Marius rightly mentions in the comments, Scala Process blocks, so I run it wrapped in a Future.
Java 8 Process introduces a function isAlive(). A non-blocking alternative would be to use Java 8 processes and use the scheduler to poll at regular intervals to see if the process has finished. However, I did no need to go to this extent.
Have you checked out the async stuff in scala-io?
http://jesseeichar.github.io/scala-io-doc/0.4.2/index.html#!/core/async%20read%20write

Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)?
Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution!
There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down.
Taking a look at the source code and the challenge, how can the speed be improved?
I ran the source code through the Devel::SmallProf profiler. The profile output is a little too verbose to post here, but you can see the results yourself using $ perl -d:SmallProf chameneos.pl 10000 (no need to run it for 6000000 meetings unless you really want to!) See perlperf for more details on some profiling tools in Perl.
It turns out that using semaphores is the major bottleneck. The lion's share of total CPU time is spent on checking whether a semaphore is locked or not. Although I haven't had enough time to look at why the source code uses semaphores, it may be that you can work around having to use semaphores altogether. That's probably your best shot at improving the code's performance.
As Zaid posted, Thread::Semaphore is rather slow. One optimization could be to use the implicit locks on shared variables instead of them. It should be faster, though I suspect it won't be faster by much.
In general, Perl's threading implementation sucks for any kind of usage that requires a lot of interthread communication. It's very suitable for tasks with little communication (as unlike CPython's threads and CRuby's threads they are actually preemptive).
It may be possible to improve that situation, we need better primitives.
I have a version based on another version from Jesse Millikian, which I think was never published.
I think it may run ~ 7x faster than the current entry, and uses standard modules all around. I'm not sure if it actually complies with all the rules though.
I've tried the forks module on it, but I think it slows it down a bit.
Anyone tried s/threads/forks/ on the Perl entry? Or Coro / Coro::MP, though the latter would probably trigger the 'interesting alternative implementations' clause.

Is there a way to have managed processes in Perl (i.e. a threads replacement that actually works)?

I have a multithreded application in perl for which I have to rely on several non-thread safe modules, so I have been using fork()ed processes with kill() signals as a message passing interface.
The problem is that the signal handlers are a bit erratic (to say the least) and often end up with processes that get killed in inapropriate states.
Is there a better way to do this?
Depending on exactly what your program needs to do, you might consider using POE, which is a Perl framework for multi-threaded applications with user-space threads. It's complex, but elegant and powerful and can help you avoid non-thread-safe modules by confining activity to a single Perl interpreter thread.
Helpful resources to get started:
Programming POE presentation by Matt Sergeant (start here to understand what it is and does)
POE project page (lots of cookbook examples)
Plus there are hundreds of pre-built POE components you can use to assemble into an application.
You can always have a pipe between parent and child to pass messages back and forth.
pipe my $reader, my $writer;
my $pid = fork();
if ( $pid == 0 ) {
close $reader;
...
}
else {
close $writer;
my $msg_from_child = <$reader>;
....
}
Not a very comfortable way of programming, but it shouldn't be 'erratic'.
Have a look at forks.pm, a "drop-in replacement for Perl threads using fork()" which makes for much more sensible memory usage (but don't use it on Win32). It will allow you to declare "shared" variables and then it automatically passes changes made to such variables between the processes (similar to how threads.pm does things).
From perl 5.8 onwards you should be looking at the core threads module. Have a look at http://metacpan.org/pod/threads
If you want to use modules which aren't thread safe you can usually load them with a require and import inside the thread entry point.