How to call select on sockets in Haskell? [duplicate] - sockets

How could I watch several files/sockets from Haskell and wait for these to become readable/writable?
Is there anything like the select/epoll/... in Haskell? Or I am forced to spawn one thread per file/socket and always use the blocking resource from within that thread?

The question is wrong: you aren't forced to spawn one thread per file/socket and use blocking calls, you get to spawn one thread per file/socket and use blocking calls. This is the cleanest solution (in any language); the only reason to avoid it in other languages is that it's a bit inefficient there. GHC's threads are cheap enough, however, that it is not inefficient in Haskell. (Additionally, behind the scenes, GHC's IO manager uses an epoll-alike to wake up threads as appropriate.)

There's a wrapper for select(2): https://hackage.haskell.org/package/select
Example usage here: https://github.com/pxqr/udev/blob/master/examples/monitor.hs#L36
There's a wrapper for poll(2):
https://hackage.haskell.org/package/poll
GHC base comes with functionality that wraps epoll on Linux (and equivalent on other platforms) in the GHC.Event module.
Example usage:
import GHC.Event
import Data.Maybe (fromMaybe)
import Control.Concurrent (threadDelay)
main = do
fd <- getSomeFileDescriptorOfInterest
mgr <- fromMaybe (error "Must be compiled with -threaded") <$> getSystemEventManager
registerFd mgr (\fdkey event -> print event) fd evtRead OneShot
threadDelay 100000000
More documentation at http://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Event.html
Example use of an older version of the lib at https://wiki.haskell.org/Simple_Servers#Epoll-based_event_callbacks
Though, the loop in that example has since been moved to the hidden module GHC.Event.Manager, and is not exported publicly as far as I can tell. GHC.Event itself says "This module should be considered GHC internal."
In Control.Concurrent there's threadWaitRead and threadWaitWrite.
So, to translate the above epoll example:
import Control.Concurrent (threadWaitRead)
main = do
fd <- getSomeFileDescriptorOfInterest
threadWaitRead fd
putStrLn "Got a read ready event"
You can wrap the threadWaitRead and subsequent IO action in Control.Monad.forever to run them repeatedly. You can also wrap the thing in forkIO to run it in the background while your program does something else.

Related

Solutions to Pythons Multiprocessing Queue buffer dead lock? How to "get" from multiprocessing Queue when its full and continue multiprocessing?

This question is about multiprocessing with Python and Pythons multiprocessing Queue buffer limitations rendered by my computers OS pipe. Basically, I hit the limitation of Pythons multiprocessing Queues buffer.
Here is the my simple implementation of what i have so far
import os
from multiprocessing import Queue,Lock,Manager
def threaded_results(q,*args):
"""do something"""
q.put(*args)
def main():
manager = Manager()
return_dict = manager.dict()
cpu = os.cpu_count()
q = Queue()
processes = []
for i in range(cpu):
p = Process(target=threaded_results,args=(q,*args))
processes.append(p)
p.start()
for p in processes:
p.join()
results = [q.get() for proc in processes]
I read that i have to empty the queue first before adding back to the queue orchestrated by some thing called a semaphore. I'm considering using my own defined data structure or refactor my design of my code. The question is, are there any conventional solutions to bypass the OS level Queue buffer limitations for storing things in cache memory using Python? How to "get" multiprocessing Queue when its full and continue multiprocessing?
After working with the multiprocessing library for a while, I've found that the simplest way to implement a robust multiprocessing queue is to use multiprocessing.Manager objects. From the docs:
Create a shared queue.Queue object and return a proxy for it.
Rather than allocating a separate thread for flushing data through a pipe, a Manager object creates and manages a standard multithreading queue, which doesn't have to have data flushed through a Pipe (haven't looked through the source code, so I can't say for sure). This means your code can keep chugging away practically indefinitely.
None of this is free, and I've found that the managed queue operates much (almost 20x) slower than a multiprocessing queue in a simple test, though the difference isn't nearly as noticeable when the queue is integrated into a full system, due to other bottlenecks.
Using managed queues can make your IPC far more robust, and it's likely a good idea to take the performance trade-off unless you can find a way to live with the unreliability of a normal multiprocessing queue.

Use persistent external program for occasional input / output translation in Scala

I'm writing some Scala code that needs to make use of a external command line program for string translation. The external program takes many minutes to start up, then listens for data on stdin (terminated by newline), converts the data, and prints the converted data to stdout (again terminated by newline). It will remain alive forever until it receives a SIGINT.
For simplicity, let's assume the external command runs like this:
$ convert
input1
output2
input2
output2
$
convert, input1, and input2 were all typed by me; output1 and output2 were written by the program to stdout. I typed Control-C at the end to return to the shell.
In my Scala code, I'd like to start up this external program, and keep it running in the background (because it is costly to startup, but cheap to keep running once it's initialized), while providing three methods to the rest of my program with an API like:
def initTranslation(): Unit
def translate(input: String): String
def stopTranslation(): Unit
initTranslation should start up the external program and keep it running in the background.
translate should put the input argument on the stdin of the external program (followed by newline), wait for output (followed by newline), and then return the output.
stopTranslation should send SIGINT to the external program.
I've worked with Java and Scala external process management before, but don't have too much experience with Java pipes, but am not 100% sure how to hook this all up. In particular, I've read that there are subtle gotchas with regards to deadlocks when I/O pipes get hooked up in situations similar to this. I'm sure I'll need some Thread to watch start up and watch over the background process in initTranslation, some piping to send a String to stdin followed by blocking to wait for receiving data and a newline on stdout in translate, then some sort of termination of the external program in stopTranslation.
I'd like to achieve this with as much pure Scala as possible, though I realize that this may require some bits of the Java I/O library. I also do not want to use any third party Scala or Java libraries (anything outside java.*, javax.* or scala.*)
What would these three methods look like?
It turns out that this is quite a bit easier than I first expected. I had been misled by various posts and recommendations (off SO) which had suggested that this would be more complex.
Caveats to this solution:
All Java. Yes, I know I mentioned that I'd rather use the Scala standard library, but this is sufficiently succinct that I think it warrants an answer.
Limited error handling - among other things, if the external program explodes and reports errors to stderr, I'm not handling that. Certainly, that could be added on later.
Usage of var for storage of local variables. Clearly, var is frowned upon for best-practice Scala use, but this example illustrates the object state needed, and you can structure your variables in your own programs as you like.
No thread-safety. If you need thread-safety, because multiple threads might call any of the following methods, use some synchronization constructs (like the synchronized keyword in the translate method) to protect yourself.
Solution:
import java.io.BufferedReader
import java.io.InputStreamReader
import java.lang.Process
import java.lang.ProcessBuilder
var process: Process = _
var outputReader: BufferedReader = _
def initTranslation(): Unit = {
process = new ProcessBuilder("convert").start()
outputReader = new BufferedReader(new InputStreamReader(process.getInputStream()))
}
def translate(input: String): String = {
// write path to external program
process.getOutputStream.write(cryptoPath.getBytes)
process.getOutputStream.write(System.lineSeparator.getBytes)
process.getOutputStream.flush()
// wait for input from program
outputReader.readLine()
}
def stopTranslation(): Unit = {
process.destroy()
}

threads in Dancer

I'm using Dancer 1.31, in a standard configuration (plackup/Starman).
In a request I wished to call a perl function asynchronously, so that the request returns inmmediately. Think of the typical "long running operation" scenario, in which one wants to return a "processing page" with a refresh+redirect.
I (naively?) tried with a thread:
sub myfunc {
sleep 9; # just for testing a slow operation
}
any '/test1' => sub {
my $thr = threads->create('myfunc');
$thr->detach();
return "done" ;
};
I does not work, the server seems to freeze, and the error log does not show anything. I guess manual creation of threads are forbidden inside Dancer? It's an issue with PSGI? Which is the recommended way?
I would stay away from perl threads especially in a web server environment. It will most likely crash your server when you join or detach them.
I usually create a few threads (thread pool) BEFORE initializing other modules and keep them around for the entire life time of the application. Thread::Queue nicely provides communication between the workers and the main thread.
The best asynchronous solution I find in Perl is POE. In Linux I prefer using POE::Wheel::Run to run executables and subroutines asynchronously. It uses fork and has a beautiful interface allowing communication with the child process. (In Windows it's not usable due to thread dependency)
Setting up Dancer and POE inside the same application/script may cause problems and POE's event loop may be blocked. A single worker thread dedicated to POE may come handy, or I would write another server based on POE and just communicate with the Dancer application via sockets.
Threads are definitively iffy with Perl. It might be possible to write some threaded Dancer code, but to be honest I don't think we ever tried it. And considering that Dancer 1's core use simpleton classes, it might also be very tricky.
As Ogla says, there are other ways to implement asynchronous behavior in Dancer. You say that you are using Starman, which is a forking engine. But there is also Twiggy, which is AnyEvent-based. To see how to leverage it to write asynchronous code, have a gander at Dancer::Plugin::Async.

Scala actors: receive vs react

Let me first say that I have quite a lot of Java experience, but have only recently become interested in functional languages. Recently I've started looking at Scala, which seems like a very nice language.
However, I've been reading about Scala's Actor framework in Programming in Scala, and there's one thing I don't understand. In chapter 30.4 it says that using react instead of receive makes it possible to re-use threads, which is good for performance, since threads are expensive in the JVM.
Does this mean that, as long as I remember to call react instead of receive, I can start as many Actors as I like? Before discovering Scala, I've been playing with Erlang, and the author of Programming Erlang boasts about spawning over 200,000 processes without breaking a sweat. I'd hate to do that with Java threads. What kind of limits am I looking at in Scala as compared to Erlang (and Java)?
Also, how does this thread re-use work in Scala? Let's assume, for simplicity, that I have only one thread. Will all the actors that I start run sequentially in this thread, or will some sort of task-switching take place? For example, if I start two actors that ping-pong messages to each other, will I risk deadlock if they're started in the same thread?
According to Programming in Scala, writing actors to use react is more difficult than with receive. This sounds plausible, since react doesn't return. However, the book goes on to show how you can put a react inside a loop using Actor.loop. As a result, you get
loop {
react {
...
}
}
which, to me, seems pretty similar to
while (true) {
receive {
...
}
}
which is used earlier in the book. Still, the book says that "in practice, programs will need at least a few receive's". So what am I missing here? What can receive do that react cannot, besides return? And why do I care?
Finally, coming to the core of what I don't understand: the book keeps mentioning how using react makes it possible to discard the call stack to re-use the thread. How does that work? Why is it necessary to discard the call stack? And why can the call stack be discarded when a function terminates by throwing an exception (react), but not when it terminates by returning (receive)?
I have the impression that Programming in Scala has been glossing over some of the key issues here, which is a shame, because otherwise it's a truly excellent book.
First, each actor waiting on receive is occupying a thread. If it never receives anything, that thread will never do anything. An actor on react does not occupy any thread until it receives something. Once it receives something, a thread gets allocated to it, and it is initialized in it.
Now, the initialization part is important. A receiving thread is expected to return something, a reacting thread is not. So the previous stack state at the end of the last react can be, and is, wholly discarded. Not needing to either save or restore the stack state makes the thread faster to start.
There are various performance reasons why you might want one or other. As you know, having too many threads in Java is not a good idea. On the other hand, because you have to attach an actor to a thread before it can react, it is faster to receive a message than react to it. So if you have actors that receive many messages but do very little with it, the additional delay of react might make it too slow for your purposes.
The answer is "yes" - if your actors are not blocking on anything in your code and you are using react, then you can run your "concurrent" program within a single thread (try setting the system property actors.maxPoolSize to find out).
One of the more obvious reasons why it is necessary to discard the call stack is that otherwise the loop method would end in a StackOverflowError. As it is, the framework rather cleverly ends a react by throwing a SuspendActorException, which is caught by the looping code which then runs the react again via the andThen method.
Have a look at the mkBody method in Actor and then the seq method to see how the loop reschedules itself - terribly clever stuff!
Those statements of "discarding the stack" confused me also for a while and I think I get it now and this is my understanding now. In case of "receive" there is a dedicated thread blocking on the message (using object.wait() on a monitor) and this means that the complete thread stack is available and ready to continue from the point of "waiting" on receiving a message.
For example if you had the following code
def a = 10;
while (! done) {
receive {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after receive and printing a " + a)
}
the thread would wait in the receive call until the message is received and then would continue on and print the "after receive and printing a 10" message and with the value of "10" which is in the stack frame before the thread blocked.
In case of react there is no such dedicated thread, the whole method body of the react method is captured as a closure and is executed by some arbitrary thread on the corresponding actor receiving a message. This means only those statements that can be captured as a closure alone will be executed and that's where the return type of "Nothing" comes to play. Consider the following code
def a = 10;
while (! done) {
react {
case msg => println("MESSAGE RECEIVED: " + msg)
}
println("after react and printing a " + a)
}
If react had a return type of void, it would mean that it is legal to have statements after the "react" call ( in the example the println statement that prints the message "after react and printing a 10"), but in reality that would never get executed as only the body of the "react" method is captured and sequenced for execution later (on the arrival of a message). Since the contract of react has the return type of "Nothing" there cannot be any statements following react, and there for there is no reason to maintain the stack. In the example above variable "a" would not have to be maintained as the statements after the react calls are not executed at all. Note that all the needed variables by the body of react is already be captured as a closure, so it can execute just fine.
The java actor framework Kilim actually does the stack maintenance by saving the stack which gets unrolled on the react getting a message.
Just to have it here:
Event-Based Programming without Inversion of Control
These papers are linked from the scala api for Actor and provide the theoretical framework for the actor implementation. This includes why react may never return.
I haven't done any major work with scala /akka, however i understand that there is a very significant difference in the way actors are scheduled.
Akka is just a smart threadpool which is time slicing execution of actors...
Every time slice will be one message execution to completion by an actor unlike in Erlang which could be per instruction?!
This leads me to think that react is better as it hints the current thread to consider other actors for scheduling where as receive "might" engage the current thread to continue executing other messages for the same actor.

Is there a way to have managed processes in Perl (i.e. a threads replacement that actually works)?

I have a multithreded application in perl for which I have to rely on several non-thread safe modules, so I have been using fork()ed processes with kill() signals as a message passing interface.
The problem is that the signal handlers are a bit erratic (to say the least) and often end up with processes that get killed in inapropriate states.
Is there a better way to do this?
Depending on exactly what your program needs to do, you might consider using POE, which is a Perl framework for multi-threaded applications with user-space threads. It's complex, but elegant and powerful and can help you avoid non-thread-safe modules by confining activity to a single Perl interpreter thread.
Helpful resources to get started:
Programming POE presentation by Matt Sergeant (start here to understand what it is and does)
POE project page (lots of cookbook examples)
Plus there are hundreds of pre-built POE components you can use to assemble into an application.
You can always have a pipe between parent and child to pass messages back and forth.
pipe my $reader, my $writer;
my $pid = fork();
if ( $pid == 0 ) {
close $reader;
...
}
else {
close $writer;
my $msg_from_child = <$reader>;
....
}
Not a very comfortable way of programming, but it shouldn't be 'erratic'.
Have a look at forks.pm, a "drop-in replacement for Perl threads using fork()" which makes for much more sensible memory usage (but don't use it on Win32). It will allow you to declare "shared" variables and then it automatically passes changes made to such variables between the processes (similar to how threads.pm does things).
From perl 5.8 onwards you should be looking at the core threads module. Have a look at http://metacpan.org/pod/threads
If you want to use modules which aren't thread safe you can usually load them with a require and import inside the thread entry point.