Are nonblocking I/O operations in Perl limited to one thread? Good design? - perl

I am attempting to develop a service that contains numerous client and server sockets (a server service as well as clients that connect out to managed components and persist) that are synchronously polled through IO::Select. The idea was to handle the I/O and/or request processing needs that arise through pools of worker threads.
The shared keyword that makes data shareable across threads in Perl (threads::shared) has its limits--handle references are not among the primitives that can be made shared.
Before I figured out that handles and/or handle references cannot be shared, the plan was to have a select() thread that takes care of the polling, and then puts the relevant handles in certain ThreadQueues spread across a thread pool to actually do the reading and writing. (I was, of course, designing this so that modification to the actual descriptor sets used by select would be thread-safe and take place in one thread only--the same one that runs select(), and therefore never while it's running, obviously.)
That doesn't seem like it's going to happen now because the handles themselves can't be shared, so the polling as well as the reading and writing is all going to need to happen from one thread. Is there any workaround for this? I am referring to the decomposition of the actual system calls across threads; clearly, there are ways to use queues and buffers to have data produced in other threads and actually sent in others.
One problem that arises from this situation is that I have to give select() a timeout, and expect that it'll be high enough to not cause any issues with polling a rather large set of descriptors while low enough not to introduce too much latency into my timing event loop - although, I do understand that if there is actual I/O set membership detected in the polling process, select() will return early, which partly mitigates the problem. I'd rather have some way of waking select() up from another thread, but since handles can't be shared, I cannot easily think of a way of doing that nor see the value in doing so; what is the other thread going to know about when it's appropriate to wake select() anyway?
If no workaround, what is a good design pattern for this type of service in Perl? I have a requirement for a rather high amount of scalability and concurrent I/O, and for that reason went the nonblocking route rather than just spawning threads for each listening socket and/or client and/or server process, as many folks using higher-level languages these days are wont to do when dealing with sockets - it seems to be kind of a standard practice in Java land, and nobody seems to care about java.nio.* outside the narrow realm of systems-oriented programming. Maybe that's just my impression. Anyway, I don't want to do it that way.
So, from the point of view of an experienced Perl systems programmer, how should this stuff be organised? Monolithic I/O thread + pure worker (non-I/O) threads + lots of queues? Some sort of clever hack? Any thread safety gotchas to look out for beyond what I have already enumerated? Is there a Better Way? I have extensive experience architecting this sort of program in C, but not with Perl idioms or runtime characteristics.
EDIT: P.S. It has definitely occurred to me that perhaps a program with these performance requirements and this design should simply not be written in Perl. But I see an awful lot of very sophisticated services produced in Perl, so I am not sure about that.

Bracketing out your several, larger design questions, I can offer a few approaches to sharing filehandles across perl threads.
One may pass $client to a thread start routine or simply reference it in a new thread:
$client = $server_socket->accept();
threads->new(\&handle_client, $client);
async { handle_client($client) };
# $client will be closed only when all threads' references
# to it pass out of scope.
For a Thread::Queue design, one may enqueue() the underlying fd:
$q->enqueue( POSIX::dup(fileno $client) );
# we dup(2) so that $client may safely go out of scope,
# closing its underlying fd but not the duplicate thereof
async {
my $client = IO::Handle->new_from_fd( $q->dequeue, "r+" );
handle_client($client);
};
Or one may just use fds exclusively, and the bit vector form of Perl's select.

Related

Asynchronous blocking thread magic

I've been learning play, and I'm getting most of the major concepts, but I'm struggling with what magic the platform is doing to enable all of these things.
In particular, let's say I have a controller that does something time-intensive. Now I understand how using Futures and asynchronous processing I can make these things appear not to block, but if it's something resource intensive, of course in the end it must block somewhere. Per the documentation:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
This bit I'm not understanding: if some task that I'm doing via a Future is possibly being handled in a separate thread pool, how/what magic is Scala/Play doing in the framework to coordinate these threads such that whichever thread is listening to the HTTP socket blocks long enough to do all of the complex processing (DB loads, serialization to JSON, etc. etc.) -- in separate threads, and yet somehow returning to the original blocking thread that has to send something back to the client for that request?
Disclaimer: this is a simplified answer for the general problem, I don't want to make this even more complex by going inside Play and Akka internals.
One method is to have a thread listening to the socket, but not writing to it, let's call it A. A spans a Future that contains, on itself, all the data needed for the computation. It is important that you don't confuse the thread that does the processing with the data that is being processed, as the data (memory) is shared by all threads (and sometimes needs explicit synchronization). The future will be processed (eventually), by a thread B.
Now, do I need for A to block until B is done? It could (and in many general cases that might be the right solution), but in this case, we hardly want to stop listening to our socket. So no, we don't, A forgets everything about the message and carries on with its life.
So when B is done, the Future might be mapped or have a listener that sends the proper response. B itself can send it given the information that it has on the original message! You just need to be careful synchronizing access to the socket, to avoid colliding with a possible thread C that might have been processing a previous or later message in parallel.
Things can obviously get more complex by having threads spawning even more threads, queues where some threads write data and other read data, etc. (Play, being based in Akka, certainly includes a lot of message queues). But I hope to have convinced you that while this statement is correct:
You can’t magically turn synchronous IO into asynchronous by wrapping
it in a Future. If you can’t change the application’s architecture to
avoid blocking operations
Such a change in application's architecture is certainly possible in many (most?) cases, and certainly has been done inside Play.

Concurrency, how to create an efficient actor setup?

Alright so I have never done intense concurrent operations like this before, theres three main parts to this algorithm.
This all starts with a Vector of around 1 Million items.
Each item gets processed in 3 main stages.
Task 1: Make an HTTP Request, Convert received data into a map of around 50 entries.
Task 2: Receive the map and do some computations to generate a class instance based off the info found in the map.
Task 3: Receive the class and generate/add to multiple output files.
I initially started out by concurrently running task 1 with 64K entries across 64 threads (1024 entries per thread.). Generating threads in a for loop.
This worked well and was relatively fast, but I keep hearing about actors and how they are heaps better than basic Java threads/Thread pools. I've created a few actors etc. But don't know where to go from here.
Basically:
1. Are actors the right way to achieve fast concurrency for this specific set of tasks. Or is there another way I should go about it.
2. How do you know how many threads/actors are too many, specifically in task one, how do you know what the limit is on number of simultaneous connections is (Im on mac). Is there a golden rue to follow? How many threads vs how large per thread pool? And the actor equivalents?
3. Is there any code I can look at that implements actors for a similar fashion? All the code Im seeing is either getting an actor to print hello world, or super complex stuff.
1) Actors are a good choice to design complex interactions between components since they resemble "real life" a lot. You can see them as different people sending each other requests, it is very natural to model interactions. However, they are most powerful when you want to manage changing state in your application, which does not seem to be the case for you. You can achieve fast concurrency without actors. Up to you.
2) If none of your operations is blocking the best rule is amount of threads = amount of CPUs. If you use a non blocking HTTP client, and NIO when writing your output files then you should be fully non-blocking on IOs and can just safely set the thread count for your app to the CPU count on your machine.
3) The documentation on http://akka.io is very very good and comprehensive. If you have no clue how to use the actor model I would recommend getting a book - not necessarily about Akka.
1) It sounds like most of your steps aren't stateful, in which case actors add complication for no real benefit. If you need to coordinate multiple tasks in a mutable way (e.g. for generating the output files) then actors are a good fit for that piece. But the HTTP fetches should probably just be calls to some nonblocking HTTP library (e.g. spray-client - which will in fact use actors "under the hood", but in a way that doesn't expose the statefulness to you).
2) With blocking threads you pretty much have to experiment and see how many you can run without consuming too many resources. Worry about how many simultaneous connections the remote system can handle rather than hitting any "connection limits" on your own machine (it's possible you'll hit the file descriptor limit but if so best practice is just to increase it). Once you figure that out, there's no value in having more threads than the number of simultaneous connections you want to make.
As others have said, with nonblocking everything you should probably just have a number of threads similar to the number of CPU cores (I've also heard "2x number of CPUs + 1", on the grounds that that ensures there will always be a thread available whenever a CPU is idle).
With actors I wouldn't worry about having too many. They're very lightweight.
If you have really no expierience in Akka try to start with something simple like doing a one-to-one actor-thread rewriting of your code. This will be easier to grasp how things work in akka.
Spin two actors at the begining one for receiving requests and one for writting to the output file. Then when request is received create an actor in request-receiver actor that will do the computation and send the result to the writting actor.

Is Scala's actors similar to Go's coroutines?

If I wanted to port a Go library that uses Goroutines, would Scala be a good choice because its inbox/akka framework is similar in nature to coroutines?
Nope, they're not. Goroutines are based on the theory of Communicating Sequential Processes, as specified by Tony Hoare in 1978. The idea is that there can be two processes or threads that act independently of one another but share a "channel," which one process/thread puts data into and the other process/thread consumes. The most prominent implementations you'll find are Go's channels and Clojure's core.async, but at this time they are limited to the current runtime and cannot be distributed, even between two runtimes on the same physical box.
CSP evolved to include a static, formal process algebra for proving the existence of deadlocks in code. This is a really nice feature, but neither Goroutines nor core.async currently support it. If and when they do, it will be extremely nice to know before running your code whether or not a deadlock is possible. However, CSP does not support fault tolerance in a meaningful way, so you as the developer have to figure out how to handle failure that can occur on both sides of channels, and such logic ends up getting strewn about all over the application.
Actors, as specified by Carl Hewitt in 1973, involve entities that have their own mailbox. They are asynchronous by nature, and have location transparency that spans runtimes and machines - if you have a reference (Akka) or PID (Erlang) of an actor, you can message it. This is also where some people find fault in Actor-based implementations, in that you have to have a reference to the other actor in order to send it a message, thus coupling the sender and receiver directly. In the CSP model, the channel is shared, and can be shared by multiple producers and consumers. In my experience, this has not been much of an issue. I like the idea of proxy references that mean my code is not littered with implementation details of how to send the message - I just send one, and wherever the actor is located, it receives it. If that node goes down and the actor is reincarnated elsewhere, it's theoretically transparent to me.
Actors have another very nice feature - fault tolerance. By organizing actors into a supervision hierarchy per the OTP specification devised in Erlang, you can build a domain of failure into your application. Just like value classes/DTOs/whatever you want to call them, you can model failure, how it should be handled and at what level of the hierarchy. This is very powerful, as you have very little failure handling capabilities inside of CSP.
Actors are also a concurrency paradigm, where the actor can have mutable state inside of it and a guarantee of no multithreaded access to the state, unless the developer building an actor-based system accidentally introduces it, for example by registering the Actor as a listener for a callback, or going asynchronous inside the actor via Futures.
Shameless plug - I'm writing a new book with the head of the Akka team, Roland Kuhn, called Reactive Design Patterns where we discuss all of this and more. Green threads, CSP, event loops, Iteratees, Reactive Extensions, Actors, Futures/Promises, etc. Expect to see a MEAP on Manning by early next month.
There are two questions here:
Is Scala a good choice to port goroutines?
This is an easy question, since Scala is a general purpose language, which is no worse or better than many others you can choose to "port goroutines".
There are of course many opinions on why Scala is better or worse as a language (e.g. here is mine), but these are just opinions, and don't let them stop you.
Since Scala is general purpose, it "pretty much" comes down to: everything you can do in language X, you can do in Scala. If it sounds too broad.. how about continuations in Java :)
Are Scala actors similar to goroutines?
The only similarity (aside the nitpicking) is they both have to do with concurrency and message passing. But that is where the similarity ends.
Since Jamie's answer gave a good overview of Scala actors, I'll focus more on Goroutines/core.async, but with some actor model intro.
Actors help things to be "worry free distributed"
Where a "worry free" piece is usually associated with terms such as: fault tolerance, resiliency, availability, etc..
Without going into grave details how actors work, in two simple terms actors have to do with:
Locality: each actor has an address/reference that other actors can use to send messages to
Behavior: a function that gets applied/called when the message arrives to an actor
Think "talking processes" where each process has a reference and a function that gets called when a message arrives.
There is much more to it of course (e.g. check out Erlang OTP, or akka docs), but the above two is a good start.
Where it gets interesting with actors is.. implementation. Two big ones, at the moment, are Erlang OTP and Scala AKKA. While they both aim to solve the same thing, there are some differences. Let's look at a couple:
I intentionally do not use lingo such as "referential transparency", "idempotence", etc.. they do no good besides causing confusion, so let's just talk about immutability [a can't change that concept]. Erlang as a language is opinionated, and it leans towards strong immutability, while in Scala it is too easy to make actors that change/mutate their state when a message is received. It is not recommended, but mutability in Scala is right there in front of you, and people do use it.
Another interesting point that Joe Armstrong talks about is the fact that Scala/AKKA is limited by the JVM which just wasn't really designed with "being distributed" in mind, while Erlang VM was. It has to do with many things such as: process isolation, per process vs. the whole VM garbage collection, class loading, process scheduling and others.
The point of the above is not to say that one is better than the other, but it's to show that purity of the actor model as a concept depends on its implementation.
Now to goroutines..
Goroutines help to reason about concurrency sequentially
As other answers already mentioned, goroutines take roots in Communicating Sequential Processes, which is a "formal language for describing patterns of interaction in concurrent systems", which by definition can mean pretty much anything :)
I am going to give examples based on core.async, since I know internals of it better than Goroutines. But core.async was built after the Goroutines/CSP model, so there should not be too many differences conceptually.
The main concurrency primitive in core.async/Goroutine is a channel. Think about a channel as a "queue on rocks". This channel is used to "pass" messages. Any process that would like to "participate in a game" creates or gets a reference to a channel and puts/takes (e.g. sends/receives) messages to/from it.
Free 24 hour Parking
Most of work that is done on channels usually happens inside a "Goroutine" or "go block", which "takes its body and examines it for any channel operations. It will turn the body into a state machine. Upon reaching any blocking operation, the state machine will be 'parked' and the actual thread of control will be released. This approach is similar to that used in C# async. When the blocking operation completes, the code will be resumed (on a thread-pool thread, or the sole thread in a JS VM)" (source).
It is a lot easier to convey with a visual. Here is what a blocking IO execution looks like:
You can see that threads mostly spend time waiting for work. Here is the same work but done via "Goroutine"/"go block" approach:
Here 2 threads did all the work, that 4 threads did in a blocking approach, while taking the same amount of time.
The kicker in above description is: "threads are parked" when they have no work, which means, their state gets "offloaded" to a state machine, and the actual live JVM thread is free to do other work (source for a great visual)
note: in core.async, channel can be used outside of "go block"s, which will be backed by a JVM thread without parking ability: e.g. if it blocks, it blocks the real thread.
Power of a Go Channel
Another huge thing in "Goroutines"/"go blocks" is operations that can be performed on a channel. For example, a timeout channel can be created, which will close in X milliseconds. Or select/alt! function that, when used in conjunction with many channels, works like a "are you ready" polling mechanism across different channels. Think about it as a socket selector in non blocking IO. Here is an example of using timeout channel and alt! together:
(defn race [q]
(searching [:.yahoo :.google :.bing])
(let [t (timeout timeout-ms)
start (now)]
(go
(alt!
(GET (str "/yahoo?q=" q)) ([v] (winner :.yahoo v (took start)))
(GET (str "/bing?q=" q)) ([v] (winner :.bing v (took start)))
(GET (str "/google?q=" q)) ([v] (winner :.google v (took start)))
t ([v] (show-timeout timeout-ms))))))
This code snippet is taken from wracer, where it sends the same request to all three: Yahoo, Bing and Google, and returns a result from the fastest one, or times out (returns a timeout message) if none returned within a given time. Clojure may not be your first language, but you can't disagree on how sequential this implementation of concurrency looks and feels.
You can also merge/fan-in/fan-out data from/to many channels, map/reduce/filter/... channels data and more. Channels are also first class citizens: you can pass a channel to a channel..
Go UI Go!
Since core.async "go blocks" has this ability to "park" execution state, and have a very sequential "look and feel" when dealing with concurrency, how about JavaScript? There is no concurrency in JavaScript, since there is only one thread, right? And the way concurrency is mimicked is via 1024 callbacks.
But it does not have to be this way. The above example from wracer is in fact written in ClojureScript that compiles down to JavaScript. Yes, it will work on the server with many threads and/or in a browser: the code can stay the same.
Goroutines vs. core.async
Again, a couple of implementation differences [there are more] to underline the fact that theoretical concept is not exactly one to one in practice:
In Go, a channel is typed, in core.async it is not: e.g. in core.async you can put messages of any type on the same channel.
In Go, you can put mutable things on a channel. It is not recommended, but you can. In core.async, by Clojure design, all data structures are immutable, hence data inside channels feels a lot safer for its wellbeing.
So what's the verdict?
I hope the above shed some light on differences between the actor model and CSP.
Not to cause a flame war, but to give you yet another perspective of let's say Rich Hickey:
"I remain unenthusiastic about actors. They still couple the producer with the consumer. Yes, one can emulate or implement certain kinds of queues with actors (and, notably, people often do), but since any actor mechanism already incorporates a queue, it seems evident that queues are more primitive. It should be noted that Clojure's mechanisms for concurrent use of state remain viable, and channels are oriented towards the flow aspects of a system."(source)
However, in practice, Whatsapp is based on Erlang OTP, and it seemed to sell pretty well.
Another interesting quote is from Rob Pike:
"Buffered sends are not confirmed to the sender and can take arbitrarily long. Buffered channels and goroutines are very close to the actor model.
The real difference between the actor model and Go is that channels are first-class citizens. Also important: they are indirect, like file descriptors rather than file names, permitting styles of concurrency that are not as easily expressed in the actor model. There are also cases in which the reverse is true; I am not making a value judgement. In theory the models are equivalent."(source)
Moving some of my comments to an answer. It was getting too long :D (Not to take away from jamie and tolitius's posts; they're both very useful answers.)
It isn't quite true that you could do the exact same things that you do with goroutines in Akka. Go channels are often used as synchronization points. You cannot reproduce that directly in Akka. In Akka, post-sync processing has to be moved into a separate handler ("strewn" in jamie's words :D). I'd say the design patterns are different. You can kick off a goroutine with a chan, do some stuff, and then <- to wait for it to finish before moving on. Akka has a less-powerful form of this with ask, but ask isn't really the Akka way IMO.
Chans are also typed, while mailboxes are not. That's a big deal IMO, and it's pretty shocking for a Scala-based system. I understand that become is hard to implement with typed messages, but maybe that indicates that become isn't very Scala-like. I could say that about Akka generally. It often feels like its own thing that happens to run on Scala. Goroutines are a key reason Go exists.
Don't get me wrong; I like the actor model a lot, and I generally like Akka and find it pleasant to work in. I also generally like Go (I find Scala beautiful, while I find Go merely useful; but it is quite useful).
But fault tolerance is really the point of Akka IMO. You happen to get concurrency with that. Concurrency is the heart of goroutines. Fault-tolerance is a separate thing in Go, delegated to defer and recover, which can be used to implement quite a bit of fault tolerance. Akka's fault tolerance is more formal and feature-rich, but it can also be a bit more complicated.
All said, despite having some passing similarities, Akka is not a superset of Go, and they have significant divergence in features. Akka and Go are quite different in how they encourage you to approach problems, and things that are easy in one, are awkward, impractical, or at least non-idiomatic in the other. And that's the key differentiators in any system.
So bringing it back to your actual question: I would strongly recommend rethinking the Go interface before bringing it to Scala or Akka (which are also quite different things IMO). Make sure you're doing it the way your target environment means to do things. A straight port of a complicated Go library is likely to not fit in well with either environment.
These are all great and thorough answers. But for a simple way to look at it, here is my view. Goroutines are a simple abstraction of Actors. Actors are just a more specific use-case of Goroutines.
You could implement Actors using Goroutines by creating the Goroutine aside a Channel. By deciding that the channel is 'owned' by that Goroutine you're saying that only that Goroutine will consume from it. Your Goroutine simply runs an inbox-message-matching loop on that Channel. You can then simply pass the Channel around as the 'address' of your "Actor" (Goroutine).
But as Goroutines are an abstraction, a more general design than actors, Goroutines can be used for far more tasks and designs than Actors.
A trade-off though, is that since Actors are a more specific case, implementations of actors like Erlang can optimize them better (rail recursion on the inbox loop) and can provide other built-in features more easily (multi process and machine actors).
can we say that in Actor Model, the addressable entity is the Actor, the recipient of message. whereas in Go channels, the addressable entity is the channel, the pipe in which message flows.
in Go channel, you send message to the channel, and any number of recipients can be listening, and one of them will receive the message.
in Actor only one actor to whose actor-ref you send the message, will receive the message.

Can a server handle multiple sockets in a single thread?

I'm writing a test program that needs to emulate several connections between virtual machines, and it seems like the best way to do that is to use Unix domain sockets, for various reasons. It doesn't really matter whether I use SOCK_STREAM or SOCK_DGRAM, but it seems like SOCK_STREAM is easier/simpler for my usage.
My problem seems to be a little backwards from the typical scenario. I want to have a single client communicating with the server over 4 distinct sockets. (I could have 4 clients with one socket each, but that distinction shouldn't matter.) Now, the thing I'm emulating doesn't have multiple threads and gets an interrupt whenever a data packet is received over one of the "sockets". Is there some easy way to emulate this with Unix sockets?
I believe that I have to do the socket(), bind(), and listen() for all 4 sockets first, then do an accept() for all 4, and do fcntl( fd, F_SETFF, FNDELAY ) for each one to make them nonblocking, so that I can check each one for data with read() in a round-robin fashion. Is there any way to make it interrupt-driven or event-driven, so that my main loop only checks for data in the socket if there's data there? Or is it better to poll them all like this?
Yes. Handling multiple connections is almost synonymous with "server", and they are often single threaded -- but please not this way:
check each one for data with read() in a round-robin fashion
That would require, as you mention, non-blocking sockets and some kind of delay to prevent your "round-robin" from becoming a system killing busy loop.
A major problem with that is the granularity of the delay. You can't make it too small, or the loop will still hog too much CPU time when nothing is happening. But what about when something is happening, and that something is data incoming simultaneously on multiple connections? Now your delay can produce a snowballing backlog of tish leading to refused connections, etc.
It just is not feasible, and no one writes a server that way, although I am sure anyone would give it serious thought if they were unaware of the library functions intended to tackle the problem. Note that networking is a platform specific issue, so these are not actually part of the C standard (which does not deal with sockets at all).
The functions are select(), poll(), and epoll(); the last one is linux specific and the other two are POSIX. The basic idea is that the call blocks, waiting until one or more of any number of active connections is ready to read or write. Waiting for a socket to be ready to write only meaningfully applies to NON_BLOCK sockets. You don't have to use NON_BLOCK, however, and the select() call blocks regardless. Using NON_BLOCK on the individual sockets makes the implementation more complex, but increases performance potential in a single threaded server -- this is the idea behind asynchronous servers (such as nginx), a paradigm which contrasts with the more traditional threaded synchronous model.
However, I would recommend that you not use NON_BLOCK initially because of the added complexity. When/if it ends up being called for, you'll know. You still do not need threads.
There are many, many, many examples and tutorials around about how to use select() in particular.

What's the best way to handle incoming messages?

I'm writing a server for an online game, that should be able to handle 1,000-2,000 clients in the end. The 3 ways I found to do this were basically:
1 thread/connection (blocking)
Making a list of clients, and loop through them (non-blocking)
Select (Basically a blocking statement for all clients at once with optional timeout?)
In the past I've used 1, but as we all know, it doesn't scale well. 2 is okay, but I have mixed feelings, about one client technically being able to make everyone else freeze. 3 sounds interesting (a bit better than 2), but I've heard it's not suitable for too many connections.
So, what would be the best way to do it (in D)? Are there other options?
The usual approach is closest to 3: asynchronous programming with a higher-performance select alternative, such as the poll or epoll system calls on Linux, IOCP on Windows, or higher-level libraries wrapping them. D does not support them directly, but you can find D bindings or 3rd-party D libraries (e.g. Tango) providing support for them.
Higher-performance servers (e.g. nginx) use one thread/process per CPU core, and use asynchronous event handling within that thread/process.
One option to consider is to have a single thread that runs the select/pole/epoll but not process the results. Rather it queues up connections known to have results and lets a thread pool feed from that. If checking that a full request has been read in is cheap, you might do that in the poll thread with non-blocking IO and only queue up full requests.
I don't know if D provides any support for any of that aside from (possibly) the inter-thread communication and queuing.