Which solutions for background job processing exist for Common Lisp? - lisp

I need a solution for background job processing. A task queue, where workers can be remote processes on different machines.
I've searched over internet, but found only Psychiq which is in alfa and not recommended for production.
I don't belive that for the such mature language as Common Lisp there isn't other solutions.
Where they are?
Update:
Possible solutions:
lfarm (suggested by #coredump).
Gearman with client/worker using cl-gearman (found by myself in yet another google session).

I am not sure if this is exactly what you are after, but LFARM might be a good candidate:
lfarm is a distributed version of lparallel which replaces worker threads with remote processes. For example lfarm:pmap will subdivide the input sequence(s), send the parts to remote machines for mapping, and then combine the results. Likewise lfarm:future wraps remote task execution in the metaphor of promises. Most of the lparallel kernel API is retained with minor variations.
The github repository has some examples.
See also Erlangen for a distributed Erlang-like approach based on native threads.
Erlangen brings distributed, asynchronous message passing to Clozure Common Lisp. It orchestrates Clozure CL processes (native threads) using message passing, and encourages fault tolerant software architectures using supervison trees. It is also transparently distributed, all its features work seamlessly across IP networks. Thus, it can be used to build applications across multiple Clozure CL instances on different hosts. Erlangen borrows many ideas from Erlang/OTP, hence the name. (Its a town!)

Related

ZeroMQ Choosing Correct Client-Worker Model for a Call Center

I have a project that needs to be written in Perl so I've chosen ZeroMQ.
There is a single client program, generating work for a variable number of workers. The workers are real human operators who will complete a task then request a new task. The job of the client program is keep all available workers busy all day. It's a call center.
So each worker can only process one task at time, and there may be some time before requesting a new task. And the number of workers may vary during the day.
The client needs to keep a queue of tasks ready to give to workers as and when they request them. Whenever the client queue gets low the client can generate more tasks to top-up the queue.
What design pattern (i.e. what ZeroMQ Socket combination) should I use for this? I've skimmed through all the patterns in the 0MQ Guide and can't find anything that matches this.
Thanks
Sure. ... there is not a single, solo Archetype to match the Requirement List use several ZeroMQ Scalable Formal Communication Patterns
Typical software Project uses many ZeroMQ sockets ( with various Archetypes ) as a certain form of node-node signalisation and message-passing platform.
It is fair to note, that automated Load-Balancers may work fine for automated processes, but not always so for processes, executed by Humans or interacting with Humans.
Humans ( both the Call centre Agents and their Line-Supervisors ) introduce another layer of requirements - sometimes with a need to introduce non-just-Round-Robin workload distribution logic, sometimes need to switch a call from Agent A to another Agent B ( which a trivial archetype will simply not be capable of and might get into troubles, if it's hardwired-logic runs into a collision ( mutually blocked REQ-REP stale-mate being one such example ).
So simply forget to wait for one super-powered archetype, but rather create a smart network of behaviours, that will cover your distributed-computing problem desired event-handling.
There are many other aspects, one ought learn before taking the first ZeroMQ socket into service.
failure resillience
performance scaling
latency-profiling ( high-priority voice-traffic, vs. low-priority logging )
watchdog acknowledgements and timeout situations handling
cross-compatibility issues ( version 2.1x vs 3.x vs 4.+ API )
processing robustness against a malfunctioning agent / malicious attack / deadly spurious traffic storms ... to name just a few of problems
all of which has some built-ins in the ZeroMQ toolbox, some of which may need some advanced thinking, so as to handle known constraints.
The Best Next Step?
A would advocate for a fabulous Pieter HINTJENS' book "Code Connected, Volume 1" -- for everyone, who is serious into distributed processing, this is a must-read -- do not hesitate to check other my posts to find a direct URL to a PDF-version of this ZeroMQ Bible.
Worth time and one's tears and sweat.

Is Communicating Sequential Processes [CSP] an alternative to the actor model in Scala?

In a 1978 Paper by Hoare we have an idea called Communicating Sequential Processes. This is used by Go, Occam, and in Clojure in core.async.
Is it possible to use CSP as an alternative to the Actor Model in Scala? (I'm seeing JCSP but I'm wondering if this is the only option, if it is mature, and if anyone uses it).
EDIT - I'm also seeing Communicating Scala Objects as an alternative to JCSP in Scala. But those of these seem to be tied to real threads - which seems to miss one of the benefits of CSP, being to get away from the memory resource cost of keeping large numbers of threads always active.
You should consult this document, but in general there are a few differences:
Channels are anonymous while actors have identities
In CSP, you use channels to transmit messages, but actors can directly contact each other.
In CSP communication is done in the form of rendezvous (i.e., it is synchronous). Actors support asynchronous message passing.
And yes, it is possible to use CSP as an alternative to the Actor model if these differences are acceptable in your position. I don't have any experience with JCSP but I wouldn't recommend using that specific library (the reason is as I see there aren't any activity in the project since 2011).

Event or polled based embedded MCU system architecture?

I have prior experience in writing both event and poll based embedded systems (for tiny MCU's with no preemptive OS).
In an event based system, tasks usually receives events (messages) on a queue and handles them in turn.
In a polled based system, tasks polls status with a certain interval and responds to change.
Which architecture do you prefer? Can both co-exist?
UPDATE: POINTS MADE
POLL BASED
- Tight coupling related to timing aspects (#Lundin)
* Can co-exist alongside event system using queues (#embedded.kyle)
* Fine for smaller programs (#Lundin)
EVENT BASED
+ More flexible system in the long run (#embedded.kyle)
- RTOS edition adds complexity (#Lundin)
* Small programs = state-machine controlled (#Lundin)
* Can be implemented using queues and a "super-loop" (inside controller/main) (#embedded.kyle)
* Only true "events" are hw interrupts ones (#Lundin)
RELATED QUESTIONS
* Looking for a comparison of different scheduling algorithms for a Finite State Machine (#embedded.kyle)
RELATED INFO
* "Prefer Using Active Objects Instead of Naked Threads" (#Miro)
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095
* "Use Threads Correctly = Isolation + Asynchronous Messages" (#Miro)
http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465
There is really no such thing as "event-driven" on a bare bone MCU platform, despite what the buzzword-spitters are trying to tell you. The only kind of true events you can receive are hardware interrupts.
Depending on the nature of the application and its real time requirements, interrupts may or may not be suitable. Generally, it is far easier to achieve deterministic real time with a polling system. However, systems relying solely on polling are very hard to maintain, because you get tight coupling between all timing aspects.
Suppose you try to start up a LCD, which is slow. Instead of polling some timer repeatedly while burning CPU cycles in an empty loop, you would perhaps decide to receive some data over a bus in the meantime. And then you want to print the data received on the LCD. Such a design has created a tight coupling between the LCD startup time and the serial bus, and another tight coupling between the serial bus and the printing of data. From an object-oriented point-of-view these things are not related to each other at all. If you were to speed up the serial bus at some point in the future, then suddenly you could encounter LCD printing bugs, because it has not finished starting up when you try to print on it.
In a small program, it is perfectly fine to use polling like in the above example. But if the program has potential of growing, polling will make it very complex and the tight coupling will ultimately lead to many strange and fatal bugs.
On the other hand, multi-threading and RTOS adds quite a lot of extra complexity which in turn can lead to bugs as well. Where to draw the line isn't simple to determine.
Out of personal experience I'd say that any program smaller than 20-30k LOC will not benefit from scheduling and multitasking, beyond simple state machines. If the program gets larger than that, I'd consider a multitasking RTOS.
Also, low-end MCUs (8- and 16-bitters) are far from suitable to run an OS. If you find that you need an OS to handle complexity on a 8- or 16-bit platform, you probably picked the wrong MCU to begin with. I'd be sceptical against any attempts to introduce an OS on anything smaller than a 32-bitter.
Actually, event-driven programming and threads can be combined and the resulting pattern is widely known as "active objects" or "actors".
Active objects (actors) are encapsulated, event-driven state machines, which communicate with one another asynchronously by posting events to each other. Active objects process all events in their own thread of execution (at least conceptually, if a cooperative scheduler is used), so they avoid by design most concurrency hazards.
Actors and active objects are all the rage (again) in the general-purpose computing (you can search for Erlang, Scala, Akka). Herb Sutter has written a couple of good articles that explain the "active object" pattern: "Prefer Using Active Objects Instead of Naked Threads" (http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095) and "Use Threads Correctly = Isolation + Asynchronous Messages" (http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465)
Here is what Herb says in the first of these articles:
"Using raw threads directly is trouble for a number of reasons ...
Active objects dramatically improve our ability to reason about our thread's code and operation by giving us higher-level abstractions and idioms that raise the semantic level of our program and let us express our intent more directly. As with all good patterns, we also get better vocabulary to talk about our design. Note that active objects aren't a novelty: UML and various libraries have provided support for active classes"
So, all this is really not new. But what's perhaps less known, especially in the embedded systems community, is that active objects are not only fully applicable to the embedded systems, but they are actually a perfect match for embedded and they are lighter than a traditional RTOS.
I've been using the event-driven active objects for over a decade now and have created the QP family of active object frameworks for embedded systems (see http://www.state-machine.com/). I would never go back to the polling "superloop" or the raw RTOS.
I prefer whichever architecture is best suited to the application at hand.
Both can co-exist in a multilevel queue architecture. One queue works on a poll basis running in the main loop. While another, most likely tasked with higher priority events, works by using interrupt based preemption.
See my answer to this SO question for a more detailed explanation and comparison of the different scheduling algorithms.

Windows message pump

This is just a technical question to improve my understanding of OS architecture.
I understand when the Application.Run() method is executed, a new form with its message pump is created. From MSDN and other online articles, I understand its thread safe nature and even understand that the Windows OS components like HAL layer, core OS services and applications on the top of the hierarchy all communicate between one another using messaging too.
Is this custom only to Windows or does this happen in the Linux environment too?
Can this be thought of as a semaphore? Or does the definition and context of a semaphore only make sense in a multi-threaded environment?
Please advice.
Thanks,
Subbu
There are many ways how processes can communicate, together called IPC - inter-process communication. From historical reasons, in UNIX-like systems use other mechanisms for communicating between processes than the message loop. UNIX processes are usually communicating through pipes (one can think about them as temporary files which can be only written in one process and read in another one), signals (code preempting the actual execution of some process) or process return values (similar to function returning). There are many other ways how to communicate (sockets, shared memory, files) but these are the most usual.
As for the semaphores: I am not sure how should these be related to message passing, semaphores objects designed for allowing programmers to create critical sections of code. Because in UNIX can be semaphore shared even between different processes (not only different threads in one process), they make sense in any multi-process OS (which is almost every today's OS), even with no threading support.
Well, semaphores can be used even with fibrils - userspace threads which are not preempted by exhausting their time quantum, as threads do, but which yield control to another fibril manually (for example when the fibril is about to begin a long blocking operation such as reading data from harddisk, it may request the data and instead of blocking switch to another fibril which wants CPU).
Unix systems have the message queues:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg);
ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg);
which are much less used than Windows messages but operate in a very similar fashion. Also a very similar concept, the Go language nicely implements the CSV (communicating sequential processes), which is an excellent multitasking paradigm, because does not suffer from exponential complexity growth. I would recommend Unix system programmers to use message queues more.
Windows messages are also somewhat similar to Unix signals, but Unix signals (usually) don't have arguments, are very limited in number (often only 32, compared to thousands of Windows messages) and the signal handlers have to execute in a weird suspended environment, which makes them much less practical. Nonetheless, signals are much more popular in Unix programming than message queues.
Regarding semaphores
Rather than using semaphores (which have an attached counter), you should first try to use mutexes, which are more lightweight and usable for synchronizing threads inside the same process.

Perl - Can you run threads across multiple machines?

I was wondering if it was possible to run threads in Perl across multiple machines. I'm working in a clustered environment and need to run some of my process in parallel, but am unable to use MPI.
If threading is not able to be used across machines, are there any other alternatives I should look at that will allow me to do something similar and not require special modules?
Threads (and forks) in Perl are tied to the same computer as the parent thread / process, so no cross-computer threading / forking. That said, you can use AnyEvent::MP / Coro::MP modules, message-passing extensions to the AnyEvent asynchronous event-loop framework and the Coro co-routine, cooperative threading framework respectively, which let you create a network of nodes doing different tasks on one or multiple machines. See AnyEvent::MP::Intro for details.
As for alternatives not requiring special modules (by which, I guess, you mean modules not in the perl distribution), you could conceivably write a daemon for your tasks and have them communicate over TCP or UDP. Anything beyond that would probably require at least a few modules not installed with Perl, but available from the CPAN.
Have a look at Gearman, a multi-machine job manager queue. It does require special modules; I answered here "just in case" you can in fact use additional modules/infrastructure.
There are Perl bindings, Gearman::XS, which I successfully use in projects where I want specific tasks to be done in an environment where either requesters or workers processes may reside on multiple machines. Works well also for multiple worker processes on one machine and one requester (example: a certain web scraper which requests all links from a page parsed by any worker, but wants to keep control of the results).
The way it works is that you create a "worker" Perl program which has a number of subroutines which perform the action you'd like to perform in a distributed fashion. You launch those worker programs on whichever machines you want and as many times as you want, and have them connect to one (or multiple) master gearman "managers".
You then create a requester (gearman client) Perl program which will perform the requests. This can also run on any machine, and will contact the master gearman manager to request a number of the workers' specific actions completed. Any worker does it, and your requester gets the result back.
If your requesters don't need a result back but "just" need a task to happen, have instead a look at TheSchwartz which has a similar architecture but does not provide a facility for getting messages from the workers back to the requesters, IIRC.
Check GRID::Machine.
I stumbled upon GNU parallel a week or two ago, while not across separate machines it helps reduce time by allowing regular programs to take advantage of multiple cores. May help speed up whatever you're doing.