Given the server-client model, would the OS initiate messages to applications, or is message passing always initiated by programs that want to use resources and thus must communicate with the OS?
OS is an overloaded term, and application is a vague term.
A pure message passing OS might implement traditional (unix) system calls in applications. For example, you might have an application called FileSystem, which accepts messages like Read,Write,Open,Close.... In these, such an application would be considered a server, and the client would be an application which wanted to use the File Services.
Pure message passing systems typically have difficulty with asynchronous events. When you look at implementing a normal read system call in a message passing system, it is natural that it will be an RPC: the client sends a read request, then suspends until the server has satisfied the read and sent a reply.
When the client wants asynchronous notification, such as send me a message when there is new mouse events available; the RPC somewhat falls down. While purely asynchronous systems exist, they are cumbersome to use with plain old programming languages like C, C++, ... There is hope that message based languages like Golang can break the impass, but that is yet to be seen.
Higher level OS-like services may deploy a number of interaction methods, quite distinct from client serve. Publish-Subscribe, a more recent reimplmentation of the 1980s multi-catch, has been popular in the last decade. Clients subscribe to a set of channels that they are interested in, and every event delivered to that channel is copied to every client subscribed to the channel before it is retired. Normal clients can generate events as well, so the mechanism serves as a dynamic interconnect between modules.
Dbus + zeromq are P-S systems of differing scales. Note that both can be implemented outside of a message passing OS.
Related
What is the actual difference between Socket and RPC (Remote Procedure Call)?
As per my understanding both's working is based on Client–server model. Also which one should be used in which conditions?
PS: Confusion arise while reading Operating System Concepts by Galvin
Short answer:
RPC is the protocol. The socket provides access to the transport to implement that protocol.
RPC is the service and protocol offered by the operating system to allow code to be triggered for running by a remote application. It has a defined protocol by which procedures or objects can be accessed by another device over a network. An implementation of RPC can be done over basically any network transport (e.g. TCP, UDP, cups with strings).
The socket is just a programming abstraction such that the application can send and receive data with another device through a particular network transport. You implement protocols (such as RPC) on top of a transport (such as TCP) with a socket.
It is operating system specific. So read first a good OS book like Operating Systems: Three Easy Pieces (freely downloadable).
Network sockets are a way to do some inter-process communication (notably between different machines). Read also about Berkeley sockets API, e.g. socket(7) on Linux.
Remote procedure calls are a programming technique (often using socket(2) system call on Linux). Every RPC request expects exactly one reply and is software initiated.
Sockets are often also used for asynchronous messages (for example, the X11 protocols stack, WebSockets, SMTP). Message passing is a programming paradigm (more general than RPC), they are sent often without expecting any reply. For example, the X11 server would send a keyboard event message for every key press, etc.
(so in some ways, you are comparing apples and oranges)
If on Linux, I recommend reading Advanced Linux Programming (freely downloadable), and reading more about syscalls(2) (notably poll(2) for multiplexing)
PS: Confusion arise while reading Operating System Concepts by Galvin
That's your problem right there.
A remote procedure call (RPC) is high level model for network communication. There are numerous RPC protocols in existence. In the RPC model, your underlying implementation creates a stub for each remote procedure. When your application calls the "remote procedure" the stub packs up the parameters, sends them over the network, invokes, the remote version of the procedure, takes the return values and send them back over the network to the caller, the stub unpacks the return values and your application then receives them.
The RPC model became hip in the late 1980's. The idea was that it would be transparent where your functions actually executed (in your process, in another process, on another computer). This concept expanded into distributed objects around the early 1990's (e.g., DCOM, CORBA).
Unfortunately, in the real world applications really needed to know if a procedure was executing remotely because of delay and error handling.
Somewhere in the the RPC implementation a network interface gets called.
Sockets are such a network interface. They are not the only programming interface but they are the most common on Unix systems.
Thus, an RPC MIGHT be implemented using a socket.
Is there a conventional way to write a program such that commands can be issued to the program from the command line without a repl? For example, how you can send commands to a running nginx server using sudo /etc/init.d/nginx restart (or any other valid command besides restart)
One idea I had was having the long-running program create and monitor a unix socket that other programs can write to to send it commands. Another was to create a local server with a REST interface that can be sent commands that way, though that seems a bit gross.
What's the right way to do this?
Both ways are ok, and you could even consider using some RPC machinery, such as making your application serve JSONRPC on some unix(7) socket. Or use a fifo(7). Or use D-Bus.
A common habit on Unix is to have applications reload their configuration files on e.g. SIGHUP signal, and save some persistent state (before terminating) on SIGTERM. Read signal(7) (notice that only async-signal-safe routines can be called fro signal handlers; a good way is to only set some volatile sig_atomic_t variable inside the handler and test it outside). See also POSIX signal.h documentation.
You might make your application become a specialized HTTP server (e.g. using some HTTP server library like libonion) and give it some Web interface (or REST, or SOAP ...); the user (or sysadmin) will then use his browser to interact with your application.
You could make your server systemd compatible. (I don't know exactly what that requires, it is perhaps D-bus related).
You could embed some command interpreter (like Guile and Lua) in your app and have some limited kind of REPL loop running on some IPC like a socket or a fifo. Beware of nasty code injection.
I had a similar issue where I have a plethora of services running on any number of machines and each is in need of communicating with several others.
My main problem was not so much the communication between the services. That can be done with a simple message sent over a connection (as Basile mentioned, it can be TCP, UDP, Unix sockets, FIFOs...). However, when you have over 20 services, many of which need to communicate with several other services, you start having a headache on how to get all the connections right (I have such a system, although it has a relatively limited number of services, like just 10 and that's already very complicated).
So I created a process (yet another service) called Communicator. All services connect to the Communicator service and when they need to send a message, they include the name of the service they want to reach. The Communicator service is in charge of sending the message to the right place—i.e. it could be to another Communicator service running on a different computer. Communicator has a graph of all the services available on your network and knows how to send messages to them without your service having to know anything about all of that. Computing a graph can be really complex.
For the purpose, I created the eventdispatcher project. It is in C++, which may not be what you're interested in, although you could use it in other languages that interface with C/C++. The structure of the messages are "proprietary" (specific to the Communicator), but you can create any message you want. A message includes a name and parameters (param-name=value). The first version has a simple one line text communication system. The newer version accepts JSON as well (still must be one line of text per message).
The system supports TCP, UDP, Unix sockets, FIFO, and between threads, you can have thread safe fifos. It also understand signals (like SIGHUP, SIGTERM, etc.) It has a specific connection to listen for the death of a thread. It supports encryption over TCP via OpenSSL. The messages can automatically be dispatched (hence the current name of the library). Connections are assigned a timer. And there are CUI and GUI (Qt) extensions as well.
The one main point here is that all your connections can be polled (see poll()) and thus you can implement a system that reacts to events instead of a system which sleeps and checks for events, sleeps and check, etc. or worth, you have a single blocking connection and everything has to happen on that one connection or your service gets stuck. This is one reason Unix has been using signals since early version of Unix did not have select() nor poll().
I understand the main principles behind both. I have however a thought which I can't answer.
Benchmarks show that WebSockets can serve more messages as this website shows: http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/
This makes sense as it states the connections do not have to be closed and reopened, also the http headers etc.
My question is, what if the connections are always from different clients all the time (and perhaps maybe some from the same client). The benchmark suggests it's the same clients connecting from what I understand, which would make sense keeping a constant connection.
If a user only does a request every minute or so, would it not be beneficial for the communication to run over REST instead of WebSockets as the server frees up sockets and can handle a larger crowd as to speak?
To fix the issue of REST you would go by vertical scaling, and WebSockets would be horizontal?
Doe this make sense or am I out of it?
This is my experience so far, I am happy to discuss my conclusions about using WebSockets in big applications approached with CQRS:
Real Time Apps
Are you creating a financial application, game, chat or whatever kind of application that needs low latency, frequent, bidirectional communication? Go with WebSockets:
Well supported.
Standard.
You can use either publisher/subscriber model or request/response model (by creating a correlationId with each request and subscribing once to it).
Small size apps
Do you need push communication and/or pub/sub in your client and your application is not too big? Go with WebSockets. Probably there is no point in complicating things further.
Regular Apps with some degree of high load expected
If you do not need to send commands very fast, and you expect to do far more reads than writes, you should expose a REST API to perform CRUD (create, read, update, delete), specially C_UD.
Not all devices prefer WebSockets. For example, mobile devices may prefer to use REST, since maintaining a WebSocket connection may prevent the device from saving battery.
You expect an outcome, even if it is a time out. Even when you can do request/response in WebSockets using a correlationId, still the response is not guaranteed. When you send a command to the system, you need to know if the system has accepted it. Yes you can implement your own logic and achieve the same effect, but what I mean, is that an HTTP request has the semantics you need to send a command.
Does your application send commands very often? You should strive for chunky communication rather than chatty, so you should probably batch those change request.
You should then expose a WebSocket endpoint to subscribe to specific topics, and to perform low latency query-response, like filling autocomplete boxes, checking for unique items (eg: usernames) or any kind of search in your read model. Also to get notification on when a change request (write) was actually processed and completed.
What I am doing in a pet project, is to place the WebSocket endpoint in the read model, then on connection the server gives a connectionID to the client via WebSocket. When the client performs an operation via REST, includes an optional parameter that indicates "when done, notify me through this connectionID". The REST server returns saying if the command was sent correctly to a service bus. A queue consumer processes the command, and when done (well or wrong), if the command had notification request, another message is placed in a "web notification queue" indicating the outcome of the command and the connectionID to be notified. The read model is subscribed to this queue, gets messessages and forward them to the appropriate WebSocket connection.
However, if your REST API is going to be consumed by non-browser clients, you may want to offer a way to check of the completion of a command using the async REST approach: https://www.adayinthelifeof.nl/2011/06/02/asynchronous-operations-in-rest/
I know, that is quite appealing to have an low latency UP channel available to send commands, but if you do, your overall architecture gets messed up. For example, if you are using a CQRS architecture, where is your WebSocket endpoint? in the read model or in the write model?
If you place it on the read model, then you can easy access to your read DB to answer fast search queries, but then you have to couple somehow the logic to process commands, being the read model the responsible of send the commands to the write model and notify if it is unable to do so.
If you place it on the write model, then you have it easy to place commands, but then you need access to your read model and read DB if you want to answer search queries through the WebSocket.
By considering WebSockets part of your read model and leaving command processing to the REST interface, you keep your loose coupling between your read model and your write model.
I have to develop a message bus for processes to send, receive messages from each other. Currently, we are running on Linux with the view of porting to other platforms later.
For this, I am using ZeroMQ over TCP. The pattern is PUB-SUB with a forwarder. My bus runs as a separate process and all clients connect to SUB port to receive messages and PUB to send messages. Each process subscribes to messages by a unique tag. A send call from a process sends messages to all. A receive call will fetch that process the messages marked with the tag of that process. This is working fine.
Now I need to wrap the ZeroMQ stuff. My clients only need to supply a unique tag. I need to maintain a global list of tags vs. ZeroMQ context and sockets details. When a client say,
initialize_comms("name"); the bus needs to check if this name is unique, create ZeroMQ contexts and sockets. Similarly, if a client say receive("name"); the bus needs to fetch messages with that tag.
To summarize the problems I am facing;
Is there anyway to achieve this using facilities provided by ZeroMQ?
Is ZeroMQ the right tool for this, or should I look for something like nanomsg?
Is PUB-SUB with forwarder the right pattern for this?
Or, am I missing something here?
Answers
Yes, ZeroMQ is capable of serving this need
Yes. ZeroMQ is a right tool ( rather a powerful tool-box of low-latency components ) for this. While nanomsg has a straight primitive for bus, the core distributed logic can be integrated in ZeroMQ framework
Yes & No. PUB-SUB as given above may serve for emulation of the "shout-cast"-to-bus and build on a SUB side-effect of using a subscription key(s). The WHOLE REST of the logic has to be re-thought and designed so as the whole scope of the fabrication meets your plans (ref. below). Also kindly bear in mind, that initial versions of ZeroMQ operated PUB/SUB primitive as "subscription filtering" of the incoming stream of messages being done on receiver side, so massive designs shall check against traffic-volumes / risk-of-flooding / process-inefficiency on the massive scale...
Yes. ZeroMQ is rather a well-tuned foundation of primitive elements ( as far as the architecture is discussed, not the power & performance thereof ) to build more clever, more robust & almost-linearly-scaleable Formal Communication Pattern(s). Do not get stuck to PUB/SUB or PAIR primitives once sketching Architecture. Any design will remain poor if one forgets where the True Powers comes from.
A good place to start a next step forward towards a scaleable & fault-resilient Bus
Thus a best next step one may do is IMHO to get a bit more global view, which may sound complicated for the first few things one tries to code with ZeroMQ, but if you at least jump to the page 265 of the Code Connected, Volume 1, if it were not the case of reading step-by-step thereto.
The fastest-ever learning-curve would be to have first an un-exposed view on the Fig.60 Republishing Updates and Fig.62 HA Clone Server pair for a possible High-availability approach and then go back to the roots, elements and details.
Here is what I ended up designing, if anyone is interested. Thanks everyone for the tips and pointers.
I have a message bus implemented using ZeroMQ (and CZMQ) running as a separate process.
The pattern is PUBLISHER-SUBSCRIBER with a LISTENER. They are connected using a PROXY.
In addition, there is a ROUTER invoked using a newly forked thread.
These three endpoints run on TCP and are bound to predefined ports which the clients know of.
PUBLISHER accepts all messages from clients.
SUBSCRIBER sends messages with a unique tag to the client who have subscribed to that tag.
LISTENER listens to all messages passing through. currently, this is for logging testing and purposes.
ROUTER provides a separate comms channel to clients. Messages such as control commands are directed here so that they will not get passed downstream.
Clients connect to,
PUBLISHER to send messages.
SUBSCRIBER to receive messages. Subscription is using unique tags.
ROUTER to send commands (check tag uniqueness etc.)
I am still doing implementation so there may be unseen problems, but right now it works fine. Also, there may be a more elegant way but I didn't want to throw away the PUB-SUB thing I had built.
I am trying to understand implementations/options for server-side Websocket endpoints - particularly in Perl using PSGI/Plack and I have a question: Why are all server-side websocket implementations based around event-driven PSGI servers (Twiggy, Tatsumaki, etc.)?
I get that websocket communication is asynchronous, but a non-event driven PSGI server (say Starman) could spawn an asynchronous listener to handle the websocket side of things. I have seen (but not understood) PHP implementations of Websocket servers, so why cant the same be done with PSGI without having to change the server to an event driven one?
Underlying network logic to deal with sockets depends on platform, OS and particular software implementations.
Most common three methods are:
pulling - there is blocking constant "asking" if socket has some data. This method is well bad, as it will block execution of main thread for as long as it waits for some data.
thread per socket - each new connection involves creating new thread and asking each socket in blocking manner happens within that thread. So it wont block main thread with logic. This method is bad as creating thread for each connection is too expensive for memory, and can be around 1Mb or RAM based on OS and other criteria.
async - uses system features to "notify" your process when there is something. So you can react once your app is ready (in case of single threaded app) or even react in separate thread straight away. This method is well efficient as it saves RAM, and allows your app to work without need of waiting or asking for data. It utilises existing functionalities that most OS and platforms provide.
Taking this in account, you indeed can create single process functional way to deal with sockets traffic. But that is not efficient at all as been proven previously. That is why fully async models are major today, as most languages and platforms do support such paradigm.