3-way communication via sockets - perl

Good Afternoon Gurus,
I am pretty familiar with basic socket programming, and the IO::Socket module but I need to code something now that I have not encountered before. It will be a 3 tier application. The first tier is an event-loop that sends messages upstream when certain events are encountered. The second tier is the 'middle-ware' server, which (among other things) acts as the message repository. The third tier is a cgi application, which will update a graphical display.
I am confused on how to set up the server to accept uni-directional connections from multiple clients one one side, and communicate bi-directionally with the cgi application on the other. I can do either of those tasks separately, just not in the same script (yet). Does my question make sense? I would like to stick with using the IO::Socket module, but it is not a requirement by any means. I am not asking for polished code, just advice on setting up the socket(s) and how to communicate from one client to another via the server.
Also, does it make more sense to have the cgi application query the server for new messages, or have the server push the new message upstream to the cgi application? The graphical updates need to be near real-time.
Thank you in advance,
Daren

You said you already have an event loop in the first tier. In a way, your second-tier server should also arrange some kind of event loop for asynchronous processing. There are many ways to code it using perl, like AnyEvent, POE, Event to name just a few. In the end, they all use one of select, poll, epoll, kqueue OS facilities (or their equivalent on Windows). If you feel comfortable coding in a relatively low-level, you can just use perl's select builtin, or, alternatively, its object-oriented counterpart, IO::Select.
Basically you create two listening sockets (you might only need one if the first tier uses the same communication protocol as the third tier to talk to your server), add it to the IO::Select object and do a select on it. Once the connection
is made, you add the accepted sockets to the select object.
The select method of IO::Select will give you back a list of sockets ready for reading or writing (I am ignoring the possibility of exceptions here). Of course you have to keep track of your sockets to know which one is which. Also, the communication logic will be somewhat complicated because you have to use non-blocking sockets.
As for the second part of your question, I am a little bit confused what you mean by "cgi" - whether it is a Common Gateway Interface (i.e., server-side web scripts), or whether it is a shorthand for "computer graphics". In both cases I think that it makes sense for your task to use server push.
In the latter case that's all I'd like to say. In the former case, I suggest you google for "Comet" (as in "AJAX"). :-)

In a standard CGI application, I don't see how you can "push" data to them. For a client interaction, the data goes through the CGI/presentation layer to the middle tier to remain in session storage (or cache) or to the backend to get stored in the database.
That is of course unless you have a thick application layer which is a caching locus and kind of a middle tier in itself.

Related

WebSocket/REST: Client connections?

I understand the main principles behind both. I have however a thought which I can't answer.
Benchmarks show that WebSockets can serve more messages as this website shows: http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/
This makes sense as it states the connections do not have to be closed and reopened, also the http headers etc.
My question is, what if the connections are always from different clients all the time (and perhaps maybe some from the same client). The benchmark suggests it's the same clients connecting from what I understand, which would make sense keeping a constant connection.
If a user only does a request every minute or so, would it not be beneficial for the communication to run over REST instead of WebSockets as the server frees up sockets and can handle a larger crowd as to speak?
To fix the issue of REST you would go by vertical scaling, and WebSockets would be horizontal?
Doe this make sense or am I out of it?
This is my experience so far, I am happy to discuss my conclusions about using WebSockets in big applications approached with CQRS:
Real Time Apps
Are you creating a financial application, game, chat or whatever kind of application that needs low latency, frequent, bidirectional communication? Go with WebSockets:
Well supported.
Standard.
You can use either publisher/subscriber model or request/response model (by creating a correlationId with each request and subscribing once to it).
Small size apps
Do you need push communication and/or pub/sub in your client and your application is not too big? Go with WebSockets. Probably there is no point in complicating things further.
Regular Apps with some degree of high load expected
If you do not need to send commands very fast, and you expect to do far more reads than writes, you should expose a REST API to perform CRUD (create, read, update, delete), specially C_UD.
Not all devices prefer WebSockets. For example, mobile devices may prefer to use REST, since maintaining a WebSocket connection may prevent the device from saving battery.
You expect an outcome, even if it is a time out. Even when you can do request/response in WebSockets using a correlationId, still the response is not guaranteed. When you send a command to the system, you need to know if the system has accepted it. Yes you can implement your own logic and achieve the same effect, but what I mean, is that an HTTP request has the semantics you need to send a command.
Does your application send commands very often? You should strive for chunky communication rather than chatty, so you should probably batch those change request.
You should then expose a WebSocket endpoint to subscribe to specific topics, and to perform low latency query-response, like filling autocomplete boxes, checking for unique items (eg: usernames) or any kind of search in your read model. Also to get notification on when a change request (write) was actually processed and completed.
What I am doing in a pet project, is to place the WebSocket endpoint in the read model, then on connection the server gives a connectionID to the client via WebSocket. When the client performs an operation via REST, includes an optional parameter that indicates "when done, notify me through this connectionID". The REST server returns saying if the command was sent correctly to a service bus. A queue consumer processes the command, and when done (well or wrong), if the command had notification request, another message is placed in a "web notification queue" indicating the outcome of the command and the connectionID to be notified. The read model is subscribed to this queue, gets messessages and forward them to the appropriate WebSocket connection.
However, if your REST API is going to be consumed by non-browser clients, you may want to offer a way to check of the completion of a command using the async REST approach: https://www.adayinthelifeof.nl/2011/06/02/asynchronous-operations-in-rest/
I know, that is quite appealing to have an low latency UP channel available to send commands, but if you do, your overall architecture gets messed up. For example, if you are using a CQRS architecture, where is your WebSocket endpoint? in the read model or in the write model?
If you place it on the read model, then you can easy access to your read DB to answer fast search queries, but then you have to couple somehow the logic to process commands, being the read model the responsible of send the commands to the write model and notify if it is unable to do so.
If you place it on the write model, then you have it easy to place commands, but then you need access to your read model and read DB if you want to answer search queries through the WebSocket.
By considering WebSockets part of your read model and leaving command processing to the REST interface, you keep your loose coupling between your read model and your write model.

What is the best, most efficient, Client pool technique with Erlang

I'm a real Erlang newbie (started 1 week ago), and I'm trying to learn this language by creating a small but efficient chat server. (When I say efficient I mean I have 5 servers used to stress test this with hundreds of thousands connected client - A million would be great !)
I have find some tutorials doing so, the only thing is, that every tutorial i found, are IRC like. If one user send a message, all user except sender will receive it.
I would like to change that a bit, and use one-to-one discussion.
What would be the most effective client pool for searching a connected user ?
I thought about registering the process, because it seems to do everything I need, but I really don't think this is the better way to do it. (Or most pretty way to do it anyway).
Does anyone would have any suggestions doing this ?
EDIT :
Every connected client is affected to an ID.
When the user is connected, it first send a login command to give it's id.
When an user wants to send a message to another one the message looks like this
[ID-NUMBER][Message] %% ID-NUMBER IS A FIXED LENGTH
When I ask for "the most effective client pool", I'm actually looking for the fastest way to retrieve/add/delete one client on the connected client list which could potentially be large (hundred of thousands -- maybe millions)
EDIT 2 :
For answering some questions :
I'm using Raw Socket (Using telnet right now to communicate with server) - will probably move to ssl later...
It is my own protocol
Every Client is a spawned Pid
Every Client's Pid is linked to it's own monitor (mostly for debugging reason - The client if disconnected should reconnect by it's own starting auth from scratch)
I have read a couple a book before starting coding, So I do not master yet every aspect of Erlang but I'm not unaware of it, I will read more about it when needed I guess.
What I'm really looking for is the best way to store and search thoses PIDs to send message directly from process to process.
Should I write my own search Client function using lists ?
or should I use ets ?
Or even use register/2 unregister/1 and whereis/1 to maintain my client list, using it's unique id as atom, it seems to be the simplest way to do so, I really don't know if it is efficient, but I'm pretty sure this is the ugly solution ;-) ?
I'm doing something similar to your chat program using gproc as a pubsub (similar to the demo on that page). Each client registers as it's id. To find a particular client, you do a lookup on that client id. To subscribe to a client, you add a property to that process of the client id being subscribed to. To publish, you call gproc:send(ClientId,Message). This covers your use case, the more general room based chat as well, and can handle distributed masterless registry of processes.
I haven't tested to see if it scales to millions, but it uses ets to do the storage and gproc is rock solid code by Ulf Wiger. I wouldn't count on being able to write a better implementation.
I'm also kind of new to Erlang (a couple of months), so I hope this can put you in the correct path :)
First of all, since you're a "newbie", you should know about these sites:
Erlang Official Documentation:
Most common modules are in the stdlib application, so start from there.
Alternative Documentation:
There's a real time search engine, so it is really good when searching
for specific modules.
Erlang Programming Rules:
For you to enter in the mindset of erlang programming.
Learn You Some Erlang Book:
A must read for everyone starting with Erlang. It's really comprehensive
and fun to read!
Trapexit.org:
Forum and cookbooks, to search for common problems faced by programmers.
Well, thinking about a non persistent database, I would suggest the sets or gb_sets modules (documentation here).
If you want persistence, you should try dets (see documentation above), but I can't state anything about efficiency, so you should research this topic a bit further.
In the book Learn You Some Erlang there is a chapter on data structures that says that sets are better for read intensive systems, while gb_sets is more appropriate for a balanced usage.
Now, Messaging systems are what everyone wants to do when they come to Erlang because the two naturally blend. However, there are a number of things to look into before one continues. Messaging basically involves the following things: User Registration, User Authentication, Sessions Management,Logging, Message Switching/routing e.t.c. Now, to do all or most of these, one needs to have a Database, certainly IN-MEMORY, thats leads me to either Mnesia or ETS Tables. Since you are new to Erlang, i suppose you have not yet really mastered working with these. At one moment, you will need to maintain Who is communicating with who, Who is available for Chat e.t.c. Hence you might need to look up things and write things some where.Another thing is you have not told us the Client. Is it going to be a Web Client (HTTP), is it an entirely new protocol you are implementing over raw Sockets ? Which ever way, you will need to master something called: Concurrency in Erlang. If a user connects and is assigned an ID, if your design is A process Per User, then you will have to save the Pids of these Processes or register them against some criteria, yet again monitor them if they die e.t.c. Which brings me to OTP and Supervision trees. There is quite alot, however, tell us more about the Client and Server interaction, the Network Communication you need e.t.c. Or is it just a simple Erlang RPC project you are doing for your own revision ?
EDIT Use ETS Tables, or use Mnesia RAM tables. Do not think of registering these Pids or Storing them in a list, Array or set. Look at this solution which was given to this question

Should I connect directly to CouchDB's socket and pass HTTP requests or use node.js as a proxy?

First, here's my original question that spawned all of this.
I'm using Appcelerator Titanium to develop an iPhone app (eventually Android too). I'm connecting to CouchDB's port directly by using Titanium's Titanium.Network.TCPSocket object. I believe it utilizes the Apple SDK's CFSocket/NSStream class.
Once connected, I simply write:
'GET /mydb/_changes?filter=app/myfilter&feed=continuous&gameid=4&heartbeat=30000 HTTP/1.1\r\n\r\n'
directly to the socket. It keeps it open "forever" and returns JSON data whenever the db is updated and matches the filter and change request. Cool.
I'm wondering, is it ok to connect directly to CouchDB's socket like this, or would I be better off opening the socket to node.js instead, and maybe using this CouchDB node.js module to handle the CouchDB proxy through node.js?
My main concern is performance. I just don't have enough experience with CouchDB to know if hitting its socket and passing faux HTTP requests directly is good practice or not. Looking for experience and opinions on any ramifications or alternate suggestions.
It's me again. :-)
CouchDB inherits super concurrency handling from Erlang, the language it was written in. Erlang uses lightweight processes and message passing between those processes to achieve excellent performance under high concurrent load. It will take advantage of all cpu cores, too.
Nodejs runs a single process and basically only does one thing at a time within that process. Its event-based, non-blocking IO approach does allow it to multitask while it waits for chunks of IO but it still only does one thing at a time.
Both should easily handle tens of thousands of connections, but I would expect CouchDB to handle concurrency better (and with less effort on your part) than Node. And keep in mind that Node adds some latency if you put it in front of CouchDB. That may only be noticeable if you have them on different machines, though.
Writing directly to Couch via TCPSocket is a-ok as long as your write a well-formed HTTP request that follows the spec. (You're not passing a faux request...that's a real HTTP request you're sending just like any other.)
Note: HTTP 1.1 does require you to include a Host header in the request, so you'll need to correct your code to reflect that OR just use HTTP 1.0 which doesn't require it to keep things simple. (I'm curious why you're not using Titanium.Network.HTTPClient. Does it only give you the request body after the request finishes or something?)
Anyway, CouchDB can totally handle direct connections and--unless you put a lot of effort into your Node proxy--it's probably going to give users a better experience when you have 100k of them playing the game at once.
EDIT: If you use Node write an actual HTTP proxy. That will run a lot faster than using the module you provided and be simpler to implement. (Rather than defining your own API that then makes requests to Couch you can just pass certain requests on to CouchDB and block others, say, for security reasons.
Also take a look at how "multinode" works:
http://www.sitepen.com/blog/2010/07/14/multi-node-concurrent-nodejs-http-server/

How should I implement bi-directional networking between an iPhone application and an Objective-C server-side application?

I'm looking for advice on the best way to implement some kind of bi-directional communication between a "server-side" application, written in Objective-C and running on a mac, and a client application running on an iPhone.
To cut a long story short, I'm adapting an existing library for use in a client-server environment. The library (which runs on the server) is basically a search engine which provides periodic results, and additionally can provide updates for any of those results at a later date. In an ideal world therefore I would be able to achieve the following with my hypothetical networking solution:
Start queries on the server.
Have the server "push" results to the client as they arrive.
Have the server "push" updates to individual results to the client as they arrive.
If I was writing this client to run on another Mac, I might well look at using Distributed Objects to mask the fact that the server was actually running remotely, but DO is not available on an iPhone.
If I was writing a more generic client-server application I would probably look at using HTTP to provide some kind of RESTful interface to searches, but this solution does not lend itself well to asynchronous updates and additionally what I am proposing does not fit well with the "stateless" tennet of REST: I would have to model my protocol so I "created" a search resource that I could subsequently query the state of and I would have to poll for updates to it.
One suggestion someone made was to make use of something like BLIP to provide me with a two-way pipe between the client and the server and implement my own "proxy" type objects for the server-side resources that knew how to fetch data from the server and additionally were addressable so that the server could push updates to them. Whilst BLIP provides the low-level messaging framework needed to communicate bi-directionally it still leaves me with a few questions:
How will I manage the lifetime of the objects on the server? I can have a message type that "creates" a search object, but when should that object be destroyed?
How well with this perform on an iPhone: if I have a persistent connection to the server will this drain the batteries too fast? This question is also pertinent in the HTTP world: most async updates are done using a COMET type hack which again requires a persistent connection.
So right now I'm still completely unsure what the best way to go is: I've done a lot of searching and reading but have not settled on any solution. I'm asking here on SO because I'm sure that there are many of you out there who have already solved this problem.
How have you gone about achieving real-time bidirectional networking between the iPhone and an Objective-C server-side app?

What's the best IPC mechanism for medium-sized data in Perl? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm working on designing a multi-tiered app in Perl and I'm wondering about the pros and cons of the various IPC mechnisms available to me. I'm looking at handling moderately-sized data, typically a few dozen kilobytes but up to a couple of megabytes, and the load is pretty light, at most a couple of hundred requests per minute.
My primary concerns are maintainability and performance (in that order). I don't think I'll need to scale up to more than one server, or port off of our main platform (RHEL), but I suppose it's something to consider.
I can think of the following options:
Temporary files - Simplistic, probably the worst option in terms of speed and storage requirements
UNIX domain sockets - Not portable, not scalable
Internet Sockets - Portable, scalable
Pipes - Portable, not scalable (?)
Considering that scalability and portability are not my primary concerns, I need to learn more. What's the best choice, and why? Please comment if you need additional information.
EDIT: I'll try to give more detail in response to ysth's questions (warning, wall of text follows):
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy?
And vice versa?
What other information do you have about your desired usage?
At this point, I'm contemplating a three-tiered approach, but I'm not sure how many processes I'll have in each tier. I think I need to have more processes towards the left side and fewer toward the right, but maybe I should have the same number across the board:
.---------. .----------. .-------.
| Request | -----> | Business | -----> | Data |
| Manager | <----- | Logic | <----- | Layer |
`---------' `----------' `-------'
These names are still generic and probably won't make it into the implementation in these forms.
The request manager is responsible for listening for requests from different interfaces, for example web requests and CLI (where response time is important) and e-mail (where response time is less important). It performs logging and manages the responses to the requests (which are rendered in a format appropriate to the type of request).
It sends data about the request to the business logic which performs logging, authorization depending on business rules, etc.
The business logic (if it needs to) then requests data from the data layer, which can either talk to (most often) the internal MySQL database or some other data source outside our team's control (e.g., our organization's primary LDAP servers, or our DB2 employee information database, etc.). This is mostly simply a wrapper which formats the data in a uniform way so that it can be handled more easily in the business logic.
The information then flows back to to the request manager for presentation.
If, when data is flowing to the right, the reader is busy, for the interactive requests I'd like to simply wait a suitable period of time, and return a timeout error if I don't get access in that amount of time (e.g. "Try again later"). For the non-interactive requests (e.g. e-mail), the polling system can simply exit and try again on the next invocation (which will probably be once per 1-3 minutes).
When data is flowing in the other direction, there shouldn't be any waiting situations. If one of the processes has died when trying to travel back to the left, all I can really do is log and exit.
Anyway, that was pretty verbose, and since I'm still in early design I probably still have some confused ideas in there. Some of what I've mentioned is probably tangential to the issue of which IPC system to use. I'm open to other suggestions on the design, but I was trying to keep the question limited in scope (For example, maybe I should consider collapsing down to two tiers, which is a much simpler for IPC). What are your thoughts?
If you're unsure about your exact requirements at the moment, try to think of a simple interface that you can code to, that any IPC implementation (be it temporary files, TCP/IP or whatever) needs to support. You can then choose a particular IPC flavour (I would start with whatever's easiest and/or easiest to debug -- probably temporary files) and implement the interface using that. If that turns out to be too slow, implement the interface using e.g. TCP/IP. Actually implementing the interface does not involve much work as you will essentially just be forwarding calls to some existing library.
The point is that you have a high-level task to perform ("transmit data from program A to program B") which is more or less independent of the details of how it is performed. By establishing an interface and coding to it, you isolate the main program from changes in the event that you need to change the implementation.
Note that you don't need to use any heavyweight Perl language mechanisms to capitalise on the idea of having an interface. You could simply have e.g. 3 different packages (for temp files, TCP/IP, Unix domain sockets), each of which exports the same set of methods. Choosing which implementation you want to use in your main program amounts to choosing which module to use.
Temporary files (and related things, like a shared memory region), are probably a bad bet. If you ever want to run your server on one machine and your clients on another, you will need to rewrite your application. If you pick any of the other options, at least the semantics are the essentially the same, if you need to switch between them at a later date.
My only real advice, though, is to not write this yourself. On the server side, you should use POE (or Coro, etc.), rather than doing select on the socket yourself. Also, if your interface is going to be RPC-ish, use something like JSON-RPC-Common/ from the CPAN.
Finally, there is IPC::PubSub, which might work for you.
Temporary files have other problems besides that. I think Internet socks are really the best choice. They are well documented, and as you say, scalable and portable. Even if that is not a core requirement, you get it nearly for free. Sockets are pretty easy to deal with, again there is copious amounts of documentation. You can build out your data sharing mechanism and protocol out in a library and never have to look at it again!
UNIX domain sockets are portable across unices. It's no less portable than pipes. It's also more efficient than IP sockets.
Anyway, you missed a few options, shared memory for example. Some would add databases to that list but I'd say that's a rather heavyweight solution.
Message queues would also be a possibility, though you'd have to change a kernel option for it to handle such large messages. Otherwise, they have an ideal interface for a lot of things, and IMHO they are greatly underused.
I generally agree though that using an existing solution is better than building somethings of your own. I don't know the specifics of your problem, but I'd suggest you'd check out the IPC section of CPAN
There are so many different options because most of them are better for some particular case, but you haven't really given any information that would identify your case.
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy? And vice versa?
What other information do you have about your desired usage?
For "interactive" requests (holding the connection open while waiting for a response (asynchronously or not): HTTP + JSON. JSON::XS is insanely fast. Everyone and everything can speak HTTP and it's easy to load balance, debug, ...
For queued requests ("please do this, thanks!"): Beanstalkd and Beanstalk::Client. Serialize the requests in the beanstalk queue with JSON.
Thrift might also be worth looking into depending on your application.