Related
What are the advantages and disadvantages of using only socket based communication vs a hybrid of REST and socket (using socket only when bidirectional communication is necessary, like receiving messages in a chat).
When I say only socket, I mean that instead of sending a GET request asking for /entities, I'd send update_needed and the server would send a push via socket.
My question is not really about performance, it's more about the concept, like delegate vs block/lambda (using socket would be like the delegate concept and REST is more like block).
It all boils down to what type of application and level of scalability you have in mind.
WebSocket/REST: Client connections?
How to handle CQRS from a client-side perspective
Hard downsides of long polling?
The main reason why I wouldn't use WebSockets in any major project is simply that still many users don't use a modern browser that support them. Namely IE 8 and 9 don't support them and both together still have a market share of over 20 % (Oct 15).
Good Afternoon Gurus,
I am pretty familiar with basic socket programming, and the IO::Socket module but I need to code something now that I have not encountered before. It will be a 3 tier application. The first tier is an event-loop that sends messages upstream when certain events are encountered. The second tier is the 'middle-ware' server, which (among other things) acts as the message repository. The third tier is a cgi application, which will update a graphical display.
I am confused on how to set up the server to accept uni-directional connections from multiple clients one one side, and communicate bi-directionally with the cgi application on the other. I can do either of those tasks separately, just not in the same script (yet). Does my question make sense? I would like to stick with using the IO::Socket module, but it is not a requirement by any means. I am not asking for polished code, just advice on setting up the socket(s) and how to communicate from one client to another via the server.
Also, does it make more sense to have the cgi application query the server for new messages, or have the server push the new message upstream to the cgi application? The graphical updates need to be near real-time.
Thank you in advance,
Daren
You said you already have an event loop in the first tier. In a way, your second-tier server should also arrange some kind of event loop for asynchronous processing. There are many ways to code it using perl, like AnyEvent, POE, Event to name just a few. In the end, they all use one of select, poll, epoll, kqueue OS facilities (or their equivalent on Windows). If you feel comfortable coding in a relatively low-level, you can just use perl's select builtin, or, alternatively, its object-oriented counterpart, IO::Select.
Basically you create two listening sockets (you might only need one if the first tier uses the same communication protocol as the third tier to talk to your server), add it to the IO::Select object and do a select on it. Once the connection
is made, you add the accepted sockets to the select object.
The select method of IO::Select will give you back a list of sockets ready for reading or writing (I am ignoring the possibility of exceptions here). Of course you have to keep track of your sockets to know which one is which. Also, the communication logic will be somewhat complicated because you have to use non-blocking sockets.
As for the second part of your question, I am a little bit confused what you mean by "cgi" - whether it is a Common Gateway Interface (i.e., server-side web scripts), or whether it is a shorthand for "computer graphics". In both cases I think that it makes sense for your task to use server push.
In the latter case that's all I'd like to say. In the former case, I suggest you google for "Comet" (as in "AJAX"). :-)
In a standard CGI application, I don't see how you can "push" data to them. For a client interaction, the data goes through the CGI/presentation layer to the middle tier to remain in session storage (or cache) or to the backend to get stored in the database.
That is of course unless you have a thick application layer which is a caching locus and kind of a middle tier in itself.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm working on designing a multi-tiered app in Perl and I'm wondering about the pros and cons of the various IPC mechnisms available to me. I'm looking at handling moderately-sized data, typically a few dozen kilobytes but up to a couple of megabytes, and the load is pretty light, at most a couple of hundred requests per minute.
My primary concerns are maintainability and performance (in that order). I don't think I'll need to scale up to more than one server, or port off of our main platform (RHEL), but I suppose it's something to consider.
I can think of the following options:
Temporary files - Simplistic, probably the worst option in terms of speed and storage requirements
UNIX domain sockets - Not portable, not scalable
Internet Sockets - Portable, scalable
Pipes - Portable, not scalable (?)
Considering that scalability and portability are not my primary concerns, I need to learn more. What's the best choice, and why? Please comment if you need additional information.
EDIT: I'll try to give more detail in response to ysth's questions (warning, wall of text follows):
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy?
And vice versa?
What other information do you have about your desired usage?
At this point, I'm contemplating a three-tiered approach, but I'm not sure how many processes I'll have in each tier. I think I need to have more processes towards the left side and fewer toward the right, but maybe I should have the same number across the board:
.---------. .----------. .-------.
| Request | -----> | Business | -----> | Data |
| Manager | <----- | Logic | <----- | Layer |
`---------' `----------' `-------'
These names are still generic and probably won't make it into the implementation in these forms.
The request manager is responsible for listening for requests from different interfaces, for example web requests and CLI (where response time is important) and e-mail (where response time is less important). It performs logging and manages the responses to the requests (which are rendered in a format appropriate to the type of request).
It sends data about the request to the business logic which performs logging, authorization depending on business rules, etc.
The business logic (if it needs to) then requests data from the data layer, which can either talk to (most often) the internal MySQL database or some other data source outside our team's control (e.g., our organization's primary LDAP servers, or our DB2 employee information database, etc.). This is mostly simply a wrapper which formats the data in a uniform way so that it can be handled more easily in the business logic.
The information then flows back to to the request manager for presentation.
If, when data is flowing to the right, the reader is busy, for the interactive requests I'd like to simply wait a suitable period of time, and return a timeout error if I don't get access in that amount of time (e.g. "Try again later"). For the non-interactive requests (e.g. e-mail), the polling system can simply exit and try again on the next invocation (which will probably be once per 1-3 minutes).
When data is flowing in the other direction, there shouldn't be any waiting situations. If one of the processes has died when trying to travel back to the left, all I can really do is log and exit.
Anyway, that was pretty verbose, and since I'm still in early design I probably still have some confused ideas in there. Some of what I've mentioned is probably tangential to the issue of which IPC system to use. I'm open to other suggestions on the design, but I was trying to keep the question limited in scope (For example, maybe I should consider collapsing down to two tiers, which is a much simpler for IPC). What are your thoughts?
If you're unsure about your exact requirements at the moment, try to think of a simple interface that you can code to, that any IPC implementation (be it temporary files, TCP/IP or whatever) needs to support. You can then choose a particular IPC flavour (I would start with whatever's easiest and/or easiest to debug -- probably temporary files) and implement the interface using that. If that turns out to be too slow, implement the interface using e.g. TCP/IP. Actually implementing the interface does not involve much work as you will essentially just be forwarding calls to some existing library.
The point is that you have a high-level task to perform ("transmit data from program A to program B") which is more or less independent of the details of how it is performed. By establishing an interface and coding to it, you isolate the main program from changes in the event that you need to change the implementation.
Note that you don't need to use any heavyweight Perl language mechanisms to capitalise on the idea of having an interface. You could simply have e.g. 3 different packages (for temp files, TCP/IP, Unix domain sockets), each of which exports the same set of methods. Choosing which implementation you want to use in your main program amounts to choosing which module to use.
Temporary files (and related things, like a shared memory region), are probably a bad bet. If you ever want to run your server on one machine and your clients on another, you will need to rewrite your application. If you pick any of the other options, at least the semantics are the essentially the same, if you need to switch between them at a later date.
My only real advice, though, is to not write this yourself. On the server side, you should use POE (or Coro, etc.), rather than doing select on the socket yourself. Also, if your interface is going to be RPC-ish, use something like JSON-RPC-Common/ from the CPAN.
Finally, there is IPC::PubSub, which might work for you.
Temporary files have other problems besides that. I think Internet socks are really the best choice. They are well documented, and as you say, scalable and portable. Even if that is not a core requirement, you get it nearly for free. Sockets are pretty easy to deal with, again there is copious amounts of documentation. You can build out your data sharing mechanism and protocol out in a library and never have to look at it again!
UNIX domain sockets are portable across unices. It's no less portable than pipes. It's also more efficient than IP sockets.
Anyway, you missed a few options, shared memory for example. Some would add databases to that list but I'd say that's a rather heavyweight solution.
Message queues would also be a possibility, though you'd have to change a kernel option for it to handle such large messages. Otherwise, they have an ideal interface for a lot of things, and IMHO they are greatly underused.
I generally agree though that using an existing solution is better than building somethings of your own. I don't know the specifics of your problem, but I'd suggest you'd check out the IPC section of CPAN
There are so many different options because most of them are better for some particular case, but you haven't really given any information that would identify your case.
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy? And vice versa?
What other information do you have about your desired usage?
For "interactive" requests (holding the connection open while waiting for a response (asynchronously or not): HTTP + JSON. JSON::XS is insanely fast. Everyone and everything can speak HTTP and it's easy to load balance, debug, ...
For queued requests ("please do this, thanks!"): Beanstalkd and Beanstalk::Client. Serialize the requests in the beanstalk queue with JSON.
Thrift might also be worth looking into depending on your application.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm looking to establish some kind of socket/COMET type functionality from my server(s) to my iPhone application. Essentially, anytime a user manages to set an arbitrary object 'dirty' on the server, by say, updating their Address.. the feedback should be pushed from the server to any clients keeping a live poll to the server. The buzzword for this is COMET I suppose. I know there is DWR out there for web browser applications, so I'm thinking, maybe it's best to set a hidden UIWebView in each of my controllers just so I can get out of the box COMET from their javascript framework? Is there a more elegant approach?
There are a couple of solutions available to use a STOMP client.
STOMP is incredibly simple and lightweight, perfect for the iPhone.
I used this one as my starting point, and found it very good. It has a few object allocation/memory leak problems, but once I got the hang of iPhone programming, these were easy to iron out.
Hope that helps!
Can you use ordinary TCP/IP socket in your application?
A) If yes then definitely a raw TCP/IP socket is more elegant solution. From your iPhone app you just wait for notification events. The socket is open as long as your application is open. If you want you can even use HTTP protocol / headers.
On the server side you can use some framework to write servers which efficiently handle thousands of open TCP/IP connections. e.g Twisted, EventMachine or libevent. Then just bind the server main socket to http port (80).
The idea is to use a server which keeps just a single data structure per client. Receives update event from some DB application and then pushes it to right client.
B) No, you have to use Apache and http client on iPhone side. Then you should know that whole COMET solution is in fact work around for limitations of HTTP protocol and Apache / PHP.
Apache was designed to handle many short time connections. As far I know only newest versions Apache (mpm worker) can handle efficiently big number of opened connection. Previously Apache was keeping one process per connection.
Web browsers have a limit of concurrent open connections to one web server (URL address in fact, eg. www.foo.com, not IP address of www.foo.com). And the limit is 2 connections. Additionally, a browser will allow only for AJAX connections to the same server from which the main HTML page was downloaded.
I wrote a web server for doing exactly this kind of thing. I'm pushing realtime updates through the server with long polling and, as an example, I had safari on the iPhone displaying that data.
A given instance of the server should be able to handle a few thousand concurrent clients without trying too hard. I've got a plan to put them in a hierarchy to allow for more horizontal scaling (should be quite trivial, but doesn't affect my current application).
WebSync has a javascript client that works on the iPhone, if that's what you're after
Would long-polling work for what you want to achieve? You can implement the client-side in a few lines of regular Javascript, which will be lighter than any framework could possibly be.
It would also be trivial to implement it in ObjC (connect, wait for a response or timeout, repeat)
The answers to my question Simple "Long Polling" example code? hopefully explain how extremely simple Long Polling is..
Basically you would just request a URL as usual - the web-server would accept the connection, but not send any data until it's available. When you receive data, or the connection times-out, you reconnect (and repeat)
The most complicated bit would be server server-side, as you cannot use a regular threaded web-server like Apache, although this is also the case with Comet..
StreamHub Comet Server works with the iPhone out of the box, no plugins or anything required. Just browsed to their website on my iPhone and all the examples worked, didn't need to install Flash or anything.
Do you want/have do the communication for your app over http? If not, you can use CFNetwork framework to use sockets (TCP/UDP) to allow your app and server to communicate. From what I have seen of the CFNetwork stack, it is pretty cool, and makes it fairly straitforward to read and write to streams, and allows for synchronous and asynchronous communication. It also allows for you to define callbacks on your socket allowing you to get notified of events like data received, connection made, etc. So, in your example you could send the information over the socket to your server, and then you could define a callback that would listen for incoming data on the stream and then update your app accordingly.
EDIT: Did a little more research, and if you go the socket approach, you may want to also look at the NSStream classes. They are Cocoa abstractions build on top of the CFSocket stuff.
you didn't mention what serverside tech you're using. But in case it's microsoft .net (or for any other googlers who come across this), there is a simple option for comet: http://www.codeplex.com/ncomet.
COMET, LightStreamer, AJAX all that junk is broken. It is basics of TCP that no 'keep-alives' are ever guaranteed without pinging traffic.. So you can forget that long-polling if any decent reliability or timely delivery is to be guaranteed..
It's just hype everyone saw through back in 2003 when the lame-mania kicked off..
If you have a situation where a TCP connection is potentially too slow and a UDP 'connection' is potentially too unreliable what do you use? There are various standard reliable UDP protocols out there, what experiences do you have with them?
Please discuss one protocol per reply and if someone else has already mentioned the one you use then consider voting them up and using a comment to elaborate if required.
I'm interested in the various options here, of which TCP is at one end of the scale and UDP is at the other. Various reliable UDP options are available and each brings some elements of TCP to UDP.
I know that often TCP is the correct choice but having a list of the alternatives is often useful in helping one come to that conclusion. Things like Enet, RUDP, etc that are built on UDP have various pros and cons, have you used them, what are your experiences?
For the avoidance of doubt there is no more information, this is a hypothetical question and one that I hoped would elicit a list of responses that detailed the various options and alternatives available to someone who needs to make a decision.
What about SCTP. It's a standard protocol by the IETF (RFC 4960)
It has chunking capability which could help for speed.
Update: a comparison between TCP and SCTP shows that the performances are comparable unless two interfaces can be used.
Update: a nice introductory article.
It's difficult to answer this question without some additional information on the domain of the problem.
For example, what volume of data are you using? How often? What is the nature of the data? (eg. is it unique, one off data? Or is it a stream of sample data? etc.)
What platform are you developing for? (eg. desktop/server/embedded)
To determine what you mean by "too slow", what network medium are you using?
But in (very!) general terms I think you're going to have to try really hard to beat tcp for speed, unless you can make some hard assumptions about the data that you're trying to send.
For example, if the data that you're trying to send is such that you can tolerate the loss of a single packet (eg. regularly sampled data where the sampling rate is many times higher than the bandwidth of the signal) then you can probably sacrifice some reliability of transmission by ensuring that you can detect data corruption (eg. through the use of a good crc)
But if you cannot tolerate the loss of a single packet, then you're going to have to start introducing the types of techniques for reliability that tcp already has. And, without putting in a reasonable amount of work, you may find that you're starting to build those elements into a user-space solution with all of the inherent speed issues to go with it.
ENET - http://enet.bespin.org/
I've worked with ENET as a reliable UDP protocol and written an asynchronous sockets friendly version for a client of mine who is using it in their servers. It works quite nicely but I don't like the overhead that the peer to peer ping adds to otherwise idle connections; when you have lots of connections pinging all of them regularly is a lot of busy work.
ENET gives you the option to send multiple 'channels' of data and for the data sent to be unreliable, reliable or sequenced. It also includes the aforementioned peer to peer ping which acts as a keep alive.
We have some defense industry customers that use UDT (UDP-based Data Transfer) (see http://udt.sourceforge.net/) and are very happy with it. I see that is has a friendly BSD license as well.
Anyone who decides that the list above isn't enough and that they want to develop their OWN reliable UDP should definitely take a look at the Google QUIC spec as this covers lots of complicated corner cases and potential denial of service attacks. I haven't played with an implementation of this yet, and you may not want or need everything that it provides, but the document is well worth reading before embarking on a new "reliable" UDP design.
A good jumping off point for QUIC is here, over at the Chromium Blog.
The current QUIC design document can be found here.
RUDP - Reliable User Datagram Protocol
This provides:
Acknowledgment of received packets
Windowing and congestion control
Retransmission of lost packets
Overbuffering (Faster than real-time streaming)
It seems slightly more configurable with regards to keep alives then ENet but it doesn't give you as many options (i.e. all data is reliable and sequenced not just the bits that you decide should be). It looks fairly straight forward to implement.
As others have pointed out, your question is very general, and whether or not something is 'faster' than TCP depends a lot on the type of application.
TCP is generally as fast as it gets for reliable streaming of data from one host to another. However, if your application does a lot of small bursts of traffic and waiting for responses, UDP may be more appropriate to minimize latency.
There is an easy middle ground. Nagle's algorithm is the part of TCP that helps ensure that the sender doesn't overwhelm the receiver of a large stream of data, resulting in congestion and packet loss.
If you need the reliable, in-order delivery of TCP, and also the fast response of UDP, and don't need to worry about congestion from sending large streams of data, you can disable Nagle's algorithm:
int opt = -1;
if (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, (char *)&opt, sizeof(opt)))
printf("Error disabling Nagle's algorithm.\n");
If you have a situation where a TCP connection is potentially too slow and a UDP 'connection' is potentially too unreliable what do you use? There are various standard reliable UDP protocols out there, what experiences do you have with them?
The key word in your sentence is 'potentially'. I think you really need to prove to yourself that TCP is, in fact, too slow for your needs if you need reliability in your protocol.
If you want to get reliability out of UDP then you're basically going to be re-implementing some of TCP's features on top of UDP which will probably make things slower than just using TCP in the first place.
Protocol DCCP, standardized in RFC 4340, "Datagram Congestion Control Protocol" may be what you are looking for.
It seems implemented in Linux.
May be RFC 5405, "Unicast UDP Usage Guidelines for Application Designers" will be useful for you.
RUDP. Many socket servers for games implement something similar.
Did you consider compressing your data ?
As stated above, we lack information about the exact nature of your problem, but compressing the data to transport them could help.
It is hard to give a universal answer to the question but the best way is probably not to stay on the line "between TCP and UDP" but rather to go sideways :).
A bit more detailed explanation:
If an application needs to get a confirmation response for every piece of data it transmits then TCP is pretty much as fast as it gets (especially if your messages are much smaller than optimal MTU for your connection) and if you need to send periodic data that gets expired the moment you send it out then raw UDP is the best choice for many reasons but not particularly for speed as well.
Reliability is a more complex question, it is somewhat relative in both cases and it always depends on a specific application. For a simple example if you unplug the internet cable from your router then good luck keeping reliably delivering anything with TCP. And what even worse is that if you don't do something about it in your code then your OS will most likely just block your application for a couple of minutes before indicating an error and in many cases this delay is just not acceptable as well.
So the question with conventional network protocols is generally not really about speed or reliability but rather about convenience. It is about getting some features of TCP (automatic congestion control, automatic transmission unit size adjustment, automatic retransmission, basic connection management, ...) while also getting at least some of the important and useful features it misses (message boundaries - the most important one, connection quality monitoring, multiple streams within a connection, etc) and not having to implement it yourself.
From my point of view SCTP now looks like the best universal choice but it is not very popular and the only realistic way to reliably pass it across the Internet of today is still to wrap it inside UDP (probably using sctplib). It is also still a relatively basic and compact solution and for some applications it may still be not sufficient by itself.
As for the more advanced options, in some of the projects we used ZeroMQ and it worked just fine. This is a much more of a complete solution, not just a network protocol (under the hood it supports TCP, UDP, a couple of higher level protocols and some local IPC mechanisms to actually deliver messages). Since a couple of releases its initial developer has switched his attention to his new NanoMSG and currently the newest NNG libraries. It is not as thoroughly developed and tested and it is not very popular but someday it may change. If you don't mind the CPU overhead and some network bandwidth loss then some of the libraries might work for you. There are some other network-oriented message exchange libraries available as well.
You should check MoldUDP, which has been around for decades and it is used by Nasdaq's ITCH market data feed. Our messaging system CoralSequencer uses it to implement a reliable multicast event-stream from a central process.
Disclaimer: I'm one of the developers of CoralSequencer
The best way to achieve reliability using UDP is to build the reliability in the application program itself( for example, by adding acknowledgment and retransmission mechanisms)