The goal is to introduce a transport and application layer protocol that is better in its latency and network throughput. Currently, the application uses REST with HTTP/1.1 and we experience a high latency. I need to resolve this latency problem and I am open to use either gRPC(HTTP/2) or REST/HTTP2.
HTTP/2:
Multiplexed
Single TCP Connection
Binary instead of textual
Header compression
Server Push
I am aware of all the above advantages. Question No. 1: If I use REST with HTTP/2, I am sure, I will get a significant performance improvement when compared to REST with HTTP/1.1, but how does this compare with gRPC(HTTP/2)?
I am also aware that gRPC uses proto buffer, which is the best binary serialization technique for transmission of structured data on the wire. Proto buffer also helps in developing an language agnostic approach. I agree with that and I can implement the same feature in REST using graphQL. But my concern is over serialization: Question No. 2: When HTTP/2 implements this binary feature, does using proto buffer give an added advantage on top of HTTP/2?
Question No. 3: In terms of streaming, bi-directional use-cases, how does gRPC(HTTP/2) compare with (REST and HTTP/2)?
There are so many blogs/videos out in the internet that compares gRPC(HTTP/2) with (REST and HTTP/1.1) like this. As stated earlier, I would like to know the differences, benefits on comparing GRPC(HTTP/2) and (REST with HTTP/2).
gRPC is not faster than REST over HTTP/2 by default, but it gives you the tools to make it faster. There are some things that would be difficult or impossible to do with REST.
Selective message compression. In gRPC a streaming RPC can decide to compress or not compress messages. For example, if you are streaming mixed text and images over a single stream (or really any mixed compressible content), you can turn off compression for the images. This saves you from compressing already compressed data which won't get any smaller, but will burn up your CPU.
First class load balancing. While not an improvement in point to point connections, gRPC can intelligently pick which backend to send traffic to. (this is a library feature, not a wire protocol feature). This means you can send your requests to the least loaded backend server without resorting to using a proxy. This is a latency win.
Heavily optimized. gRPC (the library) is under continuous benchmarks to ensure that there are no speed regressions. Those benchmarks are improving constantly. Again, this doesn't have anything to do with gRPC the protocol, but your program will be faster for having used gRPC.
As nfirvine said, you will see most of your performance improvement just from using Protobuf. While you could use proto with REST, it is very nicely integrated with gRPC. Technically, you could use JSON with gRPC, but most people don't want to pay the performance cost after getting used to protos.
I am not an expert on this by any means and I have no data to back any of this up.
The "binary feature" you're talking about is the binary representation of HTTP/2 frames. The content itself (a JSON payload) will still be UTF-8. You can compress that JSON and set Content-Encoding: gzip, just like HTTP/1.
But gRPC does gzip compression as well. So really, we're talking about the difference between gzip-compressed JSON vs gzip-compressed protobufs.
As you can imagine, compressed protobufs should beat compressed JSON in every way, or else protobufs have failed at their goal.
Besides the ubiquity of JSON vs protobufs, the only downside I can see to using protobufs is that you need the .proto to decode them, say in a tcpdump situation.
What I found out is,
If you want to send files such as txt, img or video files then REST/HTTP is much faster than gRPC.
If you want to send objects over the wire, then gRPC is effecient.
Sources:
For sending files,, sending large files,, another blog
Related
I'm new to network programming and have recently been playing around with using sockets in C++.
I have a pretty decent handle on it at this point, and I understand HTTP/TCP/IP packets pretty well.
However, upon doing some research online it seems like the bulk of network programmers suggest using external libraries such as libcurl (or curl++ for c++) for sending HTTP requests.
Considering that HTTP is a text-based protocol, why is this more beneficial/easier than simply sending HTTP requests as text messages using socket programming?
I found a few sites that show that you can do this without too much difficulty: HTTP Requests in C++ without external libraries?,
Simple C example of doing an HTTP POST and consuming the response
It seems like sending HTTP requests is simply a matter of getting the formatting correct and then sending it via a TCP socket. How is this made easier with external libraries?
Please bear with me as I'm new to network programming and eager to learn.
The links you've provided in your question are in a way a pretty good explanation on why you should not code HTTP yourself it: the first link only points to the socket API and does not say anything about HTTP while the second one provides some examples and code which are too much simplified for real world use and will not even work with with typical setup of multiple domains on the same host since the requests are missing the Host field. In other words: these are resources which might look useful to the inexperienced developer but they will actually lead you into trouble.
HTTP is not the simple as it might look. HTTP/0.9 was simple but is no longer supported by many clients. HTTP/1.0 is still kind of simple if restricted to the basic aspects. But there are already enough pitfalls, like using the wrong delimiter between lines and request header/body or not using a Host field when accessing multi-domain hosts.
And once you want to get efficient you want to have multiple requests per TCP connection and compression and then it gets more complex. With HTTP/1.1 it gets even more complex due to the use of chunked data encoding and with HTTP/2 it gets more efficient but way more complex with a binary protocol and interleaved requests and responses.
And this was only HTTP. With HTTPS you have the additional and not trivial TLS layer which has its own pitfalls.
Thus, if you just want to use HTTP and HTTPS it is much better to use established libraries. Of course if you want to learn the innards of HTTP it might be useful to read all the relevant standards, look at packet traces and try to implement your own.
I was reading through HTTP2.0 as new HTTP protocol and advantages with binary header and multiplexing. But i would like to know for Rest calls does migrating from HTTP1.1 to HTTP2.0 provide any reasonable advantage. I am not able to find any specific gain for REST full calls with HTTP2.0
Thanks in advance.
It does, in several ways:
Better support for streaming transfers. The conventional alternative is a combination of HTTP/1.1's chunked transfer encoding, connection header Voodoo and the willingness or not of any of the parts to implement HTTP pipelining (which, e.g., curl enables by default). In my experience it takes a lot more of work to get the three to work well together than to just slap HTTP/2. There is no need for chunked transfer-encoding with HTTP/2, and the protocol does not support it.
With HTTP/2, you can have many requests in flight, REST or not, with zero time for establishing connections. This is a blessing both for the browser and for the server, which has to allocate less files descriptors per client.
Header compression also applies to HTTP/2 REST requests, together with the associated bandwidth reduction.
So, if in doubt, go always for HTTP/2. There are also excellent tools out there to develop HTTP/2 applications. Some, like ShimmerCat, even remove the drudgery of setting up certificates and DNS alias, so that starting with HTTP/2 from day one becomes a no-brainer.
I am working on designing an API for consuming messages from an application that will generate a very large amount of data; 10+ of GB/s is likely, even for smaller clients. I am looking for a protocol that allows me to deliver this data in a way that is easy for clients to consume.
The obvious answer for me is: split up the messages so they are consumable over multiple connections. Each connection would consume a fraction of the overall load.
But if I do this, there are a few things I need to account for:
How does the user know they are falling behind and need to launch more connections?
Twitter says consumers should check timestamps, which could work for us
When they launch a new connection to consume more of the data, how do they specify that this is part of the same consumption session?
We could give the session a name, correlate that with a "direct" amqp queue, and let our queue do the hard work
Is there something very important I am missing.
Probably.
For this reason, I'd much rather a protocol that already exists.
The protocol would be considered extra awesome if it:
is websocket or streaming HTTP friendly
supports data compression
The problems you are describing are pretty much the same issues that video streaming has to deal with, which you probably already know. The key HTTP friendly streaming protocols are HLS (Apple), SmoothStreaming (Microsoft), HDS (Adobe) and MPEG-DASH (open protocol, but new).
When considering video streaming, it is also worth understanding whether your streams are more like 'live' streams or 'static' content - the former is generated on the fly and any given part of the live stream may only be available for a set tine, while the latter stored on the server in full and generally any part is available at any time (until the content is removed). How you stream and playback these is subtly different.
It may be that you can simply reuse one of the above video streaming protocols by wrapping your data as if it were video (or maybe it even is video), and implementing your own custom client on the receiving side.
Alternatively, these protocols could provide a good reference point if you wanted to create your own simpler protocol - there are several open source streaming servers you could look to for ideas or even adapt to your needs if that looks like a sensible route:
http://gstreamer.freedesktop.org
http://icecast.org
Video streaming is quite complex as you may already be aware, but if your use cases are simpler you may be able to ignore or remove much of the complexity - for example you may not need seek, multiple format and bit rate streams, accompanying streams (for subtitles etc). Being able to simplify like this might justify the effort to modify one of the above for your needs, if you are not able to use them out of the box.
One final point - video and audio streaming protocols usually have a built in way of dealing with delayed or lost packets. Depending on your application these may not be applicable to you so you should look carefully at this aspect if reusing a video or audio streaming protocol or server. For example, audio clients are typically tolerant of a small amount of packet loss, and will generally discard delayed packets rather than pause the audio (packets received outside the 'jitter buffer' window). If your application cannot tolerate any packet loss, then you will need to look carefully at the underlying solution and protocol to make sure it really meets your needs over all network conditions.
I'm working on a project in which I want to display biosensor EEG/ECG data measured by a portable device (e.g., a micro controller with wireless data transmission via Wifi or Bluetooth). For this purpose, I need to interface with the portable device/microcontroller, for which the many or some of the device seem to use RESTful interfaces, but offer also probably sockets.
One example of microcontroller with wifi is the "spark.io", which is based on a cortex m3 and CC3000 wireless controller for WiFi access on-board. The data to be transferred are around 500 to 1000 float values per second, which should arrive at the REST client with as little delay as possible. Probably an non-REST approach like sockets would fit better, but I would still like to test an approach based on a RESTFul interface (a tiny argument for this would be that transferring data via RESFul interface seems very common and has good library support).
Q: The question is, what is the best approach for a performant (in the sense of near-realtime) implementation that interfaces with this via REST interface?
I am sure this problem has been solved before, but I could not quickly find a paper via google scholar or technical/scientific blog post that explains this. The only link I found is on "rest hooks", but I am not sure if this is a good approach. Searching on SE didn't reveal a past question on this.
Side note: My approach would be to implement the interface in haskell first to test the design and performance of the RESFull interface. Later the working approach should be ported or implemented with Java/Android/spark.io/some other microcontroller.
(Please note this question is entirely about the architecture and not at all about haskell libraries or anything. If using REST is the stupiest thing, I will accept that as an answer if it is argumented. Also then the question is then whether in general microcontroller web-interfaces and specically their APIs, like that of "spark.io", are in general a stupid idea, if they are implemented via REST. Is this the case? If not, what definition of "near real time" justifies that a REST interface is a bad idea and thus other means of communcation are better. Like: one sensor read per minute? Or, one per second, by 1/10 second, by 1/100 second, by 1/1000 second?)
Okay, let's go through this.
REST is not necessarily a bad idea but it has a lot of features which you may not need. For example, there are REST verbs not just for retrieval, but also updating, deleting, and creating resources. If those functions are important (e.g. you need to send certain control data to the EEG controller) then REST will be nice. If you just want fast access to the stream of data, consider raw TCP instead.
Similarly, REST will package messages into "requests" and their "responses" which come with a bunch of "headers" indicating things like whether the request could be fulfilled, whether it's compressed, etc. These can be great features but may be bloat. You'll probably want to emit enough data on each request so that the ~1kB of headers are a small fraction of it. But given 8-byte floats (doubles), that requires transmitting 500-1000 data points, which you've said will take about one second. Is that our fate -- to always have 1s of latency?
REST will allow you to avoid some of that bloat by declaring a Transfer-Encoding: chunked so that the client can operate on individual chunks as they become available. So that's an architectural decision that I think will need to be made.
I would definitely get Keep-Alive working as soon as possible, and it would be my chief feature when looking for what library to use on the server. Keep-Alive is a standard extension to HTTP which avoids tearing down and rebuilding the TCP stack for each HTTP request. If you don't do this then you have some heavy protocol negotiations each time you send a request.
A crucial decision you'll have to make involves whether you want to do HTTP pipelining or not. You can combine HTTP pipelining with longer-lived requests (ones where you don't expect an immediate response) to essentially "send the data when it becomes available" (i.e. send the headers first and let the server push out the data when it's good and ready). This is an alternative to chunked transfers.
If you can work those out, then HTTP is regularly used to send megabytes per second, so your use case fits well within what REST is capable of. In terms of REST/HTTP libraries for Haskell, if you have to somehow program the controller yourself, the big options are wai, yesod, snap, and rest. If you just need an HTTP client there are a few of those too.
Assuming a standard Jetty servlet container, what is the effect (On the server, or the client) of sending a large set of binary (string) data over RPC?
Specifically, since it does not seem that GWT RPC has support for streaming, I am concerned that two things might happen:
Large memory consumption on the server side since the binary data is being loaded into memory of the RPC class.
Slow serialization or de-serialization.
Assuming any of these are true, what are my options? I am trying to build a uniform API so I'd rather not have to tell the developer: "Oh in this case, manually create a REST request to get the data".
If you need to transfer a really big amount of binary data, GWT-RPC is a bad choice (all the problem you've listed are correct ones). But if you want a uniform API on the client side, without telling the developer to simply use raw HTTP to get data, you'll have to provide client implementation for your binary service.