What is middleware in distributed systems? - sockets

I am taking a class on distributed systems right now and I can't grasp the idea of middleware. I understand that it is a software layer that provides a level of abstraction between the application and the actual communication over the network, but I need concrete examples. I know CORBA and Java RMI are examples of middleware, but those dont really make sense to me.
When I write a client-server program in Java that communicates over DatagramSockets() is that middleware? If so why not? The Java DatagramSocket() method provides a level of abstraction from my application to the actual communication over the network.

I agree with commenters so far - it's not really a clear-cut term.
However, the thing that's most applicable to your question I can think of is messaging queues such as 0MQ or RabbitMQ. They provide different ways of interacting with the network which are somewhat more abstract than using TCP or UDP directly. Both of them encourage use of "network patterns" which can be applied to most distributed systems problems.
0MQ provides flexibility about the ordering in which the client and server begin, uses multiple protocols (between threads / between processes / over TCP / over UDP / ...) to send messages, and deals with network disconnects.
RabbitMQ (which I know less about) provides centralized message queues which are persisted to disk and allow the clients on your network to only know where the relevant queues are (and not where all the producers/consumers for that queue are).

Related

Does gRPC vs NATS or Kafka make any sense?

For a long time, when it comes to the microservice architecture, NATS and Kafka are the first options that come to my mind. But recently I found this gRPC template in dotnet core and that grasped my attention. I read a lot about it and watched a lot of videos but I don't think any of those could address gRPC correctly as they usually contrast between gRPC and message brokers or protocols such as REST which I guess is pretty inappropriate although SOAP would be relevant here.
My assumption is that gRPC is a modern version of SOAP with better performance and less implementation hassle due to it protocol buffer. And I think that gRPC can by no means be compared against Kafka or NATS. And also that it cannot replace RESTful service as neither could SOAP.
Now, the question, to what extent are my assumptions true? For example, when it comes to selecting a communication bridge between nodes on a cluster, do I have to put gPRC among my options now (NATS, Kafkam Rabbit, etc) or should I consider that when creating a web proxy to bridge external request to my microservices?
Finally, how about real-time communication, can gRPC replace websocket/socket.io/signalR completely? What does it replace?
I often see people misplacing these technologies by one crucial aspect: public authentication.
For instance, check this graph:
This is a benchmark of Inverted Json (https://github.com/lega911/ijson), comparing some tools, such as iJson, RabbitMQ, Nats, 0MQ, etc.
Notice that Nats, ZeroMQ and iJson are not meant to be used as public end-points (for instance, Nats have user/password, token and keys, but it is useless in an open environment, such as web browsers, because there is no way to make the key non public).
On the other hand, GRPC works just fine with JWT and Oauth2, making it completely safe to public end-points (as safer as any other HTTP endpoint), 'cos those tokens are server-signed (so, even tough they are public, they can't be forged or tempered with)
So, what I'm trying to say is: there are techs meant to face public and techs meant to glue together servers and process within servers (which are private connections).
GRPC is public, ZeroMQ and iJson are totally private (iJson, for instance, don't have any kind of authentication). Nats works with keys or passwords, so, although is "safer" than iJson and ZeroMQ, it is not meant to be public.
When you say REST (I'm assuming HTTP here, because REST is just an architecture), websocket/socket.io/signalR, you are depicting all public interfaces. GRPC will cover you here (it's comparable to REST as request/response and websocket/socket.io/signalR because it supports half and full duplex streaming (similar to sockets)).
Nats, iJson, ZeroMQ, on the other hand, are not meant to do that. They are meant to communicate between services.
So, basically, REST/websocket/socket.io/signalR = gRPC.
Internal communication between services (in the same or in different servers) = NATs, iJSON, ZeroMQ.
(notice that I'm not even considering the other technologies in the graph, because they are products, IMO, not simple libraries you can use to achieve an end, such as RabbitMQ, nginx, etc. The other ones I'm not familiar enough to be able to make an opinion (but I'm surprised by the uvloop in that graph)).
Your intuition is correct that gRPC is not comparable to an asynchronous queueing system like kafka, Rabbit, etc.
It is however a replacement for synchronous server to server communication technologies often implemented over SOAP, RPC, REST, etc. where you are expecting to get a response from another server rather than firing a message into a queue and then effectively forgetting about the message.
gRPC is definitely an option for real-time communication. It can replace socket communication if you are not streaming to the browser(No gRPC support), have a look at the Bidirectional streaming support.
About replacing Kafka/Rabbit, gRPC can be used as a PubSub system as it supports Bidirectional streaming but I would not recommend it.

IBM IoT Foundation: When to use MQTT and when to use REST for event submission?

The IBM IoT Foundation allows devices to submit events to the IBM cloud for consumption and recording. There appears to be two primary mechanisms to achieve the transmission of events ... MQTT and REST (HTTP POST requests). Assuming that a project will have sensors with direct TCP connectivity to IBM cloud over the Internet, what might we consider as the potential distinctions between the two technologies? What factors would case us to choose MQTT or REST as the technology to use? Are there any substantial performance differences at the final mile at the IBM end that would say that one technology is preferred over another?
MQTT is designed to be a fast and lightweight messaging protocol, and is as a result, faster and more efficient at this than HTTP when used to do the equivalent. More efficient not only means less traffic data and more speed, but sometimes it can mean less electrical power as well. MQTT is particularly good where bandwidth is a concern.
MQTT does, however, need a client implementation (like Paho) which is possibly a rarer thing than an HTTP client implementation, which would be more ubiquitous and therefore more likely/easily available on any given device.
There are also TCP/IP port considerations, where some network hardware may require HTTP ports 80 or 443 (although IoTF supports MQTT and MQTTWS on port 443).
There may also be an ideological or philosophical reason for choosing HTTP instead of MQTT (or COAP for that matter), but usually, I would say the reasons for choosing HTTP instead of MQTT would be network related or client support related.
There is no official paper on the performance differences yet, but safe to say MQTT will be more efficient and faster given just about any messaging scenario (long lived connections or adhoc etc.)
I would summarize the considerations as:
mqtt will support higher throughout and the API is much simpler compared to a REST api
REST API is likely much more readily available on iot devices, BUT this could be changing as mqtt is gaining in popularity and big players like Google Cloud Platform and IBM Bluemix support mqtt in their iot service.

Microservices, AMQP or REST [duplicate]

A little background.
Very big monolithic Django application. All components use the same database. We need to separate services so we can independently upgrade some parts of the system without affecting the rest.
We use RabbitMQ as a broker to Celery.
Right now we have two options:
HTTP Services using a REST interface.
JSONRPC over AMQP to a event loop service
My team is leaning towards HTTP because that's what they are familiar with but I think the advantages of using RPC over AMQP far outweigh it.
AMQP provides us with the capabilities to easily add in load balancing, and high availability, with guaranteed message deliveries.
Whereas with HTTP we have to create client HTTP wrappers to work with the REST interfaces, we have to put in a load balancer and set up that infrastructure in order to have HA etc.
With AMQP I can just spawn another instance of the service, it will connect to the same queue as the other instances and bam, HA and load balancing.
Am I missing something with my thoughts on AMQP?
At first,
REST, RPC - architecture patterns, AMQP - wire-level and HTTP - application protocol which run on top of TCP/IP
AMQP is a specific protocol when HTTP - general-purpose protocol, thus, HTTP has damn high overhead comparing to AMQP
AMQP nature is asynchronous where HTTP nature is synchronous
both REST and RPC use data serialization, which format is up to you and it depends of infrastructure. If you are using python everywhere I think you can use python native serialization - pickle which should be faster than JSON or any other formats.
both HTTP+REST and AMQP+RPC can run in heterogeneous and/or distributed environment
So if you are choosing what to use: HTTP+REST or AMQP+RPC, the answer is really subject of infrastructure complexity and resource usage. Without any specific requirements both solution will work fine, but i would rather make some abstraction to be able switch between them transparently.
You told that your team familiar with HTTP but not with AMQP. If development time is an important time you got an answer.
If you want to build HA infrastructure with minimal complexity I guess AMQP protocol is what you want.
I had an experience with both of them and advantages of RESTful services are:
they well-mapped on web interface
people are familiar with them
easy to debug (due to general purpose of HTTP)
easy provide API to third-party services.
Advantages of AMQP-based solution:
damn fast
flexible
cost-effective (in resources usage meaning)
Note, that you can provide RESTful API to third-party services on top of your AMQP-based API while REST is not a protocol but rather paradigm, but you should think about it building your AQMP RPC api. I have done it in this way to provide API to external third-party services and provide access to API on those part of infrastructure which run on old codebase or where it is not possible to add AMQP support.
If I am right your question is about how to better organize communication between different parts of your software, not how to provide an API to end-users.
If you have a high-load project RabbitMQ is damn good piece of software and you can easily add any number of workers which run on different machines. Also it has mirroring and clustering out of the box. And one more thing, RabbitMQ is build on top of Erlang OTP, which is high-reliable,stable platform ... (bla-bla-bla), it is good not only for marketing but for engineers too. I had an issue with RabbitMQ only once when nginx logs took all disc space on the same partition where RabbitMQ run.
UPD (May 2018):
Saurabh Bhoomkar posted a link to the MQ vs. HTTP article written by Arnold Shoon on June 7th, 2012, here's a copy of it:
I was going through my old files and came across my notes on MQ and thought I’d share some reasons to use MQ vs. HTTP:
If your consumer processes at a fixed rate (i.e. can’t handle floods to the HTTP server [bursts]) then using MQ provides the flexibility for the service to buffer the other requests vs. bogging it down.
Time independent processing and messaging exchange patterns — if the thread is performing a fire-and-forget, then MQ is better suited for that pattern vs. HTTP.
Long-lived processes are better suited for MQ as you can send a request and have a seperate thread listening for responses (note WS-Addressing allows HTTP to process in this manner but requires both endpoints to support that capability).
Loose coupling where one process can continue to do work even if the other process is not available vs. HTTP having to retry.
Request prioritization where more important messages can jump to the front of the queue.
XA transactions – MQ is fully XA compliant – HTTP is not.
Fault tolerance – MQ messages survive server or network failures – HTTP does not.
MQ provides for ‘assured’ delivery of messages once and only once, http does not.
MQ provides the ability to do message segmentation and message grouping for large messages – HTTP does not have that ability as it treats each transaction seperately.
MQ provides a pub/sub interface where-as HTTP is point-to-point.
UPD (Dec 2018):
As noticed by #Kevin in comments below, it's questionable that RabbitMQ scales better then RESTful servies. My original answer was based on simply adding more workers, which is just a part of scaling and as long as single AMQP broker capacity not exceeded, it is true, though after that it requires more advanced techniques like Highly Available (Mirrored) Queues which makes both HTTP and AMQP-based services have some non-trivial complexity to scale at infrastructure level.
After careful thinking I also removed that maintaining AMQP broker (RabbitMQ) is simpler than any HTTP server: original answer was written in Jun 2013 and a lot of changed since that time, but the main change was that I get more insight in both of approaches, so the best I can say now that "your mileage may vary".
Also note, that comparing both HTTP and AMQP is apple to oranges to some extent, so please, do not interpret this answer as the ultimate guidance to base your decision on but rather take it as one of sources or as a reference for your further researches to find out what exact solution will match your particular case.
The irony of the solution OP had to accept is, AMQP or other MQ solutions are often used to insulate callers from the inherent unreliability of HTTP-only services -- to provide some level of timeout & retry logic and message persistence so the caller doesn't have to implement its own HTTP insulation code. A very thin HTTP gateway or adapter layer over a reliable AMQP core, with option to go straight to AMQP using a more reliable client protocol like JSONRPC would often be the best solution for this scenario.
Your thoughts on AMQP are spot on!
Furthermore, since you are transitioning from a monolithic to a more distributed architecture, then adopting AMQP for communication between the services is more ideal for your use case. Here is why…
Communication via a REST interface and by extension HTTP is synchronous in nature — this synchronous nature of HTTP makes it a not-so-great option as the pattern of communication in a distributed architecture like the one you talk about. Why?
Imagine you have two services, service A and service B in that your Django application that communicate via REST API calls. This API calls usually play out this way: service A makes an http request to service B, waits idly for the response, and only proceeds to the next task after getting a response from service B. In essence, service A is blocked until it receives a response from service B.
This is problematic because one of the goals with microservices is to build small autonomous services that would always be available even if one or more services are down– No single point of failure. The fact that service A connects directly to service B and in fact, waits for some response, introduces a level of coupling that detracts from the intended autonomy of each service.
AMQP on the other hand is asynchronous in nature — this asynchronous nature of AMQP makes it great for use in your scenario and other like it.
If you go down the AMQP route, instead of service A making requests to service B directly, you can introduce an AMQP based MQ between these two services. Service A will add requests to the Message Queue. Service B then picks up the request and processes it at its own pace.
This approach decouples the two services and, by extension, makes them autonomous. This is true because:
If service B fails unexpectedly, service A will keep accepting requests and adding them to the queue as though nothing happened. The requests would always be in the queue for service B to process them when it’s back online.
If service A experiences a spike in traffic, service B won’t even notice because it only picks up requests from the Message Queues at its own pace
This approach also has the added benefit of being easy to scale— you can add more queues or create copies of service B to process more requests.
Lastly, service A does not have to wait for a response from service B, the end users don’t also have to wait for long— this leads to improved performance and, by extension, a better user experience.
Just in case you are considering moving from HTTP to AMQP in your distributed architecture and you are just not sure how to go about it, you can checkout this 7 parts beginner guide on message queues and microservices. It shows you how to use a message queue in a distributed architecture by walking you through a demo project.

Differences between AMQP and ZeroMQ

Recently started looking into these AMQP (RabbitMQ, ActiveMQ) and ZeroMQ technologies, being interested in distributed systems/computation. Been Googling and StackOverflow'ing around, couldn't find a definite comparison between the two.
The farthest I got is that the two aren't really comparable, but I want to know the differences. It seems to me ZeroMQ is more decentralized (no message broker playing middle-man handling messages/guarenteering delivery) and as such is faster, but is not meant to be a fully fledged system but something to be handled more programmatically, something like Actors.
AMQP on the other hand seems to be a more fully fledged system, with a central message broker ensuring reliable delivery, but slower than ZeroMQ because of this. However, the central broker creates a single point of failure.
Perhaps a metaphor would be client/server vs. P2P?
Are my findings true? Also, what would be the advantages, disadvantages, or use cases of using one over the other? A comparison of the uses of *MQ vs. something like Akka Actors would be nice as well.
EDIT Did a bit more looking around.. ZeroMQ seems to be the new contender to AMQP, seems to be much faster, only issue would be adoption/implementations?
Here's a fairly detailed comparison of AMQP and 0MQ: http://www.zeromq.org/docs:welcome-from-amqp
Note that 0MQ is also a protocol (ZMTP) with several implementations, and a community.
AMQP is a protocol. ZeroMQ is a messaging library.
AMQP offers flow control and reliable delivery. It defines standard but extensible meta-data for messages (e.g. reply-to, time-to-live, plus any application defined headers). ZeroMQ simply provides message delimitation (i.e. breaking a byte stream up into atomic units), and assumes the properties of the underlying protocol (e.g. TCP) are sufficient or that the application will build extra functionality for flow control, reliability or whatever on top of ZeroMQ.
Although earlier versions of AMQP were defined along client/server lines and therefore required a broker, that is no longer true of AMQP 1.0 which at its core is a symmetric, peer-to-peer protocol. Rules for intermediaries (such as brokers) are layered on top of that. The link from Alexis comparing brokered and brokerless gives a good description of the benefits such intermediaries can offer. AMQP defines the rules for interoperability between different components - clients, 'smart clients', brokers, bridges, routers etc -
such that a system can be composed by selecting the parts that are useful.
In ZeroMQ there are NO MESSAGE QUEUES at all, thus the name. It merely provides a way to use messaging semantics over otherwise ordinary sockets.
AMQP is standard protocol for message queueing which is meant to be used with a message-broker handling all message sends and receives. It has a lot of features which are available because it funnels all message traffic through a broker. This may sound slow, but it is actually quite fast when used inside a data centre where host to host latencies are tiny.
I'm not really sure how to respond to your question, which is comparing a lot of different things... but see this which may help you begin to dig into these issues: http://www.rabbitmq.com/blog/2010/09/22/broker-vs-brokerless/
AMQP (Advanced Message Queuing Protocol) is a standard binary wire level protocol that enables conforming client applications to communicate with conforming messaging middleware brokers. AMQP allows cross platform services/systems between different enterprises or within the enterprise to easily exchange messages between each other regardless of the message broker vendor and platform. There are many brokers that have implemented the AMQP protocol like RabbitMQ, Apache QPid, Apache Apollo etc.
ZeroMQ is a high-performance asynchronous messaging library aimed at use in scalable distributed or concurrent applications. It provides a message queue, but unlike message-oriented middleware, a ØMQ system can run without a dedicated message broker.
Broker-less is a misnomer as compared to message brokers like ActiveMQ, QPid, Kafka for simple wiring.
It is useful and can be applied to hotspots to reduce network hops and hence latency, as we add reliability, store and forward feature and high availability requirements, you probably need a distributed broker service along with a queue for sharing data to support a loose coupling - decoupled in time - this topology and architecture can be implemented using ZeroMQ, you have to consider your use cases and see, if asynchronous messaging is required and if so, where ZeroMQ would fit, it has a good role in solution it appears and a reasonable knowledge of TCP/IP and socket programming would help you appreciate all others like ZeroMQ, AMQP, etc.

P2P Based Solutions prefers SIP or XMPP (Jingle) for Signaling

It’s just a start where I am exploring more in P2P side, and finding reasons in terms of Scalability or anything else for : SIP or XMPP (Jingle) for following use case :
P2P Client Application Capable to perform File Transfer on all Network Traversal Scenarios.
// For Signaling (e.g.; to connect/locate/disconnect peers) both XMPP (Jingle) or SIP are available.
May I know possible reasons to use what and why? Any practical use? e.g.; Scalability or anything which really makes a difference for the above Use Case
Jingle is an XMPP extension to handle multimedia sessions. In effect Jingle is the XMPP equivalent of SIP.
As far as a P2P file application goes:
Jingle and SIP are roughly equivalent as far as scalability goes. Both separate the signalling and media providing more flexibility (and consequently complications) with the way server side components can be deployed.
XMPP/Jingle has a better security design making it much more practicable to enforce clients using an SSL signalling layer. SIP does support SSL but it's more convoluted and also doesn't enjoy widespread support in the real World,
As far as NAT goes you're going to have the same problems with both. The scalability you get from having separate signalling and media paths comes back to bite when NAT is involved. There are a few different mechanisms to deal with NAT the latest attempt is ICE. ICE is collection of different mechanisms to try and resolve different NAT configurations and it's worth bearing in mind that not all configurations can be resolved and the fallback is to use a media proxying server such as TURN.
If I was you I'd use XMPP but before starting I'd work out exactly what NAT configurations need to be supported. If you need to support arbitrary clients from anywhere on the internet then you will not be able to rely on always being able to establish direct P2P communications between your clients and that's where you will face your biggest challenge.