Does Confluent Rest Proxy API (producer) retry event publish on failure - rest

We are planning to use Confluent Rest proxy as the platform layer for listening to all user events (& publishing them to Kafka). Working on micro-services model & having varied types of event generators, we want our APIs/event-generators to be decoupled from event listening/handling. Also, at event listening layer, event resilience is important for us.
From what I understand, if the Rest proxy layer fails to publish to Kafka(once) for any reason, it doesn't retry. This functionality needs to be handled by the caller (the client layer), which needs to make synchronous calls & retry on failure. However, couldn't find any details on this, in the product documentation. Could someone please confirm the same?
Confluent Rest Proxy developers claim that with the right Rest Proxy cluster set-up & right request batching by the client, performance as good as native producers' can be achieved. Any exceptions/(positive/negative)thoughts here?
Calls to the Rest Proxy Producer API are blocking. If the client doesn't need to know the partition & offset details, can we configure these calls to be non-blocking in anyway, such that once the request is received, resilience is managed by the Rest Proxy layer itself. The client just receives a 200 HTTP status as acknowledgement, whenever a produce msg request is received.

The REST Proxy is just a normal Kafka Producer and Kafka Consumer and can be configured with or without retries enabled, just as any other Kafka Producer can.
A single producer publishing via a REST Proxy will not achieve the same throughput as a single native Java Producer. However you can scale up many REST proxies and many HTTP producers to get high performance in aggregate. You can also mitigate the performance penalty imposed by HTTP by batching multiple messages together into a consolidated HTTP request to minimize the number of HTTP round trips on the wire.

Related

Sending Http Request with Kafka Stream

I am aware that it's not recommended to send http request with kafka stream as the blocking nature of external RPC calls may impact performance.
However what if the use case doesn't allow me to avoid sending http request?
I'm building an application that consumes from an input topic, then for each message it will go thorough various iterations of filtering, mapping, and joining with kTable. At the end of these iterations the result is ready to be "delivered".
Apparently, the "delivery" method is via http request. I have to call external rest APIs to send these result to various vendors. I will also need to wait for the response to come back and based on the result I will mark the delivery as either successful or failed and produce the result to an output topic, which will be consumed by other service for archiving purpose.
I'm aware that http calls will block the currently calling stream thread, so I configured a timeout which is strictly and greatly less than the consumer's max.poll.interval.ms to avoid rebalance in case the external API service is down. Also the timed out request will be sent to a low priority queue to be ready for delivery re-attempt in later times.
As you can see, I cannot really avoid making external RPC calls within kafka streams. I'm just curious if there is better architecture that's meant for such use case?
If you cannot avoid it, then one other option is to send data to some outbound "request" topic, and write a consumer to do the requests, and produce back to a "response" topic with HTTP status codes or success/fail indicators, for example.
Then have Streams also consume this response topic for the joining.
The main reason not to do blocking RPC within Streams is that it's very sensitive to time, and increasing timeouts excessively should generally be avoided when possible.

What happens to subscribers when the Kafka service is down? Do the need to subscribes to a specific topic when it restart?

Currently I have to sent events externally to the client which needs to subscribe to the these events. I have an endpoint that the client calls (subscribe) that follow the Server-Sent Events specifications. This open a HTTP connection, that is kept alive by the server that send "heartbeat" events.
The problem is that when this service need to be redeployed, or it goes down is the responsibility of the client to re-subscribe making a call to this endpoint, to receive again the events in real-time.
I was wondering, if I switch to technology like rabbitMQ or Kafka can I solve this problem? In other word, I would like to remove the responsibility of the client to re-subscribe if something goes wrong on the server side.
If you can attached article/resources to your answer would be great.
With RabbitMQ , the reconnection feature is dependant on the client library. For e.g., Java and .NET clients do provide this ( check here)
With Kafka I see there are configurations to support this. Also it's worth reading the excellent recommendations from Kakfa for surviving broker outages here.

Event-based-microservices: MQTT with Broker OR HTTP with API GATEWAY?

I'm trying to develop a project with microservices.
I have some questions on this topic (something is not clear):
1) How to implement microservices communication?
A) HTTP : Every microservice expose HTTP API , an API GATEWAY broadcast requests.
B) MQTT : every microservice pub/sub to a broker
C) BOTH : but how to understand when one is better than the other ?
Have I to use pub/sub protocol as a standard even for classic operations usually performed over HTTP ? For example I have two microservices:
web-management and product-service. web-management is a panel that lets the administrator to add, modify, ... products in its ecommerce digital shop. Let's say we want to implement createProduct operation. It's a command (according to event /command distinction), a one-to-one communication.
I can open an API in product-service, let's say (POST, "/product") that add the new product. I also can implement this transforming the command in a productCreationRequest event. In this case: web-managemnet publish this event. product-service listen to productCreationRequest events (and also productUpdateRequest, productGetEvents, ...) once it is notified it performs the operation and emits productCreated event.
I find this case borderline. For example a last-occasion-service may listen to productCreated and immediately send a message (email or push notification) to customers. What do you think about this use case?
2) Which may be a valid broker (I will use docker-compose or kubernetes to orchestrate containerized microservices: language adopted probably java, javascript, python)?
Both is definitely a possibility! Choose a broker that allows you to easily mix-and-match between HTTP (synchronous) communication, and more async event-driven pub/sub. It should allow you to migrate your microservices between the two options as required.
HTTP APIs are great at the edge of your distributed application, where a customer wants to submit an order or something, and block waiting for a response (200 OK).
But internally within your application between microservices, a lot of them don't need a response... async, eventually consistent. And using pub/sub (like MQTT) allows for multiple downstream consumers easily. Another great use for MQTT is streaming updates to downstream consumers... like a data-feed from a bus or airline company or something, rather than having to poll a REST API for updates.
For your use-case and similar ones, I would almost always recommend using pub/sub communication, even if today it's a simple request-reply interaction with a single backend process. REST over HTTP is point-to-point, and perhaps in the future you want another process to be able to see/consume/monitor that event or interaction. If you're already using publish-subscribe, adding that 2nd (or more) consumer of that data flow is trivial. Harder with REST/HTTP.
In terms of performance, I would highly doubt a blocking protocol like HTTP is going to outperform something that is asynchronous and bidirectional, like MQTT which uses WebSockets for web communication.
As for a broker to glue all this together, check out the standard edition Solace PubSub+ event broker... can do both (and translate between) MQTT and HTTP. I even wrote a CodeLab for this (almost) exact use case haha!
(BTW, I work for Solace! FYI.)
Consider using SMF framework for Javascript/Node.js, it helps prototype pub/sub communications via a message broker (RabbitMQ) between microservices out of the box:
https://medium.com/#krawa76/bootstrap-node-js-microservice-stack-4a348db38e51
As for the message broker routes, use an event-driven naming convention, e.g. post a "web.new-product", where "web" is the sub-system name, "new-product" - event name.

How to implement communication between consumer and producer with fast and slow workers?

I'm looking for a pattern and existing implementations for my situation:
I have synchronous soa which uses REST APIs, in fact I implement remote procedure call with REST. And I have got some slow workers which process requests for substantial time (around 30 seconds) and due to some license constraints for some requests I can process them only sequentially (see system setup).
What are recommended ways to implement communication for such case?
How can I mix synchronous and asynchronous communication in the situation when the consumer is behind firewall and I cannot send him notification about completed tasks easily and I might not be able to let consumer use my message broker if I have one?
Workers are implemented in Python using Flask and gunicorn. At the moment I'm using synchronous REST interfaces and allow for the delay as I only had fast workers. I looked at Kafka and RabbitMq and they would fit for backend side
communication, however how does producer communicate with the consumer?
If the consumer fires API request my producer can return code 202, then how shall producer notify consumer that result is available? Will consumer have to poll the producer for results?
Also if I use message brokers and my gateway acts on behalf of consumer, it should have a registry of requests (I already have GUID for every request now) and results, which approach would you recommend for implementing it?
Producer- agent which produces the message
Consumer - agent which can handle the message and implement the logic for processing the message

Communication between microservices - request data

I am dealing with communication between microservices.
For example (fictive example, just for the illustration):
Microservice A - Store Users (getUser, etc.)
Microservice B - Store Orders (createOrder, etc.)
Now if I want to add new Order from the Client app, I need to know user address. So the request would be like this:
Client -> Microservice B (createOrder for userId 5) -> Microservice A (getUser with id 5)
The microservice B will create order with details (address) from the User Microservice.
PROBLEM TO SOLVE: How effectively deal with communication between microservice A and microservice B, as we have to wait until the response come back?
OPTIONS:
Use RestAPI,
Use AMQP, like RabbitMQ and deal with this issue via RPC. (https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html)
I don't know what will be better for the performance. Is call faster via RabbitMQ, or RestAPI? What is the best solution for microservice architecture?
In your case using direct REST calls should be fine.
Option 1 Use Rest API :
When you need synchronous communication. For example, your case. This option is suitable.
Option 2 Use AMQP :
When you need asynchronous communication. For example when your order service creates order you may want to notify product service to reduce the product quantity. Or you may want to nofity user service that order for user is successfully placed.
I highly recommend having a look at http://microservices.io/patterns/index.html
It all depends on your service's communication behaviour to choose between REST APIs and Event-Based design Or Both.
What you do is based on your requirement you can choose REST APIs where you see synchronous behaviour between services
and go with Event based design where you find services needs asynchronous behaviour, there is no harm combining both also.
Ideally for inter-process communication protocol it is better to go with messaging and for client-service REST APIs are best fitted.
Check the Communication style in microservices.io
REST based Architecture
Advantage
Request/Response is easy and best fitted when you need synchronous environments.
Simpler system since there in no intermediate broker
Promotes orchestration i.e Service can take action based on response of other service.
Drawback
Services needs to discover locations of service instances.
One to one Mapping between services.
Rest used HTTP which is general purpose protocol built on top of TCP/IP which adds enormous amount of overhead when using it to pass messages.
Event Driven Architecture
Advantage
Event-driven architectures are appealing to API developers because they function very well in asynchronous environments.
Loose coupling since it decouples services as on a event of once service multiple services can take action based on application requirement. it is easy to plug-in any new consumer to producer.
Improved availability since the message broker buffers messages until the consumer is able to process them.
Drawback
Additional complexity of message broker, which must be highly available
Debugging an event request is not that easy.
Personally I am not a fan of using a message broker for RPC. It adds unnecessary complexity and overhead.
How do you host your long-lived RabbitMQ consumer in your Users web service? If you make it some static singleton, in your web service how do you deal with scaling and concurrency? Or do you make it a stand-alone daemon process? Now you have two User applications instead of one. What happens if your Users consumer slows down, by the time it consumes the request message the Orders service context might have timed-out and sent another message or given up.
For RPC I would suggest simple HTTP.
There is a pattern involving a message broker that can avoid the need for a synchronous network call. The pattern is for services to consume events from other services and store that data locally in their own database. Then when the time comes when the Orders service needs a user record it can access it from its own database.
In your case, your Users app doesn't need to know anything about orders, but your Orders app needs to know some details about your users. So every time a user is added, modified, removed etc, the Users service emits an event (UserCreated, UserModified, UserRemoved). The Orders service can subscribe to those events and store only the data it needs, such as the user address.
The benefit is that is that at request time, your Orders service has one less synchronous dependency on another service. Testing the service is easier as you have fewer request time dependencies. There are also drawbacks however such as some latency between user record changes occuring and being received by the Orders app. Something to consider.
UPDATE
If you do go with RabbitMQ for RPC then remember to make use of the message TTL feature. If the client will timeout, then set the message expiration to that period. This will help avoid wasted work on the part of the consumer and avoid a queue getting backed up under load. One issue with RPC over a message broker is that once a queue fills up it can add long latencies that take a while to recover from. Setting your message expiration to your client timeout helps avoid that.
Regarding RabbitMQ for RPC. Normally we use a message broker for decoupling and durability. Seeing as RPC is a synchronous communication, that is, we are waiting for a response, then durability is not a consideration. That leaves us decoupling. The question is does that decoupling buy you anything over the decoupling you can do with HTTP via a gateway or Docker service names?