Will WebFlux have any bottlenecks in such architecture? - mongodb

We're currently about to migrate from monolithic design to the microservice architecture, trying to choose the best way to replace JAX-WS with RESTful and considering to use Spring WebFlux.
We currently have an JAX-WS endpoint deployed at Tomcat EE serving requests from third-party clients. Webservice endpoint makes a long running blocking call to the database and then sends a SOAP-response to the client with a data retrieved from DB (Oracle).
Oracle DB will be replaced with one of NoSQL databases soon (possibly it will be MongoDB). Since MongoDB supports asynchronous calls we're considering to substitute current implementation with a microservice exposing REST endpoint based on WebFlux.
We have about 2500 req/sec at peaks, so current endpoint often gets down with a OutOfMemoryError. It was a root cause that pushed us towards migration.
My thoughts are to create a non-blocking endpoint which will call MongoDB in asynchronous manner and send a REST-response to the client. So I have a few questions considering basic features that WebFlux provides:
As far as I concerned there is a built-in backpressure control at
the business-level (not TCP flow control) in WebFlux and it works
generally via Reactive Streams. Since our clients are not
reactive, does it means that such way of a backpressure control is
not implementable here?
Suppose that calls to a new database remains long-running in a new
architecture. Since Netty uses EventLoop to serve incoming
requests, is there possible a situation when the microservice has
accepted all incoming HTTP connections, invoke an async call to the
db and subscribed a resulted Mono to the scheduler, but, since
the request quantity keeps growing explosively, application keep
creating new workers at scheduler pools that leads to a
crashing? Is this a realistic scenario?
Suppose that calls to the database remained synchronous. Is there a
way to handle them using WebFlux in a such way that microservice
will remain reachable under load?
Which bottlenecks can be found in such design? Does this solution
looks adequate?
Does Netty (or Reactor-Netty, or whatever) has a tool to limit a
quantity of requests processing simultaneously? Say I would to limit
the endpoint to serve not more than 100 parallel requests and skip
all requests above that point, is it possible?
Suppose I will create a huge amount of threads serving async (or
maybe sync) calls to the DB. Where is a breaking point when the
application will crash or stop responding to the incoming
HTTP-requests? What will happened there - we will ran out of memory
or..?

Finally, there were no any major issues concerning perfomance during our pilot project. But unfortunately we didn't take in account some specific Linux (and also OpenShift) TCP tuning props.
They may significanly affect the overall perfomance, in our case we've gained about 10 times more requests after tuning.
So pay attention to the net.core.somaxconn and other related parameters.
I've summarized our expertise in the article.

Related

Microservices: API Call Vs Messaging. When to Use?

I know that messaging system is non blocking and scalable and should be used in microservices environment.
The use case that i am questioning is:
Imagine that there's an admin dashboard client responsible for sending API request to create an Item object. There is a microservice that provides API endpoint which uses a MySQL database where the Item should be stored. There is another microservice which uses elastic search for text searching purposes.
Should this admin dashboard client :
A. Send 2 API Calls; 1 Call to MySQL service and another elasticsearch service
or
B. Send message to topic to be consumed by both MySQL service and elasticsearch service?
What are the pros and cons when considering A or B?
I'm thinking that it's a little overkill when only 2 microservices are consuming this topic. Also, the frequency of which the admin is creating Item object is very small.
Like many things in software architecture, it depends. Your requirements, SLAs and business needs should make it clearer.
As you noted, messaging system is not blocking and much more scalable, but, API communication got it pluses as well.
In general, REST APIs are best suited to request/response interactions where the client application sends a request to the API backend over HTTP.
Message streaming is best suited for notifications when new data or events occur that you may want to take action upon.
In you specific case, I would go with a messaging system with is much more scalable and non-blocking.
Your A approach is coupling the "routing" logic into your application. Pretend you need to perform an API call to audit your requests, then you will need to change the code and add another call to your application logic. As you said, the approach is synchronous and unless you're not providing threading logic, your calls will be lined up and won't scale, ie, call mysql --> wait response, then call elastic search --> wait response, ...
In any case you can prefer this approach if you need immediate consistency, ie, the result call of one action feeding the second action.
The B approach is decoupling that routing logic, so, any other service interested in the event can subscribe to the topic and perform the action expected. Totally asynchronous and scalable. Here you will have eventual consistency and you have to recover any possible failure.

Kafka messages over rest api

we currently have a library which we use to interact with kafka. but we planning to develop this library into a separate application. Other applications will send kafka messages using rest endpoint. Planning to use vert.x in this application to make it non-blocking and fast. Is it a good strategy. My concern 1) http will make it slower compared to TCP of kafka 2) streaming may not be possible 3) single point of failure
But being separate application - release management, control and support will be lot easier than currently.
Is it good strategy and has someone done like this before? Any suggestions?
Your consideration for going with HTTP/ TCP will depend on the number of applications that will be talking to your service. Let's say there is an IOT device that is sending lots of messages continuously, then using HTTP will be expensive and it will increase latency. Since HTTP connection establishment is an expensive operation.
Now, consider the case where you have a transactional system that is sending transaction events as they commit to your database then the rate of messages will be lower I assume, then it makes sense to use HTTP there.
It will depend on the rate of messages that your service will receive, that will decide the way you want to take.
Now, for your current approach of maintaining a library, it is a good way to maintain consistency across the organisation as long as the library is maintained and users of your library constantly update as and when you make changes to your library. It also has the advantage of not maintaining separate infrastructure/servers since your code will run in your users' application.

Communication between microservices - request data

I am dealing with communication between microservices.
For example (fictive example, just for the illustration):
Microservice A - Store Users (getUser, etc.)
Microservice B - Store Orders (createOrder, etc.)
Now if I want to add new Order from the Client app, I need to know user address. So the request would be like this:
Client -> Microservice B (createOrder for userId 5) -> Microservice A (getUser with id 5)
The microservice B will create order with details (address) from the User Microservice.
PROBLEM TO SOLVE: How effectively deal with communication between microservice A and microservice B, as we have to wait until the response come back?
OPTIONS:
Use RestAPI,
Use AMQP, like RabbitMQ and deal with this issue via RPC. (https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html)
I don't know what will be better for the performance. Is call faster via RabbitMQ, or RestAPI? What is the best solution for microservice architecture?
In your case using direct REST calls should be fine.
Option 1 Use Rest API :
When you need synchronous communication. For example, your case. This option is suitable.
Option 2 Use AMQP :
When you need asynchronous communication. For example when your order service creates order you may want to notify product service to reduce the product quantity. Or you may want to nofity user service that order for user is successfully placed.
I highly recommend having a look at http://microservices.io/patterns/index.html
It all depends on your service's communication behaviour to choose between REST APIs and Event-Based design Or Both.
What you do is based on your requirement you can choose REST APIs where you see synchronous behaviour between services
and go with Event based design where you find services needs asynchronous behaviour, there is no harm combining both also.
Ideally for inter-process communication protocol it is better to go with messaging and for client-service REST APIs are best fitted.
Check the Communication style in microservices.io
REST based Architecture
Advantage
Request/Response is easy and best fitted when you need synchronous environments.
Simpler system since there in no intermediate broker
Promotes orchestration i.e Service can take action based on response of other service.
Drawback
Services needs to discover locations of service instances.
One to one Mapping between services.
Rest used HTTP which is general purpose protocol built on top of TCP/IP which adds enormous amount of overhead when using it to pass messages.
Event Driven Architecture
Advantage
Event-driven architectures are appealing to API developers because they function very well in asynchronous environments.
Loose coupling since it decouples services as on a event of once service multiple services can take action based on application requirement. it is easy to plug-in any new consumer to producer.
Improved availability since the message broker buffers messages until the consumer is able to process them.
Drawback
Additional complexity of message broker, which must be highly available
Debugging an event request is not that easy.
Personally I am not a fan of using a message broker for RPC. It adds unnecessary complexity and overhead.
How do you host your long-lived RabbitMQ consumer in your Users web service? If you make it some static singleton, in your web service how do you deal with scaling and concurrency? Or do you make it a stand-alone daemon process? Now you have two User applications instead of one. What happens if your Users consumer slows down, by the time it consumes the request message the Orders service context might have timed-out and sent another message or given up.
For RPC I would suggest simple HTTP.
There is a pattern involving a message broker that can avoid the need for a synchronous network call. The pattern is for services to consume events from other services and store that data locally in their own database. Then when the time comes when the Orders service needs a user record it can access it from its own database.
In your case, your Users app doesn't need to know anything about orders, but your Orders app needs to know some details about your users. So every time a user is added, modified, removed etc, the Users service emits an event (UserCreated, UserModified, UserRemoved). The Orders service can subscribe to those events and store only the data it needs, such as the user address.
The benefit is that is that at request time, your Orders service has one less synchronous dependency on another service. Testing the service is easier as you have fewer request time dependencies. There are also drawbacks however such as some latency between user record changes occuring and being received by the Orders app. Something to consider.
UPDATE
If you do go with RabbitMQ for RPC then remember to make use of the message TTL feature. If the client will timeout, then set the message expiration to that period. This will help avoid wasted work on the part of the consumer and avoid a queue getting backed up under load. One issue with RPC over a message broker is that once a queue fills up it can add long latencies that take a while to recover from. Setting your message expiration to your client timeout helps avoid that.
Regarding RabbitMQ for RPC. Normally we use a message broker for decoupling and durability. Seeing as RPC is a synchronous communication, that is, we are waiting for a response, then durability is not a consideration. That leaves us decoupling. The question is does that decoupling buy you anything over the decoupling you can do with HTTP via a gateway or Docker service names?

Is it a good idea to write a gateway for PostgreSQL in Erlang?

I'm writing a web application in Erlang, and want to store my data to PostgreSQL.
There're two kinds of resources in my application. One kind is very important while the other one is not that important.
For the important one, no data loss is allowed.
For the less important one, data loss due to system failure is ok.
I want to gain maximum efficiency and came up with such an idea: write a gateway for PostgreSQL. The gateway is a gen_server, and business logic (BL) parts can talk to the gateway for storing resources.
For storing important resources, BL parts send the resources to be stored to the gateway, and block to receive a message (success or failure), and finally respond to the user with a web page.
For storing less important resources, BL parts only send the resources to the gateway without blocking. After sending the resources, BL parts respond with a web page directly.
What I'm expecting from this idea is less seconds per request, since most of the resources are less important ones. But I wonder if this is a good idea, or in other words, can I really get what I'm expecting?
Please answer according to your experience or some reliable "web search results". Thanks. :-)
I can see two problems with your proposal:
All messages sent to the gen_server gateway will be serialized (blocking or non-blocking)
If the gen_server gateway process crashes, you will lose at least the messages in the process mailbox.
What I would do is to create a helper module (not a process) that will be responsible for the database interactions. This process would use a postgresql library that supports connection pool (so the calls to DB can have some parallelism).
If you wish to do non-blocking DB operations for the less important resources, just spawn a process to do the DB interaction and move on.
Some links to postgresql erlang libraries (I haven't used any):
Postgresql connection pooling in Erlang
http://zotonic.com/page/519/epgsql-postgresql-driver

Rate-limiting RESTful consumer

I am consuming a rate-limited web service, which only allows me to make 5 calls per second. I am using my server to proxy these calls to a web client:
Mashery > My Web Server > Client's Browser
I have optimized the usage of this web service, but there are still occasional times when I go over the rate limit. What I would like to do instead is hold the client's request for one second (or longer if warranted) before making the web service call to Mashery.
There are some ways I can solve this by building a simple queue system with a database back-end, but I'd rather avoid that if something already exists. Does something already exist to rate-limit the consuming side of this?
To reliably throttle requests to only 5 per second, I would employ a queue/worker system. But before that I would just request a higher rate limit by contacting API support for that platform. Of course, I would also consider using caching if it's a read-only method and you're pulling down a lot of the same data and it's compliant with the API TOS.