IExecutorService's main motive? - threadpool

I know about high availability and scalability (etc) advantages of Hazelcast. But i just want to ask about main motive of distributed executor service and i have some questions in my mind. kindly just answer the following questions
If client load on the server is only in the form of blocking I/O requests(Data base queries etc) then is there a need to use IExecutorService or ThreadPoolExecutor is enough for this scenario?
If client load on the server is only in the form of CPU-intensive requests but request rate is high then IExecutorService can serve this scenario better on cluster, is this statement true?
The main motive of IExecutorService is to handle CPU-intensive request's load on the cluster by horizontal scaling.Is this statement true?

If client load on the server is only in the form of blocking I/O requests(Data base queries etc) then is there a need to use IExecutorService or ThreadPoolExecutor is enough for this scenario?
It depends. It doesn't only need to be CPU intensive tasks. For example if each tasks requires doing a lot of IO, but this resource is scalable, e.g.:
- the local file system of a member machine,
- another cluster (maybe there is a big Cassandra cluster) that stores data
it could still be a good use-case for HZ.
If you are using HZ to scale up doing remote calls to a db, it could very well that you bring the database to it knees :)
If client load on the server is only in the form of CPU-intensive requests but request rate is high then IExecutorService can serve this scenario better on cluster, is this statement true?
It depends. You pay a price for the PRC, so if you have very small tasks, it could very well be that the IExecutorService is not your friend. For similar reasons it could be that the Executor is not your friend, because there could be a huge contention on the work-queue of the executor.
So it depends on the type of task being processed if it makes sense to use the IExecutorService or even an Executor.
The main motive of IExecutorService is to handle CPU-intensive request's load on the cluster by horizontal scaling.Is this statement true?
See answer #1
There are not absolute answers to your questions. It very depends on a lot of factors.

Related

Distributed Resource Allocation Architecture

I am currently working on scaling a large scale infrastructure that involves distributing complex calculations over a calculation farm (Cluster with a limited number of machines). The current system is based on a service oriented architecture, whereby a limited number of services runs on each machine in the cluster.
The resources used (CPU, Memory) by each request sent these services vary widely depending on the content of the request, but may be known (or at least predicted) in advance. In other word, it is possible to know, for a given request, the following:
Time it will take to process the request. (Can vary from ms to minutes to sometimes hours)
Maximum memory required to process the request. (From a few MB to several GB)
Maximum number of cores required to process the request. (Mostly mono-threaded, but sometimes multi-threaded)
Our current architecture is problematic because our 'scheduler' does not take any of those parameters into account. Because of this, we often run into issues where one particular server is occupied by very expensive/'incompatible' requests (In terms of memory usage, CPU cores used, etc.), so processing each of them becomes widely inefficient, while other servers are occupied by relatively 'cheap' requests.
We would like to optimise this allocation process by moving our current infrastructure to a more modern orchestration system, such as Kubernetes (or other). The question I have at the moment is, given those requirements (Efficient distribution of requests with varying resource requirements - known before processing the request), what platforms currently available could be a good fit to optimise this type of workflow?
Thanks,
Jon
Kubernetes seems a good fit for that type of workload. Each request could be run as a Job which would run one or more containers to process the request. These containers can each request a minimum amount of resource they will require ahead of time in their specification and also specify a limit on these resources (e.g. maximum memory and maximum number of cores) and the Kubernetes scheduler can pick a node within the cluster that can satisfy and enforce these requirements.
This will allow you to forget about where the workloads are actually running and focus on making sure you just describe the requirements of each request accurately.

resilient microservices design pattern

in reactive programming Resilience is achieved by replication, containment, isolation and delegation.
two of the famous design patterns are Bulkheads with supervisor and circuit breaks. are these only for reaching isolation and containment?
what are the most famous design patterns for microservices and specially the ones give resiliency?
Reactive Programing can not be just resumed in design patterns. There are many considerations about systems architecture, dev ops and so to have in mind when you are designing high performance and availability systems.
Specifically, about resiliency, you should be thinking, for example, in:
Containerization
Services Orchestration
Fault Tolerant Jobs
Pub/Sub Model
And looots of other things :)
Other than BulkHead and CBs, few other things that can be implemented:
Retry Pattern on Idempotent Ops. This requires the Operation to be retried is Idempotent and will produce the same results on repeated execution.
Proper Timeout Configurations like Connection, Command Timeouts in case of network dependency
Bounded Request Queues at Virtual Host/Listener level
Failover Strategy like Caching
Redundancy, Failover Systems can be incorporated to achieve resiliency against system failures as well
You can implement various resilience patterns to achieve different levels of resilience based on your needs.
Unit Isolation –split systems into parts and isolate the parts
against each other. The entire system must never fail.
Shed Load – Implement a rate limiter, which sheds any extra load an
application can’t handle, to ensure than an application is resilient
to spikes in the number of requests. any request that is processed by
an application consumes resources like CPU, memory, IO, and so on. If
requests come at a rate that exceeds an application’s available
resources, the app may become unresponsive, behave inconsistently, or
crash.
Retry – enable an application to handle transient failures when it
tries to connect to a service or network resource, by transparently
retrying a failed operation.
Timeout – wait for a predetermined length of time and take
alternative action if that time is exceeded.
Circuit Breaker – when connecting to a remote service or resource,
handle faults that might take a variable amount of time to recover
from.
Bounded Queue – limit request queue sizes in front of heavily used
resources.

In Oracle RAC, will an application be faster, if there is a subset of the code using a separate Oracle service to the same database?

For example, I have an application that does lots of audit trails writing. Lots. It slows things down. If I create a separate service on my Oracle RAC just for audit CRUD, would that help speed things up in my application?
In other words, I point most of the application to the main service listening on my RAC via SCAN. I take the subset of my application, the audit trail data manipulation, and point it to a separate service listening but pointing same schema as the main listener.
As with anything else, it depends. You'd need to be a lot more specific about your application, what services you'd define, your workloads, your goals, etc. Realistically, you'd need to test it in your environment to know for sure.
A separate service could allow you to segregate the workload of one application (the one writing the audit trail) from the workload of other applications by having different sets of nodes in the cluster running each service (under normal operation). That can help ensure that the higher priority application (presumably not writing the audit trail) has a set amount of hardware to handle its workload even if the lower priority thread is running at full throttle. Of course, since all the nodes are sharing the same disk, if the bottleneck is disk I/O, that segregation of workload may not accomplish much.
Separating the services on different sets of nodes can also impact how frequently a particular service is getting blocks from the local node's buffer cache rather than requesting them from the other node and waiting for them to be shipped over the interconnect. It's quite possible that an application that is constantly writing to log tables might end up spending quite a bit of time waiting for a small number of hot blocks (such as the right-most block in the primary key index for the log table) to get shipped back and forth between different nodes. If all the audit records are being written on just one node (or on a smaller number of nodes), that hot block will always be available in the local buffer cache. On the other hand, if writing the audit trail involves querying the database to get information about a change, separating the workload may mean that blocks that were in the local cache (because they were just changed) are now getting shipped across the interconnect, you could end up hurting performance.
Separating the services even if they're running on the same set of nodes may also be useful if you plan on managing them differently. For example, you can configure Oracle Resource Manager rules to give priority to sessions that use one service over another. That can be a more fine-grained way to allocate resources to different workloads than running the services on different nodes. But it can also add more overhead.

Why do we need message brokers like RabbitMQ over a database like PostgreSQL?

I am new to message brokers like RabbitMQ which we can use to create tasks / message queues for a scheduling system like Celery.
Now, here is the question:
I can create a table in PostgreSQL which can be appended with new tasks and consumed by the consumer program like Celery.
Why on earth would I want to setup a whole new tech for this like RabbitMQ?
Now, I believe scaling cannot be the answer since our database like PostgreSQL can work in a distributed environment.
I googled for what problems does the database poses for the particular problem, and I found:
polling keeps the database busy and low performing
locking of the table -> again low performing
millions of rows of tasks -> again, polling is low performing
Now, how does RabbitMQ or any other message broker like that solves these problems?
Also, I found out that AMQP protocol is what it follows. What's great in that?
Can Redis also be used as a message broker? I find it more analogous to Memcached than RabbitMQ.
Please shed some light on this!
Rabbit's queues reside in memory and will therefore be much faster than implementing this in a database. A (good)dedicated message queue should also provide essential queuing related features such as throttling/flow control, and the ability to choose different routing algorithms, to name a couple(rabbit provides these and more). Depending on the size of your project, you may also want the message passing component separate from your database, so that if one component experiences heavy load, it need not hinder the other's operation.
As for the problems you mentioned:
polling keeping the database busy and low performing: Using Rabbitmq, producers can push updates to consumers which is far more performant than polling. Data is simply sent to the consumer when it needs to be, eliminating the need for wasteful checks.
locking of the table -> again low performing: There is no table to lock :P
millions of rows of task -> again polling is low performing: As mentioned above, Rabbitmq will operate faster as it resides RAM, and provides flow control. If needed, it can also use the disk to temporarily store messages if it runs out of RAM. After 2.0, Rabbit has significantly improved on its RAM usage. Clustering options are also available.
In regards to AMQP, I would say a really cool feature is the "exchange", and the ability for it to route to other exchanges. This gives you more flexibility and enables you to create a wide array of elaborate routing typologies which can come in very handy when scaling. For a good example, see:
(source: springsource.com)
and: http://blog.springsource.org/2011/04/01/routing-topologies-for-performance-and-scalability-with-rabbitmq/
Finally, in regards to Redis, yes, it can be used as a message broker, and can do well. However, Rabbitmq has more message queuing features than Redis, as rabbitmq was built from the ground up to be a full-featured enterprise-level dedicated message queue. Redis on the other hand was primarily created to be an in-memory key-value store(though it does much more than that now; its even referred to as a swiss army knife). Still, I've read/heard many people achieving good results with Redis for smaller sized projects, but haven't heard much about it in larger applications.
Here is an example of Redis being used in a long-polling chat implementation: http://eflorenzano.com/blog/2011/02/16/technology-behind-convore/
PostgreSQL 9.5
PostgreSQL 9.5 incorporates SELECT ... FOR UPDATE ... SKIP LOCKED. This makes implementing working queuing systems a lot simpler and easier. You may no longer require an external queueing system since it's now simple to fetch 'n' rows that no other session has locked, and keep them locked until you commit confirmation that the work is done. It even works with two-phase transactions for when external co-ordination is required.
External queueing systems remain useful, providing canned functionality, proven performance, integration with other systems, options for horizontal scaling and federation, etc. Nonetheless, for simple cases you don't really need them anymore.
Older versions
You don't need such tools, but using one may make life easier. Doing queueing in the database looks easy, but you'll discover in practice that high performance, reliable concurrent queuing is really hard to do right in a relational database.
That's why tools like PGQ exist.
You can get rid of polling in PostgreSQL by using LISTEN and NOTIFY, but that won't solve the problem of reliably handing out entries off the top of the queue to exactly one consumer while preserving highly concurrent operation and not blocking inserts. All the simple and obvious solutions you think will solve that problem actually don't in the real world, and tend to degenerate into less efficient versions of single-worker queue fetching.
If you don't need highly concurrent multi-worker queue fetches then using a single queue table in PostgreSQL is entirely reasonable.

Can a shared ready queue limit the scalability of a multiprocessor system?

Can a shared ready queue limit the scalability of a multiprocessor system?
Simply put, most definetly. Read on for some discussion.
Tuning a service is an art-form or requires benchmarking (and the space for the amount of concepts you need to benchmark is huge). I believe that it depends on factors such as the following (this is not exhaustive).
how much time an item which is picked up from the ready qeueue takes to process, and
how many worker threads are their?
how many producers are their, and how often do they produce ?
what type of wait concepts are you using ? spin-locks or kernel-waits (the latter being slower) ?
So, if items are produced often, and if the amount of threads is large, and the processing time is low: the data structure could be locked for large windows, thus causing thrashing.
Other factors may include the data structure used and how long the data structure is locked for -e.g., if you use a linked list to manage such a queue the add and remove oprations take constant time. A prio-queue (heaps) takes a few more operations on average when items are added.
If your system is for business processing you could take this question out of the picture by just using:
A process based architecure and just spawning multiple producer consumer processes and using the file system for communication,
Using a non-preemtive collaborative threading programming language such as stackless python, Lua or Erlang.
also note: synchronization primitives cause inter-processor cache-cohesion floods which are not good and therefore should be used sparingly.
The discussion could go on to fill a Ph.D dissertation :D
A per-cpu ready queue is a natural selection for the data structure. This is because, most operating systems will try to keep a process on the same CPU, for many reasons, you can google for.What does that imply? If a thread is ready and another CPU is idling, OS will not quickly migrate the thread to another CPU. load-balance kicks in long run only.
Had the situation been different, that is it was not a design goal to keep thread-cpu affinities, rather thread migration was frequent, then keeping separate per-cpu run queues would be costly.