Infinispan cluster fails to communicate in case of huge data - jboss

I am using Infinispan in Distributed Async mode with 4 nodes on 4 different systems. Each node runs with 3 GB of heap size.
Only one node plays the role of loader and tries to load 50 million records in chunks (in a loop where 5 million records go to cache 10 times). According to my calculation, 4 nodes can handle that much of data so space is not a problem.
When I start all 4 nodes, cluster forms successfully and data starts loading in the cache. But since the data is very huge, after sometime any one node is unable to get response from other one node and fails with below exception:
2013-11-01 05:35:14 ERROR org.infinispan.interceptors.InvocationContextInterceptor - ISPN000136: Execution error
org.infinispan.util.concurrent.TimeoutException: Timed out after 15 seconds waiting for a response from INUMUU410-54463
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:459)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:154)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:534)
INUMUU410-54463 is the machine name.

(copied from Flavius comment above:)
What I'd do in your case is to split it to such that one putAll does not contain more than e.g. 1MB of data and then send these synchronously (using cache.getAdvancedCache().withFlags(FORCE_SYNCHRONOUS)). Or otherwise throttle the number of messages that are on air simlutaneously (see also the putAllAsync method on advanced cache).

Related

How to use ZooKeeper to distribute work across a cluster of servers

I'm studying up for system design interviews and have run into this pattern in several different problems. Imagine I have a large volume of work that needs to be repeatedly processed at some cadence. For example, I have a large number of alert configurations that need to be checked every 5 min to see if the alert threshold has been breached.
The general approach is to split the work across a cluster of servers for scalability and fault tolerance. Each server would work as follows:
start up
read assigned shard
while true:
process the assigned shard
sleep 5 min
Based on this answer (Zookeeper for assigning shard indexes), I came up with the following approach using ZooKeeper:
When a server starts up, it adds itself as a child under the node /service/{server-id} and watches the children of the node. ZooKeeper assigns a unique sequence number to the server.
Server reads its unique sequence number i from ZooKeeper. It also reads the total number of children n under the /service node.
Server identifies its shard by dividing the total volume of work into n pieces and locating the ith piece.
While true:
If the watch triggers (because servers have been added to or removed from the cluster), server recalculates its shard.
Server processes its shard.
Sleep 5 min.
Does this sound reasonable? Is this generally the way that it is done in real world systems? A few questions:
In step #2, when the server reads the number of children, does it need to wait a period of time to let things settle down? What if every server is joining at the same time?
I'm not sure how timely the watch would be. Seems like there would be a time period where the server is still processing its shard and reassignment of shards might cause another server to pick up a shard that overlaps with what this server is processing, causing duplicate processing (which may or may not be ok). Is there any way to solve this?
Thanks!

NiFi PutDatabaseRecord to Postgres database : Performance improvement in loading data into Postgres database

We are trying to load data to postgres from oracle using nifi.
we are using PutDatabaseRecord to load data (which is in avro format).
we are using ExecuteSQL to extract data which is very fast but we can see that,
even though we are using 150+ threads for PutDatabaseRecord, it is maintaining an average of 1GB data writes for 5mins .
If suppose we are having 3 PutDatabaseRecord processors (i.e., let suppose for each table one processor) and each processor is of 50 threads, still it is maintaining an average of 1Gb for 5 mins (i.e., 250mb for 1 processor, 350 for 2nd processor and 400 for 3 processor. Or some other combinations but it is still 1Gb overall).
We are really, not sure if it is from postgres database end which is limiting write size or it's from nifi end.
Need help if we need to change NiFi properties or to change some settings in postgres, which will help the data loading performance.
One observation is that, data extraction from Oracle is very fast and we are able to see the Nifi queues are filling very quickly and waiting to be processed by PutDatabaseRecord process.
If you have a single NiFi instance, there will be limit on how much data you can push through regardless of the number of threads (once the number of threads reaches the number of cores on your machine). To increase throughput, you could set up a 3-5 node NiFi cluster and run the PutDatabaseRecord processors in parallel, then you should see 3-5 GB throughput to Postgres (as long as PG can handle that)

Postgres Replication Slots Checking Lag

I'm attempting to detect on my AWS RDS Aurora Postgres 11.9 instance if my three Logical Replication slots are backing up. I'm using wal2json plugin to read off of them continuously. Two of the slots are being read off by python processes. The third is kafka-connect consumer.
I'm using the below query, but am getting odds results. It is saying two of my slots are several GB behind even in the middle of the night when we have very small load. Am I misinterpreting what the query is saying?
SELECT redo_lsn, slot_name,restart_lsn,
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind
FROM pg_control_checkpoint(), pg_replication_slots;
Things I've checked:
I've checked that the consumers are still running.
I have also looked at the logs and the timestamps of the rows being inserted are coming off the database within 0-2 seconds after they were inserted. So it doesn't appear like I'm lagging behind.
I've performed an end-to-end test and the data is making it through my pipeline in a few seconds, so it is definitely consuming data relatively fast.
Both of the slots I'm using for my python processes have the same value for GB_behind, currently 12.40. Even though the two slots are on different logical databases which have dramatically different load (one is ~1000x higher load).
I have a 3rd replication slot being read by a different program (kafka connect). It shows 0 GB_behind.
There is just no way, even at peak load, that my workloads could generate 12.4 GBs of data in a few seconds(not even in a few minutes). Am I miss interpreting something? Is there a better way to check how far a replication slot is behind?
Thanks much!
Here is a small snippet of my code(python3.6) in case it helps, but I've bene using it for awhile and data has been working:
def consume(msg):
print(msg.payload)
try:
kinesis_client.put_record(StreamName=STREAM_NAME, Data=msg.payload, PartitionKey=partition_key)
except:
logger.exception('PG ETL: Failed to send load to kinesis. Likely too large.')
with con.cursor() as cur:
cur.start_replication(slot_name=replication_slot, options = {'pretty-print' : 1}, decode=True)
cur.consume_stream(consume)
I wasn't properly performing send_feedback during my consume function. So I was consuming the records, but I wasn't telling the Postgres replication slot that I had consumed the records.
Here is my complete consume function in case others interested:
def consume(msg):
print(msg.payload)
try:
kinesis_client.put_record(StreamName=STREAM_NAME, Data=msg.payload, PartitionKey=partition_key)
except:
logger.exception('PG ETL: Failed to send load to kinesis. Likely too large.')
msg.cursor.send_feedback(flush_lsn=msg.data_start)
with con.cursor() as cur:
cur.start_replication(slot_name=replication_slot, options = {'pretty-print' : 1}, decode=True)
cur.consume_stream(consume)

Mirth performance benchmark

We are using mirth connect for message transformation from hl7 to text and storing the transformed messages to azure sql database. Our current performance is 45000 messages per hour .
machine configuration is
8 GB RAM and 2 core CPU. Memory assigned to mirth is -XMS = 6122MB
We don't have any idea about what could be performance parameters for Mirth with above configurations. Anyone have idea about performance benchmarks for Mirth connect?
I'd recommend looking into the Max Processing Threads option in version 3.4 and above. It's configurable in the Source Settings (Source tab). By default it's set to 1, which means only one message can process through the channel's main processing thread at any given time. This is important for certain interfaces where order of messages is paramount, but obviously it limits throughput.
Note that whatever client is sending your channel messages also needs to be reconfigured to send multiple messages in parallel. For example if you have a single-threaded process that is sending your channel messages via TCP/MLLP one after another in sequence, increasing the max processing threads isn't necessarily going to help because the client is still single-threaded. But, for example, if you stand up 10 clients all sending to your channel simultaneously, then you'll definitely reap the benefits of increasing the max processing threads.
If your source connector is a polling type, like a File Reader, you can still benefit from this by turning the Source Queue on and increasing the Max Processing Threads. When the source queue is enabled and you have multiple processing threads, multiple queue consumers are started and all read and process from the source queue at the same time.
Another thing to look at is destination queuing. In the Advanced (wrench icon) queue settings, there is a similar option to increase the number of Destination Queue Threads. By default when you have destination queuing enabled, there's just a single queue thread that processes messages in a FIFO sequence. Again, good for message order but hampers throughput.
If you do need messages to be ordered and want to maximize parallel throughput (AKA have your cake and eat it too), you can use the Thread Assignment Variable in conjunction with multiple destination Queue Threads. This allows you to preserve order among messages with the same unique identifier, while messages pertaining to different identifiers can process simultaneously. A common use-case is to use the patient MRN for this, so that all messages for a given patient are guaranteed to process in the order they were received, but messages longitudinally across different patients can process simultaneously.
We are using an AWS EC2 4c.4xlarge instance to test a bare bone Proof of Concept performance limit. We got about 50 msgs/sec without obvious bottlenecks on cpu/memory/network/disk io/db io and etc. Want to push the limits higher. Please share your observations if any.
We run the same process. Mirth -> Azure SQL Database. We're running through performance testing right now and have been stuck at 12 - 15 messages/second (43000 - 54000 per hour).
We've run tests on each channel and found this:
1 channel source: file reader -> destination: Azure SQL DB was about 36k per hour
2 channel source: file reader -> destination: Azure SQL DB was about 59k per hour
3 channel source: file reader -> destination: Azure SQL DB was about 80k per hour
We've added multi-threading (2,4,8) to both the source and destination on 1 channel with no performance increase. Mirth is running on 8GB mem and 2 Cores with heap size set to 2048MB.
We are now going to run through a few tests with mirth running on similar "hardware" as a C4.4xlarge which in Azure is 16 cores and 32GB mem. There is 200gb of SSD available as well.
Our goal is 100k messages per hour per channel.

Jboss Activemq 6.1.0 queue message processing slows down after 10000 messages

Below is the configuration:
2 JBoss application nodes
5 listeners on the application node with 50 threads each, supports
clustering and is set up as active-active listener, so they run on
both app nodes
The listener simply gets the message and logs the information into
database
50000 messages are posted into ActiveMQ using JMeter.
Here is the observation on first execution:
Total 50000 messages are consumed in approx 22 mins.
first 0-10000 messages consumed in 1 min approx
10000-20000 messages consumed in 2 mins approx
20000-30000 messages consumed in 4 mins approx
30000-40000 messages consumed in 6 mins approx
40000-50000 messages consumed in 8 mins
So we see the message consumption time is increasing with increasing number of messages.
Second execution without restarting any of the servers:
50000 messages consumed in 53 mins approx!
But after deleting data folder of activemq and restarting activemq,
performance again improves but degrades as more data enters the queue!
I tried multiple configuration in activemq.xml, but no success...
Anybody faced similar issue, and got any solution ? Let me know. Thanks.
I've seen similar slowdowns in our production systems when pending message counts go high. If you're flooding the queues then the MQ process can't keep all the pending messages in memory, and has to go to disk to serve a message. Performance can fall off a cliff in these circumstances. Increase the memory given to the MQ server process.
Also looks as though the disk storage layout is not particularly efficient - perhaps having each message as a file in a single directory? This can make access time rise as traversing disk directory takes longer.
50000 messages in > 20 mins seems very low performance.
Following configuration works well for me (these are just pointers. You may already have tried some of these but see if it works for you)
1) Server and queue/topic policy entry
// server
server.setDedicatedTaskRunner(false)
// queue policy entry
policyEntry.setMemoryLimit(queueMemoryLimit); // 32mb
policyEntry.setOptimizedDispatch(true);
policyEntry.setLazyDispatch(true);
policyEntry.setReduceMemoryFootprint(true);
policyEntry.setProducerFlowControl(true);
policyEntry.setPendingQueuePolicy(new StorePendingQueueMessageStoragePolicy());
2) If you are using KahaDB for persistence then use per destination adapter (MultiKahaDBPersistenceAdapter). This keeps the storage folders separate for each destination and reduces synchronization efforts. Also if you do not worry about abrupt server restarts (due to any technical reason) then you can reduce then disk sync efforts by
kahaDBPersistenceAdapter.setEnableJournalDiskSyncs(false);
3) Try increasing the memory usage, temp and storage disk usage values at server level.
4) If possible increase prefetchSize in prefetch policy. This will improve performance but also increases the memory footprint of consumers.
5) If possible use transactions in consumers. This will help to reduce the message acknowledgement handling and disk sync efforts by server.
Point 5 mentioned by #hemant1900 solved the problem :) Thanks.
5) If possible use transactions in consumers. This will help to reduce
the message acknowledgement handling and disk sync efforts by server.
The problem was in my code. I had not used transaction to persist the data in consumer, which is anyway bad programming..I know :(
But didn't expect that could have caused this issue.
Now 50000, messages are getting processed in less than 2 mins.