can somebody please explain me what is the cause of this error:
You have reached maximum pool size for given partition
In latest 2.1.x version, you do not have this exception any more.
You merely wait till new connection will be available.
But I will explain it any way. To increase multiprocessor scalability pool is split on partitions and several threads work together on single partition.
Each partition has queue , when limit if connections for this queue is reached exception is thrown. But again it is already not the case for latest version.
So the best approach to fix this issue is to upgrade to latest version and set limit of maximum connections. Would be cool if you will add more information in your question , but I suppose that you use OrientGraphFactory which in latest version has maximum limit of connections equals to number of CPU cores.
Related
As written in HikariCP docs the formula for counting connection pool size is connections = ((core_count * 2) + effective_spindle_count). But which core count is this: my app server or database server?
For example: my is app running on 2 CPUs, but database is running on 16 CPUs.
This is Kevin's formula for connection pool size, where cores and spindles (you can tell is is an old formula) are the database server's.
This assumes that the connections are kept fairly busy. If you have transactions with longer idle times, you might need to make the pool bigger.
In the end, only trial and error can find the ideal pool size.
The quote is from PostgreSQL wiki which is related to database cores/server
database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.
Notice that this formula may be outdated (comment by #mustaccio)
That wiki page was last updated nearly 5 years ago, and the advice in question is even older. I/O queue depth might be more relevant today than the number of spindles, even if the latter are actually present
I'm running a somewhat large Kafka cluster, but currently I'm stuck at properly setting max.incremental.fetch.session.cache.slots and would need some guidance. The documentation about this is not clear either: https://cwiki.apache.org/confluence/display/KAFKA/KIP-227%3A+Introduce+Incremental+FetchRequests+to+Increase+Partition+Scalability
By scale i mean: 3 nodes, ~400 Topics, 4500 Partitions, 300 consumergroups, 500 consumers
For a while now, I'm seeing the FETCH_SESSION_ID_NOT_FOUND errors appearing in the logs and wanted to address them.
So I've tried increasing the value in the config, restarted all brokers and the pool quickly filled up again to it's max capacity. This reduced the occurrence of the errors, but they are not completely gone. At first I've set to value to 2000, it was instantly full. Then in several steps up to 100.000. And the pool was filled in ~40 Minutes.
From the documentation I was expecting the pool to cap out after 2 Minutes when min.incremental.fetch.session.eviction.ms kicks in. But this seems not to be the case.
What would be the metrics to gauge the appropriate size of the cache. Are the errors I'm still seeing anything I can fix on the brokers or do I need to hunt down misconfigured consumers? If so, what do I need to look out for?
Such a high usage of Fetch Sessions is most likely caused by a bad client.
Sarama, a Golang client, had an issue that caused a new Fetch Session to be allocated on every Fetch request between versions 1.26.0 and 1.26.2, see https://github.com/Shopify/sarama/pull/1644.
I'd recommend checking if you have users running this client and ensure they update to the latest release.
We Are running a 5 node flink cluster (1.6.3) over kubernetes, with a 5 partitions Kafka topic source.
5 jobs are reading from that topic (with different consumer group), each with parallelism = 5.
Each task manager is running with 10Gb of ram and the task manager heap size is limited to 2Gb.
The ingestion load is rather small (100-200 msgs per second) and an avg message size is ~4-8kb.
all jobs are running fine for a few hours. after a duration we suddenly see one or more jobs failing on:
ava.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:666)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:110)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:257)
flink restarts the job, but it keeps failing on that exception.
we've tried reducing the record poll as suggested here:
Kafka Consumers throwing java.lang.OutOfMemoryError: Direct buffer memory
We also tried increasing kafka heap size as suggested here:
Flink + Kafka, java.lang.OutOfMemoryError when parallelism > 1, although i'm failing to understand how failing to allocate memory in the flink process has anything to do with the jvm memory of the kafka broker process, and i don't see anything to indicate oom in the broker logs.
What might be the cause of that failure? what else should we check?
Thanks!
One thing that you may have underestimated, is that having a parallelism of 5, means there are 5+4+3+2+1=18 pairs of combinations. If we compare this to the linked thread, there were likely 3+2+1=6 combinations.
In the linked thread the problem was resolved by setting the max poll records to 250, hence my first thought would be to set it to 80 here (or even to 10) and see if that resolves the problem.
(I am not sure if the requirements are shaped this way, but the only noticable difference is the parallelism from 3 to 5 so that seems like a good point to compensate for).
We are using mirth connect for message transformation from hl7 to text and storing the transformed messages to azure sql database. Our current performance is 45000 messages per hour .
machine configuration is
8 GB RAM and 2 core CPU. Memory assigned to mirth is -XMS = 6122MB
We don't have any idea about what could be performance parameters for Mirth with above configurations. Anyone have idea about performance benchmarks for Mirth connect?
I'd recommend looking into the Max Processing Threads option in version 3.4 and above. It's configurable in the Source Settings (Source tab). By default it's set to 1, which means only one message can process through the channel's main processing thread at any given time. This is important for certain interfaces where order of messages is paramount, but obviously it limits throughput.
Note that whatever client is sending your channel messages also needs to be reconfigured to send multiple messages in parallel. For example if you have a single-threaded process that is sending your channel messages via TCP/MLLP one after another in sequence, increasing the max processing threads isn't necessarily going to help because the client is still single-threaded. But, for example, if you stand up 10 clients all sending to your channel simultaneously, then you'll definitely reap the benefits of increasing the max processing threads.
If your source connector is a polling type, like a File Reader, you can still benefit from this by turning the Source Queue on and increasing the Max Processing Threads. When the source queue is enabled and you have multiple processing threads, multiple queue consumers are started and all read and process from the source queue at the same time.
Another thing to look at is destination queuing. In the Advanced (wrench icon) queue settings, there is a similar option to increase the number of Destination Queue Threads. By default when you have destination queuing enabled, there's just a single queue thread that processes messages in a FIFO sequence. Again, good for message order but hampers throughput.
If you do need messages to be ordered and want to maximize parallel throughput (AKA have your cake and eat it too), you can use the Thread Assignment Variable in conjunction with multiple destination Queue Threads. This allows you to preserve order among messages with the same unique identifier, while messages pertaining to different identifiers can process simultaneously. A common use-case is to use the patient MRN for this, so that all messages for a given patient are guaranteed to process in the order they were received, but messages longitudinally across different patients can process simultaneously.
We are using an AWS EC2 4c.4xlarge instance to test a bare bone Proof of Concept performance limit. We got about 50 msgs/sec without obvious bottlenecks on cpu/memory/network/disk io/db io and etc. Want to push the limits higher. Please share your observations if any.
We run the same process. Mirth -> Azure SQL Database. We're running through performance testing right now and have been stuck at 12 - 15 messages/second (43000 - 54000 per hour).
We've run tests on each channel and found this:
1 channel source: file reader -> destination: Azure SQL DB was about 36k per hour
2 channel source: file reader -> destination: Azure SQL DB was about 59k per hour
3 channel source: file reader -> destination: Azure SQL DB was about 80k per hour
We've added multi-threading (2,4,8) to both the source and destination on 1 channel with no performance increase. Mirth is running on 8GB mem and 2 Cores with heap size set to 2048MB.
We are now going to run through a few tests with mirth running on similar "hardware" as a C4.4xlarge which in Azure is 16 cores and 32GB mem. There is 200gb of SSD available as well.
Our goal is 100k messages per hour per channel.
Sorry for that dumb question, but why exactly that number? At first I thought it must only be odd number, but actually it is 7 as the maximum voting node, and overall a maximum of 12 nodes. If you wish to run more than this, you must use the deprecated master/slave configuration.
So how to explain this number?
All members of a replica set maintain knowledge of the current state of each of the other members. This is the rationale for limiting the total number of nodes to twelve - more than that would introduce too much overhead in heartbeats between each pair of nodes.
Maximum of seven voting members is to avoid slowing down elections - since you need to have a consensus election to select a new primary limiting the number of nodes that need to coordinate amongst themselves will keep things faster.
For anyone still wondering about this, the latest dev branch of MongoDB (2.7.8 as of writing this answer) has increased the overall limit to 50 as part of SERVER-15060. This change, assuming it is not reverted due to irreconcilable problems during testing, will be available in the 2.8 production release. Note: the max number of voting nodes remains at 7 but this change should prevent the need to use the legacy master/slave configuration for those niche cases that require more than 12 nodes.