weblogic work manager HARVESTER_WM max thread constrint in 12 c - weblogic12c

We are keep getting work manager thread limit error in weblogic server log. Though our app is working fine, i would like to know what is the issue here and what kind of impact it can have in future to our app. Since this just the POC we are kind of ignoring it
java.lang.RuntimeException: MaxThreads constraint 'HARVESTER_WM' queue for workManager 'HARVESTER_WM' exceeded maximum capacity of '8000' elements. Max threads constraint count is set to 1

Related

Celery Prefetch count ignored when using greenlet based pools with SQS

We are experiencing an issue with Celery+SQS, using the gevent or the eventlet pools.
Even if we have a large amount of messages in the queue, and the concurrency set to 100, prefetch multiplier set to 5 (so the total prefetch_count reported is 500) celery will receive 10 messages from the queue (the max allowed by SQS) and wait for all of them to finish before attempting to receive again (from AWS we see either 10 or 0 messages in flight).
We do not see the same behavior using the prefork pool.
Our tasks are I/O bound, so using prefork would be inefficient, is there any way to increase the amount of messages pulled from the queue to keep all the greenlet busy?

Error in dataflow plugins.adfprod.AutoResolveIntegrationRuntime.45

I am getting below error while running my dataflow. This dataflow was running fine till yesterday. From today onwards I am getting error like this
Operation on target LoadAccount failed:
[plugins.adfprod.AutoResolveIntegrationRuntime.45 WorkspaceType: CCID:<1a11d7e0-b019-4845-ab29-641100c79f04>] The job has surpassed the max number of seconds it can be in ResourceAcquisition state [1000s], so ending the job.
Error Message - The job has surpassed the max number of seconds it can
be in ResourceAcquisition state [1000s], so ending the job.
In a lot of cases of Data Factory the MAX limitations are only soft restrictions that can easily be lifted via a support ticket.
There is no such thing as a limitless cloud platform.
Refer this article by MRPAULANDREW

Kafka producer timeout exception

[1] 2022-01-18 21:56:10,280 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #9 - KafkaProducer[test]) Failed delivery for (MessageId: 95835510BC9E9B2-0000000000134315 on ExchangeId: 95835510BC9E9B2-0000000000134315). Exhausted after delivery attempt: 1 caught: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
[1]
[1] Message History (complete message history is disabled)
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] RouteId ProcessorId Processor Elapsed (ms)
[1] [route1 ] [route1 ] [from[netty://udp://0.0.0.0:8080?receiveBufferSize=65536&sync=false] ] [ 125320]
[1] ...
[1] [route1 ] [to1 ] [kafka:test?brokers=10.99.155.100:9092&producerBatchSize=0 ] [ 0]
[1]
[1] Stacktrace
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] : org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
Here's the flow for my project
External Service ---> Netty
Netty ---> Kafka(consumer)
Kafka(producer) ---> processing events
1 and 2 are running in one Kubernetes pod and 3 is running in a separate pod.
I have encountered TimeoutException at the beginning saying like:
org.apache.kafka.common.errors.TimeoutException: Expiring 20 record(s) for test-0:121924 ms has passed since batch creation
I searched online and found a couple of potential solutions
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
Based on the suggestion, I have done:
make the timeout bigger, double the default value
make the batch size to 0, which will not send events in batch and keeps the memory usage low.
Unfortunately I still encounter the error due to memory is used up.
Does anyone know how to solve it?
Thanks!
There are several things to take into account here.
You are not showing up what your throughput is, you have to take into account that value and if your broker on 10.99.155.100:9092 is able to process such load.
Did you check 10.99.155.100 during the time of the transfer? The fact that Kafka can potentially process hundreds of thousands of messages per second doesn't mean that you can do it on any hardware.
So, having said that, the timeout is the first to come to my mind, but in your case you have 2 minutes and still you are timing out, for me, this sounds more like a problem in your broker and not on your producer.
To understand the issue, basically, you are getting your mouth full faster than you can swallow, by the time push a message the broker is not able to acknowledge on time (in this case, 2 minutes)
What things you can do here:
Check the broker performance for the given load Change your
delivery.timeout.ms to an acceptable value, I guess you have SLAs
to attach to Increase your retry backoff timer (retry.backoff.ms)
Do not put the batch size as 0, this will try a live push to the
broker, which in case seems not possible for the load. Make sure your
max.block.ms is set correctly Change to bigger batches (even if this
increases latency), but not too big, you need to sit down, check how
many records you are pushing and allocate them correctly.
Now, some rules:
delivery.timeout.ms must be bigger than the sum of
request.timeout.ms and linger.ms All the above are impacted by
the batch.size If you don't have so many rows, but those rows are
huge! then control the max.request.size
So, to summarize, your properties to change are the following:
delivery.timeout.ms, request.timeout.ms, linger.mx, max.request.size
Assuming the hardware is good and also assuming that you are not sending more than you should, those should do the trick

Multiple concurrent connections with Vertx

I'm trying to build a web application that should be able to handle at least 15000 rps. Some of the optimizations I have done is increase the worker pool size to 20 and set an accept back log to 25000. Since I have set my worker pool size to 20; wil this help with the the blocking piece of code?
A worker pool size of 20 seems to be the default.
I believe the important question in your case is how long do you expect each request to run. On my side, I expect to have thousands of short-lived requests, each with a payload size of about 5-10KB. All of these will be blocking, because of a blocking database driver I use at the moment. I have increased the default worker pool size to 40 and I have explicitly set my deploy vertical instances using the following formulae:
final int instances = Math.min(Math.max(Runtime.getRuntime().availableProcessors() / 2, 1), 2);
A test run of 500 simultaneous clients running for 60 seconds, on a vert.x server doing nothing but blocking calls, produced an average of 6 failed requests out of 11089. My test payload in this case was ~28KB.
Of course, from experience I know that running my software in production would often produce results that I have not anticipated. Thus, the important thing in my case is to have good atomicity rules in place, so that I don't get half-baked or corrupted data in the database.

Akka Actor Messaging Delay

I'm experiencing issues scaling my app with multiple requests.
Each request sends an ask to an actor, which then spawns other actors. This is fine, however, under load(5+ asks at once), the ask takes a massive amount of time to deliver the message to the target actor. The original design was to bulkhead requests evenly, but this is causing a bottleneck. Example:
In this picture, the ask is sent right after the query plan resolver. However, there is a multi-second gap when the Actor receives this message. This is only experienced under load(5+ requests/sec). I first thought this was a starvation issue.
Design:
Each planner-executor is a seperate instance for each request. It spawns a new 'Request Acceptor' actor each time(it logs 'requesting score' when it receives a message).
I gave the actorsystem a custom global executor(big one). I noticed the threads were not utilized beyond the core threadpool size even during this massive delay
I made sure all executioncontexts in the child actors used the correct executioncontext
Made sure all blocking calls inside actors used a future
I gave the parent actor(and all child) a custom dispatcher with core size 50 and max size 100. It did not request more(it stayed at 50) even during these delays
Finally, I tried creating a totally new Actorsystem for each request(inside the planner-executor). This also had no noticable effect!
I'm a bit stumped by this. From these tests it does not look like a thread starvation issue. Back at square one, I have no idea why the message takes longer and longer to deliver the more concurrent requests I make. The Zipkin trace before reaching this point does not degrade with more requests until it reaches the ask here. Before then, the server is able to handle multiple steps to e.g veify the request, talk to the db, and then finally go inside the planner-executor. So I doubt the application itself is running out of cpu time.
We had this very similar issue with Akka. We observed huge delay in ask pattern to deliver messages to the target actor on peek load.
Most of these issues are related to heap memory consumption and not because of usages of dispatchers.
Finally we fixed these issues by tuning some of the below configuration and changes.
1) Make sure you stop entities/actors which are no longer required. If its a persistent actor then you can always bring it back when you need it.
Refer : https://doc.akka.io/docs/akka/current/cluster-sharding.html#passivation
2) If you are using cluster sharding then check the akka.cluster.sharding.state-store-mode. By changing this to persistence we gained 50% more TPS.
3) Minimize your log entries (set it to info level).
4) Tune your logs to publish messages frequently to your logging system. Update the batch size, batch count and interval accordingly. So that the memory is freed. In our case huge heap memory is used for buffering the log messages and send in bulk. If the interval is more then you may fill your heap memory and that affects the performance (more GC activity required).
5) Run blocking operations on a separate dispatcher.
6) Use custom serializers (protobuf) and avoid JavaSerializer.
7) Add the below JAVA_OPTS to your jar
export JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Djava.security.egd=file:/dev/./urandom"
The main thing is XX:MaxRAMFraction=2 which will utilize more than 60% of available memory. By default its 4 means your application will use only one fourth of the available memory, which might not be sufficient.
Refer : https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
Regards,
Vinoth