in Tomcat thread pool executor, we will specify the thread size, right? is that thread count included in tomcat thread count? - threadpool

As far as i know, when ever tomcat server is started, it will start around 250 threads.
Assume, we specify 25 is the thread size in thread pool executor. so now total threads in tomcat is 250 + 25 or just 250 threads in which 25 threads will also be included?
regards,
Sathish

The executor will create the number of threads specified in minSpareThreads parameter. Connectors may create other threads if they are not assigned to use the executor. In addition to this Tomcat will create threads for its internal purposes.

Related

spark streaming - waiting for a dead executor

I have a spark streaming application running inside a k8s cluster (using spark-operator).
I have 1 executor, reading batches every 5s from a Kinesis stream.
The Kinesis stream has 12 shards, and the executor creates 1 receiver per shard. I gave it 24 cores, so it should be more than enough to handle it.
For some unknown reason, sometimes the executor crashes. I suspect it is due to memory going over the k8s pod memory limit, which would cause k8s to kill the pod. But I have not been able to prove this theory yet.
After the executor crashes, a new one is created.
However, the "work" stops. The new executor is not doing anything.
I investigated a bit:
Looking at the logs of the pod - I saw that it did execute a few tasks successfully after it was created, and then it stopped because it did not get any more tasks.
Looking in Spark Web UI - I see that there is 1 “running batch” that is not finishing.
I found some docs that say there can always be only 1 active batch at a time. So this is why the work stopped - it is waiting for this batch to finish.
Digging a bit deeper in the UI, I found this page that shows details about the tasks.
So executor 2 finished doing all the tasks it was assigned.
There are 12 tasks that were assigned to executor 1 which are still waiting to finish, but executor 1 is dead.
Why does this happen? Why does Spark not know that executor 1 is dead and never going to finish it's assigned tasks? I would expect Spark to reassign these tasks to executor 2.

Distributed Jmeter max number of servers

I use distributed jmeter for load tests in Kubernetes. To maximize the number of concurrent threads I use several Jmeter server instances.
My test plan has 2000 users, so I have 10000 concurrent users with 5 Jmeter servers. Each user send 1 request per second to Kafka. This runs without any problems.
But if I increase the number of server instances up to 10, Jmeter gets a lot of errors when sending requests and is not able to send the required rate of requests per second.
Is there a way to use more than 5 server instances in Jmeter (my cluster has 24vCPUs and 192gb ram)?
Theoretical maximum value of slaves is very high, you can have as many as 2 147 483 647 slaves which is a little bit more than 5.
So my expectation is that the problem is somewhere else, i.e. there is a maximum number of connections per IP defined or the broker is running out of resources
We cannot give meaningful advices unless we have more details, in the meantime you can check:
jmeter.log files for master and slaves
response data
kafka logs
resource consumption on kafka broker(s) side, it can be done using i.e. JMeter PerfMon Plugin

Consuming # subscription via MQTT causes huge queue on lost connection

I am using Artemis 2.14 and Java 14.0.2 on two Ubuntu 18.04 VM with 4 Cores an 16 GB RAM. My producers send approximately 2,000 Messages per seconds to approx 5,500 different Topics.
When I connect via the MQTT.FX client with certificate based authorization and do a subscription to # the MQTT.FX client dies after some time and in the web console I see a queue under # with my client id that won't be cleared by Artemis. It seems that this queue grows until the RAM is 100% used. After some time my Artemis Broker restarts itself.
Is this behaviour of Artemis normal? How can I tell Artemis to clean up "zombie" queues after some time?
I already tried to use this configuration parameters in different ways, but nothing works:
confirmationWindowSize=0
clientFailureCheckPeriod=30000
consumerWindowSize=0
Auto-created queues are automatically removed by default when:
consumer-count is 0
message-count is 0
This is done so that no messages are inadvertently deleted.
However, you can change the corresponding auto-delete-queues-message-count address-setting in broker.xml to -1 to skip the message-count check. Also, can adjust the auto-delete-queues-delay to configure a delay if needed.
It's worth noting that if you create a subscription like # (which is fairly dangerous) you need to be prepared to consume the messages as quickly as they are produced to avoid accumulation of messages in the queue. If accumulation is unavoidable then you should configure the max-size-bytes and address-full-policy according to your needs so the broker doesn't get overwhelmed. See the documentation for more details on that.

Kafka Streams - Relation between "Stream threads" vs "Tasks" running on 1 C4.XLarge Machine

I have a Kafka Streams Topology which has 5 Processors & 1 Source. Source topic for this topology has 200 partitions. My understanding is 200 tasks get created to match # of partitions for input topic.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
I know Kafka streams parallelism/scalability is controlled by number of stream threads. I can increase the num.stream.threads to 10, but how would it improve the performance if all of them run on single EC2 instance ?. How would it differ from running all tasks on single stream thread which is on single EC2 instance ?.
If you have a 8 core machine, you might want to run 8 StreamsThreads.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
This does not sound correct. A single thread cannot utilize multiple cores. While configuring a single StreamThread implies that some more other background threads are started (consumer heartbeat thread; producer sender thread), it would assume that you cannot fully utilize all 8 cores with this setting.
If 8 StreamsThreads do not fully utilize your 8 cores you might consider to configure 16 thread. However note, that all thread will share the same network and thus, if the network is the actually limiting factor, running more threads won't give you higher throughput (or higher CPU utilization). For this case, you need to scale out using multiple EC2 instances.
Given that you have 200 tasks, you could conceptually run up to 200 StreamThreads but you probably own't need 200 threads.

CeleryExecutor: Does the airflow metric "executor.queued_tasks" report the number of tasks in the celery broker?

Using its statsd plugin, Airflow can report on metric executor.queued_tasks as well as some others.
I am using CeleryExecutor and need to know how many tasks are waiting in the Celery broker, so I know when new workers should be spawned. Indeed, I set my workers so they cannot take many tasks concurrently. Is this metric what I need?
Nope. If you want to know how many TIs are waiting in the broker, you'll have to connect to it.
Task instances that are waiting to get picked up in the celery broker are queued according to the Airflow DB, but running according to the CeleryExecutor. This is because the CeleryExecutor considers that any task instance that was successfully sent to the broker is now running (unlike the DB, which waits for a worker to pick it up before marking it as running).
Metric executor.queued_tasks reports the number of tasks queued according to the executor, not the DB.
The number of queued task instances according to the DB is not exactly what you need either, because it reports the number of task instances that are waiting in the broker plus the number of task instances queued to the executor. But when would TIs be stuck in the executor's queue, you ask? When the parallelism setting of Airflow prevents the executor from sending them to the broker.