How do I stop a tsung process grecefully which is running under a stress load(assume 100k concurrent xmpp users).
Related
I am using Akka cluster (Server) and exchanging HeartBeat message with client using Akka TCP in every 5 seconds.
HeartBeat is working fine till I am not using scheduler.
but when I am starting 4-5 schedulers, Server is not receiving heartbeat buffer message from client (tcp connection). After scheduler processing, I am getting 4-5 heartbeat messages at same time.
Akka sceduler is blocking actor other processing (buffer reading etc).
I already tried below, but still facing same issue.
different-2 dispatcher
created new actor and added scheduler call in seperate actor.
using 8 core machine
tried fork-join-executor and thread-pool-executor
already tried changing Tcp-SO-ReceivedBufferSize and Tcp-SO-SendBufferSize to 1024 or 2048, but it didn’t work.
already tried Tcp-SO-TcpNoDelay
Kindly help.
I have a remote process that sends a thousands requests to my humble Spark Standalone Cluster:
3 worker-nodes with 4 cores and 8GB
Identical master-node where the driver runs
that hosts a simple data processing service developed in Scala. Those requests are sent via cUrl command with some parameters to the .jar, through the REST Apache Livy interface like this:
curl -s -X POST -H "Content-Type: application/json" REMOTE_IP:8998/batches -d '{"file":"file://PATH_TO_JAR/target/MY_JAR-1.0.0-jar-with-dependencies.jar","className":"project.update.MY_JAR","args":["C1:1","C2:2","C3:3"]}'
This triggers a spark job each time,
the resource scheduling for the cluster is dynamic so it can serve at most 3 requests at a time,
when a worker goes idle, another queued request is served.
At some point in time, the requests kills the master node memory even if they are in a WAITING state (because Spark register the jobs to be served), hangs the master node and the workers loose connection to it.
Is there a way that I can queue this requests preventing that Spark hold RAM for them ? and when a worker is free, process another request from that queue.
This question is similar, saying that the yarn.scheduler.capacity.max-applications only allows N numbers of RUNNING applications, but I can't figure it out if this is the solution I need. Apache Livy doesn't have this functionality, not that I'm aware of.
I'm using beanstalkd to managed queues. I just realised that if there are jobs in a queue and the beanstalkd process is restarted or crashes then the job is lost forever (or so I think).
Is there a way to preserve the jobs in the queue on beanstalkd failure or restart? If not, whats best practice to ensure jobs are never lost?
Beanstalkd can be started with the -b (binary log) option, and beanstalkd will write all jobs to a binlog. If the power goes out, you can restart beanstalkd with the same option and it will recover the contents of the log.
We have partitioned a large number of our jobs to improve the overall performance of our application. We are now investigating running several of these partitioned jobs in parallel (kicked off by an external scheduler). The jobs are all configured to use the same fixes reply queue. As a test, I generated a batch job that has a parallel flow where partitioned jobs are executed in parallel. On a single server (local testing) it works fine. When I try on a multiple server cluster I see the remote steps complete but the parent step does not ever finish. I see the messages in the reply queue but they are never read.
Is this an expected problem with out approach or can you suggest how we can try to resolve the problem?
I'm running multiple celery worker processes on a AWS c3.xlarge (4 core machine). There is a "batch" worker process with its --concurrency parameter set to 2, and a "priority" process with its --concurrency parameter set to 1. Both worker processes draw from the same priority queue. I am using Mongo as my broker. When I submit multiple jobs to the priority queue they are processed serially, one after the other, even though multiple workers are available. All items are processed by the "priority" process, but if I stop the "priority" process, the "batch" process will process everything (still serially). What could I have configured incorrectly that prevents celery from processing jobs asynchronously?
EDIT: It turned out that the synchronous bottleneck is in the server submitting the jobs rather than in celery.
By default the worker will prefetch 4 * concurrency number tasks to execute, which means that your first running worker is prefetching 4 tasks, so if you are queuing 4 or less tasks they will be all processed by this worker alone, and there won't be any other messages in the queue to be consumed by the second worker.
You should set the CELERYD_PREFETCH_MULTIPLIER to a number that works best for you, I had this problem before and set this option to 1, now all my tasks are fairly consumed by the workers.