Last night I had a problem with connection between Gearman server and jobs server, servers are on separated machines.
There was good job in Gearman queue, but it wasn't executing.
Job server uses Supervisor for running processes, and all processes were in 'RUNNING' status and they waited job from Gearman.
We are using Gearman v1.1.8, Centos 6.6.
Problems started after this message in log:
ERROR 2016-08-03 20:36:11.000000 [ main ] Timeout occured when calling bind() for 0.0.0.0:4730 -> libgearman-server/gearmand.cc:679
Related
I have started a dramatiq worker to do some task and after a point, it is just stuck and throws this below-mentioned error after some time.
[MainThread] [dramatiq.MainProcess] [CRITICAL] Worker with PID 53 exited unexpectedly (code -9). Shutting down...
What can be the potential reason for this to occur? Are System resources a constraint?
This queuing task is run inside a Kubernetes pod
Please check kernel logs (/var/log/kern.log and /var/log/kern.log.1)
The Worker might be getting killed due to OOMKiller (OutOfMemory).
To resolve this try to increase the memory if you are running in a docker or pod.
I want to kill my Kafka Connect distributed worker, but I am unable (or I do not know how) to determine which process running in linux is that worker.
When running
ps aux | grep worker
I do see a lot of worker processes but am unsure which is the connect worker and which are standard non-connect workers
It is true that only one of these processes was started yesterday and I suspect that that is the one, but that would obviously not be a sufficient condition in all cases, say for example if the Kafka cluster was brought online yesterday. So, in general, how can I determine which process is a Kafka connect worker?
What is the fool proof method here?
If the other worker processes are not related to connect, you can search connect process with properties file which you passed to start connect worker.
ps aux | grep connect-distributed.properties
There is no kill script for connect workers. You have to run kill command with SIGTERM to stop worker process gracefully.
Upstart service is responsible for creating a gearman workers which run in parallel as number of cpus with the help of gnu-parallel. To understand the problem you can read my stackoverflow post which describes how to run workers in parallel.
Fork processes indefinetly using gnu-parallel which catch individual exit errors and respawn
Upstart service: workon.conf
# workon
description "worker load"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
script
exec seq 1000000 | parallel -N0 --joblog out.log ./worker
end script
Oright. so above service is started
$ sudo service workon start
workon start/running, process 4620
4620 is the process id of service workon.
4 workers will be spawned as per cpu cores. for example.
___________________
Name | PID
worker 1011
worker 1012
worker 1013
worker 1014
perl 1000
perl is the process which is running gnu-parallel.
And, gnu-parallel is responsible for running parallel worker processes.
Now, the problem is.
If I kill the workon service.
$ sudo kill 4620
The service has instruction to re-spawn if killed so it restarts. But, the processes created by the service are not killed. Which means it creates a new set of processes. Now we have 2 perl and 8 workers.
Name | PID
worker 1011
worker 1012
worker 1013
worker 1014
worker 2011
worker 2012
worker 2013
worker 2014
perl 1000
perl 2000
If you ask me, the old process which abandon by service, are they zombies?
Well, the answer is no. They are alive cuz I tested them. Everytime the service dies it creates a new set.
Well, this is one problem. Another problem is with the gnu-parallel.
Lets say I started the service as fresh. Service is running good.
I ran this command to kill the gnu-parallel, i.e. perl
$ sudo kill 1000
This doesn't kill the workers,and they again left without any parent. But, the workon service intercept the death of perl and respawn a new set of workers. This time we have 1 perl and 8 workers. All 8 workers are alive. 4 of them with parent and 4 are orphan.
Now, how do I solve this problem? I want kill all processes created by the service whenever it crashes.
Well, I was able to solve this issue by post-stop. It is an event listener I believe which executes after a service ends. In my case, if I run kill -9 -pid- (pid of the service), post-stop block is executed after the service process is killed. So, I can write the necessary code to remove all the processes spawned by the service.
here is my code using post-stop.
post-stop script
exec killall php & killall perl
end script
I'm having issues with a celery beat worker not sending out tasks to celery. Celery runs on three servers with a RabbitMQ cluster behind HAProxy as a backend.
Celery beat is used to schedule a task every day at 9AM. When I start the worker, usually the first task succeeds, but after that it seems like the following tasks are never sent to rabbitmq. In the celery beat log file (celery beat is run with the -l debug option), I see messages such as: Scheduler: Sending due task my-task (tasks.myTask), but no sign of the task being received by any celery worker.
I also tried logging messages in rabbitmq via the rabbitmq_tracing plugin, which only confirmed that the task never reached rabbitmq.
Any idea what could be happening? Thanks!
I'm dealing with a very strange problem now.
Since I queue the jobs over 1,000 at once, Gearman doesn't work properly so far...
The problem is that, when I reserve the jobs in background mode, I could see the jobs were correctly queued from the monitoring page (gearman monitor),
but It is drained right after without delivering it to the worker. (within a few seconds)
After all, the jobs never be executed by the worker, just disappeared from the queue (job server).
So I tried rebooting the server entirely, and reinstall gearman as well as php library. (I'm using 1 CentOS, 1 Ubuntu with PHP gearman library, and version is 0.34 and 1.0.2)
But no luck yet... Job server just misbehaving as I explained in aobve.
What should I do for now?
Can I check the workers state, or see and monitor the whole process from queueing the jobs to the delivering to the worker?
When I tried gearmand with a option like: 'gearmand -vvvv' It never print anything on the screen while I register worker to the server, and run a job with client code (PHP)
Any comment will be appreciated.
For your information, I'm not considering persistent queue using MySQL or SQLite for now, because it sometimes occurs performance issue with slow execution.