When I run the !threads command, in the left most columns for few threads i see id as XXX. My understanding is that this mean dead threads. Does this includes any threads that exited (gracefully or ungracefully) or is it for any threads that didn't exit gracefully and were killed via APIs like Abort or Interrupt
You are right, threads marked as XXXX are dead threads - i.e. they are no longer exist from the prospective of OS. CLR keeps information about threads a bit longer than OS, that is why you see dead threads in output.
Yes, both normally terminated and forcibly killed threads appear with XXXX in output.
Related
Consider the following fork bomb in Python (source):
import os
while 1:
os.fork()
I'm too afraid to test it out myself, but I'm somewhat skeptical that if I just took this program and ran it my computer would just freeze up and die. Assuming this is true, my question is -- what mechanisms or policies is my operating system using to fight it off?
My question can be viewed as sort of an "application" problem to what one might learn in an OS class.
As expected, when I tried it out on my machine, the computer froze and I had to hard reboot. So definitely don't do this on a regular basis.
The last error that I was able to capture from the program was:
BlockingIOError: [Errno 11] Resource temporarily unavailable
File "fork_bomb.py", line 3, in <module>
os.fork()
So at some point, the OS couldn't handle the OS fork calls and returned an error. The only other useful message I can see from /var/log/syslog is
cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-2.scope
Cgroups are a way to restrict resources from processes within a particular cgroup. So presumably, the python processes were in a cgroup that had reached its pid/task limit. So that's one way the OS tries to deal with fork bombs, is limiting tasks using cgroups. Of course, the infinite loop of forks, even if the forks were failing, still required overhead from requesting resources from the OS, hence the system freeze.
Theoretically, another way the OS can try to limit fork bombs is through memory limits. Ignoring copy-on-write, if all the forked processes required extra memory, the Linux OOM (out of memory) killer will be called. This kernel process will be awakened when memory is tight and then its job is to start killing processes that it thinks will help free up sufficient memory to keep the system running. Memory limits can be set using cgroups or by setting the minimum free memory using /proc/sys/vm/min_free_kbytes.
I've a daemon-like process that starts two subprocesses (and one of the subprocesses starts ~10 others). When I systemctl stop my process the child subprocesses appear to be 'aggressively' killed by systemctl - which doesn't give my process a chance to clean up.
How do I get systemctl stop to quit the aggressive kill and thus to allow my process to orchestrate an orderly clean up?
I tried timeoutSec=30 to no avail.
KillMode= defaults to control-group. That means every process of your service is killed with SIGTERM.
You have two options:
Handle SIGTERM in each of your processes and shutdown within TimeoutStopSec (which defaults to 90 seconds)
If you really want to delegate the shutdown from your main process, set KillMode=mixed. SIGTERM will be sent to the main process only. Then again shutdown within TimeoutStopSec. If you do not shutdown within TimeoutStopSec, systemd will send SIGKILL to all your processes.
Note: I suggest to use KillMode=mixed in option 2 instead of KillMode=process, as the latter would send the final SIGKILL only to your main process, which means your sub-processes would not be killed if they've locked up.
A late (possible) answer, but as I googled for weeks with a similar issue, finding nothing, I figured I add my solution.
My error was that I ran the systemd unit as root and switched (using sudo) to "the correct" user in the startscript (inherited from SysVinit script).
That starts the processes in the user.slice which is killed mercilessly on shutdown. When I changed the unit file to run as the correct user (USER=myuser) and removed sudo from the start script, the processes start in the system.slice and get properly handled on shutdown.
One of the mongo nodes in the replica set went down today. I couldn't find what happened but when i checked the logs on the server, I saw this message 'mongod main process killed by KILL signal'. I tried googling for more information but failed. Basically i like to know what is KILL signal, who triggered it and possible causes/fixes.
Mongo version 3.2.10 on Ubuntu.
The KILL signal means that the app will be killed instantly and there is no chance left for the process to exit cleanly. It is issued by the system when something goes very wrong.
If this is the only log left, it was killed abruptly. Probably this means that your system ran out of memory (I've had this problem with other processes before). You could check if swap is configured on your machine (by using swapon -s), but perhaps you should consider adding more memory to your server, because swap would be just for it not to break, as it is very slow.
Another thing worth looking at is the free disk space left and the syslog (/var/log/syslog)
From the bind9 man page, I understand that the named process starts one worker thread per CPU if it was able to determine the number of CPUs. If its unable to determine, a single worker thread is started.
My question is how does it calculate the number of CPUs? I presume by CPU, it means cores. The Linux machine I work is customized and has kernel 2.6.34 and does not support lscpu or nproc utilities. named is starting a single thread even if i give -n 4 option. Is there any other way to force named to start multiple threads?
Thanks in advance.
I'm dealing with a very strange problem now.
Since I queue the jobs over 1,000 at once, Gearman doesn't work properly so far...
The problem is that, when I reserve the jobs in background mode, I could see the jobs were correctly queued from the monitoring page (gearman monitor),
but It is drained right after without delivering it to the worker. (within a few seconds)
After all, the jobs never be executed by the worker, just disappeared from the queue (job server).
So I tried rebooting the server entirely, and reinstall gearman as well as php library. (I'm using 1 CentOS, 1 Ubuntu with PHP gearman library, and version is 0.34 and 1.0.2)
But no luck yet... Job server just misbehaving as I explained in aobve.
What should I do for now?
Can I check the workers state, or see and monitor the whole process from queueing the jobs to the delivering to the worker?
When I tried gearmand with a option like: 'gearmand -vvvv' It never print anything on the screen while I register worker to the server, and run a job with client code (PHP)
Any comment will be appreciated.
For your information, I'm not considering persistent queue using MySQL or SQLite for now, because it sometimes occurs performance issue with slow execution.