how to get the pid of a process which has sent a SIGABRT signal to another process which has exited dumping core - coredump

how to get the pid of a process which has sent a SIGABRT signal to another process which has exited dumping core

In short, by installing a signal handler for SIGABRT. More specifically, if you specify the SA_SIGINFO flag when installing the signal handler, then the siginfo_t structure should be populated with extra info about the signal, including the sender's PID etc.

Related

Parallel h5py - The MPI_Comm_dup() function was called before MPI_INIT was invoked

I am experiencing the below issue with Parallel h5py on MacOS Ventura 13.0.1, on a 2020 MacBook Pro 4-Core Intel Core i7.
I installed h5py and dependencies, by following both of these docs and this guide.
Running a job which requires only mpi4py runs and finishes without any issues. The problem comes when I try to run a job which requires Parallel h5py, e.g. trying out this code.
I get back the following:
*** The MPI_Comm_dup() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[...] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Comm_dup() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[...] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Comm_dup() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[...] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Comm_dup() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[...] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[20469,1],3]
Exit code: 1
--------------------------------------------------------------------------
I found this GitHub issue, but it didn't help in my case.
I should also point out that I managed to install and use Parallel h5py on a MacBook Air with MacOS Monterey, though that one is only dual-core, so it doesn't allow me to test Parallel h5py with as many cores, without using -overcommit.
Since I have not found any ideas how to resolve this, apart from the above GitHub issue, I would appreciate any suggestions.

Celery lose worker

I use celery 4.4.0 version in my project(Ubuntu 18.04.2 LTS). When i raise Exception('too few functions in features to classify') , celery project lost worker and i get such logs:
[2020-02-11 15:42:07,364] [ERROR] [Main ] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0.')
Traceback (most recent call last):
File "/var/lib/virtualenvs/simus_classifier_new/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost human_status(exitcode)), billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0.
[2020-02-11 15:42:07,474] [DEBUG] [ForkPoolWorker-61] Closed channel #1
Do you have any idea how to solve this problem?
WorkerLostError are almost like OutOfMemory errors - they can't be solved. They will continue to happen from time to time. What you should do is to make your task(s) idempotent and let Celery retry tasks that failed due to worker crash.
It sounds trivial, but in many cases it is not. Not all tasks can be idempotent for an example. Celery still has bugs in the way it handles WorkerLostError. Therefore you need to monitor your Celery cluster closely and react to these events, and try to minimize them. In other words, find why the worker crashed - Was it killed by the system because it was consuming all the memory? Was it killed simply because it was running on an AWS spot instance, and it got terminated? Was it killed by someone executing kill -9 <worker pid>? All these circumstances could be handled this way or another...

boofuzz - Target connection reset, skip error

I am using boofuzz to try to fuzz a specific application. While creating the blocks etc and some testing i noticed that the target sometimes closes the connection. This causes procmon to terminate the target process and restarts it. However this is totally unnecessary for this target.
Can i somehow tell boofuzz to not handle this as an Error (so target is not restarted)
[2017-11-04 17:09:07,012] Info: Receiving...
[2017-11-04 17:09:07,093] Check Failed: Target connection reset.
[2017-11-04 17:09:07,093] Test Step: Calling post_send function:
[2017-11-04 17:09:07,093] Info: No post_send callback registered.
[2017-11-04 17:09:07,093] Test Step: Sleep between tests.
[2017-11-04 17:09:07,094] Info: sleeping for 0.100000 seconds
[2017-11-04 17:09:07,194] Test Step: Contact process monitor
[2017-11-04 17:09:07,194] Check: procmon.post_send()
[2017-11-04 17:09:07,196] Check OK: No crash detected.
Excellent question! There isn't (wasn't) any way to do this, but there really should be. A reset connection does not always mean a failure.
I just added ignore_connection_reset and ignore_connection_aborted options to the Session class to ignore ECONNRESET and ECONNABORTED errors respectively. Available in version 0.0.10.
Description of arguments available in the docs: http://boofuzz.readthedocs.io/en/latest/source/Session.html
You may find the commit that added these arguments informative for how some of the boofuzz internals work (relevant lines 182-183, 213-214, 741-756): https://github.com/jtpereyda/boofuzz/commit/a1f08837c755578e80f36fd1d78401f21ccbf852
Thank you for the solid question.

Couldn't register com.xxx.appname with the bootstrap server. Error: unknown error code.Program received signal: “SIGABRT”

I am getting this error while running in device.
Couldn't register com.xxxx.appname with the bootstrap server.
Error: unknown error code.
This generally means that another instance of this process was already running or is hung in the debugger.Program received signal: “SIGABRT”.
Now i rectified this problem .I have posted answer also if anyone got this error follow my post..
Try this,
Deleteing the app, restarting Xcode, clean build, didn't do anything
--> restart your computer
--> restart the phone

Can a sleeping Perl program be killed using kill(SIGKILL, $$)?

I am running a Perl program, there is a module in the program which is triggered by an external process to kill all the child processes and terminate its execution.
This works fine.
But, when a certain function say xyz() is executing there is a sleep(60) statement on a condition.
Right now the function is executed repeatedly as it is waiting for some value.
When I trigger the kill process as mentioned above the process does not take place.
Does anybody have a clue as to why this is happening?
I don't understand how you are trying to kill a process from within itself (your $$ in question subject) when it's sleeping.
If you are killing from a DIFFERENT process, then it will have its own $$. You need to find out the PID of the original process to kill first (by trolling process list or by somehow communicating it from the original process).
Killing a sleeping process works very well
$ ( date ; perl5.8 -e 'sleep(100);' ; date ) &
Wed Sep 14 09:48:29 EDT 2011
$ kill -KILL 8897
Wed Sep 14 09:48:54 EDT 2011
This also works with other "killish" signals ('INT', 'ABRT', 'QUIT', 'TERM')
UPDATE: Upon re-reading, may be the issue you meant was that "triggered by an external process" part doesn't happen. If that's the case, you need to:
Set up a CATCHABLE signal handler in your process before going to sleep ($SIG{'INT'}) - SIGKILL can not be caught by a handler.
Send SIGINT from said "external process"
Do all the needed cleanup once sleep() is interrupted by SIGINT from SIGINT handler.