How can I debug why a sys.process command from scala is failing with bad exit code (3) when it works from the command line? - scala

I'm writing a simple function to test whether Cassandra is running on the local machine (by capturing the output of service cassandra status. The problem is that this command always exits with a bad return status of 3. I have successfully called service cassandra stop immediately prior to this command, and have even tried running status in a loop to see if there was some race condition (it seems to reliably fail with status 3 for a long time). However, running service cassandra status through the shell works. Any ideas what the issue may be, or even just how to debug this?
private def isCassandraStopped(): Boolean = {
val s = Seq("sudo", "sh", "-c", "service cassandra status").!!
val r = " * Cassandra is not running" == s
if (!r) logger.info(s"Cassandra is not stopped: #$s#")
r
}
This is the line that succeeds prior to executing the above method:
Seq("sudo", "sh", "-c", "service cassandra stop").!

If you are reading process output with !! any non-zero exit status throws an exception by default as described in the docs. You can try to work around it and read the output as well as get the process exit status in ways described here.
By default exit status 3 should mean that the program is not running as described here. However, this might not be respected by your init script. So the best thing is to read the script and make sure that it returns the exit code 3 in that case.
So your options are:
read the exit code only if it's reliable
check how your run script is implemented and duplicate the process status check logic yourself - usually it checks that pid file exists and the process with that pid is running (checkpid).
read both output and exit status as described in other SO QAs
read all lines without throwing an exception with lines_!: Process("your command here").lines_!. You will not have an exit status in this case.
Usually exit status is more reliable in properly written shell scripts than output parsing.

Related

exec power shell script having corrective action every time

Getting the corrective action for exec while using Powershell to ADD usersand groups to local admin group. Please note I am not a scripting guy, not sure what wrong I am doing.
Notice: /
By default, an Exec resource is applied on every run. That is mediated, where desired, by the resource's unless, onlyif, and / or creates parameters, as described in that resource type's documentation.
The creates parameter is probably not appropriate for this particular case, so choose one of unless or onlyif. Each one is expected to specify a command for Puppet to run, whose success or failure (as judged by its exit status) determines whether the Exec should be applied. These two parameters differ primarily in how they interpret the exit status:
unless interprets exit status 0 (success) as indicating that the Exec's main command should not be run
onlyif interprets exit statuses other than 0 (success) as indicating that the Exec's main command should not be run
I cannot advise you about the specific command to use here, but the general form of the resource declaration would be:
exec { 'Add-LocalGroupMember Administrators built-in':
command => '... PowerShell command to do the work ...',
unless => '... PowerShell command that exits with status 0 if the work is already done ...',
provider => 'powershell',
}
(That assumes that the puppetlabs-powershell module is installed, which I take to be the case for you based on details presented in the question.)
I see your comment on the question claiming that you tried this approach without success, but this is the answer. If your attempts to implement this were unsuccessful then you'll need to look more deeply into what went wrong with those. You haven't presented any of those details, and I'm anyway not fluent in PowerShell, but my first guess would be that the exit status of your unless or onlyif script was computed wrongly.
Additionally, you probably should set the Exec's refresh property to a command that succeeds without doing anything. I'm not sure what the would be on Windows, but on most other systems that Puppet supports, /bin/true would be idiomatic. (That's not correct for Windows; I give it only as an example of the kind of thing I mean.) This will prevent running the main command twice in the same Puppet run in the event that the Exec receives an event.

Task exited with exit code null

I am running into a problem of a task exiting with a null exit code. And with this exit code, I noted that I can't access files on node to check the stderr and stdout files. What could be the problem? Also, what does a null exit code mean and how can I set the exit code to be not nullin case of failure?
Thanks!
You will want to check the task's failureInfo field within the executionInfo property.
There is a difference between task failure and application logic failure for the process (command to execute) that is executed under the task. A task failure can be a multitude of things such as a resource file for a task failing to download. A process failing to launch properly for some reason is also a task failure. However, if the process does launch and execute, but the process itself "fails" (as per application logic) and returns a non-zero exit code and no other issues are encountered with the task, this task will have the proper exit code saved. Thus, if a task completes with a null exit code, you will need to consult the failureInfo field as per above along with any stdout/stderr logs if they exist.

Catching the error status while running scripts in parallel on Jenkins

I'm running two perl scripts in parallel on Jenkins and one more script which should get executed if the first two succeed. If I get an error in script1, script 2 still runs and hence the exit status becomes successful.
I want to run it in such a way that if any one of the parallel script fails, the job should stop with a failure status.
Currently my setup looks like
perl_script_1 &
perl_script_2 &
wait
perl_script_3
If script 1 or 2 fails in the middle, the job should be terminated with a Failure status without executing job 3.
Note: I'm using tcsh shell in Jenkins.
I have a similar setup where I run several java processes (tests) in parallel and wait for them to finish. If any fail, I fail the rest of my script.
Each test process writes its result to a file to be tested once done.
Note - the code examples below are written in bash, but it should be similar in tcsh.
To do this, I get the process id for every execution:
test1 &
test1_pid=$!
# test1 will write pass or fail to file test1_result
test2 &
test2_pid=$!
...
Now, I wait for the processes to finish by using the kill -0 PID command
For example test1:
# Check test1
kill -0 $test1_pid
# Check if process is done or not
if [ $? -ne 0 ]
then
echo process test1 finished
# check results
grep fail test1_result
if [ $? -eq 0 ]
then
echo test1 failed
mark_whole_build_failed
fi
fi
Same for other tests (you can do a loop to test all running processes periodically).
Later condition the rest of the execution based on mark_whole_build_failed.
I hope this helps.

What does the EC2 command line say when a machine won't start?

When starting an instance on Amazon EC2, how would I detect a failure, for instance, if there's no machine available to fulfill my request? I'm using one of the less-common machine types and am concerned it won't start up, but am having trouble finding out what message to look for to detect this.
I'm using the EC2 commandline tools to do this. I know I can look for 'running' when I do ec2-describe-instance to see if the machine is up, but don't know what to look for to see if the startup failed.
Thanks!
The output from ec2-start-instances only returns you stopped pending, and as you say you need to use ec2-describe-instances to retrieve the state.
For that, you have a couple of choices; you can either use a loop to check for instance-state-name, looking for a result of running or stopped; alternatively you could look at either the reason or state-reason-code fields; unfortunately you'll need to trigger the failure you're worried about, to obtain the values that indicate failure.
The batch file I use to wait for a successful startup (fill in the underscores):
#echo off
set EC2_HOME=C:\tools\ec2-api-tools
set EC2_PRIVATE_KEY=C:\_\pk-_.pem
set EC2_CERT=C:\_\cert-_.pem
set JAVA_HOME=C:\Program Files (x86)\Java\jre6
%EC2_HOME%\bin\ec2-start-instances i-_
:docheck
%EC2_HOME%\bin\ec2-describe-instances | C:\tools\gnuwin32\bin\grep.exe -c stopped > %EC2_HOME%\temp.txt
findstr /m "1" %EC2_HOME%\temp.txt > nul
if %errorlevel%==0 (c:\tools\gnuwin32\bin\echo -n "."
goto docheck)
del temp.txt
ec2-start-instances will return you the previous state (after last command to instance) and the current state (after your command). ec2-stop instances does the same thing. THE PROBLEM IS, if you are scripting and you use -start- on a 'stopping' instance -OR- you use a -stop- on a 'pending' instance. These will cause exceptions in the command line tool and NASTILY exit your scripts all the way to the original console (VERY BAD BEHVIOR, AMAZON). So you have to go all the way through parsing the ec2-describe-instances [instance-id] result. HOWVER, that still leaves you vulnerable to that tiny little bit of time between when you GET the status from your instance and you APPLY A COMMAND. If someone else, or Amazon, puts you into pending or stopping, and you then do 'stop' or 'start respectively, your script will break. I really don't know how to catch such an exception with script. Bad Amazon AWS, BAD DOG!

how to use a shell script to supervise a program?

I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.
I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.
The following "if" always returns false, what am I doing wrong here???
while true; do
if [ ps -p $pid ]; then
echo "Program running fine"
sleep 10
else
echo "Program being restarted\n"
perl program_name.pl &
sleep 5
read -r pid < "${filename}_pid.txt"
fi
done
Get rid of the square brackets. It should be:
if ps -p $pid; then
The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:
if test ps -p $pid; then
In fact that yields "-bash: [: -p: binary operator expected" when I run it.
Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.
First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.
Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.
added in response to comment:
Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:
I know I should check by process name
and not process id, since another
process could jump in and take the id.
So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which
the "monitored" process fails
I start up my week long emacs process which grabs the same PID
the nanny script continues on blissfully unaware that its dependent has failed
The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:
while true ; do
if perl program_name.pl ; then
echo "program_name terminated normally, restarting"
else
echo "oops program_name died again, restarting"
fi
done
Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.
I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.
enable you to send signals to the supervised process (without knowing it's PID)
check how long a service has been up or down
capturing its output and write it to a log file
Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).
I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)
There are two possible solutions:
Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.
That's what kill -0 $pid is for. It returns success if a process with pid $pid exists.