My master scripts executes on gateway server. It invokes another script and executes it on a remote server, however there is no definite run time for the remote script. So i would like to disable the timeout..
Ok let me explain.
sub rollout
{
(my $path, my $address)=#_;
my $errorStatus = 0;
print "Rollout process being initiated on netsim\n";
#Creating SSH object
my $ssh = Net::SSH::Expect-> new (
host => "netsim",
user => 'root',
raw_pty => 1
);
#Starting SSH process
$ssh->run_ssh() or die "SSH process couldn't start: $!";
#Start the interactive session
$ssh->exec("/tmp/rollout.pl $path $address", 30);
return $errorStatus;
}
When this part of the code gets executed, the rollout.pl script is invoked, however after a second the process gets killed on the remote server
Without knowing exactly what is going wrong let me do a wild guess at an answer. Should i missinterpret your question, i will edit this later:
The default and enforced timeout of Net::SSH::Expect is only a timeout for read operations. It does not mean, that a started script will be killed when its time has run out, it just means that the last reading operation no longer waits for it saying stuff.
You can start your remote script, for example with a timeout of 1 second and the exec call may return after one second. The string returned by exec will then only contain the first second of output by the remote script. But the remote script may continue to run, you just need to start a new reading operation (there are several to your choice) to capture future output. The remote script, if everything works as expected, should continue to run until it is finished or you close your ssh connection.
That means, it will be killed if you just exec, wait for that to return and then close your connection. One way to go would be to have the remote script print something when it is finished ("done" comes to mind) and wait for that string with a waitfor with very high timeout and only close the connection when waitfor has returned TRUE. Or, well, if the timeout was reasonably long, also when it returns FALSE. If you want to monitor for your script crashing, i would suggest using pgrep or ps on the remote machine. Or make sure your remote script says something when dieing, for example using eval{} or die "done".
edit:
One thing to remember is, that this exec is in fact not something like system. It basically sends the command to the remote shell and hits enter. At least that is how you might visualize what is happening. But then, to appear more like system it waits for the default timeout of 1 second for any output and stores that in its own return value. Sadly it is a rather dumb reading operation as it just sits there and listens, so it is not a good idea to give it a high timeout or actually use it to interact with the script.
I would suggest to uses exec with a timeout of 0 and then use waitfor with a very high timeout to wait for whatever pattern your remote script puts as output once finished. And only then close your connection.
edit 2:
Just replace your exec with something like:
$ssh->exec("/tmp/rollout.pl $path $address", 0);
my $didItReturn = $ssh->waitfor("done", 300);
And make sure rollout.pl prints "done" when it is finished. And not before. Now this will start the script and then wait for up to 5 minutes for the "done". If a "done" showed up, $didItReturn will be true, if it timed out, it will be false. It will not wait for 5 minutes if it sees a "done" (as opposed to giving a timeout to exec).
edit 3:
If you want no timeout at all:
$ssh->exec("/tmp/rollout.pl $path $address", 0);
my $didItReturn = 0;
while ($didItReturn = 0)
{
$didItReturn = $ssh->waitfor("done", 300);
}
Related
I have a perl script which takes a bunch of commands, say command1, command2. The commands supposedly take a long time to complete, about 3-4 hours.
In the script (in perl), for each command, I create (fork) a child process and in that process I execute, $ ssh s1 "command1". Then, I wait for the child to finish.
my $exec_command = "ssh $machine \"$command\"";
The $command is a long computationally intensive cplusplus exec. Nothing too fancy there. In the child,
my $out = `$exec_command`;
Now, sometimes, it so happens that the child does not quit by itself. I have to repeatedly press Enter and then, the children process quit.
I have googled stuff, but to no avail. I don't even know where the problem is, in ssh or in my child-parent relationship.
Thanks.
I'm executing multiple commands over ssh using the Net::SSH2 module.
My problem is the following:
Sometimes if I send the command I get no output back. The command is still executed I just don't get the response. Sometimes I get the response in the output of the next command I send.
I suspected that it tries to get the output before it appeared in the shell but then it should at least return the shell "welcome lines" (on shell startup) if its the first command but sometimes i get no output at all. At this point I just don't know what to try anymore.
I've tried
to get the output multiple ways described in several forums for example from this demo.
Adding sleep between sending command and grabbing output. Helped somewhat most of the times but it still occurs. And having to determine the sleep time for each command individually is just not feasible. Especially since 1s seems to help on some occurences and doesn't help on others.
Running the command multiple times, which results in the next command sometimes grabbing output from the previous command.
Pushing until the last line of the output is a prompt with the same result: either ok or 0 lines.
My implementation
sub connectSSH{
my $user = "...";
my $password = "...";
my $ssh2 = Net::SSH2->new();
my $chan;
if($ssh2->connect($ip,$port,Timeout => 10)){
if(!$ssh2->auth_password($user, $password)){
print"Error: Password wrong\n";
exit;
}else{
$chan = $ssh2->channel(); # SSH
$chan->blocking(0);
$chan->shell();
}
}else{
print "Connection to $ip not possible\n";
exit;
}
return $chan;
}
sub sendCommand{
my ($chan,$command) = #_;
my #output=();
print $chan "$command\n";
#usleep(500)
push(#output,"$_") while <$chan>;
#process output...
}
I have to note that I've tried the usecases that didn't work with perl using python + paramiko and that seemed to have worked fine (didn't test as thorougly).
Update 26.08 (Partly solved)
I have retried (Thanks #jcaron for making me retry) pushing until prompt appears and that seems to work for most (the important) cases. Turns out that when I tried that method the first time there was another problem on top of the existing one, where I get no output from that specific host at all (not even after a delay/repeating the commands). Again with python/paramiko it works but not with my implementation in perl. However since the output from that specific group of hosts is not as important it's ok.
I have a simple Perl script that simply prints a line of text to stdout. What I want to accomplish is that while this script runs, if I (or someone else) issues a signal to that process to stop, I want it to trap that signal and exit cleanly. The code I have looks like the following
#!/usr/bin/perl -w
$| = 1;
use sigtrap 'handler' => \&sigtrap, 'HUP', 'INT','ABRT','QUIT','TERM';
while(1){
print "Working...\n";
sleep(2);
}
sub sigtrap(){
print "Caught a signal\n";
exit(1);
}
While this works well when I actually hit ctrl-c from the command line, if I issue a
kill -9 <pid>
It just dies. How do I get it to execute something before exiting? My general idea is to use this framework to capture when this script dies on a server due to a server reboot for maintenance or failure.
Thanks much in advance
Signal #9 (SIGKILL) can not be trapped. That's how Unix is designed.
But the system does not send that signal when shutting down for maintainance. At least if your daemon behaves correctly. It will normally send the TERM signal (or more exactly what your daemon handling script in /etc/init.d does). Only processes that do not correctly shutdown after a timeout will receive SIGKILL.
So your aim should be to correctly handle the TERM signal and to write the wrapper script in /etc/init.d that will be called when the system is changing runlevel.
Update: You can use the Daemon::Control module for the init script.
You're sending two very different signals to your process. Pressing Ctrl-C in console usually sends the process a TERMINT signal, which - judging by your code - is caught and serviced. kill -9, though, sends signal number 9 explicitly, which is called KILL. This is one of the signals whose servicing cannot be redefined and delivery of this signal always immediately ends the process, which is done by the kernel itself.
As far as I know you can't capture kill -9. Try kill <pid> instead.
I'm writing a perl script that kicks off a script on several different servers using ssh. The remote script needs to run as long as this script is running:
#!/usr/bin/perl
require 'config.cfg'
##servers is defined in config.cfg
#Contains server info as [username,hostname]
#
# #servers = ([username,server1.test.com],[username,server2.test.com])
#
foreach $server ( #servers ) {
my $pid = fork();
if ( $pid == 0 ) {
$u = ${$server}[0];
$h = ${$server}[1];
print "Running script on $h \n";
print `ssh -tl $u $h perl /opt/scripts/somescript.pl`;
exit 0;
} else {
die "Couldn't start the process: $!\n";
}
}
[...]
When I run this script, I get the following output:
./brokenscript.pl
Running script on server01
$ tcsetattr: Input/output error
Connection to server01 closed.
The same result occurs when running with system (and backticks are preferred anyways since I want the output of the script). When I run the exact command between the backticks on the command line, it works exactly as expected. What is causing this?
The tcsetattr: Input/output error message comes from ssh when it tries to put the local terminal into “raw” mode (which involves a call to tcsetattr; see enter_raw_mode in sshtty.c, called from client_loop in clientloop.c).
From IEEE Std 1003.1, 2004 (Posix) Section 11.1.4: Terminal Access Control, tcsetattr may return -1 with errno == EIO (i.e. “Input/output error”) if the calling process is in an orphaned (or background?) process group.
Effectively ssh is trying to change the setting of the local terminal even though it is not in the foreground process group (due of your fork and, the local script exiting (as evidenced by the apparent shell prompt that comes immediately before the error message in your quoted output)).
If you just want to avoid the error message, you can use ssh -ntt (redirect stdin from /dev/null, but ask the remote side to allocate a tty anyway) instead of ssh -t (add your -l and any other options you need back in too, of course).
More likely, you are interesting in keeping the local script running as long as some of the remote processes are still running. For that, you need to use the wait function (or one of its “relatives”) to wait for each forked process to exit before you exit the program that forked them (this will keep them in the foreground process group as long as the program that started them is in it). You may still want to use -n though, since it would be confusing if the multiple instances of ssh that you forked all tried to use (read from, or change the settings of) the local terminal at the same time.
As a simple demonstration, you could have the local script do sleep 30 after forking off all the children so that the ssh command(s) have time to start up while they are part of the foreground process group. This should suppress the error message, but it will not address your stated goal. You need wait for that (If I am interpreting your goal correctly).
That probably happens because you are forcing SSH to allocate a tty when stdin/stdout are not really ttys. SSH tries to call some specific tty function on those handlers (probably forwarded from the remote side) and the call fails returning some error.
Is there any reason why you should be allocating a tty?
Is there also any reason to use the obsolete version 1 of the SSH protocol?
I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.
I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.
The following "if" always returns false, what am I doing wrong here???
while true; do
if [ ps -p $pid ]; then
echo "Program running fine"
sleep 10
else
echo "Program being restarted\n"
perl program_name.pl &
sleep 5
read -r pid < "${filename}_pid.txt"
fi
done
Get rid of the square brackets. It should be:
if ps -p $pid; then
The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:
if test ps -p $pid; then
In fact that yields "-bash: [: -p: binary operator expected" when I run it.
Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.
First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.
Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.
added in response to comment:
Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:
I know I should check by process name
and not process id, since another
process could jump in and take the id.
So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which
the "monitored" process fails
I start up my week long emacs process which grabs the same PID
the nanny script continues on blissfully unaware that its dependent has failed
The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:
while true ; do
if perl program_name.pl ; then
echo "program_name terminated normally, restarting"
else
echo "oops program_name died again, restarting"
fi
done
Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.
I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.
enable you to send signals to the supervised process (without knowing it's PID)
check how long a service has been up or down
capturing its output and write it to a log file
Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).
I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)
There are two possible solutions:
Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.
That's what kill -0 $pid is for. It returns success if a process with pid $pid exists.