I have written a Perl Script which connects to a set of URLs one by one. Each time it connects to a URL, it fetches the result and performs some operations on it.
However, there are a few possibilities where an error might occur.
The most important of these which I am trying to address is, availability issue. The Internet connection might stop working in between. So, lets say I have a list of 1000 URLs, the script has gone through 500 of them successfully and then Internet stops working, as a result, the script aborts with an error. I want to control this condition and implement a pause and resume feature here.
A few lines of code to explain:
$mech=WWW::Mechanize->new();
$mech->timeout(10);
$url="http://example.com/index.php?id=";
while(<INPUT>)
{
chomp $_;
$word=$_;
$url=$url.$word;
eval{$mech->get($url);};
if($#)
{
# Pause the Script once the if condition evaluates to true
}
....
}
There could be several conditions where I would want to pause the script.
So, I understand that I need to pass some kind of signal to the script which would cause it to pause. At the same time, there should be a functionality provided to the end user, who can press "some key combination" to resume the script.
My guess is that, this would involve some OS API Calls.
On Linux, there is a way to suspend a process, by pressing Ctrl+Z key combination.
However, I am not sure how this can be automated in the Perl Script.
Please let me know if you need more details in order to find a solution to this.
Thanks.
It's pretty standard to have a process blocked while waiting for a read. While it's blocked, it won't receive any CPU time. No OS calls, outside of what Perl normally does, would be needed. So just issue your prompt "Press enter when ready to resume, and read STDIN.
Blocked-on-IO is standard as a state for an OS's process tables. Quite recently, I wrote a socket test script in Perl to work as a dummy server. I left in a haste one night and found "Waiting on port 3371..." prompt on my dummy server's terminal in the morning. It would have stayed like that for weeks.
So, you can wrap the whole thing in a labeled IO-loop, and instead of die-ing or croaking use loop controls like so:
BIT_OF_WORK:
while ( $more_work ) {
ALL_GOOD:
while ( $all_good ) {
...
my $socket = open_socket() or last ALL_GOOD;
...
last BIT_OF_WORK unless ( $more_work = #queue );
...
}
say 'Detected Network Down. Press Enter to continue';
my $user_input = <STDIN>;
}
say 'Phew, that was a lot of work!';
You might find redo more to your tastes even.
Related
I'm trying to write a program, actually a daemon, which stay in memory and perform something like tail -F on a rapidly updated log file. Then the program, when detect a new line on the file, have to launch another compiled perl script which will perform some operations on the log line and then send it with a post.
To clearly explain, I will refer to these two program as "prgTAIL" and "prgPROCESS". So, prgTAIL tail the log and launch prgPROCESS passing the new line to it.
Obviously the prgTAIL doesn't have to wait for the prgPROCESS to end the process, cause prgTAIL have to stay in memory and keep detecting new line on the log. Also, the rate of file update needs to launch multiple parallel prgPROCESS instance. For this reason I'm using two program: the first small and fast just pass the data to the second, which may be heavier cause it can be launched in multiple instances.
On the prgTAIL I used:
a pipe to tail the log file
a while loop to launch prgPROCESS on new log line
a fork(); to continue without waiting prgPROCESS ends
my $log_csv = "/log/csv.csv";
open (my $pipe, "-|", "tail", "-n0", "-F", $log_csv) or die "error";
while (<$pipe>) {
$line = $_ ;
my $pid = fork();
if (defined $pid && $pid == 0) {
exec("/bin/prgPROCESS ".$line) ; # I tried system() too.
exit 0;
}
}
The prgPROCESS operation are not so important; anyway.. it parses the $line passed as arguments, construct an XML and then post it via https.
So, this stuff actually run, but I think I messed up something with the process, cause when a reach a number of newline and prgPROCESS call around 550, prgTAIL keep running but it can't call prgPROCESS anymore, cause there are too many process. I get this error on the bash:
-bash: fork: Resource temporarily unavailable
What's wrong? Any idea? Maybe the prgPROCESS processes don't end and stay stuck without make room for other process?
PS: I'm using a Mac OS X now, but this will run on Linux.
Your problem is this:
while () {
doesn't have any constraint condition, so it's just spinning as fast as it can. You're never actually reading from your pipe, you're just forking as fast as you can and spawning that new script.
You might be wanting:
while ( my $line = <$pipe> ) {
#....
}
But really - it's arguable that you don't actually need to fork at all, because a read/process/read loop would probably do just fine - fork() and exec() is basically what system already does anyway.
You should also - if forking - clean up child processes. It doesn't matter too much for short running things, but things that sit in a loop will leave a lot of zombie processes. Either via setting $SIG{CHLD} or using waitpid.
I am writing a Perl that automatically interacts with another script.
The script needs double confirm for some critical operations.
Executing the script without Perl is something like the following:
$ ./TheScript
TheScript Starting.......
Following step might be harmful to your system.
Are You Sure (Y/N)?
$ Y
TheScript finished!
Now I want a Perl script doing that for me.
I am sure that (Y/N) confirmation will exist within 10 sec. So I've tried:
system('./TheScript');
sleep 10;
system('Y');
This failed because it stuck in system('./TheScript') and did not
go to the rest of the script including reply 'Y'.
Backstick ` is almost the same as system except it captures the STDOUT.
exec() is more impossible because it forks TheScript and is not able to do anything on it again.
Did I make any mistakes doing the analysis? Or are there any functions doing what I want?
You misunderstand the system function. It waits for the program to exit before your Perl program continues.
To drive an interactive program from Perl, you want the Expect module (or perhaps Expect::Simple). However, for a very simple case like you're suggesting, IPC::Open2 may suffice, and it's a core module.
As per your written Perl script, you are facing issue when double confirmation occurred by the system. So for that I can suggest, you can write the script in such way that ,
first it checks first confirmation OK fine
if next again it asks confirmation , your script must check second confirmation as well
for this ,
my $conf= "Are You Sure (Y/N)?"; my $length = $conf; if ($length > 0) { sleep 0; system('Y'); } else { system('N'); }
I hope , this script will be fine for you.
I have a perl script that is like so:
foreach my $addr ('http://site1.com', ...., 'http://site2.com') {
my $script = `curl -m 15 $addr`;
*do stuff with $script*
}
The -m sets a timeout of 15 seconds. Is there a way to make it if a user pushes a key, it stops the current execution and moves onto the next item in the foreach? I know last; can move to the next item but I am unsure of how to link this to the key being pushed and how to do it while the curl script is running
Edit: So based on the answers it seems difficult to do it while curl is running. Would it be possible to push a key while curl is running and have it skip to the next item in the loop as soon as the curl script returns (or times out after 15sec)?
The problem you've got with this, is that when you run curl perl hands over control and waits for completion. It blocks until it's 'done'.
So it's not as easy to do this as it might seem.
As another poster alludes to - you can use a variety of parallel processing options. I would suggest the easiest is to move away from using 'any' key, and require a ctrl-c.
So you'd then do:
foreach my $addr ('http://site1.com', ...., 'http://site2.com') {
my $pid = open ( my $curl_fh, "-|", "curl -m 15 $addr" );
$SIG{'INT'} = sub { print "Aborting fetch of $addr"; kill $pid };
while ( <$curl_fh> ) {
print;
}
#might want to set it to something else.
#undef means 'ctrl-c' will abort the whole program.
#IGNORE means exactly what it says on the tin.
#important to change it though, as it has a specific pid it'll kill,
#and that might cause problems.
$SIG{'INT'} = undef;
}
What this does is configure SIGINT (e.g. ctrl-c) so it doesn't kill your program, but does kill the sub-process.
If you wanted to look at other options, I'd offer:
Multithreading, spawn a thread to 'do' the curl fetching in the background and use Thread::Queue to pass results back and forth. (Thread::Queue supports nonblocking checks).
Forking - fork a sub process to do the curl, and use your 'main' process to send a signal if a key is pressed.
IO::Select such that you're not making blocking reads on your process.
Basically you have two options:
1. Use threads
Create a new thread, call desired system function there. Wait for output. In another thread, check for user input. On input, you can kill the child process. When child process has finished, you can ignore user input.
Such a solution seems to be rather complex, with a lot of synchronization needed, probably with using signals. Risky.
2. Use non-blocking IO
Please read this thread. It explains how to make non-blocking IO reads from either a file or a pipe. You'd like to make a non-blocking read from pipe (created with open), then non-blocking read from STDIN, loop.
Seems like a way to go, but, alas, rather complex as well.
I am trying to run an application inside a Perl script using system(). The application I'm running gets stuck sometimes (it enters some kind of infinite loop). Is there a way I can know if this application is stuck and kill it to continue with the Perl script?
I'm trying to do something like this:
start testapp.exe;
if(stuck with testapp.exe) {
kill testapp.exe;
}
Determining if "it is stuck in infinite loop" is called Halting Problem and is undecidable.
If you want to kill it, you will have to fork the application using fork and then kill it from the other fork, if it is going for too long.
You can determine if the proccess is going for too long by this
use POSIX ":sys_wait_h";
waitpid($pid, WNOHANG)>0 #waitpid returns 0 if it still running
at least, according to this manual page
I am not sure how well it works on various systems, you can try it out.
Not a direct answer, but I can recommend using forks module if you want to fork with ease, but it works only on UNIX systems (not windows).
OK, more helping code :) It works in UNIX, according to perlfork perldoc, it should work on Windows exactly the same way.
use warnings;
use strict;
use POSIX ":sys_wait_h";
my $exited_cleanly; #to this variable I will save the info about exiting
my $pid = fork;
if (!$pid) {
system("anything_long.exe"); #your long program
} else {
sleep 10; #wait 10 seconds (can be longer)
my $result = waitpid(-1, WNOHANG); #here will be the result
if ($result==0) { #system is still running
$exited_cleanly = 0; #I already know I had to kill it
kill('TERM', $pid); #kill it with TERM ("cleaner") first
sleep(1); #wait a bit if it ends
my $result_term = waitpid(-1, WNOHANG);
#did it end?
if ($result_term == 0) { #if it still didnt...
kill('KILL', $pid); #kill it with full force!
}
} else {
$exited_cleanly = 1; #it exited cleanly
}
}
#you can now say something to the user, for example
if (!$exited_cleanly) {...}
system("start testapp")
is short for
system("cmd", "/c", "start testapp")
Perl just knows about cmd; it doesn't know anything about start, much less about testapp. system is not the tool you want. That's the first problem.
The second problem is that you haven't defined what it means to be "stuck". If you want to monitor a program, it needs a heartbeat. A heartbeat is a periodic activity that can be externally examined. It can be writing to a pipe. It can be changing a file. Anything.
The monitoring program listens for this heartbeat, and presumes the program is dead if the heart stops beating, so to speak.
"Killing" is done using signals in unix, but it's done using TerminateProcess in Windows. The third problem is that Perl core does not give you access to that function.
The solution to the first and third problem is Win32::Process. It allows you to launch a process in the background, and it also allows you to terminate it.
Creating a heartbeat is up to you.
Here is one way you can handle the problem if you know that testapp should not take more than N seconds to do its thing, then you can use a timeout to kill the app by way of IPC::Run.
In the example below there is a timeout of 1 second which kills the sleep 10 command that takes too long (longer than the timeout of 1 second). If this doesn't do what you want, then you should provide more information about how you can detect that testapp.exe is "stuck".
#!/usr/bin/env perl
use IPC::Run qw( run timeout );
eval { # if (stuck with testapp.exe for more than N seconds)
#cmd = ('sleep', '10'); # this could be testapp.exe instead of sleep
run \#cmd, \$in, \$out, \$err, timeout( 1 ) or die "test"; # start testapp.exe
print "do stuff if cmd succeeds\n";
};
print "more stuff to do afterwards whether or not command fails or succeeds\n";
You can't determine that the application is stuck if you execute it like that, because the system statement won't return until the application terminates.
So, at least, you need to start the test application so it can run asynchronously from the Perl script that is to monitor it.
Having resolved that part of the problem, you have to establish a mechanism that will allow the monitoring Perl script to determine that the application is stuck. That is a non-trivial exercise, and likely system dependent, unless you adopt a simple expedient such as requiring the application to write a heart-beat indication somewhere, and the Perl script monitors for the heart-beat. For example (not necessarily a good example), the application could write the current time into a file identified by its PID, and the Perl script could monitor the file to see if the heart-beat is sufficiently recent. Of course, this assumes that the 'infinite loop' doesn't include code that writes to the heart-beat file.
We have a script on an FTP endpoint that monitors the FTP logs spewed out by our FTP daemon.
Currently what we do is have a perl script essentially runs a tail -F on the file and sends every single line into a remote MySQL database, with slightly different column content based off the record type.
This database has tables for content of both the tarball names/content, as well as FTP user actions with said packages; Downloads, Deletes, and everything else VSFTPd logs.
I see this as particularly bad, but I'm not sure what's better.
The goal of a replacement is to still get log file content into a database as quick as possible. I'm thinking doing something like making a FIFO/pipe file in place of where the FTP log file is, so I can read it in once periodically, ensuring I never read the same thing in twice. Assuming VSFTPd will place nice with that (I'm thinking it won't, insight welcome!).
The FTP daemon is VSFTPd, I'm at least fairly sure the extent of their logging capabilies are: xfer style log, vsftpd style log, both, or no logging at all.
The question is, what's better than what we're already doing, if anything?
Honestly, I don't see much wrong with what you're doing now. tail -f is very efficient. The only real problem with it is that it loses state if your watcher script ever dies (which is a semi-hard problem to solve with rotating logfiles). There's also a nice File::Tail module on CPAN that saves you from the shellout and has some nice customization available.
Using a FIFO as a log can work (as long as vsftpd doesn't try to unlink and recreate the logfile at any point, which it may do) but I see one major problem with it. If no one is reading from the other end of the FIFO (for instance if your script crashes, or was never started), then a short time later all of the writes to the FIFO will start blocking. And I haven't tested this, but it's pretty likely that having logfile writes block will cause the entire server to hang. Not a very pretty scenario.
Your problem with reading a continually updated file is you want to keep reading, even after the end of file is reached. The solution to this is to re-seek to your current position in the file:
seek FILEHANDLE, 0, 1;
Here is my code for doing this sort of thing:
open(FILEHANDLE,'<', '/var/log/file') || die 'Could not open log file';
seek(FILEHANDLE, 0, 2) || die 'Could not seek to end of log file';
for (;;) {
while (<FILEHANDLE>) {
if ( $_ =~ /monitor status down/ ) {
print "Gone down\n";
}
}
sleep 1;
seek FILEHANDLE, 0, 1; # clear eof
}
You should look into inotify (assuming you are on a nice, posix based OS) so you can run your perl script whenever the logfile is updated. If this level of IO causes problems you could always keep the logfile on a RAMdisk so IO is very fast.
This should help you set this up:
http://www.cyberciti.biz/faq/linux-inotify-examples-to-replicate-directories/
You can open the file as an in pipe.
open(my $file, '-|', '/ftp/file/path');
while (<$file>) {
# you know the rest
}
File::Tail does this, plus heuristic sleeping and nice error handling and recovery.
Edit: On second thought, a real system pipe is better if you can manage it. If not, you need to find the last thing you put in the database, and spin through the file until you get to the last thing you put in the database whenever your process starts. Not that easy to accomplish, and potentially impossible if you have no way of identifying where you left off.