Will data in a pipe queue up for reading by Perl? - perl

I have a Perl script that executes a long running process and observes its command line output (log messages), some of which are multiple lines long. Once it has a full log message, it sends it off to be processed and grabs the next log message.
open(PS_F, "run.bat |") or die $!;
$logMessage = "";
while (<PS_F>) {
$lineRead = $_;
if ($lineRead =~ m!(\d{4}-\d{2}-\d{2}\ \d{2}:\d{2}:\d{2})!) {
#process the previous log message
$logMessage = $lineRead;
}
else {
$logMessage = $logMessage.$_;
}
}
close(PS_F);
In its current form, do I have to worry about the line reading and processing "backing up"? For example, if I get a new log message every 1 second and it takes 5 seconds to do all the processing (random numbers I pulled out), do I have to worry that I will miss log messages or have memory issues?

In general, data output on the pipeline by one application will be buffered if the next cannot consume it fast enough. If the buffer fills up, the outputting application is blocked (i.e. calls to write to the output file handle just stall) until the consumer catches up. I believe the buffer on Linux is (or was) 65536 bytes.
In this fashion, you can never run out of memory, but you can seriously stall the producer application in the pipeline.

No you will not lose messages. The writing end of the pipe will block if the pipe buffer is full.

Strictly speaking, this should be a comment: Please consider re-writing your code as
# use lexical filehandle and 3 arg open
open my $PS_F, '-|', 'run.bat'
or die "Cannot open pipe to 'run.bat': $!";
# explicit initialization not needed
# limit scope
my $logMessage;
while (<$PS_F>) {
# you probably meant to anchor the pattern
# and no need to capture if you are not going to use
# captured matches
# there is no need to escape a space, although you might
# want to use [ ] for clarity
$logMessage = '' if m!^\d{4}-\d{2}-\d{2}[ ]\d{2}:\d{2}:\d{2}!;
$logMessage .= $_;
}
close $PS_F
or die "Cannot close pipe: $!";

Related

How to print before a while loop is started in Perl?

I have this code in Perl:
print "Processing ... ";
while ( some condition ) {
# do something over than 10 minutes
}
print "OK\n";
Now I get back the first print after the while loop is finished.
How can I print the messeage before the while loop is started?
Output is buffered, meaning the program decides when it actually renders what you printed. You can put
$| = 1;
to flush stdout in this single instance. For more methods (auto-flushing, file flushing etc) you can search around SO for questions about this.
Ordinarily, perl will buffer up to 8KB of output text before flushing it to the device, or up to the next newline if the device is a terminal. You can avoid this by adding
STDOUT->autoflush
to the top of your code, assuming that you are printing to STDOUT. This will force the data to be flushed after every print, say or write operation
Note that this is the same as using $| = 1 but is significantly less cryptic and allows you to change the properties of any given file handle
You can see the prints by flushing the buffers immediately after.
print "Processing ... ";
STDOUT->flush;
If you are using autoflush, you should save the current configuration by duplicating the file handle.
use autodie; # dies if error on open and close.
{
STDOUT->flush; # empty its buffer
open my $saved_stdout, '>&', \*STDOUT;
STDOUT->autoflush;
# ... output with autoflush active
open STDOUT, '>&', $saved_stdout; # restore old STDOUT
}
See perldoc -f open and search for /\>\&/

perl File::Tail not reading lines from file after certain period

I am having issues understanding why File::Tail FAILS to read the lines from an always updating file with 1000s of transactions, and automatic rollover .
It read to a certain extend correctly, but then later slows down, and for a long period even not able to read the lines in the logs. I can confirm that the actual log are being populated as when file::tail shows nothing.
my $file = File::Tail->new(name=>"$name",
tail => 1000,
maxinterval=>1,
interval=>1,
adjustafter=>5,resetafter=>1,
ignore_nonexistant=>1,
maxbuf=>32768);
while (defined(my $line=$file->read)) {
#my $line=$file->read;
my $xml_string = "";
#### Read only one event per line and deal with the XML.
#### If the log entry is not a SLIM log, I will ignore it.
if ($line =~ /(\<slim\-log\>.*\<\/slim\-log\>)/) {
do_something
} else {
not_working for some reason
}
Can someone please help me understand this. Know that this log file is updated at almost 10MB per second or 1000 events per second for an approximation.
Should I be handing the filehandle or the File::Tail results some other more efficient way?
Seems like there's limitations in File::Tail. There's some suggestions around other more direct options (a pipe, a fork, a thread, seeking to the end of the file in perl) discussed in http://www.perlmonks.org/?node_id=387706.
My favorite pick is the blocking read from a pipe:
open(TAIL, "tail -F $name|") or die "TAIL : $!";
while (<TAIL>) {
test_and_do_something
}

Print a file until certain characters found

My script is fetching data from .txt file.
I need to print the data until I find "MSISDN1" in that file.
The code I have executed is:
open (SERVICE, "Service Req.txt") || die "Cannot open service req file $!\n";
my #serv=<SERVICE>;
open(LOG, ">>logfile.txt");
foreach $ser_req(#serv) {
#until ($ser_req =~ m/MSISDN1/g)
{
print $conn $ser_req;
print LOG $ser_req;
print $ser_req;
}
close(LOG);
close (SERVICE) || die "Cannot close service req file $!\n";
The code does not run well, when the until condition is un-commented.
There is a very minute error in the code.
The until loop is not supposed to be applied.
until starts a loop. You probably want
last if $ser_req =~ /MSISDN1/;
instead of the (until) loop. You'll need to balance your braces { ... } too.
You could sensibly close SERVICE immediately after slurping the file into memory. It's a good idea to release resources such as files as quickly as you can. If you decide not to slurp the whole file (which would probably be better, especially if the marker text occurs near the beginning of big files), then you replace the foreach loop with while (<SERVICE>) or something similar.

Capturing command-line output on win32 that hasn't been flushed yet

(Context: I'm trying to monitor a long-running process from a Perl CGI script. It backs up an MSSQL database and then 7-zips it. So far, the backup part (using WITH STATS=1) outputs to a file, which I can have the browser look at, refreshing every few seconds, and it works.)
I'm trying to use 7zip's command-line utility but capture the progress bar to a file. Unfortunately, unlike SQL backups, where every time another percent is done it outputs another line, 7zip rewinds its output before outputting the new progress data, so that it looks nicer if you're just using it normally on the command-line. The reason this is unfortunate is that normal redirects using >, 1>, and 2> only create a blank file, and no output ever appears in it, except for >, which has no output until the job is done, which isn't very useful for a progress bar.
How can I capture this kind of output, either by having every change in % somehow be appended to a logfile (so I can use my existing method of logfile monitoring) just using command-line trickery (no Perl), or by using some Perl code to capture it directly after calling system()?
If you need to capture the output all at once then this is the code you want:
$var=`echo cmd`;
If you want to read the output line by line then you need this code:
#! perl -slw
use strict;
use threads qw[ yield async ];
use threads::shared;
my( $cmd, $file ) = #ARGV;
my $done : shared = 0;
my #lines : shared;
async {
my $pid = open my $CMD, "$cmd |" or die "$cmd : $!";
open my $fh, '>', $file or die "$file : $!";
while( <$CMD> ) {
chomp;
print $fh $_; ## output to the file
push #lines, $_; ## and push it to a shared array
}
$done = 1;
}->detach;
my $n = 0;
while( !$done ) {
if( #lines ) { ## lines to be processed
print pop #lines; ## process them
}
else {
## Else nothing to do but wait.
yield;
}
}
Another option is using Windows create process. I know Windows C/C++ create process will allow you to redirect all stdout. Perl has access to this same API call: See Win32::Process.
You can try opening a pipe to read 7zip's output.
This doesn't answer how to capture output that gets rewound, but it was a useful way of going about it that I ended up using.
For restores:
use 7za l to list the files in the zip file and their sizes
fork 7za e using open my $command
track each file as it comes out with -s $filename and compare to the listing
when all output files are their full size, you're done
For backups:
create a unique dir somewhere
fork 7za a -w
find the .tmp file in the dir
track its size
when the .tmp file no longer exists, you're done
For restores you get enough data to show a percentage done, but for backups you can only show the total file size so far, but you could compare with historical ratios if you're using similar data to get a guestimate. Still, it's more feedback than before (none).

How do I read a file which is constantly updating?

I am getting a stream of data (text format) from an external server and like to pass it on to a script line-by-line. The file is getting appended in a continuous manner. Which is the ideal method to perform this operation. Is IO::Socket method using Perl will do? Eventually this data has to pass through a PHP program (reusable) and eventually land onto a MySQL database.
The question is how to open the file, which is continuously getting updated?
In Perl, you can make use of seek and tell to read from a continuously growing file. It might look something like this (borrowed liberally from perldoc -f seek)
open(FH,'<',$the_file) || handle_error(); # typical open call
for (;;) {
while (<FH>) {
# ... process $_ and do something with it ...
}
# eof reached on FH, but wait a second and maybe there will be more output
sleep 1;
seek FH, 0, 1; # this clears the eof flag on FH
}
In perl there are a couple of modules that make tailing a file easier. IO::Tail and
File::Tail one uses a callback the other uses a blocking read so it just depends on which suits your needs better. There are likely other tailing modules as well but these are the two that came to mind.
IO::Tail - follow the tail of files/stream
use IO::Tail;
my $tail = IO::Tail->new();
$tail->add('test.log', \&callback);
$tail->check();
$tail->loop();
File::Tail - Perl extension for reading from continously updated files
use File::Tail;
my $file = File::Tail->new("/some/log/file");
while (defined(my $line= $file->read)) {
print $line;
}
Perhaps a named pipe would help you?
You talk about opening a file, and ask about IO::Socket. These aren't quite the same things, even if deep down you're going to be reading data of a file descriptor.
If you can access the remote stream from a named pipe or FIFO, then you can just open it as an ordinary file. It will block when nothing is available, and return whenever there is data that needs to be drained. You may, or may not, need to bring File::Tail to bear on the problem of not losing data if the sender runs too far ahead of you.
On the other hand, if you're opening a socket directly to the other server (which seems more likely), IO::Socket is not going to work out of the box as there is no getline method available. You would have to read and buffer block-by-block and then dole it out line by line through an intermediate holding pen.
You could pull out the socket descriptor into an IO::Handle, and use getline() on that. Something like:
my $sock = IO::Socket::INET->new(
PeerAddr => '172.0.0.1',
PeerPort => 1337,
Proto => 'tcp'
) or die $!;
my $io = new IO::Handle;
$io->fdopen(fileno($sock),"r") or die $!;
while (defined( my $data = $io->getline() )) {
chomp $data;
# do something
}
You may have to perform a handshake in order to start receiving packets, but that's another matter.
In python it is pretty straight-forward:
f = open('teste.txt', 'r')
for line in f: # read all lines already in the file
print line.strip()
# keep waiting forever for more lines.
while True:
line = f.readline() # just read more
if line: # if you got something...
print 'got data:', line.strip()
time.sleep(1) # wait a second to not fry the CPU needlessy
The solutions to read the whole fine to seek to the end are perfomance-unwise. If that happens under Linux, I would suggest just to rename the log file. Then, you can scan all the entites in the renamed file, while those in original file will be filled again. After scanning all the renamed file - delete it. Or move whereever you like. This way you get something like logrotate but for scanning newly arriving data.