Perl IPC::Run pipeline blocks with input file larger than 64KiB

Perl IPC::Run pipeline blocks with input file larger than 64KiB - perl

A Perl program uses IPC::Run to pipe a file through a series of commands determined at runtime and into another file, like this small test excerpt demonstrates:
#!/usr/bin/perl
use IO::File;
use IPC::Run qw(run);
open (my $in, 'test.txt');
my $out = IO::File->new_tmpfile;
my #args = ( [ split / /, shift ], "<", $in); # this code
while ($#ARGV >= 0) { # extracted
push #args, "|", [ split / /, shift ]; # verbatim
} # from the
push #args, ">pipe", $out; # program
print "Running...";
run #args or die "command failed ($?)";
print "Done\n";
It builds the pipeline from commands given as arguments, the test file is hard-coded. The problem is that the pipeline hangs if the file is bigger than 64KiB. Here is a demonstration that uses cat in the pipeline to keep things simple. First a 64KiB (65536 bytes) file works as expected:
$ dd if=/dev/urandom of=test.txt bs=1 count=65536
65536 bytes (66 kB, 64 KiB) copied, 0.16437 s, 399 kB/s
$ ./test.pl cat
Running...Done
Next, one byte more. The call to run never returns...
$ dd if=/dev/urandom of=test.txt bs=1 count=65537
65537 bytes (66 kB, 64 KiB) copied, 0.151517 s, 433 kB/s
$ ./test.pl cat
Running...
With IPCRUNDEBUG enabled, plus a few more cats you can see it's the last child that doesn't end:
$ IPCRUNDEBUG=basic ./test.pl cat cat cat cat
Running...
...
IPC::Run 0000 [#1(3543608)]: kid 1 (3543609) exited
IPC::Run 0000 [#1(3543608)]: 3543609 returned 0
IPC::Run 0000 [#1(3543608)]: kid 2 (3543610) exited
IPC::Run 0000 [#1(3543608)]: 3543610 returned 0
IPC::Run 0000 [#1(3543608)]: kid 3 (3543611) exited
IPC::Run 0000 [#1(3543608)]: 3543611 returned 0
(with a file under 64KiB you see all four exit normally)
How can this be made to work for files of any size ?
(Perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-linux-thread-multi, tried on Alpine Linux, the target platform, and Arch Linux to rule out Alpine as a cause)

You have a deadlock:
Consider using one of the following instead:
run [ 'cat' ], '<', $in_fh, '>', \my $captured;
# Do something with the captured output in $captured.
or
my $receiver = sub {
# Do something with the chunk in $_[0].
};
run [ 'cat' ], '<', $in_fh, '>', $receiver;
For example, the following "receiver" processes each line as they come in:
my $buffer = '';
my $receiver = sub {
$buffer .= $_[0];
while ($buffer =~ s/^(.*)\n//) {
process_line("$1");
}
};
run [ 'cat' ], '<', $in_fh, '>', $receiver;
die("Received partial line") if length($buffer);

Here is an example that does not deadlock but still uses the >pipe output handle. I would not recommend using this complicated approach for your use case, instead consider the approach suggested by #ikegami.
The problem is that the >pipe handle is never read from. cat tries to write to the >pipe handle but it gets filled up (since no one reads from it) and the cat process blocks when the pipe content reaches 64 KiB which is the capacity of a pipe on Linux. Now the IPC::Run::finish() process is waiting for the child cat process to exit, but at the same time the cat process is waiting for the parent to read from its pipe so we have a deadlock situation.
To avoid this situation, we can use IPC::Run::start() instead of IPC::Run::run():
use feature qw(say);
use strict;
use warnings;
use constant READ_BUF_SIZE => 8192;
use Errno qw( EAGAIN );
use IO::Select;
use IPC::Run qw();
use Symbol 'gensym';
my $outfile = 'out.txt';
open (my $out, '>', $outfile) or die "Could not open file '$outfile': $!";
my $h = IPC::Run::start ['cat'], '<', 'test.txt', '>pipe', my $pipeout = gensym;
my $select = IO::Select->new( $pipeout );
my $data = '';
my $read_offset = 0;
while (1) {
my #ready = $select->can_read;
last if !#ready;
for my $fh (#ready) {
my $bytes_read = sysread $fh, $data, READ_BUF_SIZE, $read_offset;
say "Read $bytes_read bytes..";
if ( !defined $bytes_read ) {
die "sysread failed: $!" if $! != EAGAIN;
$bytes_read = 0;
}
elsif ( $bytes_read == 0 ) {
say "Removing pipe handle from select loop";
$select->remove( $fh );
close $fh;
}
$read_offset += $bytes_read;
}
}
say "Saving data to file..";
print $out $data; #Save data to file
close $out;
say "Finishing harness..";
IPC::Run::finish $h or die "cat returned $?";
say "Done.";

Related

Trying to improve Encode::decode warning message: Segfault in $SIG{WARN} handler

I am trying to improve the warning message issued by Encode::decode(). Instead of printing the name of the module and the line number in the module, I would like it to print the name of the file being read and the line number in that file where the malformed data was found. To a developer, the origial message can be useful, but to an end user not familiar with Perl, it is probably quite meaningless. The end user would probably rather like to know which file is giving the problem.
I first tried to solve this using a $SIG{__WARN__} handler (which is probably not a good idea), but I get a segfault. Probably a silly mistake, but I could not figure it out:
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
use Encode ();
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';
my $fn = 'test.txt';
write_test_file( $fn );
# Try to improve the Encode::FB_WARN fallback warning message :
#
# utf8 "\xE5" does not map to Unicode at <module_name> line xx
#
# Rather we would like the warning to print the filename and the line number:
#
# utf8 "\xE5" does not map to Unicode at line xx of file <filename>.
my $str = '';
open ( my $fh, "<:encoding(utf-8)", $fn ) or die "Could not open file '$fn': $!";
{
local $SIG{__WARN__} = sub { my_warn_handler( $fn, $_[0] ) };
$str = do { local $/; <$fh> };
}
close $fh;
say "Read string: '$str'";
sub my_warn_handler {
my ( $fn, $msg ) = #_;
if ( $msg =~ /\Qdoes not map to Unicode\E/ ) {
recover_line_number_and_char_pos( $fn, $msg );
}
else {
warn $msg;
}
}
sub recover_line_number_and_char_pos {
my ( $fn, $err_msg ) = #_;
chomp $err_msg;
$err_msg =~ s/(line \d+)\.$/$1/; # Remove period at end of sentence.
open ( $fh, "<:raw", $fn ) or die "Could not open file '$fn': $!";
my $raw_data = do { local $/; <$fh> };
close $fh;
my $str = Encode::decode( 'utf-8', $raw_data, Encode::FB_QUIET );
my ($header, $last_line) = $str =~ /^(.*\n)([^\n]*)$/s;
my $line_no = $str =~ tr/\n//;
++$line_no;
my $pos = ( length $last_line ) + 1;
warn "$err_msg, in file '$fn' (line: $line_no, pos: $pos)\n";
}
sub write_test_file {
my ( $fn ) = #_;
my $bytes = "Hello\nA\x{E5}\x{61}"; # 2 lines ending in iso 8859-1: åa
open ( my $fh, '>:raw', $fn ) or die "Could not open file '$fn': $!";
print $fh $bytes;
close $fh;
}
Output:
utf8 "\xE5" does not map to Unicode at ./p.pl line 27
, in file 'test.txt' (line: 2, pos: 2)
Segmentation fault (core dumped)

Here is another way to locate where the warning fires, with un-buffered sysread
use warnings;
use strict;
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';
my $file = 'test.txt';
open my $fh, "<:encoding(UTF-8)", $file or die "Can't open $file: $!";
$SIG{__WARN__} = sub { print "\t==> WARN: #_" };
my $char_cnt = 0;
my $char;
while (sysread($fh, $char, 1)) {
++$char_cnt;
print "$char ($char_cnt)\n";
}
The file test.txt was written by the posted program, except that I had to add to it to reproduce the behavior -- it runs without warnings on v5.10 and v5.16. I added \x{234234} to the end. The line number can be tracked with $char =~ /\n/.
The sysread returns undef on error. It can be moved into the body of while (1) to allow reads to continue and catch all warnings, breaking out on 0 (returned on EOF).
This prints
H (1)
e (2)
l (3)
l (4)
o (5)
(6)
A (7)
å (8)
a (9)
==> WARN: Code point 0x234234 is not Unicode, may not be portable at ...
(10)
While this does catch the character warned about, re-reading the file using Encode may well be better than reaching for sysread, in particular if sysread uses Encode.
However, Perl is utf8 internally and I am not sure that sysread needs Encode.
Note. The page for sysread supports its use on data with encoding layers
Note that if the filehandle has been marked as :utf8 Unicode
characters are read instead of bytes (the LENGTH, OFFSET, and the
return value of sysread are in Unicode characters). The
:encoding(...) layer implicitly introduces the :utf8 layer.
See binmode, open, and the open pragma.
Note Apparently, things have moved on and after a certain version sysread does not support encoding layers. The link above, while for an older version (v5.10 for one) indeed shows what is quoted, with a newer version tells us that there'll be an exception.

Monitoring file contents in Perl across rotations

Good afternoon,
I have a small Perl script that essentially emulated the tail -f functionality on a Minecraft server.log. The script checks for certain strings and acts in various ways.
A simplified version of the script is below:
#!/usr/bin/perl
use 5.010;
use warnings;
use strict;
my $log = "PATH TO LOG";
my $curpos;
open(my $LOGFILE, $log) or die "Cannot open log file";
# SEEK TO EOF
seek($LOGFILE, 0, 2);
for (;;){
my $line = undef;
seek($LOGFILE,0,1); ### clear OF condition
for($curpos = tell($LOGFILE); <$LOGFILE>; $curpos = tell($LOGFILE)){
$line = "$_ \n";
if($line =~ /test string/i){
say "Found test string!";
}
}
sleep 1;
seek($LOGFILE,$curpos,0); ### Setting cursor at the EOF
}
When I had a test server up everything seemed to work fine. In production, the server.log file gets rotated. When a log gets rotated, the script keeps hold of original file, and not the file that replaces it.
I.e. server.log is being monitored, server.log gets moved and compressed to logs/date_x.log.gz, server.log is now a new file.
How can I adapt my script to monitor the filename "server.log", rather than the file that is currently called "server.log"?

Have you considered just using tail -F as the input to your script:
tail -F server.log 2>/dev/null | perl -nE 'say if /match/'
This will watch the named file, passing each line to your script on STDIN. It will correctly track only server.log, as shown below:
echo 'match' >server.log
(matched by script)
mv server.log server.log.old
echo 'match' >server.log
(also matched)
You can open the tail -F as a file in Perl using:
open(my $fh, '-|', 'tail -F server.log 2>/dev/null') or die "$!\n";

You can check inode numbers for both, file name and file handle using stat(). If they differ, log file was rotated and file should be reopened for reading.
$readline is iterator (obtained by get_readline($file_name)) which transparently takes care of such changes, and does "right thing".
use strict;
use warnings;
sub get_readline {
my ($fname) = #_;
my $fh;
return sub {
my ($i1, $i2) = map { $_ ? (stat $_)[1] : 0 } $fh, $fname;
if ($i1 != $i2) {
undef $fh;
open $fh, "<", $fname or return;
}
# reset handle to current position
seek($fh, 0, 1) or die $!;
return wantarray ? <$fh> : scalar <$fh>;
};
}
`seq 11 > log_file`;
my $readline = get_readline("log_file");
print "[regular reading]\n";
print $readline->();
print "[any new content?]\n";
print $readline->();
`rm log_file; seq 11 > log_file`;
print "[reading after log rotate]\n";
print $readline->();
output
[regular reading]
1
2
3
4
5
6
7
8
9
10
11
[any new content?]
[reading after log rotate]
1
2
3
4
5
6
7
8
9
10
11

byte swap with perl

I am reading a file, containing integers using the "33441122" byte order. How can I convert the file to the "11223344" (big endian) byte order? I have tried a few things, but I am really lost.
I have read a lot about Perl, but when it comes to swapping bytes, I'm in the dark. How can I convert this:
33 44 11 22
into this:
11 22 33 44
using Perl.
Any input would be greatly appreciated :)

You can read 4 bytes at a time, split it into individual bytes, swap them and write them out again
#! /usr/bin/perl
use strict;
use warnings;
open(my $fin, '<', $ARGV[0]) or die "Cannot open $ARGV[0]: $!";
binmode($fin);
open(my $fout, '>', $ARGV[1]) or die "Cannot create $ARGV[1]: $!";
binmode($fout);
my $hexin;
my $n;
while (($n = read($fin, $bytes_in, 4)) == 4) {
my #c = split('', $bytes_in);
my $bytes_out = join('', $c[2], $c[3], $c[0], $c[1]);
print $fout $bytes_out;
}
if ($n > 0) {
print $fout $bytes_in;
}
close($fout);
close($fin);
This will be called on the command line as
perl script.pl infile.bin outfile.bin
outfile.bin will be overwritten.

I think the best way is to read two bytes at a time and dwap them before outputting them.
This program creates a data file test.bin, and then reads it in, swapping the bytes as described.
use strict;
use warnings;
use autodie;
open my $fh, '>:raw', 'test.bin';
print $fh "\x34\x12\x78\x56";
open my $out, '>:raw', 'new.bin';
open $fh, '<:raw', 'test.bin';
while (my $n = read $fh, my $buff, 2) {
$buff = reverse $buff if $n == 2;
print $out $buff;
}

How can I print specific lines from a file in Unix?

I want to print certain lines from a text file in Unix. The line numbers to be printed are listed in another text file (one on each line).
Is there a quick way to do this with Perl or a shell script?

Assuming the line numbers to be printed are sorted.
open my $fh, '<', 'line_numbers' or die $!;
my #ln = <$fh>;
open my $tx, '<', 'text_file' or die $!;
foreach my $ln (#ln) {
my $line;
do {
$line = <$tx>;
} until $. == $ln and defined $line;
print $line if defined $line;
}

$ cat numbers
1
4
6
$ cat file
one
two
three
four
five
six
seven
$ awk 'FNR==NR{num[$1];next}(FNR in num)' numbers file
one
four
six

You can avoid the limitations of the some of the other answers (requirements for sorted lines), simply by using eof within the context of a basic while(<>) block. That will tell you when you've stopped reading line numbers and started reading data. Note that you need to reset $. when the switch occurs.
# Usage: perl script.pl LINE_NUMS_FILE DATA_FILE
use strict;
use warnings;
my %keep;
my $reading_line_nums = 1;
while (<>){
if ($reading_line_nums){
chomp;
$keep{$_} = 1;
$reading_line_nums = $. = 0 if eof;
}
else {
print if exists $keep{$.};
}
}

cat -n foo | join foo2 - | cut -d" " -f2-
where foo is your file with lines to print and foo2 is your file of line numbers

Here is a way to do this in Perl without slurping anything so that the memory footprint of the program is independent of the sizes of both files (it does assume that the line numbers to be printed are sorted):
#!/usr/bin/perl
use strict; use warnings;
use autodie;
#ARGV == 2
or die "Supply src_file and filter_file as arguments\n";
my ($src_file, $filter_file) = #ARGV;
open my $src_h, '<', $src_file;
open my $filter_h, '<', $filter_file;
my $to_print = <$filter_h>;
while ( my $src_line = <$src_h> ) {
last unless defined $to_print;
if ( $. == $to_print ) {
print $src_line;
$to_print = <$filter_h>;
}
}
close $filter_h;
close $src_h;
Generate the source file:
C:\> perl -le "print for aa .. zz" > src
Generate the filter file:
C:\> perl -le "print for grep { rand > 0.75 } 1 .. 52" > filter
C:\> cat filter
4
6
10
12
13
19
23
24
28
44
49
50
Output:
C:\> f src filter
ad
af
aj
al
am
as
aw
ax
bb
br
bw
bx
To deal with an unsorted filter file, you can modified the while loop:
while ( my $src_line = <$src_h> ) {
last unless defined $to_print;
if ( $. > $to_print ) {
seek $src_h, 0, 0;
$. = 0;
}
if ( $. == $to_print ) {
print $src_line;
$to_print = <$filter_h>;
}
}
This would waste a lot of time if the contents of the filter file are fairly random because it would keep rewinding to the beginning of the source file. In that case, I would recommend using Tie::File.

I wouldn't do it this way with large files, but (untested):
open(my $fh1, "<", "line_number_file.txt") or die "Err: $!";
chomp(my #line_numbers = <$fh1>);
$_-- for #line_numbers;
close $fh1;
open(my $fh2, "<", "text_file.txt") or die "Err: $!";
my #lines = <$fh2>;
print #lines[#line_numbers];
close $fh2;

I'd do it like this:
#!/bin/bash
numbersfile=numbers
datafile=data
while read lineno < $numbersfile; do
sed -n "${lineno}p" datafile
done
Downside to my approach is that it will spawn a lot of processes so it will be slower than other options. It's infinitely more readable though.

This is a short solution using bash and sed
sed -n -e "$(cat num |sed 's/$/p/')" file
Where num is the file of numbers and file is the input file ( Tested on OS/X Snow leopard)
$ cat num
1
3
5
$ cat file
Line One
Line Two
Line Three
Line Four
Line Five
$ sed -n -e "$(cat num |sed 's/$/p/')" file
Line One
Line Three
Line Five

$ cat input
every
good
bird
does
fly
$ cat lines
2
4
$ perl -ne 'BEGIN{($a,$b) = `cat lines`} print if $.==$a .. $.==$b' input
good
bird
does
If that's too much for a one-liner, use
#! /usr/bin/perl
use warnings;
use strict;
sub start_stop {
my($path) = #_;
open my $fh, "<", $path
or die "$0: open $path: $!";
local $/;
return ($1,$2) if <$fh> =~ /\s*(\d+)\s*(\d+)/;
die "$0: $path: could not find start and stop line numbers";
}
my($start,$stop) = start_stop "lines";
while (<>) {
print if $. == $start .. $. == $stop;
}
Perl's magic open allows for creative possibilities such as
$ ./lines-between 'tac lines-between|'
print if $. == $start .. $. == $stop;
while (<>) {

Here is a way to do this using Tie::File:
#!/usr/bin/perl
use strict; use warnings;
use autodie;
use Tie::File;
#ARGV == 2
or die "Supply src_file and filter_file as arguments\n";
my ($src_file, $filter_file) = #ARGV;
tie my #source, 'Tie::File', $src_file, autochomp => 0
or die "Cannot tie source '$src_file': $!";
open my $filter_h, '<', $filter_file;
while ( my $to_print = <$filter_h> ) {
print $source[$to_print - 1];
}
close $filter_h;
untie #source;

How can I restore STDOUT after redirecting it to a file in a Perl script?

Below is the part of my code, my code enters "if loop" with $value =1 and output of the process iperf.exe is
getting into my_output.txt. As I am timing out the process after alarm(20sec) time, also wanted to capture the output of this process only.
Then after I want to continue to the command prompt but I am not able to return to the command prompt.
Not only this code itself does not PRINT on the command prompt, rather it is printing on the my_output.txt file
(I am looping this if block through rest of my code)
output.txt
inside value loop2
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 8.00 KByte (default)
------------------------------------------------------------
[160] local 10.232.62.151 port 5001 connected with 10.232.62.151 port 1505
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[160] 0.0- 5.0 sec 2.14 MBytes 3.59 Mbits/sec 0.000 ms 0/ 1528 (0%)
inside value loop3
clue1
clue2
inside value loop4
one iperf completed
Transfer
Transfer Starting: Intent { act=android.settings.APN_SETTINGS }
******AUTOMATION COMPLETED******
Looks like some problem with reinitializing the STDOUT.
I tried to use close(STDOUT); but again it did not return to STDOUT.
Code
if($value)
{
my $file = 'my_output.txt';
use Win32::Process;
print"inside value loop\n";
# redirect stdout to a file
open STDOUT, '>', $file
or die "can't redirect STDOUT to <$file> $!";
Win32::Process::Create(my $ProcessObj,
"iperf.exe",
"iperf.exe -u -s -p 5001",
0,
NORMAL_PRIORITY_CLASS,
".") || die ErrorReport();
$alarm_time = $IPERF_RUN_TIME+2; #20sec
print"inside value loop2\n";
sleep $alarm_time;
$ProcessObj->Kill(0);
sub ErrorReport{
print Win32::FormatMessage( Win32::GetLastError() );
}
print"inside value loop3\n";
print"clue1\n";
#close(STDOUT);
print"clue2\n";
print"inside value loop4\n";
print"one iperf completed\n";
}
my $data_file="my_output.txt";
open(ROCK, $data_file)|| die("Could not open file!");
#raw_data=<ROCK>;
#COUNT_PS =split(/ /,$raw_data[7]);
my $LOOP_COUNT_PS_4 = $COUNT_PS[9];
my $LOOP_COUNT_PS_5 = $COUNT_PS[10];
print "$LOOP_COUNT_PS_4\n";
print "$LOOP_COUNT_PS_5\n";
my $tput_value = "$LOOP_COUNT_PS_4"." $LOOP_COUNT_PS_5";
print "$tput_value";
close(ROCK);
print FH1 "\n $count \| $tput_value \n";

You need to save STDOUT before reopening it:
open(SAVE, ">&STDOUT") || die "Can't save STDOUT\n";
...
open(STDOUT, ">&SAVE") || die "Can't restore STDOUT\n";
You can also use a dynamic variable to "save" STDOUT (but this might require reorganizing your code):
do {
local *STDOUT;
open(STDOUT, "...");
...
}; # STDOUT is restored here
Finally, you can use backticks to capture the complete output:
$x = `iperf.exe ...`;

Use local:
{
local *STDOUT;
open STDOUT, '>', $file
or die "can't redirect STDOUT to <$file> $!";
Win32::Process::Create(my $ProcessObj,
"iperf.exe",
"iperf.exe -u -s -p 5001",
0,
NORMAL_PRIORITY_CLASS,
".") || die ErrorReport();
$alarm_time = $IPERF_RUN_TIME+2; #20sec
print"inside value loop2\n";
sleep $alarm_time;
$ProcessObj->Kill(0);
}

You might also try select
open FF,">$filename" or die "error opening $filename for writing: $!";
print "hi 1\n"; # this goes to stdout
...
select FF; # sets output to FF
print "hi 2\n"; # this goes to FF ($filename )
...
select STDOUT; # resets output to STDOUT
print "hi 3\n"; # this goes to stdout

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl IPC::Run pipeline blocks with input file larger than 64KiB - perl

Related

Trying to improve Encode::decode warning message: Segfault in $SIG{WARN} handler

Monitoring file contents in Perl across rotations

byte swap with perl

How can I print specific lines from a file in Unix?

How can I restore STDOUT after redirecting it to a file in a Perl script?

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl IPC::Run pipeline blocks with input file larger than 64KiB - perl

Related

Trying to improve Encode::decode warning message: Segfault in $SIG{__WARN__} handler

Monitoring file contents in Perl across rotations

byte swap with perl

How can I print specific lines from a file in Unix?

How can I restore STDOUT after redirecting it to a file in a Perl script?

Categories

Resources

Trying to improve Encode::decode warning message: Segfault in $SIG{WARN} handler