Perl nested loops - perl

I have a text file with the following contents:
NW1 SN1 DEV1
NW2 SN1 DEV2
I wrote a Perl script to iterate over the file, but it is running only once. The code is:
open(INPUT1,"input.txt");
#input_array = <INPUT1>;
for($i=0;$i<#input_array;$i++)
{
my ($ser,$node,#dev)=split(/ +/,$input_array[$i]);
for($x=0;$x<#dev;$x++)
{
print("Hi");
}
}
The script is iterating for the first line but not iterating for second line.

The code you posted could be improved, and brought up to more modern standards.
It uses a bareword filehandle INPUT1.
It doesn't use 3-arg open.
It doesn't use strict or warnings (see this question).
It doesn't check the return value of open or close. ( That's what the autodie line is for in the following code )
It uses C-style for loops when it doesn't need to.
It loads the entire file into memory even though it only deals with the file one line at a time.
use strict;
use warnings;
use autodie; # checks return value of open and close for us
# 3 arg open
open( my $in_fh, '<', 'input.txt' );
# don't read the file into memory until needed
while( <$in_fh> ){
# using $_ simplified this line
my ($ser,$node,#dev) = split;
# no need to track the indices just loop over the array
for my $dev (#dev){
print "Hi\n";
}
}
close $in_fh;
If for some reason you really did need the indices of the #dev array it would be better to write it like this:
for my $x ( 0..$#dev ){
...
}
If you want to explicitly store the line into a variable of a different name, you would change the while loop to this:
while( my $line = <$in_fh> ){
my ($ser,$node,#dev) = split / +/ $line;
...
}

You forgot to enter to slurp mode of the '<>' operator. To suck in complete file you should do this:
open(INPUT1,"input.txt");
undef $/;
#input_array = split(/\r?\n/, <INPUT1>);
close INPUT1;
or better yet like this:
open(INPUT1,"input.txt");
while(<INPUT1>) {
chomp;
my ($ser,$node,#dev)=split(/ +/,$_);
for($x=0;$x<#dev;$x++)
{
print("Hi");
}
}
close INPUT1;

Related

Push File contents into the array using perl

Suppose this is the file I am reading
hey how are you
I am fine thank you
Here I want to store the contents of file into an array using one while loop so that I can easily use the array later and need not to open close file again.
Code
use warnings;
use strict;
my #point1;
my #point ;
my $log1= "log1.log";
open(IN1, "<$log1" ) or die "Could not open file $log1: $!";
while (my $line = <IN1>) {
#point = split " ",$line;
push(#point1,#point);
push(#point1,"\n");
}
print "$point1[0] 2nd\n";
close IN1;
Output
hey 2nd
I want output like below if I am printing outside while loop.
Output I want:
hey 2nd
I 2nd
What changes should I make here?
You are pushing all the words in the file onto the same list, which will make it difficult to tell them apart.
push(#point1,#point);
This is the same as doing
#point1 = qw(hey how are you I am fine thank you);
I suspect what you want is a two-dimensional array, so that you afterwards can supply line number and word number and print that word, like this:
print $point1[0][0]; # prints "hey"
To do that, you would do this:
push #point1, \#point; # the backslash makes us get the reference to the array
But then you also have to make sure that all the lines do not point to the same array, as they would when you declare my #point outside of the loop.
my #point; # outside the loop
while (my $line = <IN1>) {
#point = split " ",$line;
push(#point1, \#point); # wrong
}
You would have to declare it inside the loop
while (my $line = <IN1>) {
my #point = split " ",$line; # inside the loop
push(#point1, \#point); # correct
}
Because then it will be a new array reference each loop iteration, one for each new line. But you do not need to use a temporary variable, you can just push the values directly
while (my $line = <IN1>) {
push #point1, [ split " ",$line; ];
}
The square brackets creates a reference to an anonymous array, with the values that are inside it. Afterward you can solve your task like this:
for my $aref (#point1) {
print "$aref->[0] 2nd\n";
}
Or
for my $line_no (0 .. $#point1) {
print "$point1[$line_no][0] 2nd\n";
}
use strict;
use warnings;
use feature qw( say );
my $log1 = "log1.log";
open(my $fh, "<", $log1) # Don't use a global. Use 3-arg open.
or die("Can't open file \"$log1\": $!\n"); # No need for the line number.
my #first_words;
while (my $line = <$fh>) {
# chomp($line); # Not needed with C<< split " " >>, so that was ok.
my #words = split " ", $line; # Declare variable to the scope where they are needed.
push #first_words, $words[0]; # You want the first word of each line.
}
for my $first_word (#first_words) { # Need a loop to print stuff repeatedly.
say "$first_word 2nd";
}
or
use strict;
use warnings;
use feature qw( say );
my #first_words;
while (my $line = <>) {
my #words = split " ", $line;
push #first_words, $words[0];
}
for my $first_word (#first_words) {
say "$first_word 2nd";
}
The second is more flexible. Just pass the desired file name as an argument. It can also handle input via STDIN.
It's not really clear what data structure you're looking for. Here are a few possibilities.
You want each line of your file in an element in the array.
my #array = <$fh>;
You want each line of your file in an element in the array, but you also want the input split into individual words (so you end up with an array of arrays).
my #array = map { [ split ] } <$fh>;
You want the first word of each line in an element in the array.
my #array = map { (split)[0] } <$fh>;
I've mentioned before that you should switch to using the three-arg version of open() and lexical filehandles. So, assume that all of my code examples above are prefaced with:
open my $fh, '<', $log1 or die "Cannot open '$log1': $!\n";
Looking back over your last few questions, I can't help thinking that this is an X/Y question. You're asking for our help with tiny problems that make up part of your code, but actually, we could give far better help if we knew more about the bigger picture.

Perl sub skips foreach within which it is called

I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;

Perl: How to print a line to a file from within a while loop?

I have a very basic perl script which prints the next line in a text file after matching a search pattern.
#ARGV = <dom_boot.txt>;
while ( <> ) {
print scalar <> if /name=sacux445/;
}
Which works, However I would like to capture the output into a file for further use, rather than printing it to STDOUT.
I'm just learning (slowly) so attempted this:
my $fh;
my $dom_bootdev = 'dom_bootdev.txt';
open ($fh, '>', $dom_bootdev) or die "No such file";
#ARGV = <dom_boot.txt>;
while ( <> ) {
print $fh <> if /name=sacux445/;
}
close $fh;
But I get a syntax error.
syntax error at try.plx line 19, near "<>"
I'm struggling to figure this out. I'm guessing it's probably very simple so any help would be appreciated.
Thanks,
Luke.
The Perl parser sometimes has problems with indirect notation. The canonical way to handle it is to wrap the handle into a block:
print {$fh} <> if /name=sacux445/;
Are you sure you want to remove scalar?
Simply fetch the next line within the loop and print it, if the line matches the pattern:
while (<>) {
next unless /name=sacux445/;
my $next = <>;
last unless defined $next;
print $fh $next;
}
Note, you need to check the return value of the diamond operator.
Input
name=sacux445 (1)
aaa
name=sacux445 (2)
bbb
name=sacux445 (3)
Output
aaa
bbb
One should learn to use state machines for parsing data. A state machine allows the input read to be in only one place in the code. Rewriting the code as a state machine:
use strict;
use warnings;
use autodie; # See http://perldoc.perl.org/autodie.html
my $dom_bootdev = 'dom_bootdev.txt';
open ( my $fh, '>', $dom_bootdev ); # autodie handles open errors
use File::Glob qw( :bsd_glob ); # Perl's default glob() does not handle spaces in file names
#ARGV = glob( 'dom_boot.txt' );
my $print_next_line = 0;
while( my $line = <> ){
if( $line =~ /name=sacux445/ ){
$print_next_line = 1;
next;
}
if( $print_next_line ){
print {$fh} $line;
$print_next_line = 0;
next;
}
}
When To Us a State Machine
If the data is context-free, it can be parsed using only regular expressions.
If the data has a tree structure, it can be parsed using a simple state machine.
For more complex structures, a least one state machine with a push-down stack is required. The stack records the previous state so that the machine can return to it when the current state is finished.
The most complex data structure in use is XML. It requires a state machine for its syntax and a second one with a stack for its semantics.

In Perl, how can I make two passes over all the files specified on the command line via the diamond operator?

If i have a text file and i want to run two types of operations, but each operation must read each line of the text separately from the other. The only way i know how to do it is
open out,(">>out.txt");
while (<>){
#operation one
}
while (<>){
#operation two
}
close out;
but this will run only on the first while, in which the operation runs fine, but the second one will not be complete because the second while(<>) does not actually re-read the file but tries to continue from where the first while left. Which is at the end of the file. So is there another way? Or is there a way to tell the second while to start again at the beginning?
Given you mention in a comment:
perl example.pl text.txt
The answer is - don't use <> and instead open a filehandle.
my ( $filename ) = #ARVG;
open ( my $input, "<", $filename ) or die $!;
while ( <$input> ) {
print;
}
seek ( $input, 0, 0 );
while ( <$input> ) {
#something else
}
Alternatively, you can - assuming test.txt isn't particularly large - just read the whole thing into an array.
my #input_lines = <$input>;
foreach ( #input_lines ) {
#something
}
If you want to specify multiple files on the command line, you can wrap the whole thing in a foreach loop:
foreach my $filename ( #ARVG ) {
## open; while; seek; while etc.
}
Couldn't you simply use the following?
while (<>) {
operation1($_);
operation2($_);
}
If not, then I'm assuming you need to process the content of all the files using one operation before it's process by the other.
<> reads from the files listed in #ARGV, removing them as it opens them, so the simplest solution is to backup #ARGV and repopulate it.
my #argv = #ARGV;
while (<>) { operation1($_); }
#ARGV = #argv;
while (<>) { operation2($_); }
Of course, it will fail if <> reads from something other than a plain file or a symlink to a plain file. (Same goes for any solution using seek.) The only to make that work would be to load the entire file into temporary storage (e.g. memory or a temporary file). The following is the simplest example of that:
my #lines = <>;
for (#lines) { operation1($_); }
for (#lines) { operation2($_); }
If the data fits into memory:
my #lines = <>;
for ( #lines ){
# operation one
}
for ( #lines ){
# operation two
}
You can localize #ARGV before the first pass.
#!/usr/bin/env perl
use strict;
use warnings;
{
local #ARGV = #ARGV;
while (<>){
print "Pass 1: $_";
}
}
while (<>){
print "Pass 2: $_";
}
If no file handle is used with the diamond operator, Perl will examine the #ARGV special variable. If #ARGV has no elements, then the diamond operator will read from STDIN.
This is other way of achieve your requirements:
my #stdin=<>;
foreach my $item( #stdin ) {
# ...
}
foreach my $item( #stdin ) {
# ...
}
If you need to run the operation line by line, why not try something like this
sub operation_1 {
my $line = shift;
#processing for operation 1
}
sub operation_2 {
my $line = shift;
#processing for operation 2
}
while(<>) {
my $line = $_;
chomp($line);
operation_1($line);
operation_2($line);
}
If you were reading from an actual file, you could use
seek FILEHANDLE,0,0;
However, you are using stdin and I don't think that it's possible to rewind stdin and start over.

'merging' 2 files into a third using perl

I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;