In AWK, it is common to see this kind of structure for a script that runs on two files:
awk 'NR==FNR { print "first file"; next } { print "second file" }' file1 file2
Which uses the fact that there are two variables defined: FNR, which is the line number in the current file and NR which is the global count (equivalent to Perl's $.).
Is there something similar to this in Perl? I suppose that I could maybe use eof and a counter variable:
perl -nE 'if (! $fn) { say "first file" } else { say "second file" } ++$fn if eof' file1 file2
This works but it feels like I might be missing something.
To provide some context, I wrote this answer in which I manually define a hash but instead, I would like to populate the hash from the values in the first file, then do the substitutions on the second file. I suspect that there is a neat, idiomatic way of doing this in Perl.
Unfortunately, perl doesn't have a similar NR==FNR construct to differentiate between two files. What you can do is use the BEGIN block to process one file and main body to process the other.
For example, to process a file with the following:
map.txt
a=apple
b=ball
c=cat
d=dog
alpha.txt
f
a
b
d
You can do:
perl -lne'
BEGIN {
$x = pop;
%h = map { chomp; ($k,$v) = split /=/; $k => $v } <>;
#ARGV = $x
}
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
Update:
I gave a pretty simple example, and now when I look at that, I can only say TIMTOWDI since you can do:
perl -F'=' -lane'
if (#F == 2) { $h{$F[0]} = $F[1]; next }
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
However, I can say for sure, there is no NR==FNR construct for perl and you can probably process them in various different ways based on the files.
It looks like what you're aiming for is to use the same loop for reading both files, and have a conditional inside the loop that chooses what to do with the data. I would avoid that idea because you are hiding what two distinct processes in the same stretch of code, making it less than clear what is going on.
But, in the case of just two files, you could compare the current file with the first element of #ARGV, like this
perl -nE 'if ($ARGV eq $ARGV[0]) { say "first file" } else { say "second file" }' file1 file2
Forgetting about one-line programs, which I hate with a passion, I would just explicitly open $ARGV[0] and $ARGV[1]. Perhaps naming them like this
use strict;
use warnings;
use 5.010;
use autodie;
my ($definitions, $data) = #ARGV;
open my $fh, '<', $definitions;
while (<$fh>) {
# Build hash
}
open $fh, '<', $data;
while (<$fh>) {
# Process file
}
But if you want to avail yourself of the automatic opening facilities then you can mess with #ARGV like this
use strict;
use warnings;
my ($definitions, $data) = #ARGV;
#ARGV = ($definitions);
while (<>) {
# Build hash
}
#ARGV = ($data);
while (<>) {
# Process file
}
You can also create your own $fnr and compare to $..
Given:
var='first line
second line'
echo "$var" >f1
echo "$var" >f2
echo "$var" >f3
You can create a pseudo FNR by setting a variable in the BEGIN block and resetting at each eof:
perl -lnE 'BEGIN{$fnr=1;}
if ($fnr==$.) {
say "first file: $ARGV, $fnr, $. $_";
}
else {
say "$ARGV, $fnr, $. $_";
}
eof ? $fnr=1 : $fnr++;' f{1..3}
Prints:
first file: f1, 1, 1 first line
first file: f1, 2, 2 second line
f2, 1, 3 first line
f2, 2, 4 second line
f3, 1, 5 first line
f3, 2, 6 second line
Definitely not as elegant as awk but it works.
Note that Ruby has support for FNR==NR type logic.
Related
Is there a way to get the line number (and maybe filename) where a __DATA__ token was coded? Or some other way to know the actual line number in the original source file where a line of data read from the DATA filehandle came from?
Note that $. counts from 1 when reading from the DATA filehandle. So if the line number of the __DATA__ token were added to $. it would be what I'm looking for.
For example:
#!/usr/bin/perl
while (<DATA>) {
my $n = $. + WHAT??;
die "Invalid data at line $n\n" if /bad/;
}
__DATA__
something good
something bad
I want this to say "Invalid data at line 9", not "line 2" (which is what you get if $. is used by itself).
In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:
# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));
# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my #THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;
File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.
Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..
However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.
To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use Fcntl qw( SEEK_SET );
# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
my ($fh) = #_;
( my $initial_pos = tell($fh) ) >= 0
or return undef;
seek($fh, 0, SEEK_SET)
or return undef;
$. = 1;
while (<$fh>) {
( my $pos = tell($fh) ) >= 0
or last;
if ($pos >= $initial_pos) {
if ($pos > $initial_pos) {
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
}
return $.;
}
}
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
$. = 1;
return undef;
}
my $prefix = fix_line_number(\*DATA) ? "" : "+";
while (<DATA>) {
printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}
__DATA__
foo
bar
baz
Output:
$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz
$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz
Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.
For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:
$ cat Foo.pm
package Foo;
1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
my $ln = B::svref_2object(\*Foo::DATA)->LINE;
warn "__DATA__ at line $ln\n";
Foo::DATA->input_line_number($ln);
while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.
In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an #INC hook:
$ cat DH.pm
package DH;
unshift #INC, sub {
my ($sub, $fname) = #_;
for(#INC){
if(open my $fh, '<', my $fpath = "$_/$fname"){
$INC{$fname} = $fpath;
return \'', $fh, sub {
our (%ln, %pos);
if($_){ $pos{$fname} += length; ++$ln{$fname} }
}
}
}
};
$ cat Bar.pm
package Bar;
print while <DATA>;
1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
my $fn = "Bar.pm";
warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
seek Bar::DATA, $DH::pos{$fn}, 0;
Bar::DATA->input_line_number($DH::ln{$fn});
while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.
Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:
print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...
You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:
use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die $# if $#;
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...
None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.
Comparing the end of the file to itself in reverse might do what you want:
#!/usr/bin/perl
open my $f, "<", $0;
my #lines;
my #dataLines;
push #lines ,$_ while <$f>;
close $f;
push #dataLines, $_ while <DATA>;
my #revLines= reverse #lines;
my #revDataLines=reverse #dataLines;
my $count=#lines;
my $offset=0;
$offset++ while ($revLines[$offset] eq $revDataLines[$offset]);
$count-=$offset;
print "__DATA__ section is at line $count\n";
__DATA__
Hello there
"Some other __DATA__
lkjasdlkjasdfklj
ljkasdf
Running give a output of :
__DATA__ section is at line 19
The above script reads itself (using $0 for file name) into the #lines array and reads the DATA file into the #dataLines array.
The arrays are reversed and then compared element by element until they are different. The number of lines are tracked in $offset and this is subtracted from the $count variable which is the number of lines in the file.
The result is the line number the DATA section starts at. Hope that helps.
Thank you #mosvy for the clever and general idea.
Below is a consolidated solution which works anywhere. It uses a symbolic reference instead of eval to avoid mentioning "DATA" at compile time, but otherwise uses the same ideas as mosvy.
The important point is that code in a package containing __DATA__ must not refer to the DATA symbol by name so that that symbol won't be created until the compiler sees the __DATA__ token. The way to avoid mentioning DATA is to use a filehandle ref created at run-time.
# Get the DATA filehandle for a package (default: the caller's),
# fixed so that "$." provides the actual line number in the
# original source file where the last-read line of data came
# from, rather than counting from 1.
#
# In scalar context, returns the fixed filehandle.
# In list context, returns ($fh, $filename)
#
# For this to work, a package containing __DATA__ must not
# explicitly refer to the DATA symbol by name, so that the
# DATA symbol (glob) will not yet be created when the compiler
# encounters the __DATA__ token.
#
# Therefore, use the filehandle ref returned by this
# function instead of DATA!
#
sub get_DATA_fh(;$) {
my $pkg = $_[0] // caller;
# Using a symbolic reference to avoid mentioning "DATA" at
# compile time, in case we are reading our own module's __DATA__
my $fh = do{ no strict 'refs'; *{"${pkg}::DATA"} };
use B;
$fh->input_line_number( B::svref_2object(\$fh)->LINE );
wantarray ? ($fh, B::svref_2object(\$fh)->FILE) : $fh
}
Usage examples:
my $fh = get_DATA_fh; # read my own __DATA__
while (<$fh>) { print "$. : $_"; }
or
my ($fh,$fname) = get_DATA_fh("Otherpackage");
while (<$fh>) {
print " $fname line $. : $_";
}
I'm trying to find the number of positive (P) and negative integers (N), number of words with all lower case characters(L),all upper case characters(F), Number of words with the first character capital and the rest of characters lower case(U).
List of words in alphabetical order together with the line number and the filename of each occurrence The following example illustrates the output of the program on sample input.
file1
Hello! world my friend. ALI went to school. Ali has -1 dollars and 10 TL
file2
Hello there my friend. VELI went to school. Veli has 10,
dollars and -10,TL
After you run your program,
>prog.pl file1 file2
the output you get is as follows:
N=2
P=2
L=18
F=4
U=4
-----------
ali file1 (1 1)
and file1 (2) file2 (2)
dollars file1 (2) file2 (2)
friend file1 (1) file2 (1)
has file1 (1) file2 (1)
hello file1 (1) file2 (1)
my file1 (1) file2 (1)
school file1 (1) file2 (1)
there file2 (1)
tl file1 (2) file2 (2)
to file1 (1) file2 (1)
veli file2 (1 1)
went file1 (1) file2 (1)
world file1 (1)
I tried to fill the entries,could you help me to deal with it?
#!/usr/bin/perl
$N= 0 ;
$P= 0 ;
$L= 0 ;
$F= 0 ;
$U= 0 ;
foreach __________ ( ____________) {__________________
or die("Cannot opened because: $!") ;
$lineno = 0 ;
while($line=<>) {
chomp ;
$lineno++ ;
#tokens = split $line=~ (/[ ,.:;!\?]+/) ;
foreach $str (#tokens) {
$N++ if ($str =~ /^-\d+$/) ;
$P++ if ($str =~ /^\d+$/) ;
$L++ if ($str =~ /^[a-z]+$/) ;
$F++ if ($str =~ /^[A-Z][a-z]+$/) ;
$U++ if ($str =~ /^[A-Z]+$/) ;
if ($str =~ /^[a-zA-Z]+$/) {
$str =~ __________________;
if ( (____________________) || ($words{$str} =~ /\)$/ ) ) {
$words{$str} = $words{$str} . " " . $file . " (" . $lineno ;
}
else {_______________________________________;
}}}}
close(FH) ;
foreach $w (__________________) {
if ( ! ($words{$w} =~ /\)$/ )) {
$words{$w} = ______________________;
}}}
print "N=$N\n" ;
print "P=$P\n" ;
print "L=$L\n" ;
print "F=$F\n" ;
print "U=$U\n" ;
print "-----------\n" ;
foreach $w (sort(keys(%words))) {
print $w," ", $words{$w}, "\n";
}
A few hints, and I'll let you get on your way...
Perl has what is called a diamond operator. This operator opens all files placed on the command line (which is read into the #ARGS array), and reads them line-by-line.
use strict;
use warnings;
use autodie;
use feature qw(say);
while my $line ( <> ) {
chomp $line;
say "The line read in is '$line'";
}
Try this program and run it as you would your program. See what happens.
Next, take a look at the Perl documentation for variables related to file handles. Especially take a look at the $/ variable. This variable is what used to break records. It's normally set to a new-line, so when you read in a file, you read it in line-by-line. You may want to try that. If not, you can fall back onto something like this:
use strict;
use warnings;
use autodie;
use feature qw(say);
while my $line ( <> ) {
chomp $line;
#words = split /\s+/, $line;
for my $word ( #words ) {
say "The word is '$word'";
}
}
Now you can use a hash to track which words were in each file and how many times. You can also track the various types of words you've mentioned. However, please don't use variables such as $U. Use $first_letter_uppercase. This will have more meaning in your program and will be less confusing for you.
Your teacher is teaching you the way Perl was written almost 30 years ago. This was back before God created the Internet. (Well, not quite. The Internet was already 10 years old, but no one outside of a few academics had heard of it). Perl programming has greatly evolved since then. Get yourself a good book on Modern Perl (that is Perl 5.x).
The pragmas at the beginning of my program (the use statements) do the following:
use strict - Use strict syntax. This does several things, but the main thing is to make sure you cannot use a variable unless you first declare it. (using most likely my). This prevents mistakes such as putting $name in one place, and referring to $Name in another place.
use warnings - This warns you of basic errors such as you're attempting to use a variable that isn't defined. By default, Perl assumes the variable is a null string or equal to zero if you use it in an arithmetic context. When you attempt to print or check a variable that hasn't been assigned a value. It probably means you have a logic mistake.
The above two pragmas will catch 90% of your errors.
use autodie - This will cause your program to automatically die in many circumstances. For example, you attempt to open a none existent file for reading. This way, you don't have to remember to check each instance of whether or not certain operations succeeded of failed.
use feature qw(say) - This allows you to use say instead of print. The say command is just like print, but automatically adds a new line on the end. It can make your code way cleaner and easier to understand.
For example:
print "N=$N\n" ;
vs.
say "N=$N" ;
Here's how I'd write that program. But it won't get you many marks as it's a long way from the "fill in the blanks" approach that your teacher is using. But that's good, because your teacher's Perl is very dated.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my ($N, $P, $L, $F, $U);
my %words;
while (<>) {
my #tokens = split /[^-\w]+/;
foreach my $token (#tokens) {
$N++ if $token =~ /^-\d+$/;
$P++ if $token =~ /^\d+$/;
next unless $token =~ /[a-z]/i;
$L++ if $token eq lc $token;
$U++ if $token eq uc $token;
$F++ if $token eq ucfirst lc $token;
push #{$words{lc $token}{$ARGV}}, $.;
}
close ARGV if eof;
}
say "N=$N";
say "P=$P";
say "L=$L";
say "F=$F";
say "U=$U";
for my $word (sort { $a cmp $b } keys %words) {
print "$word ";
for my $file (sort { $a cmp $b } keys %{$words{$word}} ) {
print "$file (", join(' ', #{$words{$word}{$file}}), ') ';
}
print "\n";
}
I have an input file as following. I need to break them into multiple files based on the columns 2,3&5. The file has more columns but i have used cut command to get only the required columns.
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
Output files:
Accounts_India_Internal --
12,Accounts,India,free,Internal
Finance_China_Internal --
13,Finance,China,used,Internal
16,Finance,China,free,Internal
HR_India_External --
12,HR,India,free,External
10,HR,India,used,External
HR_China_Internal --
19,HR,China,used,Internal
and so on..
Please let me know how to achieve this.
As of now, I am thinking to sort the file based on these columns (2,3,5) and then run a loop on each record and start creating files. If a file does not exist, then create and add the record. Otherwise open the old file and add the record.
Is it possible to do this using shell scripting (bash)?
Is it possible to do this using shell scripting (bash)?
If you simply want to split the files based on fields 2, 3 and 5 you can do that quickly with awk:
awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
That appends each line to a file whose name is made up of fields 2, 3 and 5.
Example:
[me#home]$ awk -F, '{print >> $2"_"$3"_"$5}' infile.txt
[me#home]$ cat Accounts_India_Internal
12,Accounts,India,free,Internal
[me#home]$ cat Finance_China_Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
If you do want output sorted, you can first run the file through sort.
sort -k2,3 -k5,5 -t, infile.txt | awk -F, '{print >> $2"_"$3"_"$5}'
That sorts the lines on fields 2, 3, and 5 before passing them on to the awk command.
Do note that the we're appending to the files so if you repeat the command without deleting the output files, you'll end up with duplicate data in the output files. To address this, as well as include your additional requirements (using first line as header for all new files) as mentioned in the chat, see this solution.
I suggest you keep a hash of file handles keyed by their corresponding file names
This program demonstrates. The input file is expected as a parameter on the command line
use strict;
use warnings;
my %fh;
while (<>) {
chomp;
my $filename = join '_', (split /,/)[1,2,4];
if (not $fh{$filename}) {
open $fh{$filename}, '>', $filename or die "Unable to open '$filename' for output: $!";
print "$filename created\n";
}
print { $fh{$filename} } $_, "\n";
}
output
Accounts_India_Internal created
Finance_China_Internal created
HR_India_External created
HR_China_Internal created
Finance_Japan_Internal created
Accounts_US_External created
Accounts_Japan_External created
Finance_India_External created
HR_US_External created
Note: To use the code, simply change <DATA> to <> and use the file name as argument. The Data::Dumper print is there only for demonstration purposes and can also be removed.
use strict;
use warnings;
use Data::Dumper;
my %h;
while (<DATA>) {
chomp;
my #data = split /,/;
my $file = join "_", #data[1,2,4];
push #{$h{$file}}, $_;
}
print Dumper \%h;
__DATA__
12,Accounts,India,free,Internal
13,Finance,China,used,Internal
16,Finance,China,free,Internal
12,HR,India,free,External
19,HR,China,used,Internal
33,Finance,Japan,free,Internal
39,Accounts,US,used,External
14,Accounts,Japan,used,External
11,Finance,India,used,External
11,HR,US,used,External
10,HR,India,used,External
To print the files, you could use a subroutine like so:
for my $key (keys %h) {
print_file($key, $h{$key};
}
sub print_file {
my ($file, $data) = #_;
open my $fh, ">", $file or die $!;
print $fh "$_\n" for #$data;
}
save input text as foo, then:
cat foo | perl -nle '$k = join "_", (split ",", $_)[1,2,4]; $t{$k} = [#{$t{$k}}, $_]; END{for (keys %t){print join "\n", "$_ --", #{$t{$_}}, undef }}' | csplit -sz - '/^$/' {*}
I am new to Perl, but I know a little bit of C though. I came across this snippet in one of our classroom notes:
$STUFF = "C:/scripts/stuff.txt";
open STUFF or die "Cannot open $STUFF for read: $!";
print "Line $. is: $_" while (<STUFF>);
Why is the while after the print statement? What does it do?
It's the same as
while (<STUFF>) {
print "Line $. is : $_";
}
It's written the way it is because it's simpler and more compact. This is, in general, Perl's (in)famous "statement modifier" form for conditionals and loops.
The other answers have explained the statement modifier form of the while loop. However, there's a lot of other magic going on here. In particular, the script relies on three of Perl's special variables. Two of these ($_ and $!) are very common; the other ($.) is reasonably common. But they're all worth knowing.
When you run while <$fh> on an opened filehandle, Perl automagically runs through the file, line by line, until it hits EOF. Within each loop, the current line is set to $_ without you doing anything. So these two are the same:
while (<$fh>) { # something }
while (defined($_ = <$fh>)) { # something }
See perldoc perlop, the section on I/O operators. (Some people find this too magical, so they use while (my $line = <$fh>) instead. This gives you $line for each line rather than $_, which is a clearer variable name, but it requires more typing. To each his or her own.)
$! holds the value of a system error (if one is set). See perldoc perlvar, the section on $OS_ERROR, for more on how and when to use this.
$. holds a line number. See perldoc perlvar, the section on $NR. This variable can be surprisingly tricky. It won't necessarily have the line number of the file you are currently reading. An example:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
If you save this as lines and run it as perl lines file1 file2 file3, then Perl will count lines straight through file1, file2 and file3. You can see that Perl knows what file it's reading from (it's in $ARGV; the filenames will be correct), but it doesn't reset line numbering automatically for you at the end of each file. I mention this since I was bit by this behavior more than once until I finally got it through my (thick) skull. You can reset the numbering to track individual files this way:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
continue {
close ARGV if eof;
}
You should also check out the strict and warnings pragmas and take a look at the newer, three-argument form of open. I just noticed that you are "unknown (google)", which means you are likely never to return. I guess I got my typing practice for the day, at least.
The following snippets are exactly equivalent, just different ways of writing the same thing:
print "Line $. is : $_" while (<STUFF>);
while (<STUFF>) {
print "Line $. is : $_";
}
What this does is each iteration through the loop, Perl reads one line of text from the STUFF file and puts it in the special variable $_ (this is what the angle brackets do). Then the body of the loop prints lines like:
Line 1 is : test
Line 2 is : file
The special variable $. is the line number of the last line read from a file, and $_ is the contents of that line as set by the angle bracket operator.
Placing the while after the print, makes the line read almost like normal English.
It also puts emphasis on the print instead of the while. And you don't need the curly brackets: { ... }
It can also be used with if and unless, for example,
print "Debug: foobar=$foobar\n" if $DEBUG;
print "Load file...\n" unless $QUIET;
I've taken the liberty of rewriting your snippet as I would.
Below my suggested code is a rogues gallery of less than optimal methods you might see in the wild.
use strict;
use warnings;
my $stuff_path = 'c:/scripts/stuff.txt';
open (my $stuff, '<', $stuff_path)
or die "Cannot open'$stuff_path' for read : $!\n";
# My preferred method to loop over a file line by line:
# while loop with explicit variable
while( my $line = <$stuff> ) {
print "Line $. is : $line\n";
}
Here are other methods you might see. Each one could be substituted for the while loop in my example above.
# while statement modifier
# - OK, but less clear than explicit code above.
print "Line $. is : $_" while <$stuff>;
# while loop
# - OK, but if I am operating on $_ explicitly, I prefer to use an explicit variable.
while( <$stuff> ) {
print "Line $. is : $_";
}
# for statement modifier
# - inefficient
# - loads whole file into memory
print "Line $. is : $_" for <$stuff>;
# for loop - inefficient
# - loads whole file into memory;
for( <$stuff> ) {
print "Line $. is : $_\n";
}
# for loop with explicit variable
# - inefficient
# - loads whole file into memory;
for my $line ( <$stuff> ) {
print "Line $. is : $line\n";
}
# Exotica -
# map and print
# - inefficient
# - loads whole file into memory
# - stores complete output in memory
print map "Line $. is : $_\n", <$stuff>;
# Using readline rather than <>
# - Alright, but overly verbose
while( defined (my $line = readline($stuff) ) {
print "Line $. is : $line\n";
}
# Using IO::Handle methods on a lexical filehandle
# - Alright, but overly verbose
use IO::Handle;
while( defined (my $line = $stuff->readline) ) {
print "Line $. is : $line\n";
}
Note that the while statement can only follow your loop body if it's a one-liner. If your loop body runs over several lines, then your while has to precede it.
It is the same as a:
while (<STUFF>) { print "Line $. is: $_"; }
I have one file where the contents looks like:
pch
rch
channel
cap
nch
kappa
.
.
.
kary
ban
....
Now I want to read my file from nch to kary and copying those lines only in some other file. How can I do this in Perl?
If I understand your question correctly, this is pretty simple.
#!perl -w
use strict;
use autodie;
open my $in,'<',"File1.txt";
open my $out,'>',"File2.txt";
while(<$in>){
print $out $_ if /^nch/ .. /^kary/;
}
From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?
You can use Perl's somewhat exotic .. operator (documented in perlop):
perl -ne 'print if /START/ .. /END/' file1 file2 ...
If you wanted text and not lines, you would use
perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.
Here's another example of using ..:
while (<>) {
$in_header = 1 .. /^$/;
$in_body = /^$/ .. eof;
# now choose between them
} continue {
$. = 0 if eof; # fix $.
}
You could use this in 'sed':
sed -n /nch/,/kary/p $file
You could use 's2p' to convert this to Perl.
You could also write pure Perl:
while (<>)
{
next unless /nch/;
print;
while (<>)
{
print;
last if /kary/;
}
}
Strictly, both these solutions will print each set of lines from 'nch' to 'kary'; if 'nch' appears more than once, it will print more than one chunk of code. It is easy to fix that, especially in the pure Perl ('sed' solution left as an exercise for the reader).
OUTER:
while (<>)
{
next unless /nch/;
print;
while (<>)
{
print;
last OUTER if /kary/;
}
}
Also, the solutions look for 'nch' and 'kary' as part of the line - not for the whole line. If you need them to match the whole line, use '/^nch$/' etc as the regex.
Something like:
$filter = 0;
while (<>) {
chomp;
$filter = 1 if (! $filter && /^nch$/);
$filter = 0 if ($filter && /^ban$/);
print($_, "\n") if ($filter);
}
should work.
if you only want to read one block, in gawk
gawk '/kary/&&f{print;exit}/nch/{f=1}f' file
in Perl
perl -lne '$f && /kary/ && print && exit;$f=1 if/nch/; $f && print' file
or
while (<>) {
chomp;
if ($f && /kary/) {
print $_."\n";
last;
}
if (/nch/) { $f = 1; }
print $_ ."\n" if $f;
}