Read in first 5 lines, print specified line - perl

I am trying to read in the first 5 lines of my input file and print only a single line (in this case line 4) that is given to the perl script from the command line. I am having some trouble comparing the current line number to the specified line number.
Here is the important part of my perl script:
# Variables
my $sInputFile = $ARGV[0];
my $sOutputFile = $ARGV[1];
my $sRowExtractNumber = $ARGV[2];
# Open-Close / Exceptions
open(my $in, "<", $sInputFile) or die "cannot open output file: $sOutputFile\n";
open(my $out, ">", $sOutputFile) or die "cannot open input file: $sInputFile\n";
# Script
while (<$in>) {
if (1..5) {
print $out $_ if $_ == $sRowExtractNumber;
}
}
I am not getting an error per say, but nothing is being printed to the $out file.
How can I accomplish my goal?
Thanks.

The $. variable is the current input line number. I presume you used $_ by mistake.
There is no need to check both that the line number is five or less and that it matches the requested line, although you may want to verify your input beforehand.
You must always use strict and use warnings at the top of every module. It is a simple measure that will alert you to many trivial mistakes that you could otherwise easily overlook. And you should use lower case letters for local identifiers: upper case is reserved for globals such as package names.
use strict;
use warnings;
my ($input_file, $output_file, $row_extract_number) = #ARGV;
die "Line number must be five or less" if $row_extract_number < 1 or $row_extract_number > 5;
open my $in, '<', $input_file or die qq{Cannot open "$input_file" for input: $!}
open my $out, '>', $output_file or die qq{Cannot open "$output_file" for output: $!}
while (<$in>) {
if ($. == $row_extract_number) {
print $out $_;
last;
}
}

Related

Counting number of lines with conditions

This is my script count.pl, I am trying to count the number of lines in a file.
The script's code :
chdir $filepath;
if (-e "$filepath"){
$total = `wc -l < file.list`;
printf "there are $total number of lines in file.list";
}
i can get a correct output, but i do not want to count blank lines and anything in the file that start with #. any idea ?
As this is a Perl program already open the file and read it, filtering out lines that don't count with
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
where $filename is "file.list." If by "blank lines" you mean also lines with spaces only then chagne regex to /^\s*$|^\s*#/. See grep, and perlretut for regex used in its condition.
That filehandle $fh gets closed when the control exits the current scope, or add close $fh; after the file isn't needed for processing any more. Or, wrap it in a block with do
my $num_lines = do {
open my $fh, '<', $filename or die "Can't open $filename: $!";
grep { not /^$|^\s*#/ } <$fh>;
};
This makes sense doing if the sole purpose of opening that file is counting lines.
Another thing though: an operation like chdir should always be checked, and then there is no need for the race-sensitive if (-e $filepath) either. Altogether
# Perhaps save the old cwd first so to be able to return to it later
#my $old_cwd = Cwd::cwd;
chdir $filepath or die "Can't chdir to $filepath: $!";
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
A couple of other notes:
There is no reason for printf. For all normal prints use say, for which you need use feature qw(say); at the beginning of the program. See feature pragma
Just in case, allow me to add: every program must have at the beginning
use warnings;
use strict;
Perhaps the original intent of the code in the question is to allow a program to try a non-existing location, and not die? In any case, one way to keep the -e test, as asked for
#my $old_cwd = Cwd::cwd;
chdir $filepath or warn "Can't chdir to $filepath: $!";
my $num_lines;
if (-e $filepath) {
open my $fh, '<', $filename or die "Can't open $filename: $!";
$num_lines = grep { not /^$|^\s*#/ } <$fh>;
}
where I still added a warning if chdir fails. Remove that if you really don't want it. I also added a declaration of the variable that is assigned the number of lines, with my $total_lines;. If it is declared earlier in your real code then of course remove that line here.
perl -ne '$n++ unless /^$|^#/ or eof; print "$n\n" if eof'
Works with multiple files too.
perl -ne '$n++ unless /^$|^#/ or eof; END {print "$n\n"}'
Better for a single file.
open(my $fh, '<', $filename);
my $n = 0;
for(<$fh>) { $n++ unless /^$|^#/}
print $n;
Using sed to filter out the "unwanted" lines in a single file:
sed '/^\s*#/d;/^\s*$/d' infile | wc -l
Obviously, you can also replace infile with a list of files.
The solution is very simple, no any magic.
use strict;
use warnings;
use feature 'say';
my $count = 0;
while( <> ) {
$count++ unless /^\s*$|^\s*#/;
}
say "Total $count lines";
Reference:
<>

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

Perl: comparing words in two files

This is my current script to try and compare the words in file_all.txt to the ones in file2.txt. It should print out any of the words in file_all that are not in file2.
I need to format these as one word per line, but that's not the more pressing issue.
I am new to Perl ... I get C and Python more but this is being a bit tricky, I know my variable assignment is off.
use strict;
use warnings;
my $file2 = "file_all.txt"; %I know my assignment here is wrong
my $file1 = "file2.txt";
open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
while ( my $line = <$file2> ) {
++$file2{$line};
}
open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
while ( my $line = <$file1> ) {
print $line unless $file2{$line};
}
EDIT: OH, it should ignore case... like Pie is the same as PIE when comparing. and remove apostrophes
These are the errors I am getting:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.
Your error messages:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.
You are assigning a file name to $file2, and then later you are using open my $file2 ... The use of my $file2 in the second case masks the use in the first case. Then, in the body of the while loop, you pretend there is a hash table %file2, but you haven't declared it at all.
You should use more descriptive variable names to avoid conceptual confusion.
For example:
my #filenames = qw(file_all.txt file2.txt);
Using variables with integer suffixes is a code smell.
Then, factor common tasks to subroutines. In this case, what you need are: 1) A function that takes a filename and returns a table of words in that file, and 2) A function that takes a filename, and a lookup table, and prints words that are in the file, but do not appear in the lookup table.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
my #filenames = qw(file_all.txt file2.txt);
print "$_\n" for #{ words_notseen(
$filenames[0],
words_from_file($filenames[1])
)};
sub words_from_file {
my $filename = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
$words{ lc $_ } = 1 for split ' ', $line;
}
close $fh
or croak "Failed to close '$filename': $!";
return \%words;
}
sub words_notseen {
my $filename = shift;
my $lookup = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
for my $word (split ' ', $line) {
unless (exists $lookup->{$word}) {
$words{ $word } = 1;
}
}
}
return [ keys %words ];
}
You are almost there.
The % sigil denotes a hash. You can't store a file name in a hash, you need a scalar for that.
my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';
You need a hash to count the occurrences.
my %count;
To open a file, specify its name - it's stored in the scalar, do you remember?
open my $FH, '<', $file2 or die "Can't open $file2: $!";
Then, process the file line by line:
while (my $line = <$FH> ) {
chomp; # Remove newline if present.
++$count{lc $line}; # Store the lowercased string.
}
Then, open the second file, process it line by line, use lc again to get the lowercased string.
To remove apostophes, use a substitution:
$line =~ s/'//g; # Replace ' by nothing globally (i.e. everywhere).
As you have mention in your question: It should print out any of the words in file_all that are not in file2
This below small code does this:
#!/usr/bin/perl
use strict;
use warnings;
my ($file1, $file2) = qw(file_all.txt file2.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
while (<$fh1>)
{
last if eof($fh2);
my $compline = <$fh2>;
chomp($_, $compline);
if ($_ ne $compline)
{
print "$_\n";
}
}
file_all.txt:
ab
cd
ee
ef
gh
df
file2.txt:
zz
yy
ee
ef
pp
df
Output:
ab
cd
gh
The issue is the following two lines:
my %file2 = "file_all.txt";
my %file1 = "file2.txt";
Here you are assigning a single value, called a SCALAR in Perl, to a Hash (denoted by the % sigil). Hashes consist of key value pairs separated by the arrow operator (=>). e.g.
my %hash = ( key => 'value' );
Hashes expect an even number of arguments because they must be given both a key and a value. You currently only give each Hash a single value, thus this error is thrown.
To assign a value to a SCALAR, you use the $ sigil:
my $file2 = "file_all.txt";
my $file1 = "file2.txt";

Extract file contents between given lines using perl

I want to use only Sed in Perl to capture the file contents between 1000 and 2000 lines in a given file.
I tried the below but it didn't work,Can someone help me on this please.
$firstLIne="1000";
$lastline="2000";
$output=`sed -n '$firstLIne,$lastline'p sample.txt`;
Here is another pure perl solution:
my ($firstline, $lastline) = (1000,2000);
open my $fh, '<', 'sample.txt' or die "$!";
while(<$fh>){
print if $. == $firstline .. $. == $lastline;
}
if you don't use the variables anywhere else, you can use the special use case of .. with constants (4th paragraph if you use constant expression they automatically get compared to $.):
while(<$fh>){
print if 1000 .. 2000;
}
Here is the important part from the perldoc for the .. operator:
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors.
Edit Per request, with storing the intermediate lines in a variable.
my ($firstline, $lastline) = (1000,2000);
my $output = '';
open my $fh, '<', 'sample.txt' or die $!;
while(<$fh>){
$output .= $_ if $. == $firstline .. $. == $lastline;
}
print $ouput;
Also, if your file isn't too big (it fits completely into memory) you also can read it into a list and select the lines you're interested in:
my $output = join '', (<$fh>)[$firstline+1..$lastline]
For comparison, to do this in Perl only, one could write:
my $firstLine=1000;
my $lastLine=2000;
my $fn="sample.txt";
my $output;
open (my $fh, "<", $fn) or die "Could not open file '$fn': $!\n";
while (<$fh>) {
last if $. > $lastLine;
$output .= $_ if $. >= $firstLine;
}
close($fh);
Note that this will stop reading from file after line $lastLine.. so if the file contains 100,000 lines it will only read the first 2000 lines..
If you just want to print out the lines then:
perl -ne 'print if 1000 .. 2000' example_data.txt
should work.
If you want to incorporate that into a script somehow then you can "semi-slurp" the filehandle:
use strict;
use warnings;
open my $filehandle, 'example_data.txt' or die $!;
my $lines_1k_to_2k ;
while (<$filehandle>) {
$lines_1k_to_2k .= $_ if 1000 .. 2000 ;
}
print $lines_1k_to_2k ;
The .= operator will add the lines to the string in variable $lines_1k_to_2k only if they are in the range 1000 .. 2000

Select rows based on text pattern

I want to extract rows from a file that match a particular pattern and I want to do this for over 500 files. It should have the ability to retain the unique name of the file as well.
I used awk but then i have to do each file individually.
c:\>gawk "/S1901/" Census_Tract_*.csv > Census_Tract_*.csv
In the example shown in the link here (http://bit.ly/nMX8qh) I want to retain only those records that have S1901 in them. Apologies for the external link but i am not able to retain formatting of the table.
I found some perl code that I used to write it but it retains all the rows and does not select only those rows/records where the pattern matches. Any tips would be much appreciated. The perl code is below:
#perl -w
$pattern = "Subject_Census*.csv"; # process only those files that match pattern
while (defined ($in = glob($pattern))) {
($out = $in) =~ s/\.csv$/.outcsv/; # read from "xyz.in" and write to "xyz.out"
open (IN, "<", $in) or die "Can't open $in for reading: $!";
open (OUT,">>", $out) or die "Can't open $out for writing: $!";
while (<IN>) {
$mystring =~ /S1901/;
print OUT $_ if $mystring == 0;
}
close (IN) or die "Can't close $in: $!"; # good idea to do some housekeeping
close (OUT) or die "Can't close $out: $!";
}
Untested:
use strict;
use warnings;
use autodie;
my $files_list_filename = 'files.txt';
open my $fl, '<', $files_list_filename;
my #list_of_files = <$fl>;
chomp #list_of_files;
close $fl;
foreach my $file ( #list_of_files ) {
open my $test_fh, '<', $file;
while ( my $line = <$test_fh> ) {
if( $line =~ m/S1901/ ) {
print "$file at $.: $line";
}
}
close $test_fh;
}
Is that sort of what you had in mind? It opens a file named filelist.txt and reads in a list of however many filenames you want to give it. Then it iterates over that list, opening each file one by one, scanning each file one by one, and if a line is found containing the trigger text, it prints the filename and line number, as well as the line itself where the trigger was met. Then it moves on to the next.
perl -ni.bak -e 'print if /S1901/' Subject_Census*.csv