Count records in Perl - perl

Is there a built-in Perl variable that keeps track of how many records have been read in a while loop?
For example, suppose I do this:
my $count;
while (<>) {
$count++;
}
print $count;
Is there a way to do this without defining $count? That is, is there already some variable that contains this information?

$. will tell you the current line number for the current file being read.
Note that the variable resets on a close() call to the filehandle, so if the old file handle isn't closed when you start reading from a new one then the variable will keep incrementing even across files. However, if the filehandle is closed you'll have it reset to 0. For example, the code in your example and this code will continuously count across files being read:
foreach my $arg (#ARGV) {
open(I, $arg);
while(<I>) {
print $.,"\n";
}
}
But if you close the filehandle at any point before the next open call:
foreach my $arg (#ARGV) {
open(I, $arg);
while(<I>) {
print $.,"\n";
}
close(I); # NEW LINE
}
then it'll reset $. to zero again and you'll get unique counts per file.

There is no automatic loop counter in Perl. There are counters to count the current line number in a filehandle (see Wes Hardaker).
The loopcounter would be very complex (how to handle a loop inside a loop?).
So, back to the old $count++ :)

You can use a simple command line script, too:
perl -ne 'if (eof) {printf "%6d %s\n",$.,$ARGV;close #ARGV}' file1 file2 file3
10 file1
13921 file2
12 file3

Related

Using Perl to find and fix errors in CSV files

I am dealing with very large amounts of data. Every now and then there is a slip up. I want to identify each row with an error, under a condition of my choice. With that I want the row number along with the line number of each erroneous row. I will be running this script on a handful of files and I will want to output the report to one.
So here is my example data:
File_source,ID,Name,Number,Date,Last_name
1.csv,1,Jim,9876,2014-08-14,Johnson
1.csv,2,Jim,9876,2014-08-14,smith
1.csv,3,Jim,9876,2014-08-14,williams
1.csv,4,Jim,9876,not_a_date,jones
1.csv,5,Jim,9876,2014-08-14,dean
1.csv,6,Jim,9876,2014-08-14,Ruzyck
Desired output:
Row#5,4.csv,4,Jim,9876,not_a_date,jones (this is an erroneous row)
The condition I have chosen is print to output if anything in the date field is not a date.
As you can see, my desired output contains the line number where the error occurred, along with the data itself.
After I have my output that shows the lines within each file that are in error, I want to grab that line from the untouched original CSV file to redo (both modified and original files contain the same amount of rows). After I have a file of these redone rows, I can omit and clean up where needed to prevent interruption of an import.
Folder structure will contain:
Modified: 4.txt
Original: 4.csv
I have something started here, written in Perl, which by the logic will at least return the rows I need. However I believe my syntax is a little off and I do not know how to plug in the other subroutines.
Code:
$count = 1;
while (<>) {
unless ($F[4] =~ /\d+[-]\d+[-]\d+/)
print "Row#" . $count++ . "," . "$_";
}
The code above is supposed to give me my erroneous rows, but to be able to extract them from the originals is beyond me. The above code also contains some syntax errors.
This will do as you ask.
Please be certain that none of the fields in the data can ever contain a comma , otherwise you will need to use Text::CSV to process it instead of just a simple split.
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'example.csv';
<$fh>; # Skip header
while (<$fh>) {
my #fields = split /,/;
if( $fields[4] !~ /^\d{4}-\d{2}-\d{2}$/ ) {
print "Row#$.,$_";
}
}
output
Row#5,4.csv,4,Jim,9876,not_a_date,jones
Update
If you want to process a number of files then you need this instead.
The close ARGV at the end of the loop is there so that the line counter $. is reset to
1 at the start of each file. Without it it just continues from 1 upwards across all the files.
You would run this like
rob#Samurai-U:~$ perl findbad.pl *.csv
or you could list the files individually, separated by spaces.
For the test I have created files 1.csv and 2.csv which are identical to your example data except that the first field of each line is the name of the file containing the data.
You may not want the line in the output that announces each file name, in which case you should replace the entire first if block with just next if $. == 1.
use strict;
use warnings;
#ARGV = map { glob qq{"$_"} } #ARGV; # For Windows
while (<>) {
if ($. == 1) {
print "\n\nFile: $ARGV\n\n";
next;
}
my #fields = split /,/;
unless ( $fields[4] =~ /^\d{4}-\d{2}-\d{2}$/ ) {
printf "Row#%d,%s", $., $_;
}
close ARGV if eof ARGV;
}
output
File: 1.csv
Row#5,1.csv,4,Jim,9876,not_a_date,jones
File: 2.csv
Row#5,2.csv,4,Jim,9876,not_a_date,jones

Read a file from second line till end in perl

I am having a file which has so many lines. I want to discard first line and
trying to read a file from second line till end but not getting enough help on google.
Please help me out in this case.
Below is the code in which I am trying to extract 4 and 5 column of a csv file however It is including first line that is header as well, that I do not want.
My code should get me only values not headers. that are starting from second line.
foreach my $inputfile (glob("$previous_path/*Analysis*.txt")) {
open(INFILE, $inputfile) or die("Could not open file.");
foreach my $line (<INFILE>){
my #values = split(',', $line); # parse the file
my $previous_result = $values[5];
my $previous_time = $values[4];
print $previous_result,"\n";
print $previous_time,"\n";
push (#previous_result, $previous_result);
push (#previous_time, $previous_time);
}
close(INFILE);
}
Just skip the first line, then read the rest.
<>; # read and discard a line
while (<>) { # loop over the other lines
print $_
}
UPDATE: after you've edited the question, it turns out you want something completely different, to
read a CSV file in Perl
That is a completely different question, and what you should have asked for in the first place. The answer is to use an established library, like CSV::Slurp
Just skip line number ($.) 1, perhaps using next, like this:
while (<>) {
next if ($. == 1);
print $_;
}
Live demo.
u can skip the first line while reading the file itself
ex.
open(IN,"cat filename|tail -n +2|") || die "can not open file :$!";
while(<IN>){
//process further
}
close(IN);

Perl - while (<>) file handling [duplicate]

This question already has an answer here:
Which file is Perl's diamond operator (null file handle) currently reading from?
(1 answer)
Closed 10 years ago.
A simple program with while( <> ) handles files given as arguments (./program 1.file 2.file 3.file) and standard input of Unix systems.
I think it concatenates them together in one file and work is line by line. The problem is, how do I know that I'm working with the first file? And then with the second one.
For a simple example, I want to print the file's content in one line.
while( <> ){
print "\n" if (it's the second file already);
print $_;
}
The diamond operator does not concatenate the files, it just opens and reads them consecutively. How you control this depends on how you need it controlled. A simple way to check when we have read the last line of a file is to use eof:
while (<>) {
chomp; # remove newline
print; # print the line
print "\n" if eof; # at end of file, print a newline
}
You can also consider a counter to keep track of which file in order you are processing
$counter++ if eof;
Note that this count will increase by one at the last line of the file, so do not use it prematurely.
If you want to keep track of line number $. in the current file handle, you can close the ARGV file handle to reset this counter:
while (<>) {
print "line $. : ", $_;
close ARGV if eof;
}
The <> is a special case of the readline operator. It usually takes a filehandle: <$fh>.
If the filehandle is left out, then the the magic ARGV filehandle is used.
If no command line arguments are given, then ARGV is STDIN. If command line arguments are given, then ARGV will be opened to each of those in turn. This is similar to
# Pseudocode
while ($ARGV = shift #ARGV) {
open ARGV, $ARGV or do{
warn "Can't open $ARGV: $!";
next;
};
while (<ARGV>) {
...; # your code
}
}
The $ARGV variable is real, and holds the filename of the file currently opened.
Please be aware that the two-arg form of open (which is probably used here behind the scenes), is quite unsafe. The filename rm -rf * | may not do what you want.
The name of the current file for <> is contained in special $ARGV variable.
You can cross-match your list of files from #ARGV parameter array with current file name to get the file's position in the list. Assuming the only parameters you expect are filenames, you can simply do:
my %filename_positions = map { ( $ARGV[$_] => $_ ) } 0..$#ARGV;
while (<>) {
my $file_number = $filename_positions{$ARGV};
#... if ($file_number == 0) { #first file
}

Perl: How to add a line to sorted text file

I want to add a line to the text file in perl which has data in a sorted form. I have seen examples which show how to append data at the end of the file, but since I want the data in a sorted format.
Please guide me how can it be done.
Basically from what I have tried so far :
(I open a file, grep its content to see if the line which I want to add to the file already exists. If it does than exit else add it to the file (such that the data remains in a sorted format)
open(my $FH, $file) or die "Failed to open file $file \n";
#file_data = <$FH>;
close($FH);
my $line = grep (/$string1/, #file_data);
if($line) {
print "Found\n";
exit(1);
}
else
{
#add the line to the file
print "Not found!\n";
}
Here's an approach using Tie::File so that you can easily treat the file as an array, and List::BinarySearch's bsearch_str_pos function to quickly find the insert point. Once you've found the insert point, you check to see if the element at that point is equal to your insert string. If it's not, splice it into the array. If it is equal, don't splice it in. And finish up with untie so that the file gets closed cleanly.
use strict;
use warnings;
use Tie::File;
use List::BinarySearch qw(bsearch_str_pos);
my $insert_string = 'Whatever!';
my $file = 'something.txt';
my #array;
tie #array, 'Tie::File', $file or die $!;
my $idx = bsearch_str_pos $insert_string, #array;
splice #array, $idx, 0, $insert_string
if $array[$idx] ne $insert_string;
untie #array;
The bsearch_str_pos function from List::BinarySearch is an adaptation of a binary search implementation from Mastering Algorithms with Perl. Its convenient characteristic is that if the search string isn't found, it returns the index point where it could be inserted while maintaining the sort order.
Since you have to read the contents of the text file anyway, how about a different approach?
Read the lines in the file one-by-one, comparing against your target string. If you read a line equal to the target string, then you don't have to do anything.
Otherwise, you eventually read a line 'greater' than your current line according to your sort criteria, or you hit the end of the file. In the former case, you just insert the string at that position, and then copy the rest of the lines. In the latter case, you append the string to the end.
If you don't want to do it that way, you can do a binary search in #file_data to find the spot to add the line without having to examine all of the entries, then insert it into the array before outputting the array to the file.
Here's a simple version that reads from stdin (or filename(s) specified on command line) and appends 'string to append' to the output if it's not found in the input. Outuput is printed on stdout.
#! /usr/bin/perl
$found = 0;
$append='string to append';
while(<>) {
$found = 1 if (m/$append/o);
print
}
print "$append\n" unless ($found);;
Modifying it to edit a file in-place (with perl -i) and taking the append string from the command line would be quite simple.
A 'simple' one-liner to insert a line without using any module could be:
perl -ni -le '$insert="lemon"; $eq=($insert cmp $_); if ($eq == 0){$found++}elsif($eq==-1 && !$found){print$insert} print'
giver a list.txt whose context is:
ananas
apple
banana
pear
the output is:
ananas
apple
banana
lemon
pear
{
local ($^I, #ARGV) = ("", $file); # Enable in-place editing of $file
while (<>) {
# If we found the line exactly, bail out without printing it twice
last if $_ eq $insert;
# If we found the place where the line should be, insert it
if ($_ gt $insert) {
print $insert;
print;
last;
}
print;
}
# We've passed the insertion point, now output the rest of the file
print while <>;
}
Essentially the same answer as pavel's, except with a lot of readability added. Note that $insert should already contain a trailing newline.

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.