Read a file from second line till end in perl - perl

I am having a file which has so many lines. I want to discard first line and
trying to read a file from second line till end but not getting enough help on google.
Please help me out in this case.
Below is the code in which I am trying to extract 4 and 5 column of a csv file however It is including first line that is header as well, that I do not want.
My code should get me only values not headers. that are starting from second line.
foreach my $inputfile (glob("$previous_path/*Analysis*.txt")) {
open(INFILE, $inputfile) or die("Could not open file.");
foreach my $line (<INFILE>){
my #values = split(',', $line); # parse the file
my $previous_result = $values[5];
my $previous_time = $values[4];
print $previous_result,"\n";
print $previous_time,"\n";
push (#previous_result, $previous_result);
push (#previous_time, $previous_time);
}
close(INFILE);
}

Just skip the first line, then read the rest.
<>; # read and discard a line
while (<>) { # loop over the other lines
print $_
}
UPDATE: after you've edited the question, it turns out you want something completely different, to
read a CSV file in Perl
That is a completely different question, and what you should have asked for in the first place. The answer is to use an established library, like CSV::Slurp

Just skip line number ($.) 1, perhaps using next, like this:
while (<>) {
next if ($. == 1);
print $_;
}
Live demo.

u can skip the first line while reading the file itself
ex.
open(IN,"cat filename|tail -n +2|") || die "can not open file :$!";
while(<IN>){
//process further
}
close(IN);

Related

Using Perl to find and fix errors in CSV files

I am dealing with very large amounts of data. Every now and then there is a slip up. I want to identify each row with an error, under a condition of my choice. With that I want the row number along with the line number of each erroneous row. I will be running this script on a handful of files and I will want to output the report to one.
So here is my example data:
File_source,ID,Name,Number,Date,Last_name
1.csv,1,Jim,9876,2014-08-14,Johnson
1.csv,2,Jim,9876,2014-08-14,smith
1.csv,3,Jim,9876,2014-08-14,williams
1.csv,4,Jim,9876,not_a_date,jones
1.csv,5,Jim,9876,2014-08-14,dean
1.csv,6,Jim,9876,2014-08-14,Ruzyck
Desired output:
Row#5,4.csv,4,Jim,9876,not_a_date,jones (this is an erroneous row)
The condition I have chosen is print to output if anything in the date field is not a date.
As you can see, my desired output contains the line number where the error occurred, along with the data itself.
After I have my output that shows the lines within each file that are in error, I want to grab that line from the untouched original CSV file to redo (both modified and original files contain the same amount of rows). After I have a file of these redone rows, I can omit and clean up where needed to prevent interruption of an import.
Folder structure will contain:
Modified: 4.txt
Original: 4.csv
I have something started here, written in Perl, which by the logic will at least return the rows I need. However I believe my syntax is a little off and I do not know how to plug in the other subroutines.
Code:
$count = 1;
while (<>) {
unless ($F[4] =~ /\d+[-]\d+[-]\d+/)
print "Row#" . $count++ . "," . "$_";
}
The code above is supposed to give me my erroneous rows, but to be able to extract them from the originals is beyond me. The above code also contains some syntax errors.
This will do as you ask.
Please be certain that none of the fields in the data can ever contain a comma , otherwise you will need to use Text::CSV to process it instead of just a simple split.
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'example.csv';
<$fh>; # Skip header
while (<$fh>) {
my #fields = split /,/;
if( $fields[4] !~ /^\d{4}-\d{2}-\d{2}$/ ) {
print "Row#$.,$_";
}
}
output
Row#5,4.csv,4,Jim,9876,not_a_date,jones
Update
If you want to process a number of files then you need this instead.
The close ARGV at the end of the loop is there so that the line counter $. is reset to
1 at the start of each file. Without it it just continues from 1 upwards across all the files.
You would run this like
rob#Samurai-U:~$ perl findbad.pl *.csv
or you could list the files individually, separated by spaces.
For the test I have created files 1.csv and 2.csv which are identical to your example data except that the first field of each line is the name of the file containing the data.
You may not want the line in the output that announces each file name, in which case you should replace the entire first if block with just next if $. == 1.
use strict;
use warnings;
#ARGV = map { glob qq{"$_"} } #ARGV; # For Windows
while (<>) {
if ($. == 1) {
print "\n\nFile: $ARGV\n\n";
next;
}
my #fields = split /,/;
unless ( $fields[4] =~ /^\d{4}-\d{2}-\d{2}$/ ) {
printf "Row#%d,%s", $., $_;
}
close ARGV if eof ARGV;
}
output
File: 1.csv
Row#5,1.csv,4,Jim,9876,not_a_date,jones
File: 2.csv
Row#5,2.csv,4,Jim,9876,not_a_date,jones

How to read large files with different line delimiters?

I have two very large XML files that have different kinds of line endings.
File A has CR LF at the end of each XML record. File B has only CR at the end of each XML record.
In order to read File B properly, I need to set the built-in Perl variable $/ to "\r".
But if I'm using the same script with File A, the script does not read each line in the file and instead reads it as a single line.
How can I make the script compatible with text files that have various line ending delimiters? In the code below, the script is reading XML data and then using regex to split records based on a specific XML tag record ending tag like <\record>. Finally it writes the requested records to a file.
open my $file_handle, '+<', $inputFile or die $!;
local $/ = "\r";
while(my $line = <$file_handle>) { #read file line-by-line. Does not load whole file into memory.
$current_line = $line;
if ($spliceAmount > $recordCounter) { #if the splice amount hasn't been reached yet
push (#setofRecords,$current_line); #start adding each line to the set of records array
if ($current_line =~ m|$recordSeparator|) { #check for the node to splice on
$recordCounter ++; #if the record separator was found (end of that record) then increment the record counter
}
}
#don't close the file because we need to read the last line
}
$current_line =~/(\<\/\w+\>$)/;
$endTag = $1;
print "\n\n";
print "End Tag: $endTag \n\n";
close $file_handle;
While you may not need it for this, in theory, to parse .xml, you should use an xml parser. I'd recommend XML::LibXM or perhaps to start off with XML::Simple.
If the file isn't too big to hold in memory, you can slurp the whole thing into a scalar and split it into the correct lines yourself with a suitably flexible regular expression. For example,
local $/ = undef;
my $data = <$file_handle>;
my #lines = split /(?>\r\n)|(?>\r)|(?>\n)/, $data;
foreach my $line (#lines) {
...
}
Using a look-ahead assertion (?>...) preserves the end-of-line characters like the regular <> operator does. If you were just going to chomp them anyway, you can save yourself a step by passing /\r\n|\r|\n/ to split instead.

Perl: How to add a line to sorted text file

I want to add a line to the text file in perl which has data in a sorted form. I have seen examples which show how to append data at the end of the file, but since I want the data in a sorted format.
Please guide me how can it be done.
Basically from what I have tried so far :
(I open a file, grep its content to see if the line which I want to add to the file already exists. If it does than exit else add it to the file (such that the data remains in a sorted format)
open(my $FH, $file) or die "Failed to open file $file \n";
#file_data = <$FH>;
close($FH);
my $line = grep (/$string1/, #file_data);
if($line) {
print "Found\n";
exit(1);
}
else
{
#add the line to the file
print "Not found!\n";
}
Here's an approach using Tie::File so that you can easily treat the file as an array, and List::BinarySearch's bsearch_str_pos function to quickly find the insert point. Once you've found the insert point, you check to see if the element at that point is equal to your insert string. If it's not, splice it into the array. If it is equal, don't splice it in. And finish up with untie so that the file gets closed cleanly.
use strict;
use warnings;
use Tie::File;
use List::BinarySearch qw(bsearch_str_pos);
my $insert_string = 'Whatever!';
my $file = 'something.txt';
my #array;
tie #array, 'Tie::File', $file or die $!;
my $idx = bsearch_str_pos $insert_string, #array;
splice #array, $idx, 0, $insert_string
if $array[$idx] ne $insert_string;
untie #array;
The bsearch_str_pos function from List::BinarySearch is an adaptation of a binary search implementation from Mastering Algorithms with Perl. Its convenient characteristic is that if the search string isn't found, it returns the index point where it could be inserted while maintaining the sort order.
Since you have to read the contents of the text file anyway, how about a different approach?
Read the lines in the file one-by-one, comparing against your target string. If you read a line equal to the target string, then you don't have to do anything.
Otherwise, you eventually read a line 'greater' than your current line according to your sort criteria, or you hit the end of the file. In the former case, you just insert the string at that position, and then copy the rest of the lines. In the latter case, you append the string to the end.
If you don't want to do it that way, you can do a binary search in #file_data to find the spot to add the line without having to examine all of the entries, then insert it into the array before outputting the array to the file.
Here's a simple version that reads from stdin (or filename(s) specified on command line) and appends 'string to append' to the output if it's not found in the input. Outuput is printed on stdout.
#! /usr/bin/perl
$found = 0;
$append='string to append';
while(<>) {
$found = 1 if (m/$append/o);
print
}
print "$append\n" unless ($found);;
Modifying it to edit a file in-place (with perl -i) and taking the append string from the command line would be quite simple.
A 'simple' one-liner to insert a line without using any module could be:
perl -ni -le '$insert="lemon"; $eq=($insert cmp $_); if ($eq == 0){$found++}elsif($eq==-1 && !$found){print$insert} print'
giver a list.txt whose context is:
ananas
apple
banana
pear
the output is:
ananas
apple
banana
lemon
pear
{
local ($^I, #ARGV) = ("", $file); # Enable in-place editing of $file
while (<>) {
# If we found the line exactly, bail out without printing it twice
last if $_ eq $insert;
# If we found the place where the line should be, insert it
if ($_ gt $insert) {
print $insert;
print;
last;
}
print;
}
# We've passed the insertion point, now output the rest of the file
print while <>;
}
Essentially the same answer as pavel's, except with a lot of readability added. Note that $insert should already contain a trailing newline.

How check whether csv file with header is empty or not in perl

I have a csv file xyz.csv with contents
name,age,place,phone
xyz,12,ohio,7372829
sed,23,hrh,9890908
I need to parse the csv file and check whether it is a empty file i.e. it doesn't have the values for the headers provided.
a null file xyz.csv would contain just the headers (headers no may decrease or increase) e.g. decreased:
name,age,place,phone
increased:
name,age,place,mob,phno,ht
how do i check for a null file in below code and print whether is is null or not?
i have developed below script to parse the csv
open(my $data, '<', $file_c) or die "Could not open '$file_c' $!\n";
while (my $line = <$data>)
{
next if ($. == 1);
chomp $line;
my #fields = split "," , $line;
print"$fields[0] fields[1]";
}
If I were you I'd look into one of the CSV handling CPAN modules, e.g. Text::CSV, Tie::CSV_File or DBD::CSV.
To check if the file is empty would be a simple case of counting the number of parsed lines in the file. With DBI and DBD::CSV you could use a SELECT COUNT(*) FROM table_name SQL statement.
The following link provides a quick tutorial on parsing CSV files with perl: http://perlmeme.org/tutorials/parsing_csv.html
You can use the range operator search for any lines that aren't the first line. This should be pretty efficient:
while (<$data>)
{
unless (1..1) { print "not null\n"; exit 0; }
}
print "null\n";
exit 1;
Or you could just pluck the lines off one by one - if the second is defined, then it's not null:
<$data>;
print (defined <$data> ? "not null" : "null");
You cound any how check for the number of lines and if you are sure that alwys the header is present.then the first line is always the header.
so if line 2 is not present or length of line2 is 0 then its an empty file.
This should work.From your shell prompt check whether wc -l < FNAME returns only line cnt.
print "empty" if chomp(my $dummy = `wc -l < FNAME`) == 1;
It may be in-efficient, though.

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.