Perl 5.12.3 fails to loop CSV file line by line - perl

I'm sure someone has an explanation as to what is happening with the following script:
Please note, the file I specify is available and is opening. I know this because the last line of the file is output when the program is run, but it is only the last line.
Note about the .csv file: it's generated on windows (I'm using OS X 10.7.4 with Perl 5.12.3) and uses \r line breaks. I attempted to tell perl that the line break character was \r at the top of the script but it does not work. I know they're \r as the grep search finds them in a text editor.
The script runs and only prints the last line of the file. If I plug in a regular expression it will grab the first matching field from the first line and echo it fine, but I cannot iterate over the entire file.
Any clarification is appreciated as I am new to perl.
#!/usr/bin/perl
use warnings;
print "Please enter your filename:";
my ($dataline);
open(INFO,'./expensereport.csv') || die("can't open datafile: $!");
while (my $line = <INFO>) {
chomp $line;
print $line;
}
print $!;

The carriage returns without linefeed are causing print to overwrite each line on the same line, so all you see is the last.
Run dos2unix on your input file before processing.

There are several ways to tell perl that your input file is windows-style :crlf.
perldoc -f binmode or perldoc -f open
open(INFO, '<:crlf', './expensereport.csv')
...

Ahh, that's clear! :)
Look, you have a file with \r (carriage return, literally) and \n (newline). chomp cuts off \n (new line). So you print over the same line (remember "carriage return") again and again.
Use print "$line\n"; instead

Related

How to remove newline from the end of a file using Perl

I have a file that reads like this:
dog cat mouse
apple orange pear
red yellow green
There is a tab \t separating the words on each row, and a newline \n separating each of the rows. Below the last line, red yellow green there is a blank line due to a newline \n after green.
I would like to use Perl to remove the newline.
I have seen a few articles like this How can I delete a newline if it is the last character in a file? that give solutions for Perl, but I would like to do this in hard code so that I can incorporate it into my Perl script.
I don't know if this might be possible using chomp, or if chomp works on each line separately (I would like to keep the newline between lines).
Also I have seen previously comments that suggest maintaining a newline at the end of a file because Unix commands work better when a file ends with a newline. However, I have created a script which relies on input files not ending with a newline, therefore I really feel removing the newlines is necessary for my work.
You can try this:
perl -pe 'chomp if eof' file.txt
Here is another simple way, if you need it in a script:
open $fh, "file.txt";
#lines=<$fh>; # read all lines and store in array
close $fh;
chomp $lines[-1]; # remove newline from last line
print #lines;
Or something like this (in script), as suggested by jnhc for the command line:
open $fh, "file.txt";
while (<$fh>) {
chomp if eof $fh;
print;
}
close $fh;

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

CR vs LF perl parsing

I have a perl script which parses a text file and breaks it up per line into an array.
It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly.
How can I modify this line to fix this
my #allLines = split(/^/, $entireFile);
edit:
My file has a mixture of lines with either
ending LF or ending CR it just collapses all lines when its ending in CR
Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:
open(my $in, '<:crlf', $filename);
will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\r" and it will read line-by-line (but it won't change the CR to a LF).
If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:
open(my $in, '<:raw:eol(LF)', $filename);
and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.
Another option is to set $/ to undef, which will read the entire file in one slurp. Then split it on /\r\n?|\n/. But that assumes that the file is small enough to fit in memory.
If you have mixed line endings, you can normalize them by matching a generalized line ending:
use v5.10;
$entireFile =~ s/\R/\n/g;
You can also open a filehandle on a string and read lines just like you would from a file:
open my $fh, '<', \ $entireFile;
my #lines = <$fh>;
close $fh;
You can even open the string with the layers that cjm shows.
You can probably just handle the different line endings when doing the split, e.g.:
my #allLines = split(/\r\n|\r|\n/, $entireFile);
It will automatically split the input into lines if you read with <>, but you need to you need to change $/ to \r.
$/ is the "input record separator". see perldoc perlvar for details.
There is not any way to change what a regular expression considers to be the end-of-line - it's always newline.

Why does Perl just give me the last line in the file?

I have downloaded the following file: rawdata_2001.text
and I have the following perl code:
open TEXTFILE, "rawdata_2001.text";
while (<TEXTFILE>) {
print;
}
This however only prints the last line in the file. Any ideas why? Any feedback would be greatly appreciated.
The file is formatted with carriage returns only, so it's being sucked in as one line. You should be able to set $/ to "\r" to get it to read line by line. You then should strip off the carriage return with chomp, and be sure to print a new line after the string.
your file probably is using "\r" line endings, but your terminal expects "\n" or "\r\n". try running:
open my $textfile, '<', "rawdata_2001.text" or die;
while (<$textfile>) {
chomp;
print "$_\n";
}
you can also experiment with changing the input record separator before the loop with local $/ = $ending;, where $ending could be "\n", "\r\n", "\r"

How can I extract a line or row?

How can I extract the whole line in a row, for example, row 3.
These data are saved in my text editor in linux.
Here's my data:
1,julz,kath,shiela,angel
2,may,ann,janice,aika
3,christal,justine,kim
4,kris,allan,jc,mine
I want output like:
3,christal,justine,kim
The following snippet reads in the first three lines, prints only the third then exits to ensure that no unnecessary processing takes place.
Without the exit, the script would continue to process the input file despite you knowing that you have no use for it.
perl -ne 'if ($. == 3) {print;exit}' infile.txt
As perlvar points out, $. is the current line number for the last file handle accessed.
$ perl -ne'print if $. == 3' your_file.txt
Below is a script version of #ysth's answer:
$ perl -mTie::File -e'tie #lines, q(Tie::File), q(your_file.txt);
> print $lines[2]'
If it's always the third line:
perl -ne 'print if 3..3' <infile >outfile
If it's always the one that has a numeric value of "3" as the first column:
perl -F, -nae 'print if $F[0] == 3' <infile >outfile # thanks for the comment doh!
Since you didn't say how you were identifying that line, I am providing alternatives.
For a more general solution:
open my $fh, '<', 'infile.txt';
while (my $line = <$fh>) {
print $line if i_want_this_line($line);
}
where i_want_this_line implements the criteria defining which line(s) you want.
Um, the -n answers are assuming the question is "what is a script that...". In which case, perl isn't even the best answer. But I don't read that into the question.
In general, if the lines are not of fixed length, you have to read through a file line by
line until you get to the line you want. Tie::File automates this process for you (though since the code it would replace is so trivial, I rarely bother with it, myself).
use Tie::File;
use Fcntl "O_RDONLY";
tie my #line, "Tie::File", "yourfilename", mode => O_RDONLY
or die "Couldn't open file: $!";
print "The third line is ", $line[2];
You can assign the diamond operator on your filehandle to a list, each element will be a line or row.
open $fh, "myfile.txt";
my #lines = <$fh>;
EDIT: This solution grabs all the lines so that you can access any one you want, e.g. row 3 would be $lines[2] ... If you really only want one specific line, that'd be a different solution, like the other answerers'.