Why does Perl just give me the last line in the file? - perl

I have downloaded the following file: rawdata_2001.text
and I have the following perl code:
open TEXTFILE, "rawdata_2001.text";
while (<TEXTFILE>) {
print;
}
This however only prints the last line in the file. Any ideas why? Any feedback would be greatly appreciated.

The file is formatted with carriage returns only, so it's being sucked in as one line. You should be able to set $/ to "\r" to get it to read line by line. You then should strip off the carriage return with chomp, and be sure to print a new line after the string.

your file probably is using "\r" line endings, but your terminal expects "\n" or "\r\n". try running:
open my $textfile, '<', "rawdata_2001.text" or die;
while (<$textfile>) {
chomp;
print "$_\n";
}
you can also experiment with changing the input record separator before the loop with local $/ = $ending;, where $ending could be "\n", "\r\n", "\r"

Related

How to remove newline from the end of a file using Perl

I have a file that reads like this:
dog cat mouse
apple orange pear
red yellow green
There is a tab \t separating the words on each row, and a newline \n separating each of the rows. Below the last line, red yellow green there is a blank line due to a newline \n after green.
I would like to use Perl to remove the newline.
I have seen a few articles like this How can I delete a newline if it is the last character in a file? that give solutions for Perl, but I would like to do this in hard code so that I can incorporate it into my Perl script.
I don't know if this might be possible using chomp, or if chomp works on each line separately (I would like to keep the newline between lines).
Also I have seen previously comments that suggest maintaining a newline at the end of a file because Unix commands work better when a file ends with a newline. However, I have created a script which relies on input files not ending with a newline, therefore I really feel removing the newlines is necessary for my work.
You can try this:
perl -pe 'chomp if eof' file.txt
Here is another simple way, if you need it in a script:
open $fh, "file.txt";
#lines=<$fh>; # read all lines and store in array
close $fh;
chomp $lines[-1]; # remove newline from last line
print #lines;
Or something like this (in script), as suggested by jnhc for the command line:
open $fh, "file.txt";
while (<$fh>) {
chomp if eof $fh;
print;
}
close $fh;

Remove CRLF end of csv file using Perl

I have a csv file ending with CRLF in each row record. Using Perl, how to remove the CRLF only in the final record of the file so that there is no empty record row at the end of file? Thank you.
If I follow the question correctly, there is a line feed trailing the last record, which is creating an empty row at the end of the file.
You can read the file into a scalar and remove the trailing, blank row with a substitution. \R will work on new Perl versions (5.10, I think) and will match any system's line break, otherwise you'll need to use \n or \n\r
open $fh, '<', 'test.csv';
while (<$fh>) {
$str .= $_;
}
$str =~ s/\R+(.*)\R+/$1/s;

Perl 5.12.3 fails to loop CSV file line by line

I'm sure someone has an explanation as to what is happening with the following script:
Please note, the file I specify is available and is opening. I know this because the last line of the file is output when the program is run, but it is only the last line.
Note about the .csv file: it's generated on windows (I'm using OS X 10.7.4 with Perl 5.12.3) and uses \r line breaks. I attempted to tell perl that the line break character was \r at the top of the script but it does not work. I know they're \r as the grep search finds them in a text editor.
The script runs and only prints the last line of the file. If I plug in a regular expression it will grab the first matching field from the first line and echo it fine, but I cannot iterate over the entire file.
Any clarification is appreciated as I am new to perl.
#!/usr/bin/perl
use warnings;
print "Please enter your filename:";
my ($dataline);
open(INFO,'./expensereport.csv') || die("can't open datafile: $!");
while (my $line = <INFO>) {
chomp $line;
print $line;
}
print $!;
The carriage returns without linefeed are causing print to overwrite each line on the same line, so all you see is the last.
Run dos2unix on your input file before processing.
There are several ways to tell perl that your input file is windows-style :crlf.
perldoc -f binmode or perldoc -f open
open(INFO, '<:crlf', './expensereport.csv')
...
Ahh, that's clear! :)
Look, you have a file with \r (carriage return, literally) and \n (newline). chomp cuts off \n (new line). So you print over the same line (remember "carriage return") again and again.
Use print "$line\n"; instead

Why doesn't chomp() work in this case?

I'm trying to use chomp() to remove all the newline character from a file. Here's the code:
use strict;
use warnings;
open (INPUT, 'input.txt') or die "Couldn't open file, $!";
my #emails = <INPUT>;
close INPUT;
chomp(#emails);
my $test;
foreach(#emails)
{
$test = $test.$_;
}
print $test;
and the test conent for the input.txt file is simple:
hello.com
hello2.com
hello3.com
hello4.com
my expected output is something like this: hello.comhello2.comhello3.comhello4.com
however, I'm still getting the same content as the input file, any help please?
Thank you
If the input file was generated on a different platform (one that uses a different EOL sequence), chomp might not strip off all the newline characters. For example, if you created the text file in Windows (which uses \r\n) and ran the script on Mac or Linux, only the \n would get chomp()ed and the output would still "look" like it had newlines.
If you know what the EOL sequence of the input is, you can set $/ before chomp(). Otherwise, you may need to do something like
my #emails = map { s/[\n\r]+$//g; $_ } <INPUT>;

Counting records separated by CR/LF (carriage return and newline) in Perl

I'm trying to create a simple script to read a text file that contains records of book titles. Each record is separated with a plain old double space (\r\n\r\n). I need to count how many records are in the file.
For example here is the input file:
record 1
some text
record 2
some text
...
I'm using a regex to check for carriage return and newline, but it fails to match. What am I doing wrong? I'm at my wits' end.
sub readInputFile {
my $inputFile = $_[0]; #read first argument from the commandline as fileName
open INPUTFILE, "+<", $inputFile or die $!; #Open File
my $singleLine;
my #singleRecord;
my $recordCounter = 0;
while (<INPUTFILE>) { # loop through the input file line-by-line
$singleLine = $_;
push(#singleRecord, $singleLine); # start adding each line to a record array
if ($singleLine =~ m/\r\n/) { # check for carriage return and new line
$recordCounter += 1;
createHashTable(#singleRecord); # send record make a hash table
#singleRecord = (); # empty the current record to start a new record
}
}
print "total records : $recordCounter \n";
close(INPUTFILE);
}
It sounds like you are processing a Windows text file on Linux, in which case you want to open the file with the :crlf layer, which will convert all CRLF line-endings to the standard Perl \n ending.
If you are reading Windows files on a Windows platform then the conversion is already done for you, and you won't find CRLF sequences in the data you have read. If you are reading a Linux file then there are no CR characters in there anyway.
It also sounds like your records are separated by a blank line. Setting the built-in input record separator variable $/ to a null string will cause Perl to read a whole record at a time.
I believe this version of your subroutine is what you need. Note that people familiar with Perl will thank you for using lower-case letters and underscore for variables and subroutine names. Mixed case is conventionally reserved for package names.
You don't show create_hash_table so I can't tell what data it needs. I have chomped and split the record into lines, and passed a list of the lines in the record with the newlines removed. It would probably be better to pass the entire record as a single string and leave create_hash_table to process it as required.
sub read_input_file {
my ($input_file) = #_;
open my $fh, '<:crlf', $input_file or die $!;
local $/ = '';
my $record_counter = 0;
while (my $record = <$fh>) {
chomp;
++$record_counter;
create_hash_table(split /\n/, $record);
}
close $fh;
print "Total records : $record_counter\n";
}
You can do this more succinctly by changing Perl's record-separator, which will make the loop return a record at a time instead of a line at a time.
E.g. after opening your file:
local $/ = "\r\n\r\n";
my $recordCounter = 0;
$recordCounter++ while(<INPUTFILE>);
$/ holds Perl's global record-separator, and scoping it with local allows you to override its value temporarily until the end of the enclosing block, when it will automatically revert back to its previous value.
But it sounds like the file you're processing may actually have "\n\n" record-separators, or even "\r\r". You'd need to set the record-separator correctly for whatever file you're processing.
If your files are not huge multi-gigabytes files, the easiest and safest way is to read the whole file, and use the generic newline metacharacter \R.
This way, it also works if some file actually uses LF instead of CRLF (or even the old Mac standard CR).
Use it with split if you also need the actual records:
perl -ln -0777 -e 'my #records = split /\R\R/; print scalar(#records)' $Your_File
Or if you only want to count the records:
perl -ln -0777 -e 'my $count=()=/\R\R/g; print $count' $Your_File
For more details, see also my other answer here to a similar question.