Why is Perl's chomp affecting the output of my print? - perl

It's been a couple months since I've been Perling, but I'm totally stuck on why this is happening...
I'm on OSX, if it matters.
I'm trying to transform lines in a file like
08/03/2011 01:00 PDT,1.11
into stdout lines like
XXX, 20120803, 0100, KWH, 0.2809, A, YYY
Since I'm reading a file, I want to chomp after each line is read in. However, when I chomp, I find my printing gets all messed up. When I don't chomp the printing is fine (except for the extra newline...). What's going on here?
while(<SOURCE>) {
chomp;
my #tokens = split(' |,'); # #tokens now [08/03/2011, 01:00, PDT, 1.11]
my $converted_date = convertDate($tokens[0]);
my $converted_time = convertTime($tokens[1]);
print<<EOF;
$XXX, $converted_date, $converted_time, KWH, $tokens[3], A, YYY
EOF
}
With the chomp call in there, the output is all mixed up:
, A, YYY10803, 0100, KWH, 1.11
Without the chomp call in there, it's at least printing in the right order, but with the extra new line:
XXX, 20110803, 0100, KWH, 1.11
, A, YYY
Notice that with the chomp in there, it's like it overwrites the newline "on top of" the first line. I've added the $|=1; autoflush, but don't know what else to do here.
Thoughts? And thanks in advance....

The lines of your input ends with CR LF. You're removing the LF only. A simple solution is to use the following instead of chomp:
s/\s+\z//;
You could also use the dos2unix command line tool to convert the files before passing them to Perl.

The problem is that you have DOS line-endings and are running on a Unix build of Perl.
One solution to this is to use PerlIO::eol. You may have to install it but there is no need for a use line in the program.
Then you can write
binmode ':raw:eol(LF)', $filehandle;
after which, regardless of the format or source of the file, the lines read will be terminated with the standard "\n".

Related

How to remove newline from the end of a file using Perl

I have a file that reads like this:
dog cat mouse
apple orange pear
red yellow green
There is a tab \t separating the words on each row, and a newline \n separating each of the rows. Below the last line, red yellow green there is a blank line due to a newline \n after green.
I would like to use Perl to remove the newline.
I have seen a few articles like this How can I delete a newline if it is the last character in a file? that give solutions for Perl, but I would like to do this in hard code so that I can incorporate it into my Perl script.
I don't know if this might be possible using chomp, or if chomp works on each line separately (I would like to keep the newline between lines).
Also I have seen previously comments that suggest maintaining a newline at the end of a file because Unix commands work better when a file ends with a newline. However, I have created a script which relies on input files not ending with a newline, therefore I really feel removing the newlines is necessary for my work.
You can try this:
perl -pe 'chomp if eof' file.txt
Here is another simple way, if you need it in a script:
open $fh, "file.txt";
#lines=<$fh>; # read all lines and store in array
close $fh;
chomp $lines[-1]; # remove newline from last line
print #lines;
Or something like this (in script), as suggested by jnhc for the command line:
open $fh, "file.txt";
while (<$fh>) {
chomp if eof $fh;
print;
}
close $fh;

Print Line By Line

I've been trying to work on a lyrical bot for my server, but before I started to work on it, I wanted to give it a test so I came up with this script using the Lyrics::Fetcher module.
use strict;
use warnings;
use Lyrics::Fetcher;
my ($artist, $song) = ('Coldplay', 'Adventures Of A Lifetime');
my $lyrics = Lyrics::Fetcher->fetch($artist, $song, [qw(LyricWiki AstraWeb)]);
my #lines = split("\n\r", $lyrics);
foreach my $line (#lines) {
sleep(10);
print $line;
}
This script works fine, it grabs the lyrics and prints it out in a whole(which is not what I'm looking for).
I was hoping to achieve a line by line print of the lyrics every 10 seconds. Help please?
Your call to split looks suspicious. In particular the regex "\n\r". Note, the first argument to split is always interpreted as a regex regardless of whether you supply a quoted string.
On Unix systems the line ending is typically "\n". On DOS/Windows it's "\r\n" (the reverse of what you have). On ancient Macs it was "\r". To match all thre you could do:
my #lines = split(/\r\n|\n|\r/, $lyrics);
You will need to enable autoflush, otherwise the lines will just be buffered and printed when the buffer is full or when the program terminates
STDOUT->autoflush;
You can use the regex generic newline pattern \R to split on any line ending, whether your data contains CR, LF, or CR LF. This feature is available only in Perl v5.10 or better
my #lines = split /\R/, $lyrics;
And you will need to print a newline after each line of lyrics, because the split will have removed them
print $line, "\n";

Perl 5.12.3 fails to loop CSV file line by line

I'm sure someone has an explanation as to what is happening with the following script:
Please note, the file I specify is available and is opening. I know this because the last line of the file is output when the program is run, but it is only the last line.
Note about the .csv file: it's generated on windows (I'm using OS X 10.7.4 with Perl 5.12.3) and uses \r line breaks. I attempted to tell perl that the line break character was \r at the top of the script but it does not work. I know they're \r as the grep search finds them in a text editor.
The script runs and only prints the last line of the file. If I plug in a regular expression it will grab the first matching field from the first line and echo it fine, but I cannot iterate over the entire file.
Any clarification is appreciated as I am new to perl.
#!/usr/bin/perl
use warnings;
print "Please enter your filename:";
my ($dataline);
open(INFO,'./expensereport.csv') || die("can't open datafile: $!");
while (my $line = <INFO>) {
chomp $line;
print $line;
}
print $!;
The carriage returns without linefeed are causing print to overwrite each line on the same line, so all you see is the last.
Run dos2unix on your input file before processing.
There are several ways to tell perl that your input file is windows-style :crlf.
perldoc -f binmode or perldoc -f open
open(INFO, '<:crlf', './expensereport.csv')
...
Ahh, that's clear! :)
Look, you have a file with \r (carriage return, literally) and \n (newline). chomp cuts off \n (new line). So you print over the same line (remember "carriage return") again and again.
Use print "$line\n"; instead

Need help with my first perl program

I am only a few days in and have only made a couple things from the book I have been going through, so go easy :P. I have tried searching online and have tried different ways but I can not seem to pass the MAC properly to system().
What I am trying to achieve is have perl open a .txt file of MAC's each MAC is on its own separate line. Then with it reading each line taking one MAC at a time and passing it system() as an arg so aircrack can be passed the MAC arg. I have it showing the MAC being read each line properly but I can not figure out why aircrack complains the MAC its being given is not a valid MAC. Is this due to me not chomping the line read?
What I have not tried as of yet due to this complication is I eventually want it to print a found key to a file if aircrack says it has found one, or if it does not find one moves on to the next BSSID, continuing until there are no more MACs in the file to try.
the MACs are in a txt file as so
00:00:00:00:00:00
00:00:00:00:00:00
00:00:00:00:00:00
and so on
#!/usr/bin/perl
use strict;
use warnings;
my $file = '/kismetLOGS/PcapDumpKismet/WEPMACS.txt';
open my $info, $file or die "Could not open $file: $!";
while( my $line = <$info>)
{
print $line;
system("/usr/local/bin/aircrack-ng", "-b $line", "*.pcap *.pcapdump");
last if $. == 0;
}
close $info;
exit;
Thanks for any help, tips and pointers. Not looking for a spoon feed :) And hopefully I posted properly for everyone and if I am way off in how I am trying this for my end goal please feel free to say and any tips about the correct route to try would be appreciated
You can either combine all your arguments together, like
system("/usr/local/bin/aircrack-ng -b $line *.pcap *.pcapdump");
or separate them all, like
system("/usr/local/bin/aircrack-ng", "-b","$line", "*.pcap","*.pcapdump");
The latter is usually safer, for spaces in the items not to need be escaped. But then globbing doesnt work, as the arguments are passed directly to the system for execution.
If you want *.pcap to work, you'll need to go with the first version.
$line ends with a newline character. You should remove the newline character.
chomp $line;
about last if $. == 0;,change it to last if $. ~~ 0 which infers the type of the variables when doing the comparison. Remove it if you want to iterate over all of the MAC addresses, as is it will only run on the first ( 0th ) line.

Opening a CSV file created in Mac Excel with Perl

I'm having a bit of trouble with the Perl code below. I can open and read in a CSV file that I've made manually, but if I try to open any Mac Excel spreadsheet that I save as a CSV file, the code below reads it all as a single line.
#!/usr/bin/perl
use strict;
use warnings;
open F, "file.csv";
foreach (<F>)
{
($first, $second, undef, undef) = split (',', $_);
}
print "$first : $second\n";
close(F);
Always use a specialised module (such as Text::CSV or Text::CSV_XS) for this purpose as there are lots of cases where split-ing will not help (for example when the fields contain a comma which is not a field separator but is within quotes).
Traditional Macintosh (System 9 and previous) uses CR (0x0D, \r) as the line separator. Mac OS X (Unix based) uses LF(0x0A, \n) as the default line separator, so the perl script, being a Unix tool, is probably expecting LF but is getting CR. Since there are no line separators in the file perl thinks there is only one line. If it had Windows line endings (CR,LF) you'd probably be getting an invisible CR at the end of each line.
A quick loop over the input replacing 0x0D with 0x0A should fix your problem.
I've directly experienced this problem with Excel 2004 for Mac. The line endings are indeed \r, and IIRC, the text uses the MacRoman character set, rather than Latin-1 or UTF-8 as you might expect.
So as well as the good advice to use Text::CSV / Text::CSV_XS and splitting on \r, you will want to open the file using the MacRoman encoding like so:
open my $fh, "<:encoding(MacRoman)", $filename
or die "Can't read $filename: $!";
Likewise, when reading a file exported with Excel on Windows, you may wish to use :encoding(cp1252) instead of :encoding(MacRoman) in that code.
Not sure about Mac excel, but certainly the windows version tends to enclose all values in quotes: "like","this". Also, you need to take into account the possibility of there being a quote in the value, which would show up "like""this" (there's only a single " in that value).
To actually answer your question however, it's likely that it's using a different newline character from what you'd expect. It's probably saving as \r\n instead of \n, or vice versa.
As others have suspected, your line endings are probably to blame. On my Linux-based system there are builtin utilities to change these line endings. mac2unix (which I think is just a wrapper around dos2unix will read your file and change the line endings for you. You should have something similar both on Linux and Mac (Microsoft may not care about you).
If you want to handle this in Perl, look into setting the $/ variable to set the "input record separator" from "\n" to "\r" (if thats the right ending). Try local $/ = "\r" before you read the file. Read more about it in perldoc perlvar (near $/) or in perldoc perlport (devoted to writing portable Perl code.
P.S. if I have some part of this incorrect let me know, I don't use Mac, I just think I know the theory
if you set the "special variable" that handles what it considers a newline to \r you'll be able to read one line at a time: $/="\r"; in this particular case the mac new line for perl is default \n but the file is probably using \r. This builds off what Flynn1179 & Mark Thalman said but shows you what to do to use the while () style reading.