How to remove newline from the end of a file using Perl - perl

I have a file that reads like this:
dog cat mouse
apple orange pear
red yellow green
There is a tab \t separating the words on each row, and a newline \n separating each of the rows. Below the last line, red yellow green there is a blank line due to a newline \n after green.
I would like to use Perl to remove the newline.
I have seen a few articles like this How can I delete a newline if it is the last character in a file? that give solutions for Perl, but I would like to do this in hard code so that I can incorporate it into my Perl script.
I don't know if this might be possible using chomp, or if chomp works on each line separately (I would like to keep the newline between lines).
Also I have seen previously comments that suggest maintaining a newline at the end of a file because Unix commands work better when a file ends with a newline. However, I have created a script which relies on input files not ending with a newline, therefore I really feel removing the newlines is necessary for my work.

You can try this:
perl -pe 'chomp if eof' file.txt
Here is another simple way, if you need it in a script:
open $fh, "file.txt";
#lines=<$fh>; # read all lines and store in array
close $fh;
chomp $lines[-1]; # remove newline from last line
print #lines;
Or something like this (in script), as suggested by jnhc for the command line:
open $fh, "file.txt";
while (<$fh>) {
chomp if eof $fh;
print;
}
close $fh;

Related

Print Line By Line

I've been trying to work on a lyrical bot for my server, but before I started to work on it, I wanted to give it a test so I came up with this script using the Lyrics::Fetcher module.
use strict;
use warnings;
use Lyrics::Fetcher;
my ($artist, $song) = ('Coldplay', 'Adventures Of A Lifetime');
my $lyrics = Lyrics::Fetcher->fetch($artist, $song, [qw(LyricWiki AstraWeb)]);
my #lines = split("\n\r", $lyrics);
foreach my $line (#lines) {
sleep(10);
print $line;
}
This script works fine, it grabs the lyrics and prints it out in a whole(which is not what I'm looking for).
I was hoping to achieve a line by line print of the lyrics every 10 seconds. Help please?
Your call to split looks suspicious. In particular the regex "\n\r". Note, the first argument to split is always interpreted as a regex regardless of whether you supply a quoted string.
On Unix systems the line ending is typically "\n". On DOS/Windows it's "\r\n" (the reverse of what you have). On ancient Macs it was "\r". To match all thre you could do:
my #lines = split(/\r\n|\n|\r/, $lyrics);
You will need to enable autoflush, otherwise the lines will just be buffered and printed when the buffer is full or when the program terminates
STDOUT->autoflush;
You can use the regex generic newline pattern \R to split on any line ending, whether your data contains CR, LF, or CR LF. This feature is available only in Perl v5.10 or better
my #lines = split /\R/, $lyrics;
And you will need to print a newline after each line of lyrics, because the split will have removed them
print $line, "\n";

Removing bullet points from a txt file using perl

I am writing a perl script to process a text file. I need to remove bullet points from the text file and create a new one without bullets. When I look at the binary version of the text file, the bullet is stored as a unicode bullet (0xe280a2). How do I remove the bullet from a string.
I have tried the following code:
open($filehandle, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while ($row = <$filehandle>)
{
#txt_str = split(/\•/, $row);
$row = join(" ",#txt_str);
}
The backslash doesn't help you here, as the bullet is not a special character in regexes.
If you specify the input is UTF-8, you should search for a UTF-8 bullet. To do so, either prepend
use utf8;
and save your script as UTF-8; or, use
\N{BULLET}
In your case, splitting and joining can be replaced by simple replacement of the bullet by a space:
while (<$filehandle>) {
s/\N{BULLET}/ /g; # or s/•/ /g under utf8
print; # <-- this was missing in your code
}
why not use use a simple s/•/ /g instead of splitting/joining? and you should print the resulted variable ($row in your case) to an other file or stdout, otherwise you won't see the 'unbulleted' version
but for this task i'd use sed from the command line, i'm pretty sure it can handle unicode characters too

Perl 5.12.3 fails to loop CSV file line by line

I'm sure someone has an explanation as to what is happening with the following script:
Please note, the file I specify is available and is opening. I know this because the last line of the file is output when the program is run, but it is only the last line.
Note about the .csv file: it's generated on windows (I'm using OS X 10.7.4 with Perl 5.12.3) and uses \r line breaks. I attempted to tell perl that the line break character was \r at the top of the script but it does not work. I know they're \r as the grep search finds them in a text editor.
The script runs and only prints the last line of the file. If I plug in a regular expression it will grab the first matching field from the first line and echo it fine, but I cannot iterate over the entire file.
Any clarification is appreciated as I am new to perl.
#!/usr/bin/perl
use warnings;
print "Please enter your filename:";
my ($dataline);
open(INFO,'./expensereport.csv') || die("can't open datafile: $!");
while (my $line = <INFO>) {
chomp $line;
print $line;
}
print $!;
The carriage returns without linefeed are causing print to overwrite each line on the same line, so all you see is the last.
Run dos2unix on your input file before processing.
There are several ways to tell perl that your input file is windows-style :crlf.
perldoc -f binmode or perldoc -f open
open(INFO, '<:crlf', './expensereport.csv')
...
Ahh, that's clear! :)
Look, you have a file with \r (carriage return, literally) and \n (newline). chomp cuts off \n (new line). So you print over the same line (remember "carriage return") again and again.
Use print "$line\n"; instead

Perl: How to remove spaces and blank lines in one pass

I have got 2 perl scripts, first one removes blank lins from a file and the second one removes all spaces inside a file. I wonder, if it's possible to connect both of these regular expressions inside 1 script?
For spaces, i have used this regsub: $str =~ tr/ //d;
and for Blank lines, I have used this regexp
while (<$file>) {
if (/\S/){
print $new_file $_; }}
It should be really easy: just add tr/ //d before the if line.
Note: It will remove lines containing spaces only, too. If you want to keep them (but transliterated to empty lines), insert the transliteration before the print line.
If you wish to trim the end of the line that contains space,
you might want it to work like this:
perl -pi -e 's/\s*$/\n/' f1 f2 f3 #UNIX file format
perl -pi -e 's/\s*$/\r\n/' f1 f2 f3 #DOS file format

CR vs LF perl parsing

I have a perl script which parses a text file and breaks it up per line into an array.
It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly.
How can I modify this line to fix this
my #allLines = split(/^/, $entireFile);
edit:
My file has a mixture of lines with either
ending LF or ending CR it just collapses all lines when its ending in CR
Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:
open(my $in, '<:crlf', $filename);
will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\r" and it will read line-by-line (but it won't change the CR to a LF).
If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:
open(my $in, '<:raw:eol(LF)', $filename);
and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.
Another option is to set $/ to undef, which will read the entire file in one slurp. Then split it on /\r\n?|\n/. But that assumes that the file is small enough to fit in memory.
If you have mixed line endings, you can normalize them by matching a generalized line ending:
use v5.10;
$entireFile =~ s/\R/\n/g;
You can also open a filehandle on a string and read lines just like you would from a file:
open my $fh, '<', \ $entireFile;
my #lines = <$fh>;
close $fh;
You can even open the string with the layers that cjm shows.
You can probably just handle the different line endings when doing the split, e.g.:
my #allLines = split(/\r\n|\r|\n/, $entireFile);
It will automatically split the input into lines if you read with <>, but you need to you need to change $/ to \r.
$/ is the "input record separator". see perldoc perlvar for details.
There is not any way to change what a regular expression considers to be the end-of-line - it's always newline.