Why doesn't chomp() work in this case? - perl

I'm trying to use chomp() to remove all the newline character from a file. Here's the code:
use strict;
use warnings;
open (INPUT, 'input.txt') or die "Couldn't open file, $!";
my #emails = <INPUT>;
close INPUT;
chomp(#emails);
my $test;
foreach(#emails)
{
$test = $test.$_;
}
print $test;
and the test conent for the input.txt file is simple:
hello.com
hello2.com
hello3.com
hello4.com
my expected output is something like this: hello.comhello2.comhello3.comhello4.com
however, I'm still getting the same content as the input file, any help please?
Thank you

If the input file was generated on a different platform (one that uses a different EOL sequence), chomp might not strip off all the newline characters. For example, if you created the text file in Windows (which uses \r\n) and ran the script on Mac or Linux, only the \n would get chomp()ed and the output would still "look" like it had newlines.
If you know what the EOL sequence of the input is, you can set $/ before chomp(). Otherwise, you may need to do something like
my #emails = map { s/[\n\r]+$//g; $_ } <INPUT>;

Related

Remove mysterious line breaks in CSV file using Perl

I have a CSV file that I'm parsing using Perl. The file is a BOM produced by Solidworks 2015 that was saved as an XLS file, then opened in Excel and saved as a CSV file.
There are cells that have line breaks. When I read a line with such a cell from the file, the line comes in with the line breaks. For example, here is one of the lines read looks like this:
74,,74,1,1,"SJ-TL303202-DET-074-
001",PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
It reads in as a single line in Perl.
When I turn the Show All Characters in Notepad++, I can see the line breaks are cause by [CR][LF].
So I thought this would work to remove the line feeds:
$line =~ s/[\r\n]+//g;
but it does not.
You don't give much of a sample of your CSV data, but what you show is perfectly valid. A text field may contain newlines if you wish, as long as it is enclosed in double-quotes
The Text::CSV module will process it quite happily as long as you enable the binary option in the constructor call, and you may reformat the data as you wish before you write it back out again
This program expects the path to the input file as a parameter on the command line, and it will write the modified data to STDOUT, which you can redirect on the command line, like this
$ perl fix_csv.pl input.csv > output.csv
I've assumed that your data contains only 7-bit ASCII data, and it should work whether you're running it on a Windows system or on Linux
use strict;
use warnings 'all';
my ($csv_file) = #ARGV;
use Text::CSV;
open my $fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new( { binary => 1 } );
while ( my $row = $csv->getline( $fh ) ) {
tr/\r\n//d for #$row;
$csv->combine(#$row);
print $csv->string, "\n";
}
output
74,,74,1,1,SJ-TL303202-DET-074-001,PDSI,"2.25"" DIA. X 8.00""",A2,513,1,

How to read the contents of a file

I am confused on how to read the contents of a text file. I'm able to read the files name but can't figure out how to get the contents. By the way the file is encrypted that's why I'm trying to decrypt.
#!/Strawberry/perl/bin/perl
use v5.14;
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
my $enc = encode_decode($file1);
print my $dec = encode_decode($enc);
# ... do something with $_ from $file1 ...
}
close $fh1;
This line
my $enc = encode_decode($file1)
passes the name of the file to encode_decode
A loop like while ( <$fh1> ) { ... } puts each line from the file into the default variable $_. You've written so yourself in your comment “do something with $_ from $file1 ...”. You probably want
my $enc = encode_decode($_)
And, by the way, your encode_decode subroutine won't reverse its own encoding. You've written what is effectively a ROT25 encoding, so you would have to apply encode_decode 26 times to get back to the original string
It's also worth noting that your shebang line
#!/Strawberry/perl/bin/perl
is pointless on Windows because the command shell doesn't process shebang lines. Perl itself will check the line for options like -w or -i, but you shouldn't be using those anyway. Just omit the line, or if you want to be able to run your program on Linux as well as Windows then use
#!/bin/env perl
which will cause a Linux shell to search the PATH variable for the first perl executable

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Removing bullet points from a txt file using perl

I am writing a perl script to process a text file. I need to remove bullet points from the text file and create a new one without bullets. When I look at the binary version of the text file, the bullet is stored as a unicode bullet (0xe280a2). How do I remove the bullet from a string.
I have tried the following code:
open($filehandle, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while ($row = <$filehandle>)
{
#txt_str = split(/\•/, $row);
$row = join(" ",#txt_str);
}
The backslash doesn't help you here, as the bullet is not a special character in regexes.
If you specify the input is UTF-8, you should search for a UTF-8 bullet. To do so, either prepend
use utf8;
and save your script as UTF-8; or, use
\N{BULLET}
In your case, splitting and joining can be replaced by simple replacement of the bullet by a space:
while (<$filehandle>) {
s/\N{BULLET}/ /g; # or s/•/ /g under utf8
print; # <-- this was missing in your code
}
why not use use a simple s/•/ /g instead of splitting/joining? and you should print the resulted variable ($row in your case) to an other file or stdout, otherwise you won't see the 'unbulleted' version
but for this task i'd use sed from the command line, i'm pretty sure it can handle unicode characters too

Unicode in Perl not working

I have some text files which I am trying to transform with a Perl script on Windows. The text files look normal in Notepad+, but all the regexes in my script were failing to match. Then I noticed that when I open the text files in NotePad+, the status bar says "UCS-2 Little Endia" (sic). I am assuming this corresponds to the encoding UCS-2LE. So I created "readFile" and "writeFile" subs in Perl, like so:
use PerlIO::encoding;
my $enc = ':encoding(UCS-2LE)';
sub readFile {
my ($fName) = #_;
open my $f, "<$enc", $fName or die "can't read $fName\n";
local $/;
my $txt = <$f>;
close $f;
return $txt;
}
sub writeFile {
my ($fName, $txt) = #_;
open my $f, ">$enc", $fName or die "can't write $fName\n";
print $f $txt;
close $f;
}
my $fName = 'someFile.txt';
my $txt = readFile $fName;
# ... transform $txt using s/// ...
writeFile $fName, $txt;
Now the regexes match (although less often than I expect), but the output contains long strings of Asian-looking characters interspersed with longs strings of the correct text. Is my code wrong? Or perhaps Notepad+ is wrong about the encoding? How should I proceed?
OK, I figured it out. The problem was being caused by a disconnect between the encoding translation done by the "encoding..." parameter of the "open" call and the default CRLF translation done by Perl on Windows. What appeared to be happening was that LF was being translated to CRLF on output after the encoding had already been done, which threw off the "parity" of the 16-bit encoding for the following line. Once the next line was reached, the "parity" got put back. That would explain the "long strings of Asian-looking characters interspersed with longs strings of the correct text"... every other line was being messed up.
To correct it, I took out the encoding parameter in my "open" call and added a "binmode" call, as follows:
open my $f, $fName or die "can't read $fName\n";
binmode $f, ':raw:encoding(UCS-2LE)';
binmode apparently has a concept of "layered" I/O handling that is somewhat complicated.
One thing I can't figure out is how to get my CRLF translation back. If I leave out :raw or add :crlf, the "parity" problem returns. I've tried re-ordering as well and can't get it to work.
(I added this as a separate question: CRLF translation with Unicode in Perl)
I don't have the Notepad+ editor to check but it may be a BOM problem with your output encoding not containing a BOM.
http://perldoc.perl.org/Encode/Unicode.html#Size%2c-Endianness%2c-and-BOM
Maybe you need to encode $txt using a byte order mark as described above.