Remove mysterious line breaks in CSV file using Perl - perl

I have a CSV file that I'm parsing using Perl. The file is a BOM produced by Solidworks 2015 that was saved as an XLS file, then opened in Excel and saved as a CSV file.
There are cells that have line breaks. When I read a line with such a cell from the file, the line comes in with the line breaks. For example, here is one of the lines read looks like this:
74,,74,1,1,"SJ-TL303202-DET-074-
001",PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
It reads in as a single line in Perl.
When I turn the Show All Characters in Notepad++, I can see the line breaks are cause by [CR][LF].
So I thought this would work to remove the line feeds:
$line =~ s/[\r\n]+//g;
but it does not.

You don't give much of a sample of your CSV data, but what you show is perfectly valid. A text field may contain newlines if you wish, as long as it is enclosed in double-quotes
The Text::CSV module will process it quite happily as long as you enable the binary option in the constructor call, and you may reformat the data as you wish before you write it back out again
This program expects the path to the input file as a parameter on the command line, and it will write the modified data to STDOUT, which you can redirect on the command line, like this
$ perl fix_csv.pl input.csv > output.csv
I've assumed that your data contains only 7-bit ASCII data, and it should work whether you're running it on a Windows system or on Linux
use strict;
use warnings 'all';
my ($csv_file) = #ARGV;
use Text::CSV;
open my $fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new( { binary => 1 } );
while ( my $row = $csv->getline( $fh ) ) {
tr/\r\n//d for #$row;
$csv->combine(#$row);
print $csv->string, "\n";
}
output
74,,74,1,1,SJ-TL303202-DET-074-001,PDSI,"2.25"" DIA. X 8.00""",A2,513,1,

Related

How to read the contents of a file

I am confused on how to read the contents of a text file. I'm able to read the files name but can't figure out how to get the contents. By the way the file is encrypted that's why I'm trying to decrypt.
#!/Strawberry/perl/bin/perl
use v5.14;
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
my $enc = encode_decode($file1);
print my $dec = encode_decode($enc);
# ... do something with $_ from $file1 ...
}
close $fh1;
This line
my $enc = encode_decode($file1)
passes the name of the file to encode_decode
A loop like while ( <$fh1> ) { ... } puts each line from the file into the default variable $_. You've written so yourself in your comment “do something with $_ from $file1 ...”. You probably want
my $enc = encode_decode($_)
And, by the way, your encode_decode subroutine won't reverse its own encoding. You've written what is effectively a ROT25 encoding, so you would have to apply encode_decode 26 times to get back to the original string
It's also worth noting that your shebang line
#!/Strawberry/perl/bin/perl
is pointless on Windows because the command shell doesn't process shebang lines. Perl itself will check the line for options like -w or -i, but you shouldn't be using those anyway. Just omit the line, or if you want to be able to run your program on Linux as well as Windows then use
#!/bin/env perl
which will cause a Linux shell to search the PATH variable for the first perl executable

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Why doesn't chomp() work in this case?

I'm trying to use chomp() to remove all the newline character from a file. Here's the code:
use strict;
use warnings;
open (INPUT, 'input.txt') or die "Couldn't open file, $!";
my #emails = <INPUT>;
close INPUT;
chomp(#emails);
my $test;
foreach(#emails)
{
$test = $test.$_;
}
print $test;
and the test conent for the input.txt file is simple:
hello.com
hello2.com
hello3.com
hello4.com
my expected output is something like this: hello.comhello2.comhello3.comhello4.com
however, I'm still getting the same content as the input file, any help please?
Thank you
If the input file was generated on a different platform (one that uses a different EOL sequence), chomp might not strip off all the newline characters. For example, if you created the text file in Windows (which uses \r\n) and ran the script on Mac or Linux, only the \n would get chomp()ed and the output would still "look" like it had newlines.
If you know what the EOL sequence of the input is, you can set $/ before chomp(). Otherwise, you may need to do something like
my #emails = map { s/[\n\r]+$//g; $_ } <INPUT>;

Saving Data that's Been Run Through ActivePerl

This must be a basic question, but I can't find a satisfactory answer to it. I have a script here that is meant to convert CSV formatted data to TSV. I've never used Perl before now and I need to know how to save the data that is printed after the Perl script runs it though.
Script below:
#!/usr/bin/perl
use warnings;
use strict;
my $filename = data.csv;
open FILE, $filename or die "can't open $filename: $!";
while (<FILE>) {
s/"//g;
s/,/\t/g;
s/Begin\.Time\.\.s\./Begin Time (s)/;
s/End\.Time\.\.s\./End Time (s)/;
s/Low\.Freq\.\.Hz\./Low Freq (Hz)/;
s/High\.Freq\.\.Hz\./High Freq (Hz)/;
s/Begin\.File/Begin File/;
s/File\.Offset\.\.s\./File Offset (s)/;
s/Random.Number/Random Number/;
s/Random.Percent/Random Percent/;
print;
}
All the data that's been analyzed is in the cmd prompt. How do I save this data?
edit:
thank you everyone! It worked perfectly!
From your cmd prompt:
perl yourscript.pl > C:\result.txt
Here you run the perl script and redirect the output to a file called result.txt
It's always potentially dangerous to treat all commas in a CSV file as field separators. CSV files can also include commas embedded within the data. Here's an example.
1,"Some data","Some more data"
2,"Another record","A field with, an embedded comma"
In your code, the line s/,/\t/g treats all tabs the same and the embedded comma in the final field will also be expanded to a tab. That's probably not what you want.
Here's some code that uses Text::ParseWords to do this correctly.
#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;
while (<>) {
my #line = parse_line(',', 0, $_);
$_ = join "\t", #line;
# All your s/.../.../ lines here
print;
}
If you run this, you'll see that the comma in the final field doesn't get updated.

Line breaks don't exist on input from FTP file (Perl)

I downloaded a csv file using Net::FTP. When I look at this file in text editor or excel or even when I cut/paste it has line breaks and looks like this:
000000000G911|06
0000000000CDR|25|123
0000000000EGP|19
When I read the file in Perl it sees the entire text as one line like this:
000000000G911|060000000000CDR|25|1230000000000EGP|19
I have tried reading it using
tie #lines, 'Tie::File', "C:/Programs/myfile.csv", autochomp=>0 or die "Can't read file: $!\n";
foreach $l (#lines1)
{print "$l\n";
}
and
open FILE, "`<`$filename" or die $!;
my #lines=`<`FILE>;
foreach $l (#lines)
{print "$l\n";
}
close FILE;
The file has line breaks in a format that Perl is not recognizing because it is coming from a different operating system. The other programs are automatically detecting the different line break format, but Perl doesn't do that.
If you have Net::FTP perform the transfer in ASCII mode (e.g. $ftp->ascii to enable this mode), this should be taken care of and corrected for you.
Alternatively, you can figure out what is being used for line breaks and then set the special $/ variable to that value.