Line breaks don't exist on input from FTP file (Perl) - perl

I downloaded a csv file using Net::FTP. When I look at this file in text editor or excel or even when I cut/paste it has line breaks and looks like this:
000000000G911|06
0000000000CDR|25|123
0000000000EGP|19
When I read the file in Perl it sees the entire text as one line like this:
000000000G911|060000000000CDR|25|1230000000000EGP|19
I have tried reading it using
tie #lines, 'Tie::File', "C:/Programs/myfile.csv", autochomp=>0 or die "Can't read file: $!\n";
foreach $l (#lines1)
{print "$l\n";
}
and
open FILE, "`<`$filename" or die $!;
my #lines=`<`FILE>;
foreach $l (#lines)
{print "$l\n";
}
close FILE;

The file has line breaks in a format that Perl is not recognizing because it is coming from a different operating system. The other programs are automatically detecting the different line break format, but Perl doesn't do that.
If you have Net::FTP perform the transfer in ASCII mode (e.g. $ftp->ascii to enable this mode), this should be taken care of and corrected for you.
Alternatively, you can figure out what is being used for line breaks and then set the special $/ variable to that value.

Related

Remove mysterious line breaks in CSV file using Perl

I have a CSV file that I'm parsing using Perl. The file is a BOM produced by Solidworks 2015 that was saved as an XLS file, then opened in Excel and saved as a CSV file.
There are cells that have line breaks. When I read a line with such a cell from the file, the line comes in with the line breaks. For example, here is one of the lines read looks like this:
74,,74,1,1,"SJ-TL303202-DET-074-
001",PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
It reads in as a single line in Perl.
When I turn the Show All Characters in Notepad++, I can see the line breaks are cause by [CR][LF].
So I thought this would work to remove the line feeds:
$line =~ s/[\r\n]+//g;
but it does not.
You don't give much of a sample of your CSV data, but what you show is perfectly valid. A text field may contain newlines if you wish, as long as it is enclosed in double-quotes
The Text::CSV module will process it quite happily as long as you enable the binary option in the constructor call, and you may reformat the data as you wish before you write it back out again
This program expects the path to the input file as a parameter on the command line, and it will write the modified data to STDOUT, which you can redirect on the command line, like this
$ perl fix_csv.pl input.csv > output.csv
I've assumed that your data contains only 7-bit ASCII data, and it should work whether you're running it on a Windows system or on Linux
use strict;
use warnings 'all';
my ($csv_file) = #ARGV;
use Text::CSV;
open my $fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new( { binary => 1 } );
while ( my $row = $csv->getline( $fh ) ) {
tr/\r\n//d for #$row;
$csv->combine(#$row);
print $csv->string, "\n";
}
output
74,,74,1,1,SJ-TL303202-DET-074-001,PDSI,"2.25"" DIA. X 8.00""",A2,513,1,

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

perl parsing inserting new line and ^M

i am trying to modify a few strings in a file using perl by using the below logic..
open FILE1, "< /tmp/sam.dsl" //In read mode
open FILE2, "> /tmp/sam2.dsl" // Open in write mode
while(<FILE1>)
if($_=s/string/found/g)
push FILE2, $_...
I am able to change the contents however the when i read the file it has ^M in it..
my datafile is of the below format
name 'SAMPLE'
i would like to change this to
name 'SAMPLE2'
currently with my code it changes to
name 'SAMPLE2
'
which creates a new line and then does the replacement.
Do i need to use anyother mode to open the file to write..?
My guess is, that you are working with a linux file on some windows. Perl automatically converts \n into \r\n on dos-compatible machines after reading and before writing. To get rid of this behaviour, you can use binmode <FILE> on your filehandles, but it sets your filehandle into "raw binary mode". If you want to use some other layers (like :utf8 or :encoding(utf-8)) are not enabled, and you might want to set them yourself, if you are handling character data. You also could use the PerlIO::eol module from CPAN.
Consider looking at these documentation pages:
PerlIO for a general understanding how the Perl-IO works.
open the pragma (not the function) to set layers for one program.
binmode the function you might want to consider.
My suggestion, but I can't test it (no Windows around), would be to use the following:
open my $outfile, '<:encoding(utf-8)', "filename" or die "error opening: $!";
binmode $outfile, join '', grep {$_ ne ':crlf'} PerlIO::get_layers($outfile)
or die "error setting output:\n $!"
while(<$infile>){
s/match/replacement/g;
print $outfile $_;
}

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js