perl parsing inserting new line and ^M - perl

i am trying to modify a few strings in a file using perl by using the below logic..
open FILE1, "< /tmp/sam.dsl" //In read mode
open FILE2, "> /tmp/sam2.dsl" // Open in write mode
while(<FILE1>)
if($_=s/string/found/g)
push FILE2, $_...
I am able to change the contents however the when i read the file it has ^M in it..
my datafile is of the below format
name 'SAMPLE'
i would like to change this to
name 'SAMPLE2'
currently with my code it changes to
name 'SAMPLE2
'
which creates a new line and then does the replacement.
Do i need to use anyother mode to open the file to write..?

My guess is, that you are working with a linux file on some windows. Perl automatically converts \n into \r\n on dos-compatible machines after reading and before writing. To get rid of this behaviour, you can use binmode <FILE> on your filehandles, but it sets your filehandle into "raw binary mode". If you want to use some other layers (like :utf8 or :encoding(utf-8)) are not enabled, and you might want to set them yourself, if you are handling character data. You also could use the PerlIO::eol module from CPAN.
Consider looking at these documentation pages:
PerlIO for a general understanding how the Perl-IO works.
open the pragma (not the function) to set layers for one program.
binmode the function you might want to consider.
My suggestion, but I can't test it (no Windows around), would be to use the following:
open my $outfile, '<:encoding(utf-8)', "filename" or die "error opening: $!";
binmode $outfile, join '', grep {$_ ne ':crlf'} PerlIO::get_layers($outfile)
or die "error setting output:\n $!"
while(<$infile>){
s/match/replacement/g;
print $outfile $_;
}

Related

Perl log message is unexpected written into dest file

Here is a segment of test code for writing data to files,
open(OUT_FILE, ">", $destfile)||die("can not open file!");
select(OUT_FILE);
binmode(OUT_FILE);
printf "test file name:\t'%s'\n", $destfile;
writebinary(OUT_FILE,pack('H*', $name))
Log message "test file name: datatest.txt" is appended in datatest.txt
What's wrong?
You have selected your file handle OUT_FILE. select will make output from print and printf go to the selected handle instead of STDOUT, which is selected by default.
Remove the call to select. You don't need it.
Please note your code is very old-fashioned. It could be rewritten as follows, to take into account lexical file handles and proper error handling:
open my $fh, '>', $destfile or die "Can't open file '$destfile': $!";
binmode $fh;
printf "test file name:\t'%s'\n", $destfile;
writebinary($fh, pack('H*', $name));
Of course you're not telling us what writebinary does. You might need to make changes there. But keep in mind that glob filehandles are global, and other parts of your program might mess with your OUT_FILE.

backtick vs native way of doing things in PERL

Consider these 2 snippets :
#!/bin/bash/perl
open(DATA,"<input.txt");
while(<DATA>)
{
print($_) ;
}
and
$abcd = `cat input.txt`;
print $abcd;
Both will print the content of file input.txt as output
Question : Is there any standard, as to which one (backticks or native-method) should be preferred over the other, in any particular case or both are equal always??
Reason i am asking this is because i find cat method to be easier than opening a file in native perl method, so, this puts me in doubt that if i can achieve something through backtick way, shall i go with it or prefer other native ways of doing it!!
I checked this thread too : What's the difference between Perl's backticks, system, and exec? but it went a different route than my doubt!!
Use builtin functions wherever possible:
They are more portable: open works on Windows, while `cat input.txt` will not.
They have less overhead: Using backticks will fork, exec a shell which parses the command, which execs the cat program. This unnecessarily loads two programs. This is in contrast to open which is a builtin Perl function.
They make error handling easier. The open function will return a false value on error, which allows you to take different actions, e.g. like terminating the program with an error message:
open my $fh, "<", "input.txt" or die "Couldn't open input.txt: $!";
They are more flexible. For example, you can add encoding layers if your data isn't Latin-1 text:
open my $fh, "<:utf8", "input.txt" or die "Couldn't open input.txt: $!";
open my $fh, "<:raw", "input.bin" or die "Couldn't open input.bin: $!";
If you want a “just read this file into a scalar” function, look at the File::Slurp module:
use File::Slurp;
my $data = read_file "input.txt";
Using the back tick operators to call cat is highly inefficient, because:
It spawns a separate process (or maybe more than one if a shell is used) which does nothing more than read the file, which perl could do itself.
You are reading the whole file into memory instead of processing it one line at a time. OK for a small file, not so good for a large one.
The back tick method is ok for a quick and dirty script but I would not use it for anything serious.

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

Line breaks don't exist on input from FTP file (Perl)

I downloaded a csv file using Net::FTP. When I look at this file in text editor or excel or even when I cut/paste it has line breaks and looks like this:
000000000G911|06
0000000000CDR|25|123
0000000000EGP|19
When I read the file in Perl it sees the entire text as one line like this:
000000000G911|060000000000CDR|25|1230000000000EGP|19
I have tried reading it using
tie #lines, 'Tie::File', "C:/Programs/myfile.csv", autochomp=>0 or die "Can't read file: $!\n";
foreach $l (#lines1)
{print "$l\n";
}
and
open FILE, "`<`$filename" or die $!;
my #lines=`<`FILE>;
foreach $l (#lines)
{print "$l\n";
}
close FILE;
The file has line breaks in a format that Perl is not recognizing because it is coming from a different operating system. The other programs are automatically detecting the different line break format, but Perl doesn't do that.
If you have Net::FTP perform the transfer in ASCII mode (e.g. $ftp->ascii to enable this mode), this should be taken care of and corrected for you.
Alternatively, you can figure out what is being used for line breaks and then set the special $/ variable to that value.

How can I read input from a text file in Perl?

I would like to take input from a text file in Perl. Though lot of info are available over the net, it is still very confusing as to how to do this simple task of printing every line of text file. So how to do it? I am new to Perl, thus the confusion .
eugene has already shown the proper way. Here is a shorter script:
#!/usr/bin/perl
print while <>
or, equivalently,
#!/usr/bin/perl -p
on the command line:
perl -pe0 textfile.txt
You should start learning the language methodically, following a decent book, not through haphazard searches on the web.
You should also make use of the extensive documentation that comes with Perl.
See perldoc perltoc or perldoc.perl.org.
For example, opening files is covered in perlopentut.
First, open the file:
open my $fh, '<', "filename" or die $!;
Next, use the while loop to read until EOF:
while (<$fh>) {
# line contents's automatically stored in the $_ variable
}
close $fh or die $!;
# open the file and associate with a filehandle
open my $file_handle, '<', 'your_filename'
or die "Can't open your_filename: $!\n";
while (<$file_handle>) {
# $_ contains each record from the file in turn
}