Perl chomp plus =~s Yielding very strange results - perl

I am writing a perl program which creates a procmailrc text file.
The output procmail requires looks like this:
(partial IP addresses separated by "|")
\(\[54\.245\.|\(\[54\.252\.|\(\[60\.177\.|
Here is my perl script:
open (DEAD, "$dead");
#dead = <DEAD>;
foreach $line (#dead) { chomp $line; $line =~s /\./\\./g;
print FILE "\\(\\[$line\|\n"; }
close (DEAD);
Here is the output I am getting:
|(\[54\.245\.
|(\[54\.252\.
|(\[60\.177\.
Why is chomp not removing the line breaks?
Stranger still, why is the '|' appearing at the front of each line, rather than at the end?
If I replace $line with a word, (print FILE "\(\[test\|"; })
The output looks the way I would expect it to look:
\(\[test|\(\[test|\(\[test|\(\[test|
What am I missing here?
So I found a work-around that does not use chomp:
The answer was here:https://www.perlmonks.org/?node_id=504626
My new code:
open (DEAD, "$dead");
#dead = <DEAD>;
foreach $line (#junk) {
$line =~ s/\r[\n]*//gm; $line =~ s/\./\\./g; $line =~ s/\:/\\:/g;
print FILE "\\(\\[$line \|"; }
close (DEAD);
Thanks to those of you who gave me some hints.

chomp removes the value of $/, which is set to the string produced by "\n" by default.
For input, however, you have
#dead = (
"54.245\.\r\n",
"54.252\.\r\n",
"60.177\.\r\n",
);
This simple fix:
open(my $dead_fh, '<', $dead_qfn)
or die("Can't open \"$dead_qfn: $!\n");
while (my $line = <$dead_fh>) {
$line =~ s/\s+\z//;
$line =~ s/\./\\./g
print($out_fh "\\(\\[$line\|");
}
print($out_fh "\n");
(The alternative is to add the :crlf layer to your input handle.)
Using quotemeta makes more sense
open(my $dead_fh, '<', $dead_qfn)
or die("Can't open \"$dead_qfn: $!\n");
while (my $line = <$dead_fh>) {
$line =~ s/\s+\z//;
print($out_fh quotemeta("([$line|"));
}
print($out_fh "\n");

Related

how can I replace lines from 2 different files using regex in Perl?

I have 1st file ( Delta_spike_sorted.fasta)with a format of:
>lcl|KJ584357.1
AAAAA
>lcl|JQ065046.1
GGGGG
and 2nd file (Delta_final.fasta) with the format of:
>KJ584357.1 Porcine coronavirus HKU15 strain KY4813, complete genome
TTTTTT
>JQ065046.1 Magpie-robin coronavirus HKU18 strain HKU18-chu3, complete genome
CCCCCC
I'm trying to write a script to replace the >lcl... of the 1st file with the equivalent title of the 2nd file by matching their IDs (those next to lcl). The final outcome should be something like this:
>Porcine coronavirus HKU15 strain KY4813
AAAAA
>Magpie-robin coronavirus HKU18 strain HKU18-chu3
GGGGG
Now that I see it again, maybe using some hashes would be the most suitable option (sorry for the many mistakes, it's my first post here, also i'm a noob on programming)
#!/usr/bin/perl -w
open (FIN, "< coronavirus_complete/complete_final/Delta_final.fasta") or die "unable to open FIN \n";
open (FH, "< coronavirus_cds/Spikes/Spikes_complete/sorted/Delta_spike_sorted.fasta") or die "unable
to open FH \n";
while ($line=<FH>){
if ($line =~ /^>/){
chomp($line);
$acc=substr($line,5,9);
#print "$acc\n";
}
while ($string=<FIN>){
if ($string =~ /^>/){
chomp ($string);
$gen=substr($string,12);
#print "$gen\n";
}
if ($acc =~ /\Q$string/){
$line =~ s/$line/$gen/g;
print "$acc\n";
}
}
Here is an example where I read the definitions from the second file into a hash first. Then you avoid to reread that file for each line in the first file:
use feature qw(say);
use strict;
use warnings;
{ # <-- scope to prevent local lexical variable to 'leak' into subs below
my $map = read_fin("Delta_final.fasta");
my $fn = 'Delta_spike_sorted.fasta';
open (my $fh, '<', $fn) or die "unable to open file '$fn': $! \n";
while (my $line=<$fh>) {
chomp $line;
if ($line =~ /^>/){
my $acc = substr $line,5,10;
if (exists $map->{$acc}) {
say ">$map->{$acc}";
next;
}
}
say $line;
}
close $fh;
}
sub read_fin {
my ($fn) = #_;
my %map;
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
while( my $line = <$fh> ) {
chomp $line;
if ( $line =~ /^>((?:\S){10})\s+(\S.*)$/ ) {
$map{$1} = $2;
}
}
close $fh;
return \%map;
}
Output:
>Porcine coronavirus HKU15 strain KY4813, complete genome
AAAAA
>Magpie-robin coronavirus HKU18 strain HKU18-chu3, complete genome
GGGGG

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.
Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}
There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

Want to add random string to identifier line in fasta file

I want to add random string to existing identifier line in fasta file.
So I get:
MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Then the sequence on the next lines as normal. I am have problem with i think in the format output. This is what I get:
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
It's added to every line. (I altered length to fit here.) I want just to add to the identifier line.
This is what i have so far:
use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
my $header_line;
my $seq;
my $uniqueID;
open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");
while( <$fh> ){
if ($_ =~ m/^(\S+)\s+(.*)/) {
$header_line = $1;
$seq = $2;
$uniqueID = $currentId++;
print $out_fh "$header_line$uniqueID\n$seq";
} # if
} # while
close $fh;
close $out_fh;
Thanks very much, any ideas will be greatly appreciated.
Your program isn't working because the regex ^(\S+)\s+(.*) matches every line in the input file. For instance, \S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT; the newline at the end of the line matches \s+; and nothing matches .*.
Here's how I would encode your solution. It simply appends $current_id to the end of any line that contains a pipe | character
use strict;
use warnings;
use 5.010;
use autodie;
my ($filename) = #ARGV;
my $current_id = 'a' x 57;
open my $in_fh, '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";
while ( my $line = <$in_fh> ) {
chomp $line;
$line .= $current_id if $line =~ tr/|//;
print $line, "\n";
}
close $out_fh;
output
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT

Perl: printing original file with changes

I wrote this code and it works fine, it should find lines in which there's no string like 'SID' and append a pipe | at the beginning of the line, so like this: find all lines in which there's no 'SID' and append a pipe | at the beginning of the line. But how I wrote it, I can just output the lines which were changed and have a pipe. What I actually want: leave the file as it is and just append the pipes to the lines which match. Thank you.
#!usr/bin/perl
use strict;
use warnings;
use autodie;
my $fh;
open $fh, '<', 'file1.csv';
my $out = 'file2.csv';
open(FILE, '>', $out);
my $myline = "";
while (my $line = <$fh>) {
chomp $line;
unless ($line =~ m/^SID/) {
$line =~ m/^(.*)$/;
$myline = "\|$1";
}
print FILE $myline . "\n";
}
close $fh;
close FILE;
my file example:
SID,bla
foo bar <- my code adds the pipe to the beginning of this line
output should be like this:
SID,bla
| foo bar
but in my case I only print $myline, I know:
| foo bar
The line
$line =~ m/^(.*)$/
is misguided: all it does is put the contents of $line into $1, so the following statement
$myline = "\|$1"
may as well be
$myline = "|$line"
(The pipe | doesn't need escaping unless it is part of a regular expression.)
Since you are printing $myline at the end of your loop you are never seeing the contents of unmodified lines.
You can fix that by printing $line or $myline according to which one contains the required output, like this
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^SID/) {
print "$line\n";
}
else {
my $myline = "|$line";
print "$myline\n";
}
}
or, much more simply, by dropping the intermediate variable and using the default $_ for the input lines, like this
while (<$fh>) {
print '|' unless /^SID/;
print;
}
Note that I have also removed the chomp as it just means you have to put the newline back on the end of the string when you print it.
Instead of creating a new variable $myline, use the one you already have:
while (my $line =<$fh>) {
$line = '|' . $line if $line !~ /^SID/;
print FILE $line;
}
Also, you can use lexical filehandle for the output file as well. Moreover, you should check the return value of open:
open my $OUT, '>', $out or die $!;

open a file and replace a word using perl

I want to open a file and replace a word from a file.
My code is attached here.
open(my $fh, "<", "pcie_7x_v1_7.v") or die "cannot open <pcie_7x_v1_7.v:$!";
while (my $line = <$fh>) {
if ($line =~ timescale 1 ns) {
print $line $msg = "pattern found \n ";
print "$msg";
$line =~ s/`timescale 1ns/`timescale 1ps/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
File contains pattern timescale 1ns/1ps.
My requirement is to replace timescale 1ns/1ps to be replaced with timescale 1ps/1ps.
At present else condition occurs always.
Update code after receiving comment:
Hi,
Thanks for the quick solution.
I changed the code accordingly, but the result was not successful.
I have attached the updated code here.
Please suggest me if I missed anything here.
use strict;
use warnings;
open(my $fh, "<", "pcie_7x_v1_7.v" )
or die "cannot open <pcie_7x_v1_7.v:$!" ;
open( my $fh2, ">", "cie_7x_v1_7.v2")
or die "cannot open <pcie_7x_v1_7.v2:$!" ;
while(my $line = <$fh> )
{
print $line ;
if ($_ =~ /timescale\s1ns/ )
{
$msg = "pattern found \n " ;
print "$msg" ;
$_ =~ s/`timescale 1ns/`timescale 1ps/g ;
}
else
{
$msg = "pattern not found \n " ;
print "$msg" ;
}
print $fh2 $line ;
}
close($fh) ;
close($fh2) ;
Result:
pattern not found
pattern not found
pattern not found
pattern not found
Regards,
Binu
3rd update:
// File : pcie_7x_v1_7.v
// Version : 1.7
//
// Description: 7-series solution wrapper : Endpoint for PCI Express
//
//--------------------------------------------------------------------------------
//`timescale 1ps/1ps
`timescale 1ns/1ps
(* CORE_GENERATION_INFO = "pcie_7x_v1_7,pcie_7x_v1_7,
You can use a perl oneliner from a command line. No need to write a script.
perl -p -i -e "s/`timescale\s1ns/`timescale 1ps/g" pcie_7x_v1_7.v
-
However,
If you still want to use the script, you are almost there. You just need to fix a couple errors
print $line; #missing
if ($line =~ /timescale\s1ns/) #made it a real regex, this should match now
$line =~ s/`timescale 1ns/`timescale 1ps/g ; #added g to match all occurences in line
after the if-else you must print the line to a file again
for example, open a new file for writing (let's call it 'pcie_7x_v1_7.v.2') at the beginning of your script
open(my $fh2, ">", "pcie_7x_v1_7.v.2" ) or die "cannot open <pcie_7x_v1_7.v.2:$!" ;
then , after the else block just print the line (whether it's changed or not) to the file
print $fh2 $line;
Don't forget to close the filehandles when you're done
close($fh);
close($fh2);
EDIT:
Your main problem was that you used $_ for the check, while you had assigned the line to $line. So you did print $line, but then if ($_ =~ /timescale/. That would never work.
I'm copy pasting your script and made a couple corrections and formatted it a little more dense to better fit in the website. I also removed the if match check as suggested by TLP and directly did the substitution in the if. It has exactly the same result. This works:
use strict;
use warnings;
open(my $fh, "<", "pcie_7x_v1_7.v" )
or die "cannot open <pcie_7x_v1_7.v:$!" ;
open( my $fh2, ">", "pcie_7x_v1_7.v2")
or die "cannot open >pcie_7x_v1_7.v2:$!" ;
while(my $line = <$fh> ) {
print $line;
if ($line =~ s|`timescale 1ns/1ps|`timescale 1ps/1ns|g) {
print "pattern found and replaced\n ";
}
else {
print "pattern not found \n ";
}
print $fh2 $line ;
}
close($fh);
close($fh2);
#now it's finished, just overwrite the old file with the new file
rename "pcie_7x_v1_7.v2", "pcie_7x_v1_7.v";