Perl: printing original file with changes - perl

I wrote this code and it works fine, it should find lines in which there's no string like 'SID' and append a pipe | at the beginning of the line, so like this: find all lines in which there's no 'SID' and append a pipe | at the beginning of the line. But how I wrote it, I can just output the lines which were changed and have a pipe. What I actually want: leave the file as it is and just append the pipes to the lines which match. Thank you.
#!usr/bin/perl
use strict;
use warnings;
use autodie;
my $fh;
open $fh, '<', 'file1.csv';
my $out = 'file2.csv';
open(FILE, '>', $out);
my $myline = "";
while (my $line = <$fh>) {
chomp $line;
unless ($line =~ m/^SID/) {
$line =~ m/^(.*)$/;
$myline = "\|$1";
}
print FILE $myline . "\n";
}
close $fh;
close FILE;
my file example:
SID,bla
foo bar <- my code adds the pipe to the beginning of this line
output should be like this:
SID,bla
| foo bar
but in my case I only print $myline, I know:
| foo bar

The line
$line =~ m/^(.*)$/
is misguided: all it does is put the contents of $line into $1, so the following statement
$myline = "\|$1"
may as well be
$myline = "|$line"
(The pipe | doesn't need escaping unless it is part of a regular expression.)
Since you are printing $myline at the end of your loop you are never seeing the contents of unmodified lines.
You can fix that by printing $line or $myline according to which one contains the required output, like this
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^SID/) {
print "$line\n";
}
else {
my $myline = "|$line";
print "$myline\n";
}
}
or, much more simply, by dropping the intermediate variable and using the default $_ for the input lines, like this
while (<$fh>) {
print '|' unless /^SID/;
print;
}
Note that I have also removed the chomp as it just means you have to put the newline back on the end of the string when you print it.

Instead of creating a new variable $myline, use the one you already have:
while (my $line =<$fh>) {
$line = '|' . $line if $line !~ /^SID/;
print FILE $line;
}
Also, you can use lexical filehandle for the output file as well. Moreover, you should check the return value of open:
open my $OUT, '>', $out or die $!;

Related

Perl - substring keywords

I have a text file where is lot of lines, I need search in this file keywords and if exist write to log file line where is keywords and line one line below and one above the keyword. Now search or write keyword not function if find write all and I dont known how can I write line below and above. Thanks for some advice.
my $vstup = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly20220513_154743.txt";
my $log = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly.log";
open( my $default_fh, "<", $vstup ) or die $!;
open( my $main_fh, ">", $log ) or die $!;
my $var = 0;
while ( <$default_fh> ) {
if (/\Volat\b/)
$var = 1;
}
if ( $var )
print $main_fh $_;
}
}
close $default_fh;
close $main_fh;
The approach below use one semaphore variable and a buffer variable to enable the desired behavior.
Notice that the pattern used was replaced by 'A` for simplicity testing.
#!/usr/bin/perl
use strict;
use warnings;
my ($in_fh, $out_fh);
my ($in, $out);
$in = 'input.txt';
$out = 'output.txt';
open($in_fh, "< ", $in) || die $!."\n";
open($out_fh, "> ", $out) || die $!;
my $p_next = 0;
my $p_line;
while (my $line = <$in_fh>) {
# print line after occurrence
print $out_fh $line if ($p_next);
if ($line =~ /A/) {
if (defined($p_line)) {
# print previous line
print $out_fh $p_line;
# once printed undefine variable to avoid printing it again in the next loop
undef($p_line);
}
# Print current line if not already printed as the line follows a pattern
print $out_fh $line if (!$p_next);
# toggle semaphore to print the next line
$p_next = 1;
} else {
# pattern not found.
# if pattern was not detected in both current and previous line.
$p_line = $line if (!$p_next);
$p_next = 0;
}
}
close($in_fh);
close($out_fh);

Perl chomp plus =~s Yielding very strange results

I am writing a perl program which creates a procmailrc text file.
The output procmail requires looks like this:
(partial IP addresses separated by "|")
\(\[54\.245\.|\(\[54\.252\.|\(\[60\.177\.|
Here is my perl script:
open (DEAD, "$dead");
#dead = <DEAD>;
foreach $line (#dead) { chomp $line; $line =~s /\./\\./g;
print FILE "\\(\\[$line\|\n"; }
close (DEAD);
Here is the output I am getting:
|(\[54\.245\.
|(\[54\.252\.
|(\[60\.177\.
Why is chomp not removing the line breaks?
Stranger still, why is the '|' appearing at the front of each line, rather than at the end?
If I replace $line with a word, (print FILE "\(\[test\|"; })
The output looks the way I would expect it to look:
\(\[test|\(\[test|\(\[test|\(\[test|
What am I missing here?
So I found a work-around that does not use chomp:
The answer was here:https://www.perlmonks.org/?node_id=504626
My new code:
open (DEAD, "$dead");
#dead = <DEAD>;
foreach $line (#junk) {
$line =~ s/\r[\n]*//gm; $line =~ s/\./\\./g; $line =~ s/\:/\\:/g;
print FILE "\\(\\[$line \|"; }
close (DEAD);
Thanks to those of you who gave me some hints.
chomp removes the value of $/, which is set to the string produced by "\n" by default.
For input, however, you have
#dead = (
"54.245\.\r\n",
"54.252\.\r\n",
"60.177\.\r\n",
);
This simple fix:
open(my $dead_fh, '<', $dead_qfn)
or die("Can't open \"$dead_qfn: $!\n");
while (my $line = <$dead_fh>) {
$line =~ s/\s+\z//;
$line =~ s/\./\\./g
print($out_fh "\\(\\[$line\|");
}
print($out_fh "\n");
(The alternative is to add the :crlf layer to your input handle.)
Using quotemeta makes more sense
open(my $dead_fh, '<', $dead_qfn)
or die("Can't open \"$dead_qfn: $!\n");
while (my $line = <$dead_fh>) {
$line =~ s/\s+\z//;
print($out_fh quotemeta("([$line|"));
}
print($out_fh "\n");

Want to add random string to identifier line in fasta file

I want to add random string to existing identifier line in fasta file.
So I get:
MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Then the sequence on the next lines as normal. I am have problem with i think in the format output. This is what I get:
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
It's added to every line. (I altered length to fit here.) I want just to add to the identifier line.
This is what i have so far:
use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
my $header_line;
my $seq;
my $uniqueID;
open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");
while( <$fh> ){
if ($_ =~ m/^(\S+)\s+(.*)/) {
$header_line = $1;
$seq = $2;
$uniqueID = $currentId++;
print $out_fh "$header_line$uniqueID\n$seq";
} # if
} # while
close $fh;
close $out_fh;
Thanks very much, any ideas will be greatly appreciated.
Your program isn't working because the regex ^(\S+)\s+(.*) matches every line in the input file. For instance, \S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT; the newline at the end of the line matches \s+; and nothing matches .*.
Here's how I would encode your solution. It simply appends $current_id to the end of any line that contains a pipe | character
use strict;
use warnings;
use 5.010;
use autodie;
my ($filename) = #ARGV;
my $current_id = 'a' x 57;
open my $in_fh, '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";
while ( my $line = <$in_fh> ) {
chomp $line;
$line .= $current_id if $line =~ tr/|//;
print $line, "\n";
}
close $out_fh;
output
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT

Ignore lines in a file till match and process lines after that

I am looping over lines in a file and when matched a particular line, i want to process the lines after the current (matched) line. I can do it :-
open my $fh, '<', "abc" or die "Cannot open!!";
while (my $line = <$fh>){
next if($line !~ m/Important Lines below this Line/);
last;
}
while (my $line = <$fh>){
print $line;
}
Is there a better way to do this (code needs to be a part of a bigger perl script) ?
I'd use flip-flop operator:
while(<DATA>) {
next if 1 .. /Important/;
print $_;
}
__DATA__
skip
skip
Important Lines below this Line
keep
keep
output:
keep
keep

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.