Add a line after every string match

Add a line after every string match - perl

I have a sample file here http://pastebin.com/m5m40nGF
What I want to do is add a line after every instance of protein_id.
protein_id always has the same pattern:
TAB-TAB-TAB-protein_id-TAB-gnl|CorradiLab|M715_#SOME_NUMBER
What I need to do is to add this after every line of protein_id:
TAB-TAB-TAB-transcript_id-TAB-gnl|CorradiLab|M715_mRNA_#SOME_NUMBER
The catch is that #SOME_NUMBER has to stay the same.
In the first case, it would look like this:
94 1476 CDS
protein_id gnl|CorradiLab|M715_ECU01_0190
transcript_id gnl|CorradiLab|M715_mRNA_ECU01_0190
product serine hydroxymethyltransferase
label serine hydroxymethyltransferase
Thanks! Adrian
I tried a perl solution, but I get an error.
open(IN, $in); while(<IN>){
print $_;
if ($_ ~= /gnl\|CorradiLab\|/) {
$_ =~ s/tprotein_id/transcript_id/;
print $_;
}
}
Error:
syntax error at test.pl line 3, near "$_ ~"
syntax error at test.pl line 7, near "}"
Execution of test.pl aborted due to compilation errors.

The following perl script worked
my $in=shift;
open(IN, $in); while(<IN>){
print $_;
if ($_ =~ /gnl\|CorradiLab\|/) {
my $tmp = $_;
$tmp =~ s/protein_id/transcript_id/;
print $tmp;
}
}

Offering an update on existing answer because I feel it can be improved further:
Generally - the precise problem in the OP is this line:
if ($_ ~= /gnl\|CorradiLab\|/) {
Because you've got ~= not =~. That's what syntax error at test.pl line 3, near "$_ ~" is trying to tell you.
I would offer that improving on:
my $in=shift;
open(IN, $in); while(<IN>){
print $_;
if ($_ =~ /gnl\|CorradiLab\|/) {
my $tmp = $_;
$tmp =~ s/protein_id/transcript_id/;
print $tmp;
}
}
while ( my $tmp = <IN> ) { skips the need to assign $_.
3 argument open with lexical filehandle is preferable. E.g. open ( my $in, "<", "$input_filename" ) or die $!; (You should test whether the open worked too)
Explicit open may well be unnecessary if you're just reading a filename from command line. Using <> either reads filenames (opening and processing) or STDIN, which means your script becomes a bit more versatile.
Thus I would rewrite as:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $line = <> ) {
print $line;
if ( $line =~ /gnl\|CorradiLab\|/ ) {
$line =~ s/protein_id/transcript_id/;
print $line;
}
}
Or alternatively:
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
print;
if (m/gnl\|CorradiLab\|/) {
s/protein_id/transcript_id/;
print;
}
}

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.

paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.

To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);

You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

Populate an array by splitting a string

I am trying to convert a string into an array based on space delimiter.
My input file looks like this:
>Reference
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnctcACCATGGTGTCGACTC
TTCTATGGAAACAGCGTGGATGGCGTCTCCAGGCGATCTGACGGTTCACTAAACGAGCTC
Ignoring the line starting with >, the length of rest of the string is 360.
I am trying to convert this into an array.
Here's my code so far:
#!/usr/bin/perl
use strict;
use warnings;
#### To to change bases with less than 10X coverage to N #####
#### Take depth file and consensus fasta file as input arguments ####
my ($in2) = #ARGV;
my $args = $#ARGV + 1;
if ( $args != 1 ) {
print "Error!!! Insufficient Number of Argumrnts\n";
print "Usage: $0 <consensus fasta file> \n";
}
#### Open a filehandle to read in consensus fasta file ####
my $FH2;
my $line;
my #consensus;
my $char;
open($FH2, '<', $in2) || die "Could not open file $in2\n";
while ( <$FH2> ) {
$line = $_;
chomp $line;
next if $line =~ />/; # skip header line
$line =~ s/\s+//g;
my $len = length($line);
print "$len\n";
#print "$line";
#consensus = split(// , $line);
print "$#consensus\n";
#print "#consensus\n";
#for $char (0 .. $#consensus){
# print "$char: $consensus[$char]\n";
# }
}
The problem is the $len variable returns a value of 60 instead of 360 and $#consensus returns a value of 59 instead of 360 which is the length of the string.
I have removed the whitespace after each line with code $line =~ s/\s+//g;but it still is not working.

It looks like your code is essentially working. It's just your checking logic that makes no sense. I'd do the following:
use strict;
use warnings;
if (#ARGV != 1) {
print STDERR "Usage: $0 <consensus fasta file>\n";
exit 1;
}
open my $fh, '<', $ARGV[0] or die "$0: cannot open $ARGV[0]: $!\n";
my #consensus;
while (my $line = readline $fh) {
next if $line =~ /^>/;
$line =~ s/\s+//g;
push #consensus, split //, $line;
}
print "N = ", scalar #consensus, "\n";
Main things to note:
Error messages should go to STDERR, not STDOUT.
If an error occurs, the program should exit with an error code, not keep running.
Error messages should include the name of the program and the reason for the error.
chomp is redundant if you're going to remove all whitespace anyway.
As you're processing the input line by line, you can just keep pushing elements to the end of #consensus. At the end of the loop it'll have accumulated all characters across all lines.
Examining #consensus within the loop makes little sense as it hasn't finished building yet. Only after the loop do we have all characters we're interested in.

Perl: printing original file with changes

I wrote this code and it works fine, it should find lines in which there's no string like 'SID' and append a pipe | at the beginning of the line, so like this: find all lines in which there's no 'SID' and append a pipe | at the beginning of the line. But how I wrote it, I can just output the lines which were changed and have a pipe. What I actually want: leave the file as it is and just append the pipes to the lines which match. Thank you.
#!usr/bin/perl
use strict;
use warnings;
use autodie;
my $fh;
open $fh, '<', 'file1.csv';
my $out = 'file2.csv';
open(FILE, '>', $out);
my $myline = "";
while (my $line = <$fh>) {
chomp $line;
unless ($line =~ m/^SID/) {
$line =~ m/^(.*)$/;
$myline = "\|$1";
}
print FILE $myline . "\n";
}
close $fh;
close FILE;
my file example:
SID,bla
foo bar <- my code adds the pipe to the beginning of this line
output should be like this:
SID,bla
| foo bar
but in my case I only print $myline, I know:
| foo bar

The line
$line =~ m/^(.*)$/
is misguided: all it does is put the contents of $line into $1, so the following statement
$myline = "\|$1"
may as well be
$myline = "|$line"
(The pipe | doesn't need escaping unless it is part of a regular expression.)
Since you are printing $myline at the end of your loop you are never seeing the contents of unmodified lines.
You can fix that by printing $line or $myline according to which one contains the required output, like this
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/^SID/) {
print "$line\n";
}
else {
my $myline = "|$line";
print "$myline\n";
}
}
or, much more simply, by dropping the intermediate variable and using the default $_ for the input lines, like this
while (<$fh>) {
print '|' unless /^SID/;
print;
}
Note that I have also removed the chomp as it just means you have to put the newline back on the end of the string when you print it.

Instead of creating a new variable $myline, use the one you already have:
while (my $line =<$fh>) {
$line = '|' . $line if $line !~ /^SID/;
print FILE $line;
}
Also, you can use lexical filehandle for the output file as well. Moreover, you should check the return value of open:
open my $OUT, '>', $out or die $!;

What is this compilation error in perl? I do not see a syntax problem

#!/usr/bin/perl
use strict;
use warnings;
open(TEST, "leet.txt") or die "Can't open leet.txt: $!\n";
while(my $line = <TEST>) {
if($line =~ tr/34/ea/)
print <<EOF;
$line
EOF
}
It produces this:
syntax error at ./practice.pl line 11, near ")
print"
Execution of ./practice.pl aborted due to compilation errors.

You have to enclose the if commands in a { } block, even when it has just one command. Unlike other languages, in Perl, this is not optional.
if($line =~ tr/34/ea/) {
print <<EOF;
$line
EOF
}

I suppose you just skipped pasting two "}" at the end. Then add "{":
if($line =~ tr/34/ea/) {

To conditionally run a single statement place the conditional after the statement. For example, these two statements have the same behavior:
if ($bar) { print "foo!\n"; }
print "foo!\n" if ($bar);
In the case of your code you could write it like this:
#!/usr/bin/perl
use strict;
use warnings;
open(TEST, "leet.txt") or die "Can't open leet.txt: $!\n";
while(my $line = <TEST>) {
print <<EOF if ($line =~ tr/34/ea/);
$line
EOF
}
close TEST;

How can I avoid warnings in Perl?

I have a small piece of code for printing the contents in a text file like this,
use strict;
use warnings;
open (FILE, "2.txt") || die "$!\n";
my $var = <FILE>;
while ($var ne "")
{
print "$var";
$var = <FILE>;
}
Text file is,
line 1
line 2
line 3
After running the code i am getting a warning like this,
line 1
line 2
line 3
Use of uninitialized value $var in string ne at del.pl line 10, <FILE> line 3.
How to overcome this warning.

The common idiom for reading from a file is this:
open my $fh, '<', $file or die $!;
while (defined(my $line = <$fh>)) {
print $line, "\n";
}
Although the while loop implicitly tests for whether the result of the assignment is defined, it's better to do the test explicitly for clarity.

I always use:
while(<FILE>) {
print $_;
}
No such problems...

The quickest fix is probably to replace
while ($var ne "")
with
while (defined $var)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Add a line after every string match - perl

The following perl script worked my $in=shift; open(IN, $in); while(<IN>){ print $_; if ($_ =~ /gnl\|CorradiLab\|/) { my $tmp = $_; $tmp =~ s/protein_id/transcript_id/; print $tmp; } }

Related

how to combine the code to make the output is on the same line?

Populate an array by splitting a string

Perl: printing original file with changes

What is this compilation error in perl? I do not see a syntax problem

How can I avoid warnings in Perl?

Categories

Resources