modify lines between two tags in perl - perl

I need some help with replaceing lines between two tages in perl. I have a file in which I want to modify lines between two tags:
some lines
some lines
tag1
ABC somelines
NOP
NOP
ABC somelines
NOP
NOP
ABC somelines
tag2
As you can see, I have two tags, tag1 and tag2 and basically, I want to replace all instances of ABC with NOP between tag1 and tag2. Here is the relevant portion of code but it doesn't replace. Can anyone please help..?
my $fh;
my $cur_file = "file_name";
my #lines = ();
open($fh, '<', "$cur_file") or die "Can't open the file for reading $!";
print "Before while\n";
while(<$fh>)
{
print "inside while\n";
my $line = $_;
if($line =~ /^tag1/)
{
print "inside range check\n";
$line = s/ABC/NOP/;
push(#lines, $line);
}
else
{
push(#lines, $line);
}
}
close($fh);
open ($fh, '>', "$cur_file") or die "Can't open file for wrinting\n";
print $fh #lines;
close($fh);

Consider a one-liner using the Flip-Flop operator.
perl -i -pe 's/ABC/NOP/ if /^tag1/ .. /^tag2/' file

Use $INPLACE_EDIT in conjunction with the range operator ..
use strict;
use warnings;
local $^I = '.bak';
local #ARGV = $cur_file;
while (<>) {
if (/^tag1/ .. /^tag2/) {
s/ABC/NOP/;
}
print;
}
unlink "$cur_file$^I"; #delete backup;
For alternative ways to edit a file, check out: perlfaq5

Your line which says $line = s/ABC/NOP/; is incorrect, you need =~ there.
#!/usr/bin/perl
use strict;
use warnings;
my $tag1 = 0;
my $tag2 = 0;
while(my $line = <DATA>){
if ($line =~ /^tag1/){
$tag1 = 1; #Set the flag for tag1
}
if ($line =~ /^tag2/){
$tag2 = 1; #Set the flag for tag2
}
if($tag1 == 1 && $tag2 == 0){
$line =~ s/ABC/NOP/;
}
print $line;
}
Demo

Related

Perl - substring keywords

I have a text file where is lot of lines, I need search in this file keywords and if exist write to log file line where is keywords and line one line below and one above the keyword. Now search or write keyword not function if find write all and I dont known how can I write line below and above. Thanks for some advice.
my $vstup = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly20220513_154743.txt";
my $log = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly.log";
open( my $default_fh, "<", $vstup ) or die $!;
open( my $main_fh, ">", $log ) or die $!;
my $var = 0;
while ( <$default_fh> ) {
if (/\Volat\b/)
$var = 1;
}
if ( $var )
print $main_fh $_;
}
}
close $default_fh;
close $main_fh;
The approach below use one semaphore variable and a buffer variable to enable the desired behavior.
Notice that the pattern used was replaced by 'A` for simplicity testing.
#!/usr/bin/perl
use strict;
use warnings;
my ($in_fh, $out_fh);
my ($in, $out);
$in = 'input.txt';
$out = 'output.txt';
open($in_fh, "< ", $in) || die $!."\n";
open($out_fh, "> ", $out) || die $!;
my $p_next = 0;
my $p_line;
while (my $line = <$in_fh>) {
# print line after occurrence
print $out_fh $line if ($p_next);
if ($line =~ /A/) {
if (defined($p_line)) {
# print previous line
print $out_fh $p_line;
# once printed undefine variable to avoid printing it again in the next loop
undef($p_line);
}
# Print current line if not already printed as the line follows a pattern
print $out_fh $line if (!$p_next);
# toggle semaphore to print the next line
$p_next = 1;
} else {
# pattern not found.
# if pattern was not detected in both current and previous line.
$p_line = $line if (!$p_next);
$p_next = 0;
}
}
close($in_fh);
close($out_fh);

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.
Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}
There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

Printing value from split result Perl

Here I have a abc.txt file:
aaa,1000,kevin
bbb,2000,john
ccc,3000,jane
ddd,4000,kevin
Then I want to print out:
kevin
john
jane
my Perl script is:
open (INFILE, $ARGV[1]) or die "An input file is required as argument\n";
#store=();
while(<INFILE>)
{
chomp();
#data=split(/,/);
#
#
#
if (%store ne "0")
{
print "Printing users:\n";
foreach $key (keys %store)
{print $key."\n";}
}
print "Printing users:\n";
foreach $key (keys %store)
{print $key."\n";}
}
My idea is to store the value into hash and create key to each value. How can I do in the ### line?
You have declared #store and then using %store. I didn't understand that why you doing that, but the below code will give you desire output. First read the input file, split the data and then remove the duplicates.
use strict;
use warnings;
my $infile = $ARGV[0];
open my $fh, "<", $infile or die "An input file is required as argument: $!";
my %store;
while(my $line = <$fh>)
{
chomp($line);
my #data = split /,/, $line;
my #removeduplicate = (grep { !$store{$_}++ } #data)[2];
foreach(#removeduplicate){
if( $_ ne ''){
print "$_\n";
}
}
}
close $fh;
Output:
kevin
john
jane
hmmm. it depends what do you want. maybe this example will help you:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper; #for debug if you want
my $infile='abc.txt'; #or ARGV[0] whatever it is
my $fh;
open $fh,'<',$infile or die "problem with $infile $# $!";
my $inputline;
my %Storage;
my #Values;
while (defined($inputline=<$fh>)) {
chomp $inputline;
#Values=split ',',$inputline;
if (#Values != 3) {
warn "$inputline has formatted badly";
next;
}
#warn if exists $Storage{$Values[1]}; #optional warning for detected duplicates
$Storage{$Values[1]}=#Values[0,2]; #create hash data
#duplicates will be removed automaticly
}
close $fh;
print Dumper \%Storage; #print how perl it stores
foreach my $Key (keys %Storage) { #example loop
print #{Storage->{$Key}},"\n"; #do anything
}
I hope this template will be enough for you.

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";
Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.
print "$1" if $line =~ /(\S+)_NN/;
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";