How can I search and replace a match a specific number of times in a string in Perl? - perl

How can I search and replace a match with specific number of times using s///;. For example:
$string="abcabdaaa";
I want to replace a with i in $string n times. How can I do that? n is an integer provided by user.

The simple answer probably doesn't do want you want.
my $str = 'aaaa';
$str =~ s/a/a_/ for 1..2;
print $str, "\n"; # a__aaa. But you want a_a_aa, right?
You need to count the replacements yourself, and act accordingly:
$str = 'aaaa';
my $n = 0;
$str =~ s/(a)/ ++$n > 2 ? $1 : 'a_' /ge;
print $str, "\n";
See the FAQ, How do I change the Nth occurrence of something? for related examples.

Just substitute $n times:
$string =~ s/a/i/ for 1..$n;
This will do it.
More general solution would be global substitution with counter:
my $i = 0; # count the substitutions made
$string =~ s/(a)/ ++$i > $n ? $1 : "i" /ge;

I'm not aware of any flag that would do that. I'd simply use a loop:
for (my $i = 0; $i < $n; $i++)
{
$string =~ s/a/i/;
}

you can try this:
$str1=join('i',split(/a/,$str,$n));

Here is a way to do based on the comment you made to eugene y's answer
#!/usr/bin/perl
use strict; use warnings;
my $string = '***ab***c';
my $n = 3;
1 while $n -- and $string =~ s/\*([^\n])/*\n$1/;
print "$string\n";
Output:
*
*
*
ab***c

Using
sub substitute_n {
my $n = shift;
my $pattern = shift;
my $replace = shift;
local $_ = shift;
my $i = 1;
s{($pattern)} {
$i++ <= $n ? eval qq{"$replace"} : $1;
}ge;
$_;
}
You can then write
my $s = "***ab***c";
print "[", substitute_n(2, qr/\*/, '$1\n', $s), "]\n";
to get the following output:
[*
*
*ab***c]

Related

Variable Scope outside foreach loop Perl

Here is the problem:
Generating 10 iterations of 50 iterations and accessing the 50 character string outside the inner foreach loop.
I have tried putting the 50x iteration inside a sub function and calling it, but that was unsuccessful.
Thus far, I only get a single character outside the foreach loop whether it's in a sub function or not. I'm fairly certain this is a scope issue that I'm failing to see.
So, code:
#!/usr/bin/perl
use strict;
use warnings;
my #dna = ('A','G','T','C');
my $i;
my $str;
for ($i=1; $i<11; $i++){
#print $i . " ";
foreach(1..50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
#correct here all 50 nts
print $str;
}
#single nt here
#print $str;
print "\n";
}
Output: Corerct, but I need to access $str as is below but outside the foreach loop and within the first for loop.
TGATTAGCGTCCGCGCGTATTGTATTAAGCCACAGAATGTAATGCCAAGA
GCTATAGGAAGACGCCGATCCCTGGACCGGCACAGGCACGGTAACAGCAG
TTGTTGTAGGATCCCAGGGAGCGAAGCACGTGAACTGCGACTAATTTCAA
TAACCAGGCAACACTAAACAGCTCCCATGTGTAAGGACGTATAGGCAGTT
GTAATTGTAGATCACAAAATTTACACGGTATAGCATTAACTGGAACCTGC
AACAGTGCCGTTTATTAATCTCCTCTAGTGTAGGGACGAATCGACCACGG
CGTGAGCAAGCACAAATATCCTTTAGGGGTGTGCTTAAAACACCCAGTAG
GAGTTCATAGGCCAACAATATGGCAAAGCCTTGCCCCATCAAATTCGGCG
TTGCGTCTGCGAACACTGTTGGTGTGCCTTTAGTGCGGGTTACTCGAGAA
CGCGATCTCCGTTTATAACGCTAGCAAACTACTACGGACCGAGGCATCGC
I removed the extra space in the string. It was superfluous.
This was another attempt at getting to the variable to no avail:
use strict;
use warnings;
my $str;
my #dna = ('A','G','T','C');
for (my $i=1; $i<11; $i++){
fifty();
print $str;
}
sub fifty {
foreach (1 .. 50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
return $str;
}
}
for (my $i=1; $i<11; $i++){
fifty();
Infiftyyou return something but you discard ist, as you do no assignement like $str= fifty();
print $str;
}
And here you print something that has no value yet as it seems - in fact you assign a value in fifty- but you shouldn't use global variables.
sub fifty {
foreach (1 .. 50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
Here you discard whatever is in $str and assign one letter instead. Also you assign to a global variable - which you should avoid.
return $str;
}
}
And here you directly leave fifty and return just the one character - which you (see above) discard.
I found this to work perfectly: Turns out to be scope as far as I could tell and not sure why I was stuck. Regardless, moving on now.
#!/usr/bin/perl
use strict;
use warnings;
my #dna = ('A','G','T','C');
my $i;
my $str;
for ($i=1; $i<11; $i++){
my $filename = "seq_" . $i;
open(my $OUT, '>', $filename) or die("Can't open $filename($!)");
foreach(1..50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
print $OUT $str;
}
close $filename;
}

Is there a better way to count occurrence of char in a string?

I felt there must a better way to count occurrence instead of writing a sub in perl, shell in Linux.
#/usr/bin/perl -w
use strict;
return 1 unless $0 eq __FILE__;
main() if $0 eq __FILE__;
sub main{
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = count_occurrence($str, $char);
print "count<$count> of <$char> in <$str>\n";
}
sub count_occurrence{
my ($str, $char) = #_;
my $len = length($str);
$str =~ s/$char//g;
my $len_new = length($str);
my $count = $len - $len_new;
return $count;
}
If the character is constant, the following is best:
my $count = $str =~ tr/y//;
If the character is variable, I'd use the following:
my $count = length( $str =~ s/[^\Q$char\E]//rg );
I'd only use the following if I wanted compatibility with versions of Perl older than 5.14 (as it is slower and uses more memory):
my $count = () = $str =~ /\Q$char/g;
The following uses no memory, but might be a bit slow:
my $count = 0;
++$count while $str =~ /\Q$char/g;
Counting the occurences of a character in a string can be performed with one line in Perl (as compared to your 4 lines). There is no need for a sub (although there is nothing wrong with encapsulating functionality in a sub). From perlfaq4 "How can I count the number of occurrences of a substring within a string?"
use warnings;
use strict;
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = () = $str =~ /\Q$char/g;
print "count<$count> of <$char> in <$str>\n";
In a beautiful* Bash/Coreutils/Grep one-liner:
$ str=ru8xysyyyyyyysss6s5s
$ char=y
$ fold -w 1 <<< "$str" | grep -c "$char"
8
Or maybe
$ grep -o "$char" <<< "$str" | wc -l
8
The first one works only if the substring is just one character long; the second one works only if the substrings are non-overlapping.
* Not really.
toolic has given a correct answer, but you might consider not hardcoding your values to make the program reusable.
use strict;
use warnings;
die "Usage: $0 <text> <characters>" if #ARGV < 1;
my $search = shift; # the string you are looking for
my $str; # the input string
if (#ARGV && -e $ARGV[0] || !#ARGV) { # if str is file, or there is no str
local $/; # slurp input
$str = <>; # use diamond operator
} else { # else just use the string
$str = shift;
}
my $count = () = $str =~ /\Q$search\E/gms;
print "Found $count of '$search' in '$str'\n";
This will allow you to use the program to count for the occurrence of a character, or a string, inside a string, a file, or standard input. For example:
count.pl needles haystack.txt
some_process | count.pl foo
count.pl x xyzzy

perl replace characters in a string but retain special character or space

I would like to create a program that replaces characters and retains the special characters. An example input and output is shown below.
Here's what I did so far:
$sentence = userinput;
#words = split(/ /, $sentence);
for ($i = 0; $i < #words.length; $i ++){
$words[$i] =~ s/\W//g;
#characters = split(//, $words[$i]);
#print $words[$i] . "\n";
$wordlength = length($words[$i]);
for ($j = 0; $j < #characters.length; $j ++){
$char = $characters[$j];
for ($x = 0; $x < $wordlength; $x++){
$char++;
if ($char eq "aa"){
$char = "a";
}
elsif ($char eq "AA"){
$char = "A";
}
}
print $char;
if ($x = 0){
$output[$i] = $char;
}
else {
$output[$i] = join ($char);
}
}
print $output[$i];
}
Input:
Hi! how are you doing?
Output:
Jk! krz duh brx itnsl?
A couple of things in your code don't make sense:
Missing use strict; use warnings;.
All variables are global (you should be using my to create variables)
#foo.length is not the number of elements in the array #foo. It's the number of elements in the array #foo concatenated with the number of characters in $_ (because arrays in scalar context return their length, . concatenates strings, and length works on $_ by default).
join ($char) always returns the empty string: You're joining an empty list (no elements) using $char as a separator.
Here's an attempt to fix all of these issues:
use strict;
use warnings;
my $sentence = readline;
$sentence =~ s{([A-Za-z]+)}{
my $word = $1;
join '', map {
my $base = ord(/^[A-Z]/ ? 'A' : 'a');
chr((ord($_) - $base + length($word)) % 26 + $base)
} split //, $word
}eg;
print $sentence;
I think what you are doing is rot3 encoding, but if so then your example is wrong
my $sentence = 'Hi! how are you doing?';
$sentence =~ tr/A-Za-z/D-ZA-Cd-za-c/;
print $sentence, "\n";
output
Kl! krz duh brx grlqj?
which is similar, but not identical to
Jk! krz duh brx itnsl?

How do I speed up pattern recognition in perl

This is the program as it stands right now, it takes in a .fasta file (a file containing genetic code), creates a hash table with the data and prints it, however, it is quite slow. It splits a string an compares it against all other letters in the file.
use strict;
use warnings;
use Data::Dumper;
my $total = $#ARGV + 1;
my $row;
my $compare;
my %hash;
my $unique = 0;
open( my $f1, '<:encoding(UTF-8)', $ARGV[0] ) or die "Could not open file '$ARGV[0]' $!\n";
my $discard = <$f1>;
while ( $row = <$f1> ) {
chomp $row;
$compare .= $row;
}
my $size = length($compare);
close $f1;
for ( my $i = 0; $i < $size - 6; $i++ ) {
my $vs = ( substr( $compare, $i, 5 ) );
for ( my $j = 0; $j < $size - 6; $j++ ) {
foreach my $value ( substr( $compare, $j, 5 ) ) {
if ( $value eq $vs ) {
if ( exists $hash{$value} ) {
$hash{$value} += 1;
} else {
$hash{$value} = 1;
}
}
}
}
}
foreach my $val ( values %hash ) {
if ( $val == 1 ) {
$unique++;
}
}
my $OUTFILE;
open $OUTFILE, ">output.txt" or die "Error opening output.txt: $!\n";
print {$OUTFILE} "Number of unique keys: " . $unique . "\n";
print {$OUTFILE} Dumper( \%hash );
close $OUTFILE;
Thanks in advance for any help!
It is not clear from the description what is wanted from this script, but if you're looking for matching sets of 5 characters, you don't actually need to do any string matching: you can just run through the whole sequence and keep a tally of how many times each 5-letter sequence occurs.
use strict;
use warnings;
use Data::Dumper;
my $str; # store the sequence here
my %hash;
# slurp in the whole file
open(IN, '<:encoding(UTF-8)', $ARGV[0]) or die "Could not open file '$ARGV[0]' $!\n";
while (<IN>) {
chomp;
$str .= $_;
}
close(IN);
# not sure if you were deliberately omitting the last two letters of sequence
# this looks at all the sequence
my $l_size = length($str) - 4;
for (my $i = 0; $i < $l_size; $i++) {
$hash{ substr($str, $i, 5) }++;
}
# grep in a scalar context will count the values.
my $unique = grep { $_ == 1 } values %hash;
open OUT, ">output.txt" or die "Error opening output.txt: $!\n";
print OUT "Number of unique keys: ". $unique."\n";
print OUT Dumper(\%hash);
close OUT;
It might help to remove searching for information that you already have.
I don't see that $j depends upon $i so you're actually matching values to themselves.
So you're getting bad counts as well. It works for 1, because 1 is the square of 1.
But if for each five-character string you're counting strings that match, you're going
to get the square of the actual number.
You would actually get better results if you did it this way:
# compute it once.
my $lim = length( $compare ) - 6;
for ( my $i = 0; $i < $lim; $i++ ){
my $vs = substr( $compare, $i, 5 );
# count each unique identity *once*
# if it's in the table, we've already counted it.
next if $hash{ $vs };
$hash{ $vs }++; # we've found it, record it.
for ( my $j = $i + 1; $j < $lim; $j++ ) {
my $value = substr( $compare, $j, 5 );
$hash{ $value }++ if $value eq $vs;
}
}
However, it could be an improvement on this to do an index for your second loop
and let the c-level of perl do your matching for you.
my $pos = $i;
while ( $pos > -1 ) {
$pos = index( $compare, $vs, ++$pos );
$hash{ $vs }++ if $pos > -1;
}
Also, if you used index, and wanted to omit the last two characters--as you do, it might make sense to remove those from the characters you have to search:
substr( $compare, -2 ) = ''
But you could do all of this in one pass, as you loop through file. I believe the code
below is almost an equivalent.
my $last_4 = '';
my $last_row = '';
my $discard = <$f1>;
# each row in the file after the first...
while ( $row = <$f1> ) {
chomp $row;
$last_row = $row;
$row = $last_4 . $row;
my $lim = length( $row ) - 5;
for ( my $i = 0; $i < $lim; $i++ ) {
$hash{ substr( $row, $i, 5 ) }++;
}
# four is the maximum we can copy over to the new row and not
# double count a strand of characters at the end.
$last_4 = substr( $row, -4 );
}
# I'm not sure what you're getting by omitting the last two characters of
# the last row, but this would replicate it
foreach my $bad_key ( map { substr( $last_row, $_ ) } ( -5, -6 )) {
--$hash{ $bad_key };
delete $hash{ $bad_key } if $hash{ $bad_key } < 1;
}
# grep in a scalar context will count the values.
$unique = grep { $_ == 1 } values %hash;
You may be interested in this more concise version of your code that uses a global regex match to find all the subsequences of five characters. It also reads the entire input file in one go, and removes the newlines afterwards.
The path to the input file is expected as a parameter on the command line, and the output is sent to STDIN, and can be redirected to a file on the command line, like this
perl subseq5.pl input.txt > output.txt
I've also used Data::Dump instead of Data::Dumper because I believe it to be vastly superior. However it is not a core module, and so you will probably need to install it.
use strict;
use warnings;
use open qw/ :std :encoding(utf-8) /;
use Data::Dump;
my $str = do { local $/; <>; };
$str =~ tr|$/||d;
my %dups;
++$dups{$1} while $str =~ /(?=(.{5}))/g;
my $unique = grep $_ == 1, values %dups;
print "Number of unique keys: $unique\n";
dd \%dups;

Perl - How to change every $variable occurrence of ";" in a string

Very new here so be gentle. :)
Here is the jist of what I want to do:
I want to take a string that is made up of numbers separated by semi-colons (ex. 6;7;8;9;1;17;4;5;90) and replace every "X" number of semicolons with a "\n" instead. The "X" number will be defined by the user.
So if:
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
The output should be:
6;7;8\n9;1;17\n4;5;90
I've found lots on changing the Nth occurrence of something but I haven't been able to find anything on changing every Nth occurrence of something like I am trying to describe above.
Thanks for all your help!
use List::MoreUtils qw(natatime);
my $input_string = "6;7;8;9;1;17;4;5;90";
my $it = natatime 3, split(";", $input_string);
my $output_string;
while (my #vals = $it->()) {
$output_string .= join(";", #vals)."\n";
}
Here is a quick and dirty answer.
my $input_string = "6;7;8;9;1;17;4;5;90";
my $count = 0;
$input_string =~ s/;/++$count % 3 ? ";" : "\n"/eg;
Don't have time for a full answer now, but this should get you started.
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
my $regexp = '(' . ('\d+;' x ($Nth_number_of_semicolons_to_replace - 1)) . '\d+);';
$string =~ s{ $regexp ) ; }{$1\n}xsmg
sub split_x{
my($str,$num,$sep) = #_;
return unless defined $str;
$num ||= 1;
$sep = ';' unless defined $sep;
my #return;
my #tmp = split $sep, $str;
while( #tmp >= $num ){
push #return, join $sep, splice #tmp, 0, $num;
}
push #return, join $sep, #tmp if #tmp;
return #return;
}
print "$_\n" for split_x '6;7;8;9;1;17;4;5;90', 3
print join( ',', split_x( '6;7;8;9;1;17;4;5;90', 3 ) ), "\n";
my $string = "6;7;8;9;1;17;4;5;90";
my $Nth_number_of_semicolons_to_replace = 3;
my $num = $Nth_number_of_semicolons_to_replace - 1;
$string =~ s{ ( (?:[^;]+;){$num} [^;]+ ) ; }{$1\n}gx;
print $string;
prints:
6;7;8
9;1;17
4;5;90
The regex explained:
s{
( # start of capture group 1
(?:[^;]+;){$num} # any number of non ';' characters followed by a ';'
# repeated $num times
[^;]+ # any non ';' characters
) # end of capture group
; # the ';' to replace
}{$1\n}gx; # replace with capture group 1 followed by a new line
If you've got 5.10 or higher, this could do the trick:
#!/usr/bin/perl
use strict;
use warnings;
my $string = '1;2;3;4;5;6;7;8;9;0';
my $n = 3;
my $search = ';.*?' x ($n -1);
print "string before: [$string]\n";
$string =~ s/$search\K;/\n/g;
print "print string after: [$string]\n";
HTH,
Paul