How to separate an array in Perl based on pattern - perl

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.

my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}

split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;

Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Related

Perl: Grep unique value

Basically I wanted to emulate the piped grep operation as we do in shell script, (grep pattern1 |grep pattern2) in my Perl code to make the result unique.
Below code is working, bust just wanted to know this is the right approach. Please note, I don't want to introduce a inner loop here, just for the grep part.
foreach my $LINE ( #ARRAY1 ) {
#LINES = split /\s+/, $LINE;
#RESULT= grep ( /$LINES[0]/, ( grep /$LINES[1]/, #ARRAY2 ) );
...
This is basically same thing what you're doing, "for every #ARRAY2 element, check whether it matches ALL elements from #LINES" (stop as soon as any of the #LINES element does not match),
use List::Util "none";
my #RESULT= grep { my $s = $_; none { $s !~ /$_/ } #LINES } #ARRAY2;
# index() is faster for literal values
my #RESULT= grep { my $s = $_; none { index($s, $_) <0 } #LINES } #ARRAY2;
There is no need to cascade calls to grep -- you can simply and the conditions together
It's also worth saying that you should be using lower-case letters for your identifiers, and split /\s+/ should almost always be split ' '
Here's what I would write
for my $line ( #array1 ) {
my #fields = split ' ', $line;
my #result = grep { /$fields[0]/ and /$fields[1] } #array2;
...
}
There are different ways to grep/extract unique values from array in perl.
##2) Best of all
my %hash = map { $_ , 1 } #array;
my #uniq = keys %hash;
print "\n Uniq Array:", Dumper(\#uniq);
##3) Costly process as it involves 'greping'
my %saw;
my #out = grep(!$saw{$_}++, #array);
print "\n Uniq Array: #out \n";

Split and add digits

If I open a file with strings like "233445", how can I then split that string into digits "2 3 3 4 4 5" and add each one to each other "2 + 3 + 3 etc..." and print out the result.
My code so far looks like this:
use strict;
#open (FILE, '<', shift);
#my #strings = <FILE>;
#strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
my #done = split(undef, $numbers);
print "#done\n";
}
But I don't know where to start for the actual add function.
use strict;
use warnings;
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum;
$sum += $_ for split(//, $string);
print "$sum\n";
}
or
use strict;
use warnings;
use List::Util qw( sum );
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum = sum split(//, $string);
print "$sum\n";
}
PS — Always use use strict; use warnings;. It would have detected your misuse of commas in qw, and it would have dected your misuse of undef for split's first argument.
use strict;
my #done;
#open (FILE, '<', shift);
#my #strings = <FILE>;
my #strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
#done = split(undef, $numbers);
print "#done\n";
}
my $tot;
map { $tot += $_} #done;
print $tot, "\n";
No one suggested an eval solution?
my #strings = qw( 12243434 345 676744 );
foreach my $string (#strings) {
my $sum = eval join '+',split //, $string;
print "$sum\n";
}
If your numbers are in a file, a one-liner might be nice:
perl -lnwe 'my $sum; s/(\d)/$sum += $1/eg; print $sum' numbers.txt
Since addition only uses numbers, it is safe to ignore all other characters. So just extract them one at the time with the regex and sum them up.
TIMTOWTDI:
perl -MList::Util=sum -lnwe 'print sum(/\d/g);' numbers.txt
perl -lnwe 'my $a; $a+=$_ for /\d/g; print $a' numbers.txt
Options:
-l auto-chomp input and add newline to print
-n implicit while(<>) loop around program -- open the file name given as argument and read each line into $_.

Perl - How to change every $variable occurrence of ";" in a string

Very new here so be gentle. :)
Here is the jist of what I want to do:
I want to take a string that is made up of numbers separated by semi-colons (ex. 6;7;8;9;1;17;4;5;90) and replace every "X" number of semicolons with a "\n" instead. The "X" number will be defined by the user.
So if:
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
The output should be:
6;7;8\n9;1;17\n4;5;90
I've found lots on changing the Nth occurrence of something but I haven't been able to find anything on changing every Nth occurrence of something like I am trying to describe above.
Thanks for all your help!
use List::MoreUtils qw(natatime);
my $input_string = "6;7;8;9;1;17;4;5;90";
my $it = natatime 3, split(";", $input_string);
my $output_string;
while (my #vals = $it->()) {
$output_string .= join(";", #vals)."\n";
}
Here is a quick and dirty answer.
my $input_string = "6;7;8;9;1;17;4;5;90";
my $count = 0;
$input_string =~ s/;/++$count % 3 ? ";" : "\n"/eg;
Don't have time for a full answer now, but this should get you started.
$string = "6;7;8;9;1;17;4;5;90";
$Nth_number_of_semicolons_to_replace = 3;
my $regexp = '(' . ('\d+;' x ($Nth_number_of_semicolons_to_replace - 1)) . '\d+);';
$string =~ s{ $regexp ) ; }{$1\n}xsmg
sub split_x{
my($str,$num,$sep) = #_;
return unless defined $str;
$num ||= 1;
$sep = ';' unless defined $sep;
my #return;
my #tmp = split $sep, $str;
while( #tmp >= $num ){
push #return, join $sep, splice #tmp, 0, $num;
}
push #return, join $sep, #tmp if #tmp;
return #return;
}
print "$_\n" for split_x '6;7;8;9;1;17;4;5;90', 3
print join( ',', split_x( '6;7;8;9;1;17;4;5;90', 3 ) ), "\n";
my $string = "6;7;8;9;1;17;4;5;90";
my $Nth_number_of_semicolons_to_replace = 3;
my $num = $Nth_number_of_semicolons_to_replace - 1;
$string =~ s{ ( (?:[^;]+;){$num} [^;]+ ) ; }{$1\n}gx;
print $string;
prints:
6;7;8
9;1;17
4;5;90
The regex explained:
s{
( # start of capture group 1
(?:[^;]+;){$num} # any number of non ';' characters followed by a ';'
# repeated $num times
[^;]+ # any non ';' characters
) # end of capture group
; # the ';' to replace
}{$1\n}gx; # replace with capture group 1 followed by a new line
If you've got 5.10 or higher, this could do the trick:
#!/usr/bin/perl
use strict;
use warnings;
my $string = '1;2;3;4;5;6;7;8;9;0';
my $n = 3;
my $search = ';.*?' x ($n -1);
print "string before: [$string]\n";
$string =~ s/$search\K;/\n/g;
print "print string after: [$string]\n";
HTH,
Paul

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.
See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt
#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;
#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}
I use 'for' for "C" style loops, and 'foreach' for iterating over lists.
#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}

How can I extract the values after = in my string with Perl?

I have a string like this
field1=1 field2=2 field3=abc
I want to ouput this as
2,1,abc
Any ideas as to how I can go about this? I can write a small C or Java program to do this, trying I'm trying to find out a simple way to do it in Perl.
use strict;
use warnings;
my $string = 'field1=1 field2=2 field3=abc';
my #values = ($string =~ m/=(\S+)/g);
print join(',', #values), "\n";
#!/usr/bin/perl
use strict;
use warnings;
# Input string
my $string = "field1=1 field2=2 field3=abc";
# Split string into a list of "key=value" strings
my #pairs = split(/\s+/,$string);
# Convert pair strings into hash
my %hash = map { split(/=/, $_, 2) } #pairs;
# Output hash
printf "%s,%s,%s\n", $hash{field2}, $hash{field1}, $hash{field3}; # => 2,1,abc
# Output hash, alternate method
print join(",", #hash{qw(field2 field1 field3)}), "\n";
Use m//g in list context:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "field1=1 field2=2 field3=abc";
if ( my #matches = $x =~ /(?:field[1-3]=(\S+))/g ) {
print join(',', #matches), "\n";
}
__END__
Output:
C:\Temp> klm
1,2,abc
$_='field1=1 field2=2 field3=abc';
$,=',';
say /=(\S+)/g
Let's play Perl golf :D
my $str = 'field1=1 field2=2 field3=abc';
print(join(',', map { (split('=', $_))[1] } split(' ', $str)));
There's several ways you can do that:
Regex match
my $s = "field1=1 field2=2 field3=abc";
$s =~ /field1=(\w*) field2=(\w*) field3=(\w*)$/; //pick out each field
print $1,$2,$3;'
12abc
Split the string on match
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s; print #arr,"\n"; //make an array of name=value pairs
my #vals = map { #pairs = split /=/, $_; $pairs[1] } #arr; //get the values only from each pair
print #vals'
field1=1field2=2field3=abc
12abc
Split and put in a hash (I think that's the most useful one)
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s;
my %pairs = map { split=/, $_; } #arr;
print $pairs{field1}, $pairs{field2}, $pairs{field3}
12abc
Assuming your ordering was a typo:
#!/usr/bin/perl
use strict; use warnings;
my $str='a=1 b=2 c=abc';
my #v;
while ($str =~ /=(\S+)/g) {
push #v, $1;
}
print join (',', #v);
Perl is definitely the right tool for this.
#! /usr/bin/perl
$str = "field1=1 field2=2 field3=abc";
$str =~ /field1=(\S+)\ field2=(\S+)\ field3=(\S+)/;
print "$1,$2,$3", "\n";
my $a = "field1=1 field2=2 field3=abc";
my #f = split /\s*\w+=/, $a;
shift(#f);
print join(",", #f), "\n";
$string="field1=1 field2=2 field3=abc";
#s=split /\s+/,$string;
$temp=$s[1];$s[1]=$s[0];$s[0]=$temp;
foreach (#s){s/.*=//; push(#a,$_ );}
print join(",",#a);
If you actually need both the keys and the values. I would put them into a hash. You could just capture both sides of the "=", and put directly into the hash.
use strict;
use warnings;
my $str = 'field1=1 field2=2 field3=abc';
my %fields = $str =~ / (\S+) \s* = \s* (\S+) /xg;
use YAML;
print Dump \%fields
---
field1: 1
field2: 2
field3: abc
For further information please read perldoc perlre.
If you are just a beginner, you may want to read perldoc perlretut.