Map with Split & Trim in Perl

Map with Split & Trim in Perl - perl

How do I use map with the split function to trim the constituents: $a, $b, $c and $d; of $line?
my ($a, $b, $c, $d, $e) = split(/\t/, $line);
# Perl trim function to remove whitespace from the start and end of the string
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

Don't use prototypes the ($) on your function unless you need them.
my ( $a, $b, $c, $d, $e ) =
map {s/^\s+|\s+$//g; $_} ## Notice the `, $_` this is common
, split(/\t/, $line, 5)
;
Don't forget in the above s/// returns the replacement count -- not $_. So, we do that explicitly.
or more simply:
my #values = map {s/^\s+|\s+$//g; $_}, split(/\t/, $line, 5), $line

map takes two inputs:
an expression or block: this would be the trim expression (you don't have to write your own -- it's on CPAN)
and a list to operate on: this should be split's output:
use String::Util 'trim';
my #values = map { trim($_) } split /\t/, $line;

This should work:
my ($a, $b, $c, $d, $e) = map {trim ($_)} (split(/\t/, $line));
By the way, it's a minor point, but you should not use $a and $b as variable names.

You can also use "foreach" here.
foreach my $i ($a, $b, $c, $d, $e) {
$i=trim($i);
}

Just for variety:
my #trimmed = grep { s/^\s*|\s*$//g } split /\t/, $line;
grep acts as a filter on lists. This is why the \s+s need to be changed to \s*s inside the regex. Forcing matches on 0 or more spaces prevents grep from filtering out items in the list that have no leading or trailing spaces.

When I trim a string, I don't often want to keep the original. It would be nice to have the abstraction of a sub but also not have to fuss with temporary values.
It turns out that we can do just this, as perlsub explains:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable).
In your case, trim becomes
sub trim {
for (#_) {
s/^ \s+ //x;
s/ \s+ $//x;
}
wantarray ? #_ : $_[0];
}
Remember that map and for are cousins, so with the loop in trim, you no longer need map. For example
my $line = "1\t 2\t3 \t 4 \t 5 \n";
my ($a, $b, $c, $d, $e) = split(/\t/, $line);
print "BEFORE: [", join("] [" => $a, $b, $c, $d), "]\n";
trim $a, $b, $c, $d;
print "AFTER: [", join("] [" => $a, $b, $c, $d), "]\n";
Output:
BEFORE: [1] [ 2] [3 ] [ 4 ]
AFTER: [1] [2] [3] [4]

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.

my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}

split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;

Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Perl unexpected result

Imagine I have this Perl script
my $name = " foo ";
my $sn = " foosu";
trim($name, \$sn);
print "name: [$name]\n";
print "sn: [$sn]\n";
exit 0;
sub trim{
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+// ;
${$ref_input} =~ s/\s+$// ;
};
foreach my $input (#_){
if (ref($input) eq "SCALAR"){
$fref_trim->($input);
} else {
$fref_trim->(\$input);
}
}
}
Result:
name: [foo]
sn: [foosu]
I would expect $name to be "[ foo ]" when printing the value after calling trim, but the sub is setting $name as I would want it. Why is this working, when it really shouldn't?
I'm not passing $name by reference and the trim sub is not returning anything. I'd expect the trim sub to create a copy of the $name value, process the copy, but then the original $name would still have the leading and trailing white spaces when printed in the main code.
I assume it is because of the alias with #_, but shouldn't the foreach my $input (#_) force the sub to copy the value and only treat the value not the alias?
I know I can simplify this sub and I used it only as an example.

Elements of #_ are aliases to the original variables. What you are observing is the difference between:
sub ltrim {
$_[0] =~ s/^\s+//;
return $_[0];
}
and
sub ltrim {
my ($s) = #_;
$s =~ s/^\s+//;
return $s;
}
Compare your code to:
#!/usr/bin/env perl
my $name = " foo ";
my $sn = " foosu";
trim($name, \$sn);
print "name: [$name]\n";
print "sn: [$sn]\n";
sub trim {
my #args = #_;
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+//;
${$ref_input} =~ s/\s+\z//;
};
for my $input (#args) {
if (ref($input) eq "SCALAR") {
$fref_trim->($input);
}
else {
$fref_trim->(\$input);
}
}
}
Output:
$ ./zz.pl
name: [ foo ]
sn: [foosu]
Note also that the loop variable in for my $input ( #array ) does not create a new copy for each element of the array. See perldoc perlsyn:
The foreach loop iterates over a normal list value and sets the scalar variable VAR to be each element of the list in turn. ...
...
the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
In your case, this would mean that, at each iteration $input is an alias to the corresponding element of #_ which itself is an alias to the variable that was passed in as an argument to the subroutine.
Making a copy of #_ thus prevents the variables in the calling context from being modified. Of course, you could do something like:
sub trim {
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+//;
${$ref_input} =~ s/\s+\z//;
};
for my $input (#_) {
my $input_copy = $input;
if (ref($input_copy) eq "SCALAR") {
$fref_trim->($input_copy);
}
else {
$fref_trim->(\$input_copy);
}
}
}
but I find making a wholesale copy of #_ once to be clearer and more efficient assuming you do not want to be selective.

I assume it is because of the alias with #_, but shouldn't the foreach my $input (#_) force the sub to copy the value and only treat the value not the alias?
You're right that #_ contains aliases. The part that's missing is that foreach also aliases the loop variable to the current list element. Quoting perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
So ultimately $input is an alias for $_[0], which is an alias for $name, which is why you see the changes appearing in $name.

Perl: Grep unique value

Basically I wanted to emulate the piped grep operation as we do in shell script, (grep pattern1 |grep pattern2) in my Perl code to make the result unique.
Below code is working, bust just wanted to know this is the right approach. Please note, I don't want to introduce a inner loop here, just for the grep part.
foreach my $LINE ( #ARRAY1 ) {
#LINES = split /\s+/, $LINE;
#RESULT= grep ( /$LINES[0]/, ( grep /$LINES[1]/, #ARRAY2 ) );
...

This is basically same thing what you're doing, "for every #ARRAY2 element, check whether it matches ALL elements from #LINES" (stop as soon as any of the #LINES element does not match),
use List::Util "none";
my #RESULT= grep { my $s = $_; none { $s !~ /$_/ } #LINES } #ARRAY2;
# index() is faster for literal values
my #RESULT= grep { my $s = $_; none { index($s, $_) <0 } #LINES } #ARRAY2;

There is no need to cascade calls to grep -- you can simply and the conditions together
It's also worth saying that you should be using lower-case letters for your identifiers, and split /\s+/ should almost always be split ' '
Here's what I would write
for my $line ( #array1 ) {
my #fields = split ' ', $line;
my #result = grep { /$fields[0]/ and /$fields[1] } #array2;
...
}

There are different ways to grep/extract unique values from array in perl.
##2) Best of all
my %hash = map { $_ , 1 } #array;
my #uniq = keys %hash;
print "\n Uniq Array:", Dumper(\#uniq);
##3) Costly process as it involves 'greping'
my %saw;
my #out = grep(!$saw{$_}++, #array);
print "\n Uniq Array: #out \n";

Comparing multiple numerical values in Perl

Say I have a few variables, $x, $y, $z, $a, $b, $c, and I want to make sure they all have the same value.
Can I test with something like if ($x == $y == $z == $a == $b == $c) to avoid multiple binary comparisons, i.e. (if $x == $y and $x == $z and $y == $z ...)?
Is there any way I can do all the comparing with one short and simple test?

if ( grep $x != $_, $y, $z, $a, $b, $c ) {
print "not all the same\n";
}

$x == $y and $x == $z and $y == $z is equivalent to $x == $y and $x == $z due to equality being transitive. This latter one is also the optimal solution, with N-1 comparisons for N variables.
If you have an array, you can use uniq from List::MoreUtils:
use List::MoreUtils qw(uniq);
my #arr1 = qw(foo foo foo foo foo foo);
my #arr2 = qw(foo BAR foo foo foo foo);
print "arr1: ", (uniq #arr1) == 1 ? "All same" : "Different" , "\n";
print "arr2: ", (uniq #arr2) == 1 ? "All same" : "Different" , "\n";
(If you have more than several variables and don't have an array, it might be worth considering to rewrite the code...)

You can use List::MoreUtils::first_index.
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw( first_index );
my ($x, $y, $z, $a, $b, $c) = (1) x 6;
if (are_all_same($x, $y, $z, $a, $b, $c)) {
print "They all have the same value\n";
}
$c = 3;
unless (are_all_same($x, $y, $z, $a, $b, $c)) {
print "At least one has a different value than the others\n";
}
sub are_all_same {
my $x = shift;
-1 == first_index { $x != $_ } #_;
}
Of course, there is the issue of whether having so many variables in a small scope is appropriate (are you suffering from Fortranitis?), and whether one should use a hash to avoid a problem like this in the first place.
You can also use are_all_same with a large array, and it will impose minimal additional space and time penalties.

If they are all the same, then in particular the first must be equal to all the remaining ones. So that suggests the use of List::Util::all:
use List::Util 'all';
if( all { $x == $_ } $y, $z, $a, $b, $c ) {
...
}

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.

See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt

#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;

#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}

I use 'for' for "C" style loops, and 'foreach' for iterating over lists.

#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}