How can I use PDL rcols in a subroutine with pass-by-reference? - perl

Specifically, I want to use rcols with the PERLCOLS option.
Here's what I want to do:
my #array;
getColumn(\#array, $file, 4); # get the fourth column from file
I can do it if I use \#array, but for backward compatibility I'd prefer not to do this. Here's how I'd do it using an array-ref-ref:
sub getColumn {
my ($arefref, $file, $colNum) = #_;
my #read = rcols $file, { PERLCOLS => [$colNum] };
$$arefref = $read[-1];
return;
}
But, I don't see how to make a subroutine that takes an array ref as an argument without saying something like #$aref = #{$read[-1]}, which, afaict, copies each element individually.
PS: reading the PDL::IO::Misc documentation, it seems like the perl array ought to be $read[0] but it's not.
PERLCOLS
- an array of column numbers which are to be read into perl arrays
rather than piddles. Any columns not specified in the explicit list
of columns to read will be returned after the explicit columns.
(default B).
I am using PDL v2.4.4_05 with Perl v5.10.0 built for x86_64-linux-thread-multi

I don't understand why this wouldn't work:
my $arr_ref;
getColumn( $arr_ref, $file, 4 );
sub getColumn {
my ( $arr_ref, $file, $colNum ) = #_;
my #read = rcols, $file, { PERLCOLS => [ $colNum ] };
# At this point, #read is a list of PDLs and array references.
$arr_ref = $read[-1];
}
Looking at the rcols() documentation, it looks like if you add the PERLCOLS option it returns whatever column you request as an array reference, so you should be able to just assign it to the array reference you passed in.
And as for the documentation question, what I understand from that is you haven't specified any explicit columns, therefore rcols() will return all of the columns in the file as PDLs first, and then return the columns you requested as Perl arrayrefs, which is why your arrayref is coming out in $read[-1].

I believe part of the difficulty with using rcols here is that the user is running PDL-2.4.4 while the rcols docs version was from PDL-2.4.7 which may have version skew in functionality. With the current PDL-2.4.10 release, it is easy to use rcols to read in a single column of data as a perl array which is returned via an arrayref:
pdl> # cat data
1 2 3 4
1 2 3 4
1 2 3 4
pdl> $col = rcols 'data', 2, { perlcols=>[2] }
ARRAY(0x2916e60)
pdl> #{$col}
3 3 3
Notice that in the current release, the perlcols option allows one to specify the output type of a column rather than just adding a perl-style column at the end.
Use pdldoc rcols or do help rcols in the PDL shell to see the more documentation.
A good resource is the perldl mailing list.

Related

Why doesn't this deref of a reference work as a one-liner?

Given:
my #list1 = ('a');
my #list2 = ('b');
my #list0 = ( \#list1, \#list2 );
then
my #listRef = $list0[1];
my #list = #$listRef; # works
but
my #list = #$($list0[1]); # gives an error message
I can't figure out why. What am I missing?
There is one simple de-referencing rule that covers this. Loosely put:
What follows the sigil need be the correct reference for it, or a block that evaluates to that.
A specific case from perlreftut
You can always use an array reference, in curly braces, in place of the name of an array.
In your case then, it should be
my #list = #{ $list0[1] };
(not index [2] since your #list0 has two elements)  Spaces are there only for readability.
The attempted #$($list0[2]) is a syntax error, first because the ( (following the $) isn't allowed in an identifier (variable name), what presumably follows that $.
A block {} though would be allowed after the $ and would be evaluated, and must yield a scalar reference in this case, to be dereferenced by that $ in front of it; but then the first # would be in error. That can then fixed as well but this gets messy if pushed, and wasn't meant to go that far. While the exact rules are (still) a little murky, see Identifier Parsing in perldata.
The #$listRef earlier is correct syntax in general. But it refers to a scalar variable $listRef (which must be an array reference since it's getting dereferenced into an array by the first #), and there is no such a thing in the example -- you have an array variable #listRef.
So with use strict; in effect this, too, would fail to compile.
Dereferencing an arrayref to assign a new array is expensive as it has to copy all elements (and to construct the new array variable), while it's rarely needed (unless you actually want a copy). With the array reference on hand ($ar) all that one may need is readily available
#$ar; # list of elements
$ar->[$index]; # specific element
#$ar[#indices]; # slice -- list of some elements, like #$ar[0,2..5,-1]
$ar->#[0,-1]; # slice, with new "postfix dereferencing" (stable at v5.24)
$#$ar; # last index in the anonymous array referred by $ar
See Slices in perldata and Postfix reference slicing in perlref
You need
#{ $list0[1] }
Whenever you can use the name of a variable, you can use a block that evaluates to a reference. That means the syntax for getting the elements of an array are
#NAME # If you have the name
#BLOCK # If you have a reference
That means that
my #array1 = 4..5;
my #array2 = #array1;
and
my $array1 = [ 4..5 ];
my #array2 = #{ $array1 }
are equivalent.
When the only thing in the block is a simple scalar ($NAME or $BLOCK), you can omit the curlies. That means that
#{ $array1 }
is equivalent to
#$array1
That's why #$listRef works, and it's why #{ $list0[1] } can't be simplified.
See Perl Dereferencing Syntax.
You have a lot going on there and multiple levels of inadvertent references, so let's go through it:
First, you start by making a list of two items, each of which is an array reference. You store that in an array:
my #list0 = ( \#list2, \#list2 );
Then you ask for the item with index 2, which is a single item, and store that in an array:
my #listRef = $list0[2];
However, there is no item with index 2 because Perl indexes from zero. The value in #listRef in undefined. Not only that, but you've asked for a single item and stored it in an array instead of a scalar. That's probably not what you meant.
You say this following line works, but I don't think you know that because it won't give you the value you were expecting even if you didn't get an error. Something else is happening. You haven't declared or used a variable $listRef, so Perl creates it for you and gives it the value undef. When you try to dereference it, Perl uses "autovivification" to create the reference. This is the process where Perl helpfully creates a reference structure for you if you start with undef:
my #list = #$listRef; # works
There is nothing in that array so #list should be empty.
Fix that to get the last item, which has index of 1, and fix it so you are assigning the single value (the reference) to a scalar variable:
my $listRef = $list0[1];
Data::Dumper is handy here:
use Data::Dumper;
my #list2 = qw(a b c);
my #list0 = ( \#list2, \#list2 );
my $listRef = $list0[1];
print Dumper($listRef);
You get the output:
$VAR1 = [
'a',
'b',
'c'
];
Perl has some features that can catch these sorts of variable naming mistakes and will help you track down problems. Add these to the top of your program:
use strict;
use warnings;
For the rest, you might want to check out my book Intermediate Perl which explains all this reference stuff.
And, recent Perls have a new feature called postfix dereferencing that allows you to write dereferences from left to right:
my #items = ( \#list2, \#list2 );
my #items_of_last_ref = $items[1]->#*;
my #list = #$#listRef; # works
I doubt that works. That may not throw a syntax error but it sure as hell does not do what you think it does. For once
my #list0 = ( \#list2, \#list2 );
defines an array with 2 elements and you access
my #listRef = $list0[2];
the third element. So #listRef is an array that contains one element which is undef. The following code doesn't make sense either.
Unless the question is purely academic (answered by zdim already), I assume you want the second element of #list into a separate array, I would write
my #list = #{ $list0[1] };
The question is not complete and not clear on desired outcome.
OP tries to access an element $list0[2] of array #list0 which does not exist -- array has elements with indexes 0 and 1.
Perhaps #listRef should be $listRef instead in the post.
Bellow is my vision of described problem
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #list1 = qw/word1 word2 word3 word4/;
my #list2 = 1000..1004;
my #list0 = (\#list1, \#list2);
my $ref_array = $list0[0];
map{ say } #{$ref_array};
$ref_array = $list0[1];
map{ say } #{$ref_array};
say "Element: " . #{$ref_array}[2];
output
word1
word2
word3
word4
1000
1001
1002
1003
1004
Element: 1002

How to check rows returned by fetchall_arrayref method in Perl?

I am trying to understand following piece of code, in particular what's happening in line 4, 5 and 6.
I have understood most of it , but just can't seem to understand what's being done with #$r != 1; in line 4 (does #$r represents the number of rows returned?) and similarly what's happening with #$r[0] in line 5 and #$rr[0] in line 6:
1 my $sth = $dbh->prepare(" a select statement ");
2 $sth->execute();
3 my $r = $sth->fetchall_arrayref();
4 die "Failed to select " if !defined($r) || #$r != 1;
5 my $rr = #$r[0];
6 my $rec = #$rr[0];
7 print "Rec is $rec\n";
The fetchall_arrayref() returns a reference to an array of all rows, in which each element is also a reference to an array, with that row's elements.
Then the die line checks
that the top-level reference $r is defined (that the call worked), and
that the size of that array, #$r, is exactly 1 – so, that the array contains exactly one element. This betrays the expectation that the query will return one row and since the code is prepared to die for this it may well ask for one row, by fetchrow_arrayref or fetchrow_array.
The #$r dereferences the $r arrayref, and in the scalar context imposed by != we indeed get the number of elements in the list.
The line 5 is very misleading, to say the least, even as the syntax happens to be legitimate: it extracts the first element and feeds it into a scalar, but using #$r[0] syntax which would normally imply that we are getting a list, by its sigil #. It's equivalent to #{$r}[0] and is an abuse of notation.
It should either clearly get the first element, if that's the intent
my $rr = $r->[0];
or also dereference it to get the whole array
my #row = #{ $r->[0] };
if that's wanted.
The last line that you query does the exact same, using the retrieved $rr reference. But the first element of the first arrayref (row) is easily obtained directly
my $rec = $r->[0]->[0]; # or $r->[0][0]
what replaces lines 5 and 6.
See perlreftut and perldsc.
Evaluating an array (reference) in scalar context returns the number of elements in the array. The answer to your other questions is that it's just standard dereferencing using ugly syntax and useless intermediate variables.
I believe in teaching people how to fish, though, so consider this code, which is essentially what you're working with:
use strict;
use warnings;
use Data::Dumper;
my $r = [[1, 2, 3]];
my $rr = #$r[0];
my $rec = #$rr[0];
print Dumper($r, $rr, $rec);
Output:
$VAR1 = [
[
1,
2,
3
]
];
$VAR2 = $VAR1->[0];
$VAR3 = 1;
It should be easy to see what's going on now that you can see what each variable holds, right?

Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate]

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

Using the $# operator

I've just taken over maintenance of a piece of Perl system. The machine it used to run on is dead so I do not know which version of Perl it was using, but it was working. It included the following line to count the lines in a page of ASCII text
my $lcnt = $#{#{$page{'lines'}}};
In Perl 5.10.1 ( we are now running this on CentOS 6.3 ) the above code no longer works. I instead use the following, which works fine.
my #arr = #{$page{'lines'}};
my $lcnt = $#arr;
I'll admit my Perl isn't great but from what I can see the first version should never have worked as it is trying to deference an array rather than an array ref
First question - is my guess at why this first line of code doesn't now work correct, and secondly did it work earlier due to a now fixed bug in a prior Perl version?
Thanks!
The first version never worked. Assuming $page{'lines'} is an arrayref, this is what you want:
my $lcnt = $#{$page{'lines'}};
Note that this is going to give you one less than the number of items in your arraref. The $# operator is the INDEX of the last item, not the number of items. If you want the number of items in $page{'lines'}, you probably want this:
my $lcnt = scalar(#{$page{'lines'}});
Some things about your code. This:
my $lcnt = $#{#{$page{'lines'}}};
Was never correct. Take a look at the three things going on here
$page{'lines'} # presumably an array ref
#{ ... } # dereference into an array
$#{ ... } # get last index of an array ref
This is equivalent to (continuing on your own code):
my #arr = #{$page{'lines'}};
my $foo = #arr; # foo is now the size of the array, e.g. 3
my $lcnt = $#$foo;
If you use
use strict;
use warnings;
Which you should always do, without question (!), you will get the informative fatal error message:
Can't use string ("3") as an ARRAY ref while "strict refs" in use
(Where 3 will be the size of your array)
The correct way to get the size (number of elements) of an array is to put the array in scalar context:
my $size = #{ $page{'lines'} };
The way to get the index of the last element is using the $# sigil:
my $last_index = $#{ $page{'lines'} };
As you'll note, the syntax is the same, it is just a matter of using # or $# to get what you want, just the same as when using a regular array
my $size = #array;
my $last = $#array;
So, to refer back to the beginning: Using both # and $# is not and was never correct.

Reading numbers from a file to variables (Perl)

I've been trying to write a program to read columns of text-formatted numbers into Perl variables.
Basically, I have a file with descriptions and numbers:
ref 5.25676 0.526231 6.325135
ref 1.76234 12.62341 9.1612345
etc.
I'd like to put the numbers into variables with different names, e.g.
ref_1_x=5.25676
ref_1_y=0.526231
etc.
Here's what I've got so far:
print "Loading file ...";
open (FILE, "somefile.txt");
#text=<FILE>;
close FILE;
print "Done!\n";
my $count=0;
foreach $line (#text){
#coord[$count]=split(/ +/, $line);
}
I'm trying to compare the positions written in the file to each other, so will need another loop after this.
Sorry, you weren't terribly clear on what you're trying to do and what "ref" refers to. If I misunderstood your problem please commend and clarify.
First of all, I would strongly recommend against using variable names to structure data (e.g. using $ref_1_x to store x coordinate for the first row with label "ref").
If you want to store x, y and z coordinates, you can do so as an array of 3 elements, pretty much like you did - the only difference is that you want to store an array reference (you can't store an array as a value in another array in Perl):
my ($first_column, #data) = split(/ +/, $line); # Remove first "ref" column
#coordinates[$count++] = \#data; # Store the reference to coordinate array
Then, to access the x coordinate for row 2, you do:
$coordinates[1]->[0]; # index 1 for row 2; then sub-index 0 for x coordinate.
If you insist on storing the 3 coordinates in named data structure, because sub-index 0 for x coordinate looks less readable - which is a valid concern in general but not really an issue with 3 columns - use a hash instead of array:
my ($first_column, #data) = split(/ +/, $line); # Remove first "ref" column
#coordinates[$count++] = { x => $data[0], y => $data[1], z => $data[2] };
# curly braces - {} - to store hash reference again
Then, to access the x coordinate for row 2, you do:
$coordinates[1]->{x}; # index 1 for row 2
Now, if you ALSO want to store the rows that have a first column value "ref" in a separate "ref"-labelled data structure, you can do that by wrapping the original #coordinates array into being a value in a hash with a key of "ref".
my ($label, #data) = split(/ +/, $line); # Save first "ref" label
$coordinates{$label} ||= []; # Assign an empty array ref
#if we did not create the array for a given label yet.
push #{ $coordinates{$label} }, { x => $data[0], y => $data[1], z => $data[2] };
# Since we don't want to bother counting per individual label,
# Simply push the coordinate hash at the end of appropriate array.
# Since coordinate array is stored as an array reference,
# we must dereference for push() to work using #{ MY_ARRAY_REF } syntax
Then, to access the x coordinate for row 2 for label "ref", you do:
$label = "ref";
$coordinates{$label}->[1]->{x}; # index 1 for row 2 for $label
Also, your original example code has a couple of outdated idioms that you may want to write in a better style (use 3-argument form of open(), check for errors on IO operations like open(); use of lexical filehandles; storing entire file in a big array instead of reading line by line).
Here's a slightly modified version:
use strict;
my %coordinates;
print "Loading file ...";
open (my $file, "<", "somefile.txt") || die "Can't read file somefile.txt: $!";
while (<$file>) {
chomp;
my ($label, #data) = split(/ +/); # Splitting $_ where while puts next line
$coordinates{$label} ||= []; # Assign empty array ref if not yet assigned
push #{ $coordinates{$label} }
, { x => $data[0], y => $data[1], z => $data[2] };
}
close($file);
print "Done!\n";
It is not clear what you want to compare to what, so can't advise on that without further clarifications.
The problem is you likely need a double-array (or hash or ...). Instead of this:
#coord[$count]=split(/ +/, $line);
Use:
#coord[$count++]=[split(/ +/, $line)];
Which puts the entire results of the split into a sub array. Thus,
print $coord[0][1];
should output "5.25676".