What's wrong with this alternative mechanism to make DBI queries? - perl

In the DBI documentation, this is the recommended code for executing a query many times:
$sth = $dbh->prepare_cached($statement);
$sth->execute(#bind);
$data = $sth->fetchall_arrayref(#attrs);
$sth->finish;
However, I see that many* query methods allow passing a prepared and cached statement handle in place of a query string, which makes this possible:
$sth = $dbh->prepare_cached($statement);
$data = $dbh->selectall_arrayref($sth, \%attrs, #bind);
Is there anything wrong with this approach? I haven't seen it used in the wild.
FWIW, I have benchmarked these two implementations. And the second approach appears marginally (4%) faster, when querying for two consecutive rows using fetchall_arrayref in the first implementation vs selectall_arrayref in the second.
* The full list of query methods which support this are:
selectrow_arrayref - normal method with prepared statements is fetchrow_arrayref
selectrow_hashref - " " fetchrow_hashref
selectall_arrayref - " " fetchall_arrayref
selectall_hashref - " " fetchall_hashref
selectcol_arrayref (doesn't really count, as it has no parallel method using the first code path as described above - so the only way
to use prepared statements with this method is to use the second code
path above)

There's nothing wrong with it, as long as you were planning to do only one fetch. When you use the select*_* methods, all the data comes back in one chunk. My DBI code more often looks like this:
$sth = $dbh->prepare_cached($statement);
$sth->execute(#bind);
while (my $row = $sth->fetch) { # alias for fetchrow_arrayref
# do something with #$row here
}
There's no equivalent to this using a select*_* method.
If you're going to call fetchall_* (or you're only fetching 1 row), then go ahead and use a select*_* method with a statement handle.

No, there's nothing wrong that approach. There is something wrong with your benchmark or its analysis, though.
You've claimed that
$sth->execute(#bind);
$data = $sth->fetchall_arrayref(#attrs);
$sth->finish;
is slower than a call to
sub selectall_arrayref {
my ($dbh, $stmt, $attr, #bind) = #_;
my $sth = (ref $stmt) ? $stmt : $dbh->prepare($stmt, $attr)
or return;
$sth->execute(#bind) || return;
my $slice = $attr->{Slice}; # typically undef, else hash or array ref
if (!$slice and $slice=$attr->{Columns}) {
if (ref $slice eq 'ARRAY') { # map col idx to perl array idx
$slice = [ #{$attr->{Columns}} ]; # take a copy
for (#$slice) { $_-- }
}
}
my $rows = $sth->fetchall_arrayref($slice, my $MaxRows = $attr->{MaxRows});
$sth->finish if defined $MaxRows;
return $rows;
}
Maybe if you got rid of the useless call to finish you'll find the first faster? Note that benchmarks with less than 5% difference are not very telling; the accuracy isn't that high.
Update: s/faster than/slower than/

The performance difference should not be between selectall_arrayref() and fetchall_arrayref() but between fetchall_arrayref() and doing a fetch() in a loop yourself. fetchall_arrayref() may be faster as it is hand optimized in C.
The docs for fetchall_arrayref discuss performance...
If $max_rows is defined and greater than or equal to zero then it is
used to limit the number of rows fetched before returning.
fetchall_arrayref() can then be called again to fetch more rows. This
is especially useful when you need the better performance of
fetchall_arrayref() but don't have enough memory to fetch and return
all the rows in one go.
Here's an example (assumes RaiseError is enabled):
my $rows = []; # cache for batches of rows
while( my $row = ( shift(#$rows) || # get row from cache, or reload cache:
shift(#{$rows=$sth->fetchall_arrayref(undef,10_000)||[]}) )
) {
...
}
That might be the fastest way to fetch and process lots of rows using
the DBI, but it depends on the relative cost of method calls vs memory
allocation.
A standard "while" loop with column binding is often faster because the
cost of allocating memory for the batch of rows is greater than the
saving by reducing method calls. It's possible that the DBI may provide
a way to reuse the memory of a previous batch in future, which would
then shift the balance back towards fetchall_arrayref().
So that's a definitive "maybe". :-)

I don't think there's really any advantage to using one over the other, other than that the first uses three lines and the second uses one (less possibility for bugs with the second method). The first might be more commonly used because the documentation states that the "typical method call sequence for a SELECT statement is prepare, execute, fetch, fetch, ... execute, fetch, fetch, ..." and gives this example:
$sth = $dbh->prepare("SELECT foo, bar FROM table WHERE baz=?");
$sth->execute( $baz );
while ( #row = $sth->fetchrow_array ) {
print "#row\n";
}
Now, I'm not suggesting that programmers actually read the documentation (heaven forbid!) but given its prominence near the top of the documentation in a section designed to show you how to use the module, I would suspect that the more-verbose method is slightly more preferred by the module's author. As to why, your guess is as good as mine.

Related

Single Responsibility Principle: Write data to file after running a query

I have to write rows generated after running a sql query to a file.
# Run the SQL script.
my $dbh = get_dbh($source);
my $qry = $dbh->prepare("$sql_data");
$qry->execute();
# Dump the data to file.
open(my $fh_write, ">", "$filename");
while (my #data = $qry->fetchrow_array())
{
print {$fh_write} join("\t", #data) . "\n";
}
close($fh_write);
Clearly i am doing two thing in a function:
Running the sql query.
Writing the data to file.
Is there a way to do this using SRP ?
There are lots of rows in the data so returning the array of rows from a seperate function might not be nice idea.
You could split it up into two different functions. One would query the database, and the other would write data to a file.
sub run_query {
my ( $sql, #args ) = #_;
# if you truly want separation of concerns,
# you need to connect $dbh somewhere else
my $sth = $dbh->prepare($sql);
$sth->execute(#args);
# this creates an iterator
return sub {
return $sth->fetchrow_arrayref;
};
}
This function takes a an SQL query and some arguments (remember to use placeholders!) and runs the query. It returns a code reference that closes over $sth. Every time that reference is invoked, one line of results will be fetched. When the statement handle $sth is empty, it will return undef, which is handed through, and you're done iterating. That might seem overkill, but stay with me for a moment.
Next, we make a function to write data to a file.
sub write_to_file {
my ( $filename, $iter ) = #_;
open my $fh, '>', $filename or die $!;
while ( my $data = $iter->() ) {
print $fh join( "\t", #{$data} ), "\n";
}
return;
}
This takes a filename and an iterator, which is a code reference. It opens the file, and then iterates until there is no more data left. Every line is written to the file. We don't need close $fh because it's a lexical filehandle that will be closed implicitly once $fh goes out of scope at the end of the function anyway.
What you've done now is define an interface. Your write_to_file function's interface is that it takes a file name and an iterator that always returns an array reference of fields.
Let's put this together.
my $iter = run_query('SELECT * FROM orders');
write_to_file( 'orders.csv', $iter );
Two lines of code. One runs the query, the other one writes the data. Looks pretty separated to me.
The good thing about this approach is that now you can also write other things to a file with the same code. The following code could for example talk to some API. The iterator that it returns again gives us one row of results per invocation.
sub api_query {
my ($customer_id) = #_;
my $api = API::Client->new;
my $res = $api->get_orders($customer_id); # returns [ {}, {}, {}, ... ]
my $i = 0;
return sub {
return if $i == $#{ $res };
return $res->[$i++];
}
}
You could drop this into the above example instead of run_query() and it would work, because this function returns something that adheres to the same interface. You could just as well make a write_to_api or write_to_slack_bot function that has the same partial interface. One of the parameters would be the same kind of iterator. Now those are exchangeable too.
Of course this whole example is very contrived. In reality it highly depends on the size and complexity of your program.
If it's a script that runs as a cronjob that does nothing but create this report once a day, you should not care about this separation of concerns. The pragmatic approach would likely be the better choice.
Once you have a lot of those, you'd start caring a bit more. Then my above approach might be viable. But only if you really need to have things flexible.
Not every concept is always applicable, and not every concept always makes sense.
Please keep in mind that there are tools that are better suited for those jobs. Instead of making your own CSV file you can use Text::CSV_XS. Or you could use an ORM like DBIx::Class and have ResultSet objects as your interface.
You should be using a seperate function for doing the job, in your situation using doing how you are doing things currently makes much more sense than sticking to SRP.

Reducing code verbosity and efficiency

I came across the below where some heavy stipulations were done, finally we got a number of #hits and we need to return just one:
if ($#hits > 0)
{
my $highestScore = 0;
my $chosenMatch = "";
for $hit (#hits)
{
my $currScore = 0;
foreach $k (keys %{$hit})
{
next if $k eq $retColumn;
$currScore++ if ($hit->{$k} =~ /\S+/);
}
if ($currScore > $highestScore)
{
$chosenMatch = $hit;
$highestScore = $currScore;
}
}
return ($chosenMatch);
}
elsif ($#hits == 0)
{
return ($hits[0]);
}
That's an eye full and I was hoping to simplify the above code, I came up with:
return reduce {grep /\S+/, values %{$a} > grep /\S+/, values %{$b} ? $a : $b} #matches;
After of using of course useing, List::Util
I wonder if the terse version is any efficient and/or advantage over the original one. Also, there's one condition that's skipped: if $k eq $retColumn, how can I efficiently get that in?
There is a famous quote:
"Premature optimisation is the root of all evil" - Donald Knuth
It is almost invariably the case that making code more concise really doesn't make much difference to the efficiency, and can cause significant penalties to readability and maintainability.
Algorithm is important, code layout ... isn't really. Things like reduce, map and grep are still looping - they're just doing so behind the scenes. You've gained almost no efficiency by using them, you've just saved some bytes in your file. That's fine if they make your code more clear, but that should be your foremost consideration.
Please - keep things clear first, foremost and always. Make your algorithm good. Don't worry about replacing an explicit loop with a grep or map unless these things make your code clearer.
And in the interests of being constructive:
use strict and warnings is really important. Really really important.
To answer your original question:
I wonder if the terse version is any efficient and/or advantage over the original one
No, I think if anything the opposite. Short of profiling code speed, the rule of thumb is look at number and size of loops - a single chunk of code rarely makes much difference, but running it lots and lots of times (unnecessarily) is where you get your inefficiency.
In your first example - you have two loops, a foreach loop inside a for loop. It looks like you traverse your #hits data structure once, and 'unwrap' it to get at the inner layers.
In your second example, both your greps are loops, and your reduce is as well. If I'm reading it correctly, then it'll be traversing your data structure multiple times. (Because you are greping values $a and $b - these will be applied several times).
So I don't think you have gained either readability or efficiency by doing what you've done. But you have made a function that's going to make future maintenance programmers have to think really hard. To take another quote:
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" - Brian Kernighan
I wonder if the terse version is any efficient and/or advantage over the original one
The terse version is less efficient than the original because it calculates the score of every element twice, but it does have readability advantages.
The following keeps the readability gain (and even adds some):
sub get_score {
my ($match) = #_;
my #keys = grep { $_ ne $retColumn } keys %$match;
my $score = grep { /\S/ } #{$match}{ #keys };
return $score;
}
return reduce { get_score($a) > get_score($b) ? $a : $b } #matches;
You can look at any part of that sub and understand it without looking around. The least context you need to understand code, the more readable it is.
If you did need an efficiency boost, you can avoid calling get_score on every input twice by using a Schwartzian Transform. As with many optimizations, you will take a readability hit, but at least it's idiomatic (well known and thus well recognizable).
return
map { $_->[0] }
reduce { $a->[1] > $b->[1] ? $a : $b }
map { [ $match, get_score($match) ] }
#matches;

Perl - Data comparison taking huge time

open(INFILE1,"INPUT.txt");
my $modfile = 'Data.txt';
open MODIFIED,'>',$modfile or die "Could not open $modfile : $!";
for (;;) {
my $line1 = <INFILE1>;
last if not defined $line1;
my $line2 = <INFILE1>;
last if not defined $line2;
my ($tablename1, $colname1,$sql1) = split(/\t/, $line1);
my ($tablename2, $colname2,$sql2) = split(/\t/, $line2);
if ($tablename1 eq $tablename2)
{
my $sth1 = $dbh->prepare($sql1);
$sth1->execute;
my $hash_ref1 = $sth1->fetchall_hashref('KEY');
my $sth2 = $dbh->prepare($sql2);
$sth2->execute;
my $hash_ref2 = $sth2->fetchall_hashref('KEY');
my #fieldname = split(/,/, $colname1);
my $colcnt=0;
my $rowcnt=0;
foreach $key1 ( keys(%{$hash_ref1}) )
{
foreach (#fieldname)
{
$colname =$_;
my $strvalue1='';
#val1 = $hash_ref1->{$key1}->{$colname};
if (defined #val1)
{
my #filtered = grep /#val1/, #metadata;
my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
my $strvalue2='';
#val2 = $hash_ref2->{$key1}->{$colname};
if (defined #val2)
{
my #filtered = grep /#val2/, #metadata2;
my $strvalue2 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
if ($strvalue1 ne $strvalue2 )
{
$colcnt = $colcnt + 1;
print MODIFIED "$tablename1\t$colname\t$strvalue1\t$strvalue2\n";
}
}
}
if ($colcnt>0)
{
print "modified count is $colcnt\n";
}
%$hash_ref1 = ();
%$hash_ref2 = ();
}
The program is Read input file in which every line contrain three strings seperated by tab. First is TableName, Second is ALL Column Name with commas in between and third contain the sql to be run. As this utlity is doing comparison of data, so there are two rows for every tablename. One for each DB. So data needs to be picked from each respective db's and then compared column by column.
SQL returns as ID in the result set and if the value is coming from db then it needs be translated to a string by reading from a array (that array contains 100K records with Key and value seperated by ||)
Now I ran this for one set of tables which contains 18K records in each db. There are 8 columns picked from db in each sql. So for every record out of 18K, and then for every field in that record i.e. 8, this script is taking a lot of time.
My question is if someone can look and see if it can be imporoved so that it takes less time.
File contents sample
INPUT.TXT
TABLENAME COL1,COL2 select COL1,COL2 from TABLENAME where ......
TABLENAMEB COL1,COL2 select COL1,COL2 from TABLENAMEB where ......
Metadata array contains something like this(there are two i.e. for each db)
111||Code 1
222||Code 2
Please suggest
Your code does look a bit unusual, and could gain clarity from using subroutines vs. just using loops and conditionals. Here are a few other suggestions.
The excerpt
for (;;) {
my $line1 = <INFILE1>;
last if not defined $line1;
my $line2 = <INFILE1>;
last if not defined $line2;
...;
}
is overly complicated: Not everyone knows the C-ish for(;;) idiom. You have lots of code duplication. And aren't you actually saying loop while I can read two lines?
while (defined(my $line1 = <INFILE1>) and defined(my $line2 = <INFILE1>)) {
...;
}
Yes, that line is longer, but I think it's a bit more self-documenting.
Instead of doing
if ($tablename1 eq $tablename2) { the rest of the loop }
you could say
next if $tablename1 eq $tablename2;
the rest of the loop;
and save a level of intendation. And better intendation equals better readability makes it easier to write good code. And better code might perform better.
What are you doing at foreach $key1 (keys ...) — something tells me you didn't use strict! (Just a hint: lexical variables with my can perform slightly better than global variables)
Also, doing $colname = $_ inside a for-loop is a dumb thing, for the same reason.
for my $key1 (keys ...) {
...;
for my $colname (#fieldname) { ... }
}
my $strvalue1='';
#val1 = $hash_ref1->{$key1}->{$colname};
if (defined #val1)
{
my #filtered = grep /#val1/, #metadata;
my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
}
I don't think this does what you think it does.
From the $hash_ref1 you retrive a single element, then assign that element to an array (a collection of multiple values).
Then you called defined on this array. An array cannot be undefined, and what you are doing is quite deprecated. Calling defined function on a collection returns info about the memory management, but does not indicate ① whether the array is empty or ② whether the first element in that array is defined.
Interpolating an array into a regex isn't likely to be useful: The elements of the array are joined with the value of $", usually a whitespace, and the resulting string treated as a regex. This will wreak havoc if there are metacharacters present.
When you only need the first value of a list, you can force list context, but assign to a single scalar like
my ($filtered) = produce_a_list;
This frees you from weird subscripts you don't need and that only slow you down.
Then you assign to a $strvalue1 variable you just declared. This shadows the outer $strvalue1. They are not the same variable. So after the if branch, you still have the empty string in $strvalue1.
I would write this code like
my $val1 = $hash_ref1->{$key1}{$colname};
my $strvalue1 = defined $val1
? do {
my ($filtered) = grep /\Q$val1/, #metadata;
substr $filtered, 2 + index $filtered, '||'
} : '';
But this would be even cheaper if you pre-split #metadata into pairs and test for equality with the correct field. This would remove some of the bugs that are still lurking in that code.
$x = $x + 1 is commonly written $x++.
Emptying the hashrefs at the end of the iteration is unneccessary: The hashrefs are assigned to a new value at the next iteration of the loop. Also, it is unneccessary to assist Perls garbage collection for such simple tasks.
About the metadata: 100K records is a lot, so either put it in a database itself, or at the very least a hash. Especially for so many records, using a hash is a lot faster than looping through all entries and using slow regexes … aargh!
Create the hash from the file, once at the beginning of the program
my %metadata;
while (<METADATA>) {
chomp;
my ($key, $value) = split /\|\|/;
$metadata{$key} = $value; # assumes each key only has one value
}
Simply look up the key inside the loop
my $strvalue1 = defined $val1 ? $metadata{$val1} // '' : ''
That should be so much faster.
(Oh, and please consider using better names for variables. $strvalue1 doesn't tell me anything, except that it is a stringy value (d'oh). $val1 is even worse.)
This is not really an answer but it won't really fit well in a comment either so, until you provide some more information, here are some observations.
Inside you inner for loop, there is:
#val1 = $hash_ref1->{$key1}->{$colname};
Did you mean #val1 = #{ $hash_ref1->{$key1}->{$colname} };?
Later, you check if (defined #val1)? What did you really want to check? As perldoc -f defined points out:
Use of "defined" on aggregates (hashes and arrays) is
deprecated. It used to report whether memory for that aggregate
had ever been allocated. This behavior may disappear in future
versions of Perl. You should instead use a simple test for size:
In your case, if (defined #val1) will always be true.
Then, you have my #filtered = grep /#val1/, #metadata; Where did #metadata come from? What did you actually intend to check?
Then you have my $strvalue1 = substr(#filtered[0],index(#filtered[0],'||') + 2);
There is some interesting stuff going on in there.
You will need to verbalize what you are actually trying to do.
I strongly suspect there is a single SQL query you can run that will give you what you want but we first need to know what you want.

How can I prevent perl from reading past the end of a tied array that shrinks when accessed?

Is there any way to force Perl to call FETCHSIZE on a tied array before each call to FETCH? My tied array knows its maximum size, but could shrink from this size depending on the results of earlier FETCH calls. here is a contrived example that filters a list to only the even elements with lazy evaluation:
use warnings;
use strict;
package VarSize;
sub TIEARRAY { bless $_[1] => $_[0] }
sub FETCH {
my ($self, $index) = #_;
splice #$self, $index, 1 while $$self[$index] % 2;
$$self[$index]
}
sub FETCHSIZE {scalar #{$_[0]}}
my #source = 1 .. 10;
tie my #output => 'VarSize', [#source];
print "#output\n"; # array changes size as it is read, perl only checks size
# at the start, so it runs off the end with warnings
print "#output\n"; # knows correct size from start, no warnings
for brevity I have omitted a bunch of error checking code (such as how to deal with accesses starting from an index other than 0)
EDIT: rather than the above two print statements, if ONE of the following two lines is used, the first will work fine, the second will throw warnings.
print "$_ " for #output; # for loop "iterator context" is fine,
# checks FETCHSIZE before each FETCH, ends properly
print join " " => #output; # however a list context expansion
# calls FETCHSIZE at the start, and runs off the end
Update:
The actual module that implements a variable sized tied array is called List::Gen which is up on CPAN. The function is filter which behaves like grep, but works with List::Gen's lazy generators. Does anyone have any ideas that could make the implementation of filter better?
(the test function is similar, but returns undef in failed slots, keeping the array size constant, but that of course has different usage semantics than grep)
sub FETCH {
my ($self, $index) = #_;
my $size = $self->FETCHSIZE;
...
}
Ta da!
I suspect what you're missing is they're just methods. Methods called by tie magic, but still just methods you can call yourself.
Listing out the contents of a tied array basically boils down to this:
my #array;
my $tied_obj = tied #array;
for my $idx (0..$tied_obj->FETCHSIZE-1) {
push #array, $tied_obj->FETCH($idx);
}
return #array;
So you don't get any opportunity to control the number of iterations. Nor can FETCH reliably tell if its being called from #array or $array[$idx] or #array[#idxs]. This sucks. Ties kinda suck, and they're really slow. About 3 times slower than a normal method call and 10 times than a regular array.
Your example already breaks expectations about arrays (10 elements go in, 5 elements come out). What happen when a user asks for $array[3]? Do they get undef? Alternatives include just using the object API, if your thing doesn't behave exactly like an array pretending it does will only add confusion. Or you can use an object with array deref overloaded.
So, what you're doing can be done, but its difficult to get it to work well. What are you really trying to accomplish?
I think that order in which perl calls FETCH/FETCHSIZE methods can't be changed. It's perls internal part.
Why not just explicitly remove warnings:
sub FETCH {
my ($self, $index) = #_;
splice #$self, $index, 1 while ($$self[$index] || 0) % 2;
exists $$self[$index] ? $$self[$index] : '' ## replace '' with default value
}

Is returning a whole array from a Perl subroutine inefficient?

I often have a subroutine in Perl that fills an array with some information. Since I'm also used to hacking in C++, I find myself often do it like this in Perl, using references:
my #array;
getInfo(\#array);
sub getInfo {
my ($arrayRef) = #_;
push #$arrayRef, "obama";
# ...
}
instead of the more straightforward version:
my #array = getInfo();
sub getInfo {
my #array;
push #array, "obama";
# ...
return #array;
}
The reason, of course, is that I don't want the array to be created locally in the subroutine and then copied on return.
Is that right? Or does Perl optimize that away anyway?
What about returning an array reference in the first place?
sub getInfo {
my $array_ref = [];
push #$array_ref, 'foo';
# ...
return $array_ref;
}
my $a_ref = getInfo();
# or if you want the array expanded
my #array = #{getInfo()};
Edit according to dehmann's comment:
It's also possible to use a normal array in the function and return a reference to it.
sub getInfo {
my #array;
push #array, 'foo';
# ...
return \#array;
}
Passing references is more efficient, but the difference is not as big as in C++. The argument values themselves (that means: the values in the array) are always passed by reference anyway (returned values are copied though).
Question is: does it matter? Most of the time, it doesn't. If you're returning 5 elements, don't bother about it. If you're returning/passing 100'000 elements, use references. Only optimize it if it's a bottleneck.
If I look at your example and think about what you want to do I'm used to write it in this manner:
sub getInfo {
my #array;
push #array, 'obama';
# ...
return \#array;
}
It seems to me as straightforward version when I need return large amount of data. There is not need to allocate array outside sub as you written in your first code snippet because my do it for you. Anyway you should not do premature optimization as Leon Timmermans suggest.
To answer the final rumination, no, Perl does not optimize this away. It can't, really, because returning an array and returning a scalar are fundamentally different.
If you're dealing with large amounts of data or if performance is a major concern, then your C habits will serve you well - pass and return references to data structures rather than the structures themselves so that they won't need to be copied. But, as Leon Timmermans pointed out, the vast majority of the time, you're dealing with smaller amounts of data and performance isn't that big a deal, so do it in whatever way seems most readable.
This is the way I would normally return an array.
sub getInfo {
my #array;
push #array, 'foo';
# ...
return #array if wantarray;
return \#array;
}
This way it will work the way you want, in scalar, or list contexts.
my $array = getInfo;
my #array = getInfo;
$array->[0] == $array[0];
# same length
#$array == #array;
I wouldn't try to optimize it unless you know it is a slow part of your code. Even then I would use benchmarks to see which subroutine is actually faster.
There's two considerations. The obvious one is how big is your array going to get? If it's less than a few dozen elements, then size is not a factor (unless you're micro-optimizing for some rapidly called function, but you'd have to do some memory profiling to prove that first).
That's the easy part. The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:
for my $info (#{ getInfo($some, $args) }) {
...
}
That's ugly. This is much better.
for my $info ( getInfo($some, $args) ) {
...
}
It also lends itself to mapping and grepping.
my #info = grep { ... } getInfo($some, $args);
But returning an array ref can be handy if you're going to pick out individual elements:
my $address = getInfo($some, $args)->[2];
That's simpler than:
my $address = (getInfo($some, $args))[2];
Or:
my #info = getInfo($some, $args);
my $address = $info[2];
But at that point, you should question whether #info is truly a list or a hash.
my $address = getInfo($some, $args)->{address};
What you should not do is have getInfo() return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.
Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.
use Method::Signatures;
method foo(\#args) {
print "#args"; # #args is not a copy
push #args, 42; # this alters the caller array
}
my #nums = (1,2,3);
Class->foo(\#nums); # prints 1 2 3
print "#nums"; # prints 1 2 3 42
This is done through the magic of Data::Alias.
3 other potentially LARGE performance improvements if you are reading an entire, largish file and slicing it into an array:
Turn off BUFFERING with sysread() instead of read() (manual warns
about mixing)
Pre-extend the array by valuing the last element -
saves memory allocations
Use Unpack() to swiftly split data like uint16_t graphics channel data
Passing an array ref to the function allows the main program to deal with a simple array while the write-once-and-forget worker function uses the more complicated "$#" and arrow ->[$II] access forms. Being quite C'ish, it is likely to be fast!
I know nothing about Perl so this is a language-neutral answer.
It is, in a sense, inefficient to copy an array from a subroutine into the calling program. The inefficiency arises in the extra memory used and the time taken to copy the data from one place to another. On the other hand, for all but the largest arrays, you might not give a damn, and might prefer to copy arrays out for elegance, cussedness or any other reason.
The efficient solution is for the subroutine to pass the calling program the address of the array. As I say, I haven't a clue about Perl's default behaviour in this respect. But some languages provide the programmer the option to choose which approach.