I have taken a principles of programming class and have been given a Perl expression that concatenates a string number to an int number and then adds another number to it and it evaluates fine. i.e. ("4" . 3) + 7 == 50.
I'm trying to understand why Perl does this and what concerns it may bring up. I'm having a hard time grasping many of the concepts of the class and am trying to get explanations from different sources apart from my horrible teacher and equally horrible notes.
Can the concept behind this kind of expression be explained to me as well as concerns they might bring up? Thanks in advance for the help.
Edit: For Clarity
Perl is built around the central concept of 'do what I mean'.
A scalar is a multi purpose variable type, and is intended to implicitly cast values to a data type that's appropriate to what you're doing.
The reason this works is because perl is context sensitive - it knows the difference between different expected return values.
At a basic level, you can see this with the wantarray function. (Which as noted below - is probably badly named, because we're talking about a LIST context)
sub context_test {
if ( not defined wantarray() ) {
print "Void context\n";
}
if ( wantarray() ) {
return ( "List", "Context" );
}
else {
return "scalar context";
}
}
context_test();
my $scalar = context_test();
my #list = context_test();
print "Scalar context gave me $scalar\n";
print "List context gave me #list\n";
This principle occurs throughout perl. If you want, you can use something like Contextual::Return to extend this further - testing the difference between numeric, string and boolean subsets of scalar contexts.
The reason I mention this is because a scalar is a special sort of data type - if you look at Scalar::Util you will see a capability of creating a dualvar - a scalar that has different values in different contexts.
my $dualvar = Scalar::Util::dualvar ( 666, "the beast" );
print "Numeric:",$dualvar + 0,"\n";
print "String:",$dualvar . '',"\n";
Now, messing around with dualvars is a good way to create some really annoying and hard to trace bugs, but the point is - a scalar is a magic datatype, and perl is always aware of what you're doing with the result.
If you perform a string operation, perl treats it as a string. If you perform a numeric operation, perl tries to treat it as a number.
my $value = '4'; #string;
my $newvalue = $value . 3; #because we concat, perl treats _both_ as strings.
print $newvalue,"\n";
my $sum = $newvalue + 7; #perl turns strings back to numbers, because we're adding;
print $sum,"\n";
if ( Scalar::Util::isdual ( $newvalue ) ) { print "newvalue Is a dual var\n" };
if ( not Scalar::Util::isdual ( $sum ) ) { print "sum is NOT a dual var\n"; };
Mostly 'context' is something that happens behind the scenes in perl, and you don't have to worry about it. If you've come from a programming background, the idea of implicit casting between int and string may seem a little bit dirty. But it mostly works fine.
You may occasionally get errors like:
Argument "4a3" isn't numeric in addition (+)
One of the downsides of this approach is these are runtime errors, because you're not doing strong type checking at 'compile' time.
So in terms of specific concerns:
You're runtime type checking, not compile time. If you have strict types, you can detect an attempt to add a string to an int before you start to run anything.
You're not always operating in the context that you assume you are, which can lead to some unpredictable behaviour. One of the best examples is that print operates in a list context - so to take the example above:
print context_test();
You'll get List Context.
If you monkey around with context sensitive return types, you can create some really annoying bugs that are immensely irritating to back trace and troubleshoot.
Related
Frequently I need to convert data from one type to another and then compare them. Some operators will convert to specific types first and this conversion may cause loss of efficiency. For instance, I may have
my $a, $b = 0, "foo"; # initial value
$a = (3,4,5).Set; # re-assign value
$b = open "dataFile"; # re-assign value
if $a eq $b { say "okay"; } # convert to string
if $a == 5 { say "yes"; } # convert to number
if $a == $b {} # error, Cannot resolve caller Numeric(IO::Handle:D: );
The operators "eq" and "==" will convert data to the digestible types first and may slow things down. Will the operators "eqv" and "===" skip converting data types and be more efficient if data to be compared cannot be known in advance (i.e., you absolutely have no clue what you are going to get in advance)?
It's not quite clear to me from the question if you actually want the conversions to happen or not.
Operators like == and eq are really calls to multi subs with names like infix:<==>, and there are many candidates. For example, there's one for (Int, Int), which is selected if we're comparing two Ints. In that case, it knows that it doesn't need to coerce, and will just do the integer comparison.
The eqv and === operators will not coerce; the first thing they do is to check that the values have the same type, and if they don't, they go no further. Make sure to use the correct one depending of if you want snapshot semantics (eqv) or reference semantics (===). Note that the types really must be the exact same, so 1e0 === 1 will not come out true because the one value is a Num and the other an Int.
The auto-coercion behavior of operators like == and eq can be really handy, but from a performance angle it can also be a trap. They coerce, use the result of the coercion for the comparison, and then throw it away. Repeatedly doing comparisons can thus repeatedly trigger coercions. If you have that situation, it makes sense to split the work into two phases: first "parse" the incoming data into the appropriate data types, and then go ahead and do the comparisons.
Finally, in any discussion on efficiency, it's worth noting that the runtime optimizer is good at lifting out duplicate type checks. Thus while in principle, if you read the built-ins source, == might seem cheaper in the case it gets two things have the same type because it isn't enforcing they are precisely the same type, in reality that extra check will get optimized out for === anyway.
Both === and eqv first check whether the operands are of the same type, and will return False if they are not. So at that stage, there is no real difference between them.
The a === b operator is really short for a.WHICH eq b.WHICH. So it would call the .WHICH method on the operands, which could be expensive if an operand is something like a really large Buf.
The a eqv b operator is more complicated in that it has special cased many object comparisons, so in general you cannot say much about it.
In other words: YMMV. And if you're really interested in performance, benchmark! And be prepared to adapt your code if another way of solving the problem proves more performant.
Inspired a little by: https://stackoverflow.com/questions/30977789/why-is-c-not-a-functional-programming-language
I found: Higher Order Perl
It made me wonder about the assertion that Perl is a functional programming language. Now, I appreciate that functional programming is a technique (much like object oriented).
However I've found a list of what makes a functional programming language:
First Class functions
Higher Order Functions
Lexical Closures
Pattern Matching
Single Assignment
Lazy Evaluation
Garbage Collection
Type Inference
Tail Call Optimization
List Comprehensions
Monadic effects
Now some of these I'm quite familiar with:
Garbage collection, for example, is Perl reference counting and releasing memory when no longer required.
Lexical closures are even part of the FAQ: What is a closure? - there's probably a better article here: http://www.perl.com/pub/2002/05/29/closure.html
But I start to get a bit fuzzy on some of these - List Comprehensions, for example - I think that's referring to map/grep (List::Util and reduce?)
I anyone able to help me fill in the blanks here? Which of the above can Perl do easily (and is there an easy example) and are there examples where it falls down?
Useful things that are relevant:
Perl monks rant about functional programming
Higher Order Perl
C2.com functional programming definitions
First Class functions
In computer science, a programming language is said to have first-class functions if it treats functions as first-class citizens. Specifically, this means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures.
So in Perl:
my $print_something = sub { print "Something\n" };
sub do_something {
my ($function) = #_;
$function->();
}
do_something($print_something);
Verdict: Natively supported
Higher Order Functions
In mathematics and computer science, a higher-order function (also functional form, functional or functor) is a function that does at least one of the following:
takes one or more functions as an input
outputs a function
With reference to this post on perlmonks:
In Perl terminology, we often refer to them as callbacks, factories, and functions that return code refs (usually closures).
Verdict: Natively supported
Lexical Closures
Within the perl FAQ we have questions regarding What is a closure?:
Closure is a computer science term with a precise but hard-to-explain meaning. Usually, closures are implemented in Perl as anonymous subroutines with lasting references to lexical variables outside their own scopes. These lexicals magically refer to the variables that were around when the subroutine was defined (deep binding).
Closures are most often used in programming languages where you can have the return value of a function be itself a function, as you can in Perl.
This is explained perhaps a little more clearly in the article: Achieving Closure
sub make_hello_printer {
my $message = "Hello, world!";
return sub { print $message; }
}
my $print_hello = make_hello_printer();
$print_hello->()
Verdict: Natively supported
Pattern Matching
In the context of pure functional languages and of this page, Pattern Matching is a dispatch mechanism: choosing which variant of a function is the correct one to call. Inspired by standard mathematical notations.
Dispatch tables are the closest approximation - essentially a hash of either anonymous subs or code refs.
use strict;
use warnings;
sub do_it {
print join( ":", #_ );
}
my $dispatch = {
'onething' => sub { print #_; },
'another_thing' => \&do_it,
};
$dispatch->{'onething'}->("fish");
Because it's just a hash, you can add code references and anonymous subroutines too. (Note - not entirely dissimilar to Object Oriented programming)
Verdict: Workaround
Single Assignment
Any assignment that changes an existing value (e.g. x := x + 1) is disallowed in purely functional languages.4 In functional programming, assignment is discouraged in favor of single assignment, also called initialization. Single assignment is an example of name binding and differs from assignment as described in this article in that it can only be done once, usually when the variable is created; no subsequent reassignment is allowed.
I'm not sure perl really does this. The closest approximation might be references/anonymous subs or perhaps constant.
Verdict: Not Supported
Lazy Evaluation
Waiting until the last possible moment to evaluate an expression, especially for the purpose of optimizing an algorithm that may not use the value of the expression.
Examples of lazy evaluation techniques in Perl 5?
And again, coming back to Higher Order Perl (I'm not affiliated with this book, honest - it just seems to be one of the key texts on the subject).
The core concept here seems to be - create a 'linked list' in perl (using object oriented techniques) but embed a code reference at your 'end marker' that evaluates if you ever get that far.
Verdict: Workaround
Garbage Collection
"GarbageCollection (GC), also known as automatic memory management, is the automatic recycling of heap memory."
Perl does this via reference counting, and releasing things when they are no longer referenced. Note that this can have implications for certain things that you're (probably!) more likely to encounter when functional programming.
Specifically - circular references which are covered in perldoc perlref
Verdict: Native support
Type Inference
TypeInference is the analysis of a program to infer the types of some or all expressions, usually at CompileTime
Perl does implicitly cast values back and forth as it needs to. Usually this works well enough that you don't need to mess with it. Occasionally you need to 'force' the process, by making an explicit numeric or string operation. Canonically, this is either by adding 0, or concatenating an empty string.
You can overload a scalar to do different things in by using dualvars
Verdict: Native support
Tail Call Optimization
Tail-call optimization (or tail-call merging or tail-call elimination) is a generalization of TailRecursion: If the last thing a routine does before it returns is call another routine, rather than doing a jump-and-add-stack-frame immediately followed by a pop-stack-frame-and-return-to-caller, it should be safe to simply jump to the start of the second routine, letting it re-use the first routine's stack frame (environment).
Why is Perl so afraid of "deep recursion"?
It'll work, but it'll warn if your recursion depth is >100. You can disable this by adding:
no warnings 'recursion';
But obviously - you need to be slightly cautious about recursion depth and memory footprint.
As far as I can tell, there isn't any particular optimisation and if you want to do something like this in an efficient fashion, you may need to (effectively) unroll your recursives and iterate instead.
Tailcalls are supported by perl. Either see the goto ⊂ notation, or see the neater syntax for it provided by Sub::Call::Tail
Verdict: Native
List Comprehensions
List comprehensions are a feature of many modern FunctionalProgrammingLanguages. Subject to certain rules, they provide a succinct notation for GeneratingElements? in a list.
A list comprehension is SyntacticSugar for a combination of applications of the functions concat, map and filter
Perl has map, grep, reduce.
It also copes with expansion of ranges and repetitions:
my #letters = ( "a" .. "z" );
So you can:
my %letters = map { $_ => 1 } ( "A" .. "z" );
Verdict: Native (List::Utils is a core module)
Monadic effects
... nope, still having trouble with these. It's either much simpler or much more complex than I can grok.
If anyone's got anything more, please chip in or edit this post or ... something. I'm still a sketchy on some of the concepts involved, so this post is more a starting point.
Really nice topic, I wanted to write an article titled something link "the camel is functional". Let me contribute with some code.
Perl also support this anonymous functions like
sub check_config {
my ( $class, $obj ) = #_;
my $separator = ' > ';
# Build message from class namespace.
my $message = join $separator, ( split '::', $class );
# Use provided object $obj or
# create an instance of class with defaults, provided by configuration.
my $object = $obj || $class->new;
# Return a Function.
return sub {
my $attribute = shift;
# Compare attribute with configuration,
# just to ensure it is read from there.
is $object->config->{$attribute},
# Call attribute accessor so it is read from config,
# and validated by type checking.
$object->$attribute,
# Build message with attribute.
join $separator, ( $message, $attribute );
}
}
sub check_config_attributes {
my ( $class, $obj ) = #_;
return sub {
my $attributes = shift;
check_config( $class, $obj )->($_) for (#$attributes);
}
}
I have a sub in perl (generated automatically by SWIG) that I want to return multiple values from. However, I seem to be getting variable meta-data instead of the actual values.
sub getDate {
my $tm = *libswigperlc::MyClass_getDate;
($tm.sec, $tm.min, $tm.hour, $tm.day, $tm.month, $tm.year + 1900);
}
The caller is like this...
my ($sec,$min,$hour,$day,$month,$year) = $s->getDate();
print "$year-$month-$day $hour:$min\n";
The $tm.year + 1900 does return the value as wanted. If I add "+ 1" to the other values, they work as wanted too.
But
print $month;
results in
*libswigperlc::MyClass_getDatemonth
instead of
3
What is the best way to return the values to the caller?
I am a novice perl user - I use C++ normally.
Let's go for a longer answer here:
First of all, are you using strict and warnings pragmas? (use strict; use warnings;) They will save you a lot of time by taking some of your Perl freedom away (To me, without them you're stepping out from freedom into extreme anarchism (: ).
$tm . sec would do this: tries to concatenate $tm with sec. Then what's sec?
-If you are using strict pragma, then sec is a sub declared somewhere before the call
-If you are not using strict pragma (I guess this is the case) sec is used as a bareword.
What is *libswigperlc::MyClass_getDate? Is it returning an object that's overloading concatenation operator(.) and/or add operator (+) in it? If yes (and specially without strict/warnings pragmas) you may expect any kind of result depending on the definition of the overload functions. Getting a correct result that you are getting by putting + is one of the possibilities.
That's all that comes to my mind, I hope others add their explanations too or correct mine.
tchrist - you were right to question the typeglob line. That line was generated by Swig, and I had no understanding of it.
All I had to do was return the typeglob as is...
sub getDate {
*libswigperlc::MyClass_getDate2;
}
Now the caller can access the members like this...
my $tm = myClass->getDate();
print "Year = $tm->{year}";
At least now I understand it just well enough to know to leave it as it is, or to change it to be the way as per the original idea.
I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.
Has anybody found a good solution for lazily-evaluated lists in Perl? I've tried a number of ways to turn something like
for my $item ( map { ... } #list ) {
}
into a lazy evaluation--by tie-ing #list, for example. I'm trying to avoid breaking down and writing a source filter to do it, because they mess with your ability to debug the code. Has anybody had any success. Or do you just have to break down and use a while loop?
Note: I guess that I should mention that I'm kind of hooked on sometimes long grep-map chains for functionally transforming lists. So it's not so much the foreach loop or the while loop. It's that map expressions tend to pack more functionality into the same vertical space.
As mentioned previously, for(each) is an eager loop, so it wants to evaluate the entire list before starting.
For simplicity, I would recommend using an iterator object or closure rather than trying to have a lazily evaluated array. While you can use a tie to have a lazily evaluated infinite list, you can run into troubles if you ever ask (directly or indirectly, as in the foreach above) for the entire list (or even the size of the entire list).
Without writing a full class or using any modules, you can make a simple iterator factory just by using closures:
sub make_iterator {
my ($value, $max, $step) = #_;
return sub {
return if $value > $max; # Return undef when we overflow max.
my $current = $value;
$value += $step; # Increment value for next call.
return $current; # Return current iterator value.
};
}
And then to use it:
# All the even numbers between 0 - 100.
my $evens = make_iterator(0, 100, 2);
while (defined( my $x = $evens->() ) ) {
print "$x\n";
}
There's also the Tie::Array::Lazy module on the CPAN, which provides a much richer and fuller interface to lazy arrays. I've not used the module myself, so your mileage may vary.
All the best,
Paul
[Sidenote: Be aware that each individual step along a map/grep chain is eager. If you give it a big list all at once, your problems start much sooner than at the final foreach.]
What you can do to avoid a complete rewrite is to wrap your loop with an outer loop. Instead of writing this:
for my $item ( map { ... } grep { ... } map { ... } #list ) { ... }
… write it like this:
while ( my $input = calculcate_next_element() ) {
for my $item ( map { ... } grep { ... } map { ... } $input ) { ... }
}
This saves you from having to significantly rewrite your existing code, and as long as the list does not grow by several orders of magnitude during transformation, you get pretty nearly all the benefit that a rewrite to iterator style would offer.
If you want to make lazy lists, you'll have to write your own iterator. Once you have that, you can use something like Object::Iterate which has iterator-aware versions of map and grep. Take a look at the source for that module: it's pretty simple and you'll see how to write your own iterator-aware subroutines.
Good luck, :)
There is at least one special case where for and foreach have been optimized to not generate the whole list at once. And that is the range operator. So you have the option of saying:
for my $i (0..$#list) {
my $item = some_function($list[$i]);
...
}
and this will iterate through the array, transformed however you like, without creating a long list of values up front.
If you wish your map statement to return variable numbers of elements, you could do this instead:
for my $i (0..$#array) {
for my $item (some_function($array[$i])) {
...
}
}
If you wish more pervasive laziness than this, then your best option is to learn how to use closures to generate lazy lists. MJD's excellent book Higher Order Perl can walk you through those techniques. However do be warned that they will involve far larger changes to your code.
Bringing this back from the dead to mention that I just wrote the module List::Gen on CPAN which does exactly what the poster was looking for:
use List::Gen;
for my $item ( #{gen { ... } \#list} ) {...}
all computation of the lists are lazy, and there are map / grep equivalents along with a few other functions.
each of the functions returns a 'generator' which is a reference to a tied array. you can use the tied array directly, or there are a bunch of accessor methods like iterators to use.
Use an iterator or consider using Tie::LazyList from CPAN (which is a tad dated).
I asked a similar question at perlmonks.org, and BrowserUk gave a really good framework in his answer. Basically, a convenient way to get lazy evaluation is to spawn threads for the computation, at least as long as you're sure you want the results, Just Not Now. If you want lazy evaluation not to reduce latency but to avoid calculations, my approach won't help because it relies on a push model, not a pull model. Possibly using Corooutines, you can turn this approach into a (single-threaded) pull model as well.
While pondering this problem, I also investigated tie-ing an array to the thread results to make the Perl program flow more like map, but so far, I like my API of introducing the parallel "keyword" (an object constructor in disguise) and then calling methods on the result. The more documented version of the code will be posted as a reply to that thread and possibly released onto CPAN as well.
If I remember correctly, for/foreach do get the whole list first anyways, so a lazily evaluated list would be read completely and then it would start to iterate through the elements. Therefore, I think there's no other way than using a while loop. But I may be wrong.
The advantage of a while loop is that you can fake the sensation of a lazily evaluated list with a code reference:
my $list = sub { return calculate_next_element };
while(defined(my $element = &$list)) {
...
}
After all, I guess a tie is as close as you can get in Perl 5.