Since simply overloading '=' in Perl does not act as one would expect, what is the proper way to do this?
Quote from overload perldoc:
Simple assignment is not overloadable (the '=' key is used for the Copy Constructor). Perl does have a way to make assignments to an object do whatever you want, but this involves using tie(), not overload - see tie and the COOKBOOK examples below.
I have read through the COOKBOOK and the documentation for tie and am having trouble figuring out how you could use it in this way.
I want to be able to create an object like so: my $object = Object->new() Then when I assign it to something I want it to do some special processing.
For example: $object = 3 would internally do something like $object->set_value(3);
I know this isn't necessarily good practice. This is more of an educational question. I just want to know how this can be done. Not whether it should be done.
You can't do that. You can add magic to a variable so that a sub is called after a value is assigned to the variable, but that's a far cry from what you asked.
Besides, what you asked doesn't really make sense. What should the following do?
my $object;
$object = Object->new();
$object = Object->new();
The perl documentation quoted above is somewhat poorly worded. There is no way to overload assignment into some storage location (i.e. variable) that currently contains some specific object or other; the current value of a variable is never important for assignment.
However, what you can do is add magic to that variable directly, that captures the attempt to store into it (SET magic, implemented by the STORE method of the tied class). But it is important to realise this is magic on the variable itself, and not the value that it currently contains.
Related
I'm very new to perl so you'll have to excuse my ignorance.
I'm working on a legacy project. I don't have a dedicated IDE, I'm using PHPStorm with a dedicated Perl plugin.
When hovering over a new keyword I'm getting a warning Using of fancy calls is not recommended, use TCO::AEMAP->new().
The code in question is
my $aemapper = new TCO::AEMAP();
Basically it's suggesting doing
my $aemapper = TCO::AEMAP->new();
Is there any merit to this claim or is it simply more of a convention? I can't find much on google since I'm not exactly sure what to look for.
The new Foo version is called indirect object syntax. It's an old-fashioned way of calling the constructor on a package, and it's discouraged in modern Perl. Here's a partial quote of the relevant section in perldoc.
We recommend that you avoid this syntax, for several reasons.
First, it can be confusing to read. In the above example, it's not
clear if save is a method provided by the File class or simply a
subroutine that expects a file object as its first argument.
When used with class methods, the problem is even worse. Because Perl
allows subroutine names to be written as barewords, Perl has to guess
whether the bareword after the method is a class name or subroutine
name. In other words, Perl can resolve the syntax as either File->new(
$path, $data ) or new( File( $path, $data ) ) .
To parse this code, Perl uses a heuristic based on what package names
it has seen, what subroutines exist in the current package, what
barewords it has previously seen, and other input. Needless to say,
heuristics can produce very surprising results!
Older documentation (and some CPAN modules) encouraged this syntax,
particularly for constructors, so you may still find it in the wild.
However, we encourage you to avoid using it in new code.
The alternative is calling new as a class method on a package with the arrow, as in Foo->new. The arrow -> does three things:
It looks up what's on its left-hand side. In this case, the bareword Foo looks looks like a package name. So Perl will see if it knows a package (or namespace) with that name.
It calls the method on the right-hand side of the arrow in that package it's just found.
It passes in the thing that's on the right, which in our case is Foo, the package name, as the first argument. That's why in the method declaration you will see my ($class, #args) = #_ or similar.
For all other object oriented calls, it's typical to use the arrow syntax. But there is lots of old code around that uses indirect object syntax for new, and especially older modules on CPAN still use it in their documentation.
Both work, but the indirect object syntax is discouraged. Use Foo->new for new code.
As far as I know, in Perl, we can call a subroutine from a Module by using these techniques:
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
Create an Object of that Module in your perl script finally call foo using that Object.
Directly call foo using its path, like this myDir::Module::foo();.
If I am always confused which is better way of calling a subroutine foo.
If I have a dynamic script, which I run from the browser and not command line, which approach one should go for so that the script takes less time.
Thanks.
There is a difference between the fastest, and the best way to call code in Perl.
Edit: please see simbabques answer as well. He especially covers the differences between #1 and #3, and why you would use either.
#1, #3: Function calls
Your #1 and #3 are identical: The subroutine has an unique name in the globally visible namespace. Many names may map to one subroutine via aliases, or importing a module.
If the name of the function you are calling is known at compile time, the sub will be resolved at compile time. This assumes that you don't spontaneously redefine your functions. If the exact function is only known at runtime, this is only a hash lookup away.
There are three ways how functions can be called:
foo(#args);
&foo(#args);
#_ = #args; goto &foo;
Number one (braces sometimes optional) is default, and validates your arguments against the sub prototype (don't use prototypes). Also, a whole call stack frame (with much useful debug information) is constructed. This takes time.
Number two skips the protoype verification, and assumes that you know what you are doing. This is slightly faster. I think this is sloppy style.
Number three is a tail call. This returns from the current sub with the return value of foo. This is fast, as prototypes are ignored, and the current call stack frame can be reused. This isn't useful very often, and has ugly syntax. Inlining the code is about an order of magnitude faster (i.e. in Perl, we prefer loops over recursion ☹).
#2: Method calls
The flexibility of OO comes at a hefty performance price: As the type of the object you call the message on is never known until runtime, the actual method can only be resolved at runtime.
This means that $foo->bar() looks up the function bar in the package that $foo was blessed into. If it can't be found there, it will be searched for in parent classes. This is slow. If you want to use OO, pay attention to shallow hierarchies (→ less lookups). Do also note that Perls default Method Resolution Order is unusual.
You cannot generally reduce a method call to a function call, even if you know the type.
If $foo if of class Foo, and Foo::bar is a sub, then Foo::bar($foo) will skip the method resultution, and might even work. However, this breaks encapsulation, and will break once Foo is subclassed. Also, this doesn't work if Foo doesn't define bar, but the method was defined in a parent class.
I am generally in favour of object orientation, until it is clear from benchmarks that this will not provide the performance you require.
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
In order to do this, you would use Exporter in the module/package that implements the sub. You tell your module what it will export via #EXPORT_OK and #EXPORT. If you use the module, stuff gets imported into your current namespace at compile time. The following to statements are equivalent.
# This is the same...
use Module;
# ... as this
BEGIN {
require Module;
Module->import();
}
You want to do this if you have stuff you are going to use in your main script, or you are going to use often. Some examples are List::Util, Data::Dumper or use feature 'say'. Of course you can also use it in other modules.
use Data::Dumper;
use List::Util qw(max);
use feature qw(say);
my #foo = (1, 2, 3, 4, 5, 23);
print Dumper \#foo;
say max(#foo);
The catch is that here, you 'pollute' your namespace. Do this if you must, but keep in mind that it happens at compile time, so it is not conditional. You cannot say:
if ($foo) {
use Some::Module 'foo';
foo($foo);
} else {
use Something::Else 'bar';
bar();
}
It will load both Some::Module and Something::Else at compile time, thus increasing the time and memory your program consumes. The condition will work of course, but it is not efficient.
Create an Object of that Module in your perl script finally call foo using that Object.
This is the OOp approach. It is (as mentioned above) not compairable to the other methods. You don't need to import methods of an object. You just load your class (which is a module) either with use or require (see above), create an instance and use its methods to your liking. However, you need an object oriented module for that. If you are interestend in how that works, start by taking a look at perlootut.
Directly call foo using its path, like this myDir::Module::foo();.
It's actually not quite its path, but rather its name(space). For example, Data::Dumper is Dumper.pm located in the folder Data, somewhere in your lib dir. But that is not really important.
The main difference to the first approach is that you ommit the importing part. This is useful if you want to build something that conditionally loads certain modules, or if you are in a huge (maybe legacy) application and do not want to pollute the namespace.
if ($order_has_some_condition) {
require Very::Long::NameSpace::For::This::Condition::Module;
Very::Long::NameSpace::For::This::Condition::Module::do_stuff_with_an_order($order);
}
Imagine this piece of code is in a legacy sub with 2k lines and a lot of stuff going on, most of it is never called in our case. We do not want to use our module, making it available for each of the maybe 100 different cases that are handled in this huge piece of code. Instead, we want to only load it if we really need it. Now we require the module and call it's sub directly using the full name.
In conclusing, both the first and the third way have their merits. They both need to exist, and they should both be used if appropriate. In some cases, it is just flavor, but in others it makes sense to decide. The second, OOp, approach is something else entirely.
There are no real speed differences, and as Borodin said, Perl is fast. Of course, if you do not import stuff, you don't have to 'pay' for the import. In a 10-liner script, that doesn't matter. In legacy software with potentially thousands of lines of codes and many use cases in one huge file, it matters a lot.
I hope this helps you decide.
Apologies if this question isn't appropriate for StackOverflow. I suspect the answer is largely a matter of opinion (unless one of the style guides has a recommendation).
I have code that looks something like this
use File::Temp;
sub foo {
...
}
sub bar {
...
}
sub baz {
my $fh = tempfile();
...
}
baz is the only subroutine that uses File::Temp, and I'm not using AutoLoader. Is it reasonable to put the use declaration inside baz, or should I leave it at the top of my script?
Since (as chepner said), there is no difference technically, it really is a matter of style.
The pros of putting then all on top:
Clear at first glance what all the direct module dependencies are
Easier to maintain - if you need to move around code using the library, you don't need to remember to move the library.
Please note that the same exact logic also applies to variable declarations, but in that case, the scoping concerns severely trump the "remember to move the declaration" concerns and therefore you should declare variables in the innermost possible scope as close to where they are used as possible.
For esoteric cases where your own code contains complicated logic in BEGIN{} blocks that depends on all the libraries being loaded (e.g., call a specifically named method from ALL loaded libraries - which I have done) - you will have a bug if some library's use call is AFTER that BEGIN{} block
The cons of putting them all on top:
One can possibly argue that this makes the code less readable since you need to seek out to the start of file to see what you imported from the module. Frankly, I don't believe that but I have heard it expressed and it has at least some merit.
use is essentially the same as putting a require statement inside a BEGIN block, so it doesn't really matter where you put them; they are evaluated before any of the rest of your code.
I have a variable that I need to pass to a subroutine. It is very possible that the subroutine will not need this variable, and providing the value for the variable is expensive. Is it possible to create a "lazy-loading" object that will only be evaluated if it is actually being used? I cannot change the subroutine itself, so it must still look like a normal Perl scalar to the caller.
You'll want to look at Data::Lazy and Scalar::Defer. Update: There's also Data::Thunk and Scalar::Lazy.
I haven't tried any of these myself, but I'm not sure they work properly for an object. For that, you might try a Moose class that keeps the real object in a lazy attribute which handles all the methods that object provides. (This wouldn't work if the subroutine does an isa check, though, unless it calls isa as a method, in which case you can override it in your class.)
Data::Thunk is the most transparent and robust way of doing this that i'm aware of.
However, I'm not a big fan of it, or any other similar modules or techniques that try to hide themself from the user. I prefer something more explicit, like having the code using the value that's hard to compute simply call a function to retrieve it. That way you don't need to precompute your value, your intent is more clearly visible, and you can also have various options to avoid re-computing the value, like lexical closures, perl's state variables, or modules like Memoize.
You might look into tying.
I would suggest stepping back and rethinking how you are structuring your program. Instead of passing a variable to a method that it might not need, make that value available in some other way, such as another method call, that can be called as needed (and not when it isn't).
In Moose, data like this is ideally stored in attributes. You can make attributes lazily built, so they are not calculated until they are first needed, but after that the value is saved so it does not need to be calculated a second time.
If I only want to check if something is impossible or not (i.e., I will not be using something like if(possible)), should I name the boolean notPossible and use if(notPossible) or should I name it possible and use if(!possible) instead?
And just to be sure, if I also have to check for whether it is possible, I would name the boolean possible and use if(possible) along with else, right?
You should probably use isPossible.
Negative names for booleans like notPossible is a very bad idea. You might end up having to write things like if (!notPossible) which makes the code difficult to read. Don't do that.
I tend to err on the side of positivity here and use possible. It means someone can't write some code later that does this...
if (!notPossible)
Which is unreadable.
I like to name booleans with consistent short-verb prefixes such as is or has, and I'd find a not prefix peculiar and tricky to mentally "parse" (so, I suspect, would many readers of the code, whether aware of feeling that way or not;-) -- so, I'd either name the variable isPossible (and use !isPossible), or just name the variable isImpossible (many adjectives have handy antonyms like this, and for a prefix has you could use a prefix lacks to make an antonym of the whole thing;-).
I generally try to name my flags so that they will read as nicely as possible where they are being used. That means try to name them so that they won't have to be negated where they are used.
I know some folks insist all names be positive, so that people don't get confused between the negations in the name and those in their head. That's probably a good policy for a boolean in a class interface. But if its local to a single source file, and I know all the calls will negate it, I'd much rather see if (impossible && ...) than if (!isPossible && ...).
Whichever is easier to read in your specifc application. Just be sure you don't end up with "if (!notPossible)".
I think it's considered preferable to avoid using negatives in variable names so that you avoid the double negative of if(!notPossible).
I recommend using isPossible. It just seems to make a lot of sense to use 'is' (or perhaps 'has') whenever you can when naming boolean variables. This is logical because you want to find out if something is possible, right?
I agree that negatively named booleans are a bad idea, but sometimes it's possible to reframe the condition in a way that's positive. For example, you might use pathIsBlocked rather than cannotProceed, or rather than isNotAbleToDie you could use isImmortal.
You should name it for exactly what is being stored in it. If you're storing whether it's possible then name it isPossible. If you're storing whether it's impossible name it isImpossible.
In either case you can use an else if you need to check for both cases.
From your description it seems more important to you to check for impossibility so I'd go with isImpossible:
if(isImpossible)
{
// ...
}
else
{
//...
}