How can I represent sets in Perl? - perl

I would like to represent a set in Perl. What I usually do is using a hash with some dummy value, e.g.:
my %hash=();
$hash{"element1"}=1;
$hash{"element5"}=1;
Then use if (defined $hash{$element_name}) to decide whether an element is in the set.
Is this a common practice? Any suggestions on improving this?
Also, should I use defined or exists?
Thank you

Yes, building hash sets that way is a common idiom. Note that:
my #keys = qw/a b c d/;
my %hash;
#hash{#keys} = ();
is preferable to using 1 as the value because undef takes up significantly less space. This also forces you to uses exists (which is the right choice anyway).

Use one of the many Set modules on CPAN. Judging from your example, Set::Light or Set::Scalar seem appropriate.
I can defend this advice with the usual arguments pro CPAN (disregarding possible synergy effects).
How can we know that look-up is all that is needed, both now and in the future? Experience teaches that even the simplest programs expand and sprawl. Using a module would anticipate that.
An API is much nicer for maintenance, or people who need to read and understand the code in general, than an ad-hoc implementation as it allows to think about partial problems at different levels of abstraction.
Related to that, if it turns out that the overhead is undesirable, it is easy to go from a module to a simple by removing indirections or paring data structures and source code. But on the other hand, if one would need more features, it is moderately more difficult to achieve the other way around.
CPAN modules are already tested and to some extent thoroughly debugged, perhaps also the API underwent improvement steps over the time, whereas with ad-hoc, programmers usually implement the first design that comes to mind.
Rarely it turns out that picking a module at the beginning is the wrong choice.

That's how I've always done it. I would tend to use exists rather than defined but they should both work in this context.

Related

Should I prefer hashes or hashrefs in Perl?

I'm still learning perl.
For me it feels more "natural" to references to hashes rather than directly access them, because it is easier to pass references to a sub, (one variable can be passed instead of a list). Generally I prefer this approach to that where one directly accesses %hashes".
The question is, where (in what situations) is better to use plain %hashes, so
$hash{key} = $value;
instead of
$href->{key} = $value
Is here any speed, or any other "thing" what prefers to use %hashes and not $hashrefs? Or it is matter only of pure personal taste and TIMTOWTDI? Some examples, when is better to use %hash?
I think this kind of question is very legitimate: Programming languages such as Perl or C++ have come a long way and accumulated a lot of historical baggage, but people typically learn them from ahistorical, synchronous exposés. Hence they keep wondering why TIMTOWDI and WTF all these choices and what is better and what should be preferred?
So, before version 5, Perl didn't have references. It only had value types. References are an add-on to Perl 4, enabling lots more stuff to be written. Value types had to be retained, of course, to keep backward compatibility; and also, for simplicity's sake, because frequently you don't need the indirection that references are.
To answer your question:
Don't waste time thinking about the speed of Perl hash lists. They're fast. They're memory access. Accessing a database or the filesystem or the net, that is where your program will typically spend time.
In theory, a dereference operation should take a tiny bit of time, so tiny it shouldn't matter.
If you're curious, then benchmark. Don't draw too many conclusions from differences you might see. Things could look different on another release.
So there is no speed reason to favour references over value types or vice versa.
Is there any other reason? I'd say it's a question of style and taste. Personally, I prefer the syntax without -> accessors.
If you can use a plain hashes, to describe your data, you use a plain hash. However, when your data structure gets a bit more complex, you will need to use references.
Imagine a program where I'm storing information about inventory items, and how many I have in stock. A simple hash works quite well:
$item{XP232} = 324;
$item{BV348} = 145;
$item{ZZ310} = 485;
If all you're doing is creating quick programs that can read a file and store simple information for a report, there's no need to use references at all.
However, when things get more complex, you need references. For example, my program isn't just tracking my stock, I'm tracking all aspects of my inventory. Inventory items also have names, the company that creates them, etc. In this case, I'll want to have my hashes not pointing to a single data point (the number of items I have in stock), but a reference to a hash:
$item{XP232}->{DESCRIPTION} = "Blue Widget";
$item{XP232}->{IN_STOCK} = 324;
$item{XP232}->{MANUFACTURER} = "The Great American Widget Company";
$item{BV348}->{DESCRIPTION} = "A Small Purple Whatzit";
$item{BV348}->{IN_STOCK} = 145;
$item{BV348}->{MANUFACTURER} = "Acme Whatzit Company";
You can do all sorts of wacky things to do something like this (like have separate hashes for each field or put all fields in a single value separated by colons), but it's simply easier to use references to store these more complex structures.
For me the main reason to use $hashrefs to %hashes is the ability to give them meaningful names (a related idea would be name the references to an anonymous hash) which can help you separate data structures from program logic and make things easier to read and maintain.
If you end up with multiple levels of references (refs to refs?!) you start to loose this clean and readable advantage, though. As well, for short programs or modules, or at earlier stages of development where you are retesting things as you go, directly accessing the %hash can make things easier for simple debugging (print statements and the like) and avoiding accidental "action at a distance" issues so you can focus on "iterating" through your design, using references where appropriate.
In general though I think this is a great question because TIMTOWDI and TIMTOCWDI where C = "correct". Thanks for asking it and thanks for the answers.

Moose vs. MooseX::Declare

POSTLUDE
MooseX::Declare would no longer be recommended by anyone as it relies on Devel::Declare which served its purpose but is itself obsolete. At this point if anyone wants MX::D they should look at Moops
ORIGINAL
Assuming I already have a decent knowledge of old-style Perl OO, and assuming I am going to write some new code in some flavor of Moose (yes, I understand there is a performance hit), I was wondering if deeper down either rabbit hole, am I going to wish that I had chosen the other path? Could you SO-monks enlighten me with the relative merits of Moose vs. MooseX::Declare (or some other?). Also how interchangeable they are, one for one class and the other for another, should I choose to switch.
(p.s. I would be ok cw-ing this question, however I think a well formed answer might be able to avoid subjectivity)
MooseX::Declare is basically a sugar-layer of syntax over Moose. They are, for everything past the parser, identical in what they produce. MooseX::Declare just produces a lot more of it, with a lot less writing.
Speaking as someone who enjoys the syntax of MooseX::Declare but still prefers to write all of my code in plain Moose, the tradeoffs are mostly on the development & maintainability side.
The basic list of items of note when comparing them:
MooseX::Declare has much more concise syntax. Things that take several hundred lines in plain old perl objects (POPO?), may take 50 lines in Moose, may take 30 lines in MooseX::Declare. The code from MooseX::Declare is to me more readable and elegant as well.
MooseX::Declare means you have MooseX::Types and MooseX::Method::Signatures for free. This leads to the very elegant method foo(Bar $bar, Baz $baz) { ... } syntax that caused people to come back to Perl after several years in Ruby.
A downside to MooseX::Declare is that some of the error messages are much more cryptic than Moose. The error to a TypeConstraint validation failure may happen several layers deep in MooseX::Types::Structured and getting from there to where in your code you broke it can be difficult for people new to the system. Moose has this problem too, but to a lesser degree.
The places where the dragons hide in MooseX::Declare can be subtly different than where they hide in Moose. MooseX::Declare puts in an effort to walk around known Moose issues ( the timing of with() for example) but introduces some new places to be aware of. MooseX::Types for example have a wholly different set of problems from Moose's native Stringy types[^1].
MooseX::Declare has yet another performance hit. This is known to the MooseX::Declare developers and people are working on it (for several values of working I believe).
MooseX::Declare adds more dependencies to Moose. I add this one because people complain already about Moose's dependency list which is around 20 modules. MooseX::Declare adds around another 5 direct dependencies on top of that. The total list however according to http://deps.cpantesters.org/ is Moose 27, MooseX::Declare 91.
If you're willing to go with MooseX::Declare, the best part is you can swap between them at the per-class level. You need not pick one over the over in a project. If this class is better in Moose because of Performance needs, or it's being maintained by Junior programmers, or being installed on a more tightly controlled system. You can do that. If that class can benefit from the extra clarity of the MooseX::Declare syntax you can do that too.
Hope this helps answer the question.
[^1]: Some say fewer, some say more. Honestly the Moose core developers are still arguing this one, and there is no right answer.
One minor aspect that may interest you, and I may as well be interested by an answer to this : the main problem I had with MooseX::Declare, which was important in my specific case, was that I was unable to pack my application as an executable, neither with PAR::Packer nor ActiveState PerlApp.
I then used https://github.com/komarov/undeclare/blob/master/undeclare.pl to go back to Moose code.
As written above
other problems with MooseX::Declare:
- terrible error messages ( really, useless. unless you use Method::Signatures::Modifiers )
- performance hit ( as You have noted ), but in my opinion not small. ( we profiled some big real-life apps )
- problem with TryCatch ( if U use that, see: https://rt.cpan.org/Public/Bug/Display.html?id=82618 )
- some incompatibilities in mixed ( MooseX - non-Moose environment, eg. failed $VERSION check )
If You do not need the 'syntactic sugar' of MooseX, do not use it. Depending on the task You are into, I'd use from 'bottom-to-top', eg.
1. Mouse+Mehod::Signatures
2. Moose
3. then perhaps MooseX
depending on what you want.
Upgrading is not too complicated in this order. However, if You come to the point that You really need MooseX, I'd rather suggest You looking for some other, OO-wise developed language that offer most of the features in-box ( eg. horribile dictu Ruby or Python ), and those, that are not found, You perhaps you can live without.
If You really want Moose, consider a bottom-to-top approach starting with the less sugar. I prefer using Mouse + Method::Signatures first. My scenario is that I am sitting on the backend where we need very few objects, shallow hierarchy, but sometimes fast accessors - then we can still fall back to XSAccessor. Mouse+Method Signatures seem to be a rather good compromise between syntactic help and speed. If my design really needs more, then simply upgrade to Moose.
I can confirm the speed penalty with MooseX::Declare not only with simple accessor benchmarks ( https://metacpan.org/pod/App::Benchmark::Accessors ), but also in real-life application. This combined with cryptic error messages rules MooseX::Declare out.

What's the best way to make a deep copy of a data structure in Perl?

Given a data structure (e.g. a hash of hashes), what's the clean/recommended way to make a deep copy for immediate use? Assume reasonable cases, where the data's not particularly large, no complicated cycles exist, and readability/maintainability/etc. are more important than speed at all costs.
I know that I can use Storable, Clone, Clone::More, Clone::Fast, Data::Dumper, etc. What's the current best practice?
Clone is much faster than Storable::dclone, but the latter supports more data types.
Clone::Fast and Clone::More are pretty much equivalent if memory serves me right, but less feature complete than even Clone, and Scalar::Util::Clone supports even less but IIRC is the fastest of them all for some structures.
With respect to readability these should all work the same, they are virtually interchangeable.
If you have no specific performance needs I would just use Storable's dclone.
I wouldn't use Data::Dumper for this simply because it's so cumbersome and roundabout. It's probably going to be very slow too.
For what it's worth, if you ever want customizable cloning then Data::Visitor provides hooking capabilities and fairly feature complete deep cloning is the default behavior.
My impression is that Storable::dclone() is somewhat canonical.
Clone is probably what you want for that. At least, that's what all the code I've seen uses.
Try to use fclone from Panda::Lib which seems the fastest one (written in XS)
Quick and dirty hack if you're already dealing with JSONs and using the JSON module in your code: convert the structure to a JSON and then convert the JSON back to a structure:
use JSON;
my %hash = (
obj => {},
arr => []
);
my $hash_ref_to_hash_copy = from_json(to_json(\%hash));
The only negative possibly being having to deal with a hash reference instead of a pure hash, but still, this has come in handy a few times for me.

How do you do Design by Contract in Perl?

I'm investigating using DbC in our Perl projects, and I'm trying to find the best way to verify contracts in the source (e.g. checking pre/post conditions, invariants, etc.)
Class::Contract was written by Damian Conway and is now maintained by C. Garret Goebel, but it looks like it hasn't been touched in over 8 years.
It looks like what I want to use is Moose, as it seems as though it might offer functionality that could be used for DbC, but I was wondering if anyone had any resources (articles, etc.) on how to go about this, or if there are any helpful modules out there that I haven't been able to find.
Is anyone doing DbC with Perl? Should I just "jump in" to Moose and see what I can get it to do for me?
Moose gives you a lot of the tools (if not all the sugar) to do DbC. Specifically, you can use the before, after and around method hooks (here's some examples) to perform whatever assertions you might want to make on arguments and return values.
As an alternative to "roll your own DbC" you could use a module like MooseX::Method::Signatures or MooseX::Method to take care of validating parameters passed to a subroutine. These modules don't handle the "post" or "invariant" validations that DbC typically provides, however.
EDIT: Motivated by this question, I've hacked together MooseX::Contract and uploaded it to the CPAN. I'd be curious to get feedback on the API as I've never really used DbC first-hand.
Moose is an excellent oo system for perl, and I heartily recommend it for anyone coding objects in perl. You can specify "subtypes" for your class members that will be enforced when set by accessors or constructors (the same system can be used with the Moose::Methods package for functions). If you are coding more than one liners, use Moose;
As for doing DbC, well, might not be the best fit for perl5. It's going to be hard in a language that offers you very few guarantees. Personally, in a lot of dynamic languages, but especially perl, I tend to make my guiding philosophy DRY, and test-driven development.
I would also recommend using Moose.
However as an "alternative" take a look at Sub::Contract.
To quote the author....
Sub::Contract offers a pragmatic way to implement parts of the programming by contract paradigm in Perl.
Sub::Contract is not a design-by-contract framework.
Sub::Contract aims at making it very easy to constrain subroutines input arguments and return values in order to emulate strong typing at runtime.
If you don't need class invariants, I've found the following Perl Hacks book recommendation to be a good solution for some programs. See Smart::Comments.

When should I use OO Perl?

I'm just learning Perl.
When is it advisable to use OO Perl instead of non-OO Perl?
My tendency would be to always prefer OO unless the project is just a code snippet of < 10 lines.
TIA
From Damian Conway:
10 criteria for knowing when to use object-oriented design
Design is large, or is likely to become large
When data is aggregated into obvious structures, especially if there’s a lot of data in each aggregate
For instance, an IP address is not a good candidate: There’s only 4 bytes of information related to an IP address. An immigrant going through customs has a lot of data related to him, such as name, country of origin, luggage carried, destination, etc.
When types of data form a natural hierarchy that lets us use inheritance.
Inheritance is one of the most powerful feature of OO, and the ability to use it is a flag.
When operations on data varies on data type
GIFs and JPGs might have their cropping done differently, even though they’re both graphics.
When it’s likely you’ll have to add data types later
OO gives you the room to expand in the future.
When interactions between data is best shown by operators
Some relations are best shown by using operators, which can be overloaded.
When implementation of components is likely to change, especially in the same program
When the system design is already object-oriented
When huge numbers of clients use your code
If your code will be distributed to others who will use it, a standard interface will make maintenence and safety easier.
When you have a piece of data on which many different operations are applied
Graphics images, for instance, might be blurred, cropped, rotated, and adjusted.
When the kinds of operations have standard names (check, process, etc)
Objects allow you to have a DB::check, ISBN::check, Shape::check, etc without having conflicts between the types of check.
There is a good discussion about same subject # PerlMonks.
Having Moose certainly makes it easier to always use OO from the word go. The only real exception is if compilation start-up is an issue (Moose does currently have a compile time overhead).
I don't think you should measure it by lines of code.
You are right, often when you are just writing a simple script OO is probably too much overhead, but I think you should be more flexible regarding the 10 lines aproach.
In all cases when you are using OO Perl Rememebr to use Moose (or Mouse)
This question doesn't have that much to do with Perl. The question is "when, given a choice, should I use OO?" That "given a choice" bit is because in some languages (Java, for example), you really don't have any choice.
The answer is "when it makes sense". Think about the problem you're trying to solve. Does the problem fit into the OO concepts of classes and object? If it does, great, use OO. Otherwise use some other paradigm.
Perl is fairly flexible, and you can easily write procedural, functional, or OO Perl, or even mix them together. Don't get hung up on doing OO because everyone else is. Learn to use the right approach for each task.
All of this takes experience and practice, so make sure to try all these approaches out, and maybe even take some smaller problems and solve them in multiple ways to see how each works.
Damian Conway has a passage in Perl Best Practices about this. It is not a rule that you have to follow it, but it is probably better advice that I can give without knowing a lot about what you are doing.
Here is the publisher's page if that is a better place to link to the book.