Why are Perl modules case sensitive? - perl

Although I haven't seen any modules with same name but with different cases, but just for curiosity, I was trying to install Log::Log4perl and during installation I misspelled it 'Perl' in place of 'perl':
% cpan -i Log::Log4Perl
Cannot install Log::Log4Perl, don't know what it is.
When I used correct name then things went well:
% cpan -i Log::Log4perl
Same names but different cases can create conflicts. Is there any specific reason behind that?

Because
use Foo::Bar;
would be ambiguous on case-sensitive file systems (Foo/Bar.pm? foo/bar.pm? FOO/BAR.pm? Foo/Bar.PM? etc), and it would require traversing the directory's contents to find the file's name. (Up to 9 directories per element of #INC would need traversing for Foo::Bar.)

In Perl, modules loaded with use translated directly onto the file system. Something such as use Log::Log4perl translates into:
BEGIN {
require 'Log/Log4perl.pm';
Log::Log4perl->import;
}
On a system that has a case sensitive file system, if the name is not exactly in the same case, it might as well not even exist. This is explained in the documentation for use and require. Different cases mean different names.
As such, when the cpan command translates a package name into a distribution, it uses the exact case you specify. The filesystem might be case insensitive, but inside Perl, the package names are still case sensitive. The literal case you enter is the one that Perl (and the cpan client) uses. If a package of that exact case isn't defined, the right things won't happen.
I consider this to be one of the major design decisions that hold Perl back and talked about it in my Frozen Perl 2011 keynote address.
Curiously, case insensitive filesystems lets you get away with it, as seen with the use seems to be case INSENSITIVE!! post on Perlmonks.

Related

Simplest way to get a comprehensive listing of package names available in CPAN?

Suppose that, as a private project, I have implemented a Perl package, and tested it, both formally and through extensive everyday use. I find the package useful and solid enough to warrant submitting it to CPAN.
Up to this point, since the package has been a private project, I have not worried too much about the package's name, but now that I want to submitted to CPAN, however, I would like the package's name to fit well within the ecology of package names already in CPAN.
In order to find a suitable "CPAN name" for my package, I would have to inspect a comprehensive listing of all these package names1.
What is the simplest way to get this comprehensive listing of names of packages in CPAN?
ObPedantry
(IOW, if the question above is already clear enough for you, you may safely ignore what follows.)
I don't think that I can give a technically correct formal definition of what I mean here by "package name", so let me at least give an "operational definition".
If, for example, the one-liner
$ perl -MFoo::Bar::Baz -c -e 1
fails with an error beginning with
Can't locate Foo/Bar/Baz.pm in #INC ...
..., but after installing some distributions from CPAN, the same oneliner succeeds with
-e syntax OK
...then I'll say that "Foo::Bar::Baz is a package name in CPAN".
(We could split hairs over the package/module distinction, and consider scenarios in which the distinction matters, but please let's not.)
Furthermore, if after inspecting the list this question asks about I discover that, on the one hand, there are in fact many eminent package names in CPAN that begin with the prefix Foo::Bar::, and on the other, there are none (or negligibly few) that begin with the prefix Fubar::, then this would be a good enough reason for me to change the name of my Fubar::Frobozz package to Foo::Bar::Frobozz before submitting it to CPAN.
1 Of course, after inspecting such a list, I may discover that my package does not add sufficiently new functionality relative to what's already available in CPAN to warrant submitting my package to CPAN after all.
If you have run cpan before, you have downloaded a comprehensive package and distribution list under <cpan-home>/sources/modules/02packages.details.txt.gz.
A fresh copy is available on any CPAN mirror, e.g.
http://www.cpan.org/modules/02packages.details.txt.gz .
PAUSE::Packages can do what you want, however you probably want to use this list, but http://prepan.org/ can provide advice/review before submission to cpan, with of course reading on the naming of modules first.
Are you sure that's a thing you want? There are 33,623 distributions on CPAN at the time of writing. Within cpan you can enter
cpan> d /./
That's d for distributions followed by a regex pattern that matches the names you're interested in
If you're really interested in packages -- and a distribution may contain multiple package names -- you need
cpan> m/./
where m is for modules. There are 163,136 of those, which means there's an average of four or five packages per distribution, and it takes cpan a few minutes to generate the list. (I'm sorry, I didn't monitor the exact time.)
You could use MetaCPAN::Client
I found this article which gives the idea about using this module.
#!/usr/bin/perl
use strict; use warnings; use MetaCPAN::Client;
my $mcpan = MetaCPAN::Client->new();
my $release_results = $mcpan->release({ status => 'latest' } );
while ( my $release = $release_results->next ) {
printf "%s v%s\n", $release->distribution, $release->version;
}
Currently this gave me 32601 result like this:
Proc-tored v0.11
Locale-Utils-PlaceholderBabelFish v0.004
Perinci-To-Doc v0.83
Mojolicious-Plugin-Qooxdoo v0.905
App-cdnget v0.05
Baal-Parser v0.01
Acme-DoOrDie v0.001
Net-Shadowsocks v0.9.0
MetaCPAN-Client v2.006000
This modules also gives information about release, module, author, and file & uses Elasticsearch.
It also get updated regularly on every MetaCPAN API change.

Module naming convention: Reserved prefix for internal distributions? [duplicate]

I have read the perldoc on modules, but I don't see a recommendation on naming a package so it won't collide with builtin or CPAN module/package names.
In the past, to develop a local Session.pm module, I have created a local directory using my company's name, such as:
package Company::Session;
... and Session.pm would be found in directory Company/.
But I'm just not a fan of this naming convention. I would rather name the package hierarchy closer to the functionality of the code. But that's how it's done on CPAN generally...
I feel like I am missing something fundamental. I also looked in Damian's Perl Best Practices but I may not have been looking in the right place...
Any recommendations on avoiding package namespace collisions the right way?
Update w/ Related Question: if there is a package name conflict, how does Perl choose which one to use? Thanks everyone.
The namespace Local:: has been reserved for just this purpose. No module that starts with that prefix will be accepted to CPAN or the core. Alternatively, you can use an underscore in the top-level name (like My_Corp::Session or just My_Session). All categories with an underscore have also been reserved. (This is mentioned in perlmodlib, under "Select a name for the module".)
Note that both those reservations apply only to the top-level name. For example, there are CPAN modules named Time::Local and Text::CSV_XS. But Local::Time and Text_CSV::XS are reserved names and would not be accepted on CPAN.
Naming modules after your company is fine too. (Well, unless you work for some really generic sounding company.) Using the reverse domain name is probably overkill, unless you intend to distribute your modules to others. (But in that case, you should probably register a normal module name.)
How Perl resolves a conflict:
Perl searches the directories in #INC for a module with the specified name. The first module found is the one used. So the order of directories in #INC determines which module would be used (if you have modules with the same name installed in different locations).
perl -V will report the contents of #INC (the highest-priority directories are listed first). But there are lots of ways to manipulate #INC at runtime, too.
BTW, Raku can handle multiple modules with the same name by different authors, and even use more than one in a single program. That's a different solution.
There is nothing wrong with naming your internal modules after your company; I always do this. 90% of my code ends up on CPAN, so it has "normal" names, but the internal stuff is always starts with ClientName::.
I'm sure everyone else does this too.
What's wrong with just picking a name for your package that you like and then googling "perl the-name-you-picked"?
The #INC variable contains a list of directories to in which to look for modules. It starts with the first entry and then moves on to next if it doesn't find the request module. #INC has a default value that created when perl is compiled, but you can can change it with the PERL5LIB environment variable, the lib pragma, and directly manipulating the #INC array in a BEGIN block:
#!/usr/bin/perl
BEGIN {
#INC = (); #no modules can be found
}
use strict; #error: Can't locate strict.pm in #INC (#INC contains:)
If you need the maximum level of certainty that your module name will not conflict with someone else's you can take a page from Java's book: name the module with the name of the companies domain. So if you work for Example, Inc. and their domain name is example.com, you would name your HTML parser module Com::Example::HTML::Parser or Example::Com::HTML::Parser. The benefit of the first is that if you have multiple subunits they can all have their own name space, but the modules will still sort together:
Com::Example::Biz::FindCustomers
Com::Example::IT::ParseLogs
Com::Example::QA::TestServer
but it does look odd at first.
(I know this post is old, but as I've had to sort this out in the past few months, I thought I'd weigh in)
At work we decided that 'Local::' felt too geographic. CompanyName:: had some problems for us too that aren't development related, I'll skip those, though I will say that CompanyName is long when you have to type it dozens of times.
So we settled on 'Our::'. Sure, we're not 'CPAN Safe' as there could be the day when we want to use a CPAN module with the Our:: prefix. But it feels nice.
Our::Data is our Class::DBI module
Our::App is our generic app framework that does config handling and Getopt stuff
Nice to read and nice to type.

What is the difference between library files and modules?

What is the difference between library files and modules in Perl?
It's all Perl code to perl. All distinctions are purely idiomatic.
Perl code meant for inclusion that uses a package directive:
Called "module".
Usually has the extension .pm. Must have this extension for use to find them.
Should always be loaded with require, possibly via use.
Must therefore return a true value.
More modular, better supported by CPAN.
Perl code meant for inclusion that doesn't use a package directive:
Called "library". (At least historically. These days, "library" might also be used to refer to a module or distribution.)
Usually has the extension .pl.
Should always be loaded with do.
Pollutes the caller's namespace.
Usually indicative of a substandard design. Avoid these!
Perl code meant for direct execution by interpreter:
Called "script".
Usually has the extension .pl, or none at all.
Will probably start with a shebang (#!) line so they can be started without specifying perl.
Library files (I'm assuming you mean require 'foo.pl' stuff here) are an obsolete (pre-Perl 5) form of external module. For the most part, you shouldn't need to care any more, although there are still some Perl 4 installations around and therefore still some Perl code that remains backward compatible with them (and there's some code that's simply never been updated and still loads getcwd.pl etc.).
Nothing. They are both files that contain Perl code. Here are some of the possible circumstantial differences, though.
A perl executable is more likely to have a #!/bin/perl shbang.
Old .pl Perl libraries (hence the 'p' + 'l') are more likely to expect to be required than .pm modules.
Perl 5 style (.pm) modules are more likely to use Exporter -- although even newer module eschew exporting anything.

How do I choose a package name for a custom Perl module that does not collide with builtin or CPAN packages names?

I have read the perldoc on modules, but I don't see a recommendation on naming a package so it won't collide with builtin or CPAN module/package names.
In the past, to develop a local Session.pm module, I have created a local directory using my company's name, such as:
package Company::Session;
... and Session.pm would be found in directory Company/.
But I'm just not a fan of this naming convention. I would rather name the package hierarchy closer to the functionality of the code. But that's how it's done on CPAN generally...
I feel like I am missing something fundamental. I also looked in Damian's Perl Best Practices but I may not have been looking in the right place...
Any recommendations on avoiding package namespace collisions the right way?
Update w/ Related Question: if there is a package name conflict, how does Perl choose which one to use? Thanks everyone.
The namespace Local:: has been reserved for just this purpose. No module that starts with that prefix will be accepted to CPAN or the core. Alternatively, you can use an underscore in the top-level name (like My_Corp::Session or just My_Session). All categories with an underscore have also been reserved. (This is mentioned in perlmodlib, under "Select a name for the module".)
Note that both those reservations apply only to the top-level name. For example, there are CPAN modules named Time::Local and Text::CSV_XS. But Local::Time and Text_CSV::XS are reserved names and would not be accepted on CPAN.
Naming modules after your company is fine too. (Well, unless you work for some really generic sounding company.) Using the reverse domain name is probably overkill, unless you intend to distribute your modules to others. (But in that case, you should probably register a normal module name.)
How Perl resolves a conflict:
Perl searches the directories in #INC for a module with the specified name. The first module found is the one used. So the order of directories in #INC determines which module would be used (if you have modules with the same name installed in different locations).
perl -V will report the contents of #INC (the highest-priority directories are listed first). But there are lots of ways to manipulate #INC at runtime, too.
BTW, Raku can handle multiple modules with the same name by different authors, and even use more than one in a single program. That's a different solution.
There is nothing wrong with naming your internal modules after your company; I always do this. 90% of my code ends up on CPAN, so it has "normal" names, but the internal stuff is always starts with ClientName::.
I'm sure everyone else does this too.
What's wrong with just picking a name for your package that you like and then googling "perl the-name-you-picked"?
The #INC variable contains a list of directories to in which to look for modules. It starts with the first entry and then moves on to next if it doesn't find the request module. #INC has a default value that created when perl is compiled, but you can can change it with the PERL5LIB environment variable, the lib pragma, and directly manipulating the #INC array in a BEGIN block:
#!/usr/bin/perl
BEGIN {
#INC = (); #no modules can be found
}
use strict; #error: Can't locate strict.pm in #INC (#INC contains:)
If you need the maximum level of certainty that your module name will not conflict with someone else's you can take a page from Java's book: name the module with the name of the companies domain. So if you work for Example, Inc. and their domain name is example.com, you would name your HTML parser module Com::Example::HTML::Parser or Example::Com::HTML::Parser. The benefit of the first is that if you have multiple subunits they can all have their own name space, but the modules will still sort together:
Com::Example::Biz::FindCustomers
Com::Example::IT::ParseLogs
Com::Example::QA::TestServer
but it does look odd at first.
(I know this post is old, but as I've had to sort this out in the past few months, I thought I'd weigh in)
At work we decided that 'Local::' felt too geographic. CompanyName:: had some problems for us too that aren't development related, I'll skip those, though I will say that CompanyName is long when you have to type it dozens of times.
So we settled on 'Our::'. Sure, we're not 'CPAN Safe' as there could be the day when we want to use a CPAN module with the Our:: prefix. But it feels nice.
Our::Data is our Class::DBI module
Our::App is our generic app framework that does config handling and Getopt stuff
Nice to read and nice to type.

What's the best way to have two modules which use functions from one another in Perl?

Unfortunately, I'm a totally noob when it comes to creating packages, exporting, etc in Perl. I tried reading some of the modules and often found myself dozing off from the long chapters. It would be helpful if I can find what I need to understand in just one simple webpage without the need to scroll down. :P
Basically I have two modules, A & B, and A will use some function off from B and B will use some functions off from A. I get a tons of warning about function redefined when I try to compile via perl -c.
Is there a way to do this properly? Or is my design retarded? If so what would be a better way? As the reason I did this is to avoid copy n pasting the other module functions again into this module and renaming them.
It's not really good practice to have circular dependencies. I'd advise factoring something or another to a third module so you can have A depends on B, A depends on C, B depends on C.
So... the suggestion to factor out common code into another module is
a good one. But, you shouldn't name modules *.pl, and you shouldn't
load them by require-ing a certain pathname (as in require
"../lib/foo.pl";). (For one thing, saying '..' makes your script
depend on being executed from the same working directory every time.
So your script may work when you run it as perl foo.pl, but it won't
work when you run it as perl YourApp/foo.pl. That is generally not good.)
Let's say your app is called YourApp. You should build your
application as a set of modules that live in a lib/ directory. For
example, here is a "Foo" module; its filename is lib/YourApp/Foo.pm.
package YourApp::Foo;
use strict;
sub do_something {
# code goes here
}
Now, let's say you have a module called "Bar" that depends on "Foo".
You just make lib/YourApp/Bar.pm and say:
package YourApp::Bar;
use strict;
use YourApp::Foo;
sub do_something_else {
return YourApp::Foo::do_something() + 1;
}
(As an advanced exercise, you can use Sub::Exporter or Exporter to
make use YourApp::Foo install subroutines in the consuming package's
namespace, so that you don't have to write YourApp::Foo:: before
everything.)
Anyway, you build your whole app like this. Logical pieces of
functionally should be grouped together in modules (or even better,
classes).
To make all this run, you write a small script that looks like this (I
put these in bin/, so let's call it bin/yourapp.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use feature ':5.10';
use FindBin qw($Bin);
use lib "$Bin/../lib";
use YourApp;
YourApp::run(#ARGV);
The key here is that none of your code is outside of modules, except a
tiny bit of boilerplate to start your app running. This is easy to
maintain, and more importantly, it makes it easy to write automated
tests. Instead of running something from the command-line, you can
just call a function with some values.
Anyway, this is probably off-topic now. But I think it's important
to know.
The simple answer is to not test compile modules with perl -c... use perl -e'use Module'
or perl -e0 -MModule instead.
perl -c is designed for doing a test compile of a script, not a module. When you run it
on one of your
When recursively using modules, the key point is to make sure anything externally referenced is set up early. Usually this means at least making use #ISA be set in a compile time construct (in BEGIN{} or via "use parent" or the deprecated "use base") and #EXPORT and friends be set in BEGIN{}.
The basic problem is that if module Foo uses module Bar (which uses Foo), compilation of Foo stops right at that point until Bar is fully compiled and it's mainline code has executed. Making sure that whatever parts of Foo Bar's compile and run-of-mainline-code
need are there is the answer.
(In many cases, you can sensibly separate out the functionality into more modules and break the recursion. This is best of all.)