Simplest way to get a comprehensive listing of package names available in CPAN? - perl

Suppose that, as a private project, I have implemented a Perl package, and tested it, both formally and through extensive everyday use. I find the package useful and solid enough to warrant submitting it to CPAN.
Up to this point, since the package has been a private project, I have not worried too much about the package's name, but now that I want to submitted to CPAN, however, I would like the package's name to fit well within the ecology of package names already in CPAN.
In order to find a suitable "CPAN name" for my package, I would have to inspect a comprehensive listing of all these package names1.
What is the simplest way to get this comprehensive listing of names of packages in CPAN?
ObPedantry
(IOW, if the question above is already clear enough for you, you may safely ignore what follows.)
I don't think that I can give a technically correct formal definition of what I mean here by "package name", so let me at least give an "operational definition".
If, for example, the one-liner
$ perl -MFoo::Bar::Baz -c -e 1
fails with an error beginning with
Can't locate Foo/Bar/Baz.pm in #INC ...
..., but after installing some distributions from CPAN, the same oneliner succeeds with
-e syntax OK
...then I'll say that "Foo::Bar::Baz is a package name in CPAN".
(We could split hairs over the package/module distinction, and consider scenarios in which the distinction matters, but please let's not.)
Furthermore, if after inspecting the list this question asks about I discover that, on the one hand, there are in fact many eminent package names in CPAN that begin with the prefix Foo::Bar::, and on the other, there are none (or negligibly few) that begin with the prefix Fubar::, then this would be a good enough reason for me to change the name of my Fubar::Frobozz package to Foo::Bar::Frobozz before submitting it to CPAN.
1 Of course, after inspecting such a list, I may discover that my package does not add sufficiently new functionality relative to what's already available in CPAN to warrant submitting my package to CPAN after all.

If you have run cpan before, you have downloaded a comprehensive package and distribution list under <cpan-home>/sources/modules/02packages.details.txt.gz.
A fresh copy is available on any CPAN mirror, e.g.
http://www.cpan.org/modules/02packages.details.txt.gz .

PAUSE::Packages can do what you want, however you probably want to use this list, but http://prepan.org/ can provide advice/review before submission to cpan, with of course reading on the naming of modules first.

Are you sure that's a thing you want? There are 33,623 distributions on CPAN at the time of writing. Within cpan you can enter
cpan> d /./
That's d for distributions followed by a regex pattern that matches the names you're interested in
If you're really interested in packages -- and a distribution may contain multiple package names -- you need
cpan> m/./
where m is for modules. There are 163,136 of those, which means there's an average of four or five packages per distribution, and it takes cpan a few minutes to generate the list. (I'm sorry, I didn't monitor the exact time.)

You could use MetaCPAN::Client
I found this article which gives the idea about using this module.
#!/usr/bin/perl
use strict; use warnings; use MetaCPAN::Client;
my $mcpan = MetaCPAN::Client->new();
my $release_results = $mcpan->release({ status => 'latest' } );
while ( my $release = $release_results->next ) {
printf "%s v%s\n", $release->distribution, $release->version;
}
Currently this gave me 32601 result like this:
Proc-tored v0.11
Locale-Utils-PlaceholderBabelFish v0.004
Perinci-To-Doc v0.83
Mojolicious-Plugin-Qooxdoo v0.905
App-cdnget v0.05
Baal-Parser v0.01
Acme-DoOrDie v0.001
Net-Shadowsocks v0.9.0
MetaCPAN-Client v2.006000
This modules also gives information about release, module, author, and file & uses Elasticsearch.
It also get updated regularly on every MetaCPAN API change.

Related

Why is the same module listed three times at metacpan.org?

Searching for Devel::Peek at metacpan.org gives the following screen shot:
Why is the module listed three times? (It looks a little bit strange, and could easily confuse the user..)
Oddly enough, there's one missing. The following are Devel::Peek's official distributions:
perl
Devel-Peek
These two distributions are returned when searching search.cpan.org, and the only two distributions returned when searching search.cpan.org.
Being part of the perl distribution and part of its own distribution is called being a "dual-lifed" module. It allows the module to be bundled with Perl without having to upgrade Perl to upgrade the module.
I don't know why meta::cpan doesn't pick up the official distribution, and I don't know why it doesn't flag the other distributions as unofficial. You could alert the site's maintainers of the problem.
Conversely, I don't know why search.cpan.org doesn't return CookBookA and CookBookB, and why it doesn't flag theses other distributions as unofficial when one goes to it directly. I think it has to do with the fact that Devel::Peek is only present as a documentation file (.pod) —not a module (.pm)— in them.

Why are Perl modules case sensitive?

Although I haven't seen any modules with same name but with different cases, but just for curiosity, I was trying to install Log::Log4perl and during installation I misspelled it 'Perl' in place of 'perl':
% cpan -i Log::Log4Perl
Cannot install Log::Log4Perl, don't know what it is.
When I used correct name then things went well:
% cpan -i Log::Log4perl
Same names but different cases can create conflicts. Is there any specific reason behind that?
Because
use Foo::Bar;
would be ambiguous on case-sensitive file systems (Foo/Bar.pm? foo/bar.pm? FOO/BAR.pm? Foo/Bar.PM? etc), and it would require traversing the directory's contents to find the file's name. (Up to 9 directories per element of #INC would need traversing for Foo::Bar.)
In Perl, modules loaded with use translated directly onto the file system. Something such as use Log::Log4perl translates into:
BEGIN {
require 'Log/Log4perl.pm';
Log::Log4perl->import;
}
On a system that has a case sensitive file system, if the name is not exactly in the same case, it might as well not even exist. This is explained in the documentation for use and require. Different cases mean different names.
As such, when the cpan command translates a package name into a distribution, it uses the exact case you specify. The filesystem might be case insensitive, but inside Perl, the package names are still case sensitive. The literal case you enter is the one that Perl (and the cpan client) uses. If a package of that exact case isn't defined, the right things won't happen.
I consider this to be one of the major design decisions that hold Perl back and talked about it in my Frozen Perl 2011 keynote address.
Curiously, case insensitive filesystems lets you get away with it, as seen with the use seems to be case INSENSITIVE!! post on Perlmonks.

Why is "package" keyword sometimes separated by a comment from the package name?

Analyzing sources of CPAN modules I can see something like this:
...
package # hide from PAUSE
Try::Tiny::ScopeGuard;
...
Obviously, it's taken from Try::Tiny, but I have seen this kind of comments between package keyword and package identifier in other modules too.
Why this procedure is used? What is its goal and what benefits does it have?
It is indeed a hack to hide a package from PAUSE's indexer.
When a distribution is uploaded to PAUSE, the indexer will examine each file in the upload, looking for the names of packages that are included in the distribution. Any indexed packages can show up in CPAN search results.
There are many reasons for not wanting the indexer to discover your packages. Your distribution may have many small or insignificant packages that would clutter up the search results for your module. You may have packages defined in your t (test) directory or some other non-standard directory that are not meant to be installed as part of the distribution. Your distribution may include files from a completely different distribution (that somebody else wrote).
The hack works because the indexer strictly looks for the keyword package and an expression that looks like a package name on the same line.
Nowadays, you can include a META.yml file with your distribution. The PAUSE indexer will look for and respect a no_index specification in this file. But this is a relatively new capability of the indexer so older modules and old-timer CPAN contributors will still use the line break hack.
Here's an example of a no_index spec from Forks::Super
no_index:
directory:
- t
- inc
package:
- Sys::CpuAffinity
- Signals::XSIG
- Signals::XSIG::Default
- Signals::XSIG::TieArray56
Sys::CpuAffinity and Signals::XSIG are separate distributions that are also packaged with Forks::Super. Some of the test scripts contain package declarations (e.g., Arbitrary::Test::Package) that shouldn't be indexed.
Okay, here's another shot at this phenomenon ... I've been whacky-hacking Perl for a dozen years and I've rarely seen this packy hack and possibly simply ignored and never bothered to investigate. One thing seems clear, though. There's some hackish processing going on at PAUSE that's been crafted in the good ol' Perl'n'UNIX school of thought that without the shadow of a doubt involves line-oriented text parsing, so they parse those Perl files, possibly even using grep, but rather perl itself, who knows, to extract package names and then kick of some procedure or get some stats or whatnot. And to trip up this procedure and hack around its ways the author splits the package declaration in two lines so the hacky packy grep job doesn't have a clue that there's a package declared right under its nose and the programmer is happy about his hacky skills and the PAUSE stats or whatever it is they're cobbling together are as they should be. Does that make sense?

Module naming convention: Reserved prefix for internal distributions? [duplicate]

I have read the perldoc on modules, but I don't see a recommendation on naming a package so it won't collide with builtin or CPAN module/package names.
In the past, to develop a local Session.pm module, I have created a local directory using my company's name, such as:
package Company::Session;
... and Session.pm would be found in directory Company/.
But I'm just not a fan of this naming convention. I would rather name the package hierarchy closer to the functionality of the code. But that's how it's done on CPAN generally...
I feel like I am missing something fundamental. I also looked in Damian's Perl Best Practices but I may not have been looking in the right place...
Any recommendations on avoiding package namespace collisions the right way?
Update w/ Related Question: if there is a package name conflict, how does Perl choose which one to use? Thanks everyone.
The namespace Local:: has been reserved for just this purpose. No module that starts with that prefix will be accepted to CPAN or the core. Alternatively, you can use an underscore in the top-level name (like My_Corp::Session or just My_Session). All categories with an underscore have also been reserved. (This is mentioned in perlmodlib, under "Select a name for the module".)
Note that both those reservations apply only to the top-level name. For example, there are CPAN modules named Time::Local and Text::CSV_XS. But Local::Time and Text_CSV::XS are reserved names and would not be accepted on CPAN.
Naming modules after your company is fine too. (Well, unless you work for some really generic sounding company.) Using the reverse domain name is probably overkill, unless you intend to distribute your modules to others. (But in that case, you should probably register a normal module name.)
How Perl resolves a conflict:
Perl searches the directories in #INC for a module with the specified name. The first module found is the one used. So the order of directories in #INC determines which module would be used (if you have modules with the same name installed in different locations).
perl -V will report the contents of #INC (the highest-priority directories are listed first). But there are lots of ways to manipulate #INC at runtime, too.
BTW, Raku can handle multiple modules with the same name by different authors, and even use more than one in a single program. That's a different solution.
There is nothing wrong with naming your internal modules after your company; I always do this. 90% of my code ends up on CPAN, so it has "normal" names, but the internal stuff is always starts with ClientName::.
I'm sure everyone else does this too.
What's wrong with just picking a name for your package that you like and then googling "perl the-name-you-picked"?
The #INC variable contains a list of directories to in which to look for modules. It starts with the first entry and then moves on to next if it doesn't find the request module. #INC has a default value that created when perl is compiled, but you can can change it with the PERL5LIB environment variable, the lib pragma, and directly manipulating the #INC array in a BEGIN block:
#!/usr/bin/perl
BEGIN {
#INC = (); #no modules can be found
}
use strict; #error: Can't locate strict.pm in #INC (#INC contains:)
If you need the maximum level of certainty that your module name will not conflict with someone else's you can take a page from Java's book: name the module with the name of the companies domain. So if you work for Example, Inc. and their domain name is example.com, you would name your HTML parser module Com::Example::HTML::Parser or Example::Com::HTML::Parser. The benefit of the first is that if you have multiple subunits they can all have their own name space, but the modules will still sort together:
Com::Example::Biz::FindCustomers
Com::Example::IT::ParseLogs
Com::Example::QA::TestServer
but it does look odd at first.
(I know this post is old, but as I've had to sort this out in the past few months, I thought I'd weigh in)
At work we decided that 'Local::' felt too geographic. CompanyName:: had some problems for us too that aren't development related, I'll skip those, though I will say that CompanyName is long when you have to type it dozens of times.
So we settled on 'Our::'. Sure, we're not 'CPAN Safe' as there could be the day when we want to use a CPAN module with the Our:: prefix. But it feels nice.
Our::Data is our Class::DBI module
Our::App is our generic app framework that does config handling and Getopt stuff
Nice to read and nice to type.

How do I choose a package name for a custom Perl module that does not collide with builtin or CPAN packages names?

I have read the perldoc on modules, but I don't see a recommendation on naming a package so it won't collide with builtin or CPAN module/package names.
In the past, to develop a local Session.pm module, I have created a local directory using my company's name, such as:
package Company::Session;
... and Session.pm would be found in directory Company/.
But I'm just not a fan of this naming convention. I would rather name the package hierarchy closer to the functionality of the code. But that's how it's done on CPAN generally...
I feel like I am missing something fundamental. I also looked in Damian's Perl Best Practices but I may not have been looking in the right place...
Any recommendations on avoiding package namespace collisions the right way?
Update w/ Related Question: if there is a package name conflict, how does Perl choose which one to use? Thanks everyone.
The namespace Local:: has been reserved for just this purpose. No module that starts with that prefix will be accepted to CPAN or the core. Alternatively, you can use an underscore in the top-level name (like My_Corp::Session or just My_Session). All categories with an underscore have also been reserved. (This is mentioned in perlmodlib, under "Select a name for the module".)
Note that both those reservations apply only to the top-level name. For example, there are CPAN modules named Time::Local and Text::CSV_XS. But Local::Time and Text_CSV::XS are reserved names and would not be accepted on CPAN.
Naming modules after your company is fine too. (Well, unless you work for some really generic sounding company.) Using the reverse domain name is probably overkill, unless you intend to distribute your modules to others. (But in that case, you should probably register a normal module name.)
How Perl resolves a conflict:
Perl searches the directories in #INC for a module with the specified name. The first module found is the one used. So the order of directories in #INC determines which module would be used (if you have modules with the same name installed in different locations).
perl -V will report the contents of #INC (the highest-priority directories are listed first). But there are lots of ways to manipulate #INC at runtime, too.
BTW, Raku can handle multiple modules with the same name by different authors, and even use more than one in a single program. That's a different solution.
There is nothing wrong with naming your internal modules after your company; I always do this. 90% of my code ends up on CPAN, so it has "normal" names, but the internal stuff is always starts with ClientName::.
I'm sure everyone else does this too.
What's wrong with just picking a name for your package that you like and then googling "perl the-name-you-picked"?
The #INC variable contains a list of directories to in which to look for modules. It starts with the first entry and then moves on to next if it doesn't find the request module. #INC has a default value that created when perl is compiled, but you can can change it with the PERL5LIB environment variable, the lib pragma, and directly manipulating the #INC array in a BEGIN block:
#!/usr/bin/perl
BEGIN {
#INC = (); #no modules can be found
}
use strict; #error: Can't locate strict.pm in #INC (#INC contains:)
If you need the maximum level of certainty that your module name will not conflict with someone else's you can take a page from Java's book: name the module with the name of the companies domain. So if you work for Example, Inc. and their domain name is example.com, you would name your HTML parser module Com::Example::HTML::Parser or Example::Com::HTML::Parser. The benefit of the first is that if you have multiple subunits they can all have their own name space, but the modules will still sort together:
Com::Example::Biz::FindCustomers
Com::Example::IT::ParseLogs
Com::Example::QA::TestServer
but it does look odd at first.
(I know this post is old, but as I've had to sort this out in the past few months, I thought I'd weigh in)
At work we decided that 'Local::' felt too geographic. CompanyName:: had some problems for us too that aren't development related, I'll skip those, though I will say that CompanyName is long when you have to type it dozens of times.
So we settled on 'Our::'. Sure, we're not 'CPAN Safe' as there could be the day when we want to use a CPAN module with the Our:: prefix. But it feels nice.
Our::Data is our Class::DBI module
Our::App is our generic app framework that does config handling and Getopt stuff
Nice to read and nice to type.