How can I create a Zip archive in Perl? - perl

I need to create a Zip archive after filtering the list of files I want to include. Preferably I'd like the module to work in both Windows and Linux.
Since I need to filter the list of files, I don't really want to to use an external program. I'd rather not introduce external dependencies either so I can compile the script into a single executable on Windows (using ActiveState PDK).
What I already tried
Until now I've used Archive::Zip found on CPAN but it has a major bug on Windows machine that use non-ASCII filenames: the filenames get corrupted in the archive as they don't get translated into unicode.
There is a bug report filed for that but it hasn't been updated in over 10 months and in the module documentation the developer is rather unhelpful (of the "fix your computer or get rid of Windows" kind).
Update:
Thanks to the clarifications from brian and Alan Haggai Alavi it seems that enough love is being put in Archive::Zip to get these bugs out soon and finally have a fully functioning zip module in Windows.

Although the module documentation says some stupid things about Windows, the current maintainer is Adam Kennedy, the same guy who brought you Strawberry Perl. He's definitely not anti-Windows. He released a version October, so they are working on it. There's also an open grant from The Perl Foundation to fix Archive::Extract bugs.The bug you mention, RT 35334: Filename Encoding by Archive::Zip, maybe just needs someone to show it some love. That could be you. People solve the problems that bother them, so maybe nobody interested in the module needs this just yet.
The module has had problems, and I've been following its progress since I use it in a couple projects. It has gotten a lot better recently and can certainly use some love. Sometimes open source means helping to fix the problems that you encounter. I know this doesn't help you solve your problem immediately, but that's how I think you're going to get this done aside from system() calls.

The above said bug has been solved very lately by the addition of Unicode filename support under Windows. A release featuring the fix will be available in CPAN within a week.

You could try the standard-distribution Archive::Extract. It may not be any better than Archive::Zip, but the documentation says that, if there are problems, it goes under the hood to try to use command-line tools on your system to unzip the file. This is probably most robust on Unix, but Windows has a zip archive utility, and it should be accessible via the command line. Plus, Archive::Extract can handle many other types of compression (theoretically).
Of course, it may turn out that Archive::Extract simply figures out what kind of compression the file uses and then passes it to the appropriate other library, which might be Archive::Zip.
You might also try IO::Uncompress::Unzip and it's counterpart, IO::Compress::Zip, for just unzipping, reading, and rezipping. If absolutely necessary. Again, I don't know how much better these will work, but they are all part of the standard library.

Related

perl: new cpan module maker? local configuration text files and executables, too?

I am writing a perl program that I want to share with others, eventually via cpan. it's getting to the point where I should start thinking about this on a bigger scale.
a decade ago, I used the h2xs package maker once. is this still the most recommended way to get started? there used to be a couple of alternatives. because I am starting from scratch with very little recollection, anything simple will do at this point.
I need to read a few long text files (not perl modules) for configuration. where do I put them and how do I access them, no matter where the module is installed? (FindBin?) _DATA_ is inconvenient.
I need to provide an executable (linux and osx). can putting an executable into the user's path be part of the module installation? (how?)
I would like to be able to continue developing it, run it for test purposes, have a new version, repack it, and reupload it easily.
before uploading to cpan, can I share a cpan bundle for easy local installation to downloaders and testers?
# cpan < mybundle.cpanbundle
advice appreciated.
regards,
/iaw
If anything I say conflicts with Andy Lester, listen to him instead. He knows more than I ever will.
Module::Starter is a good, simple way to generate module scaffolding. My take is it's been the default for this sort of thing for a few years now.
For configuration/support files, I think you probably want File::ShareDir. Might be worth considering Data::Section if it's just a matter of needing multiple __DATA__ sections though.
You can certainly put scripts in the bin subdirectory of your distribution, the build tool will put it in the right place at install time.
A build tool will take care of the work-flow you describe.
Bundles are something different. You make a distribution and share the tarball/archive.
If you set up PERL5LIB appropriately, then repeat make test, make install, make dist to your heart's content. For development/sharing purposes a lot of projects do their work on github or similar - makes it easy to share. They have private accounts for business purposes too. Very useful if you want to rewind and see where/when a problem was introduced.
If you get a copy of cpanm (simple to install, fairly lightweight) then it can install from a tar.gz file or even direct from a git repository. You can also tell it to install to a local dir (local::lib compatible - another utility that's very useful).
Hopefully that's reasonably up-to-date as of 2014. You may see Dist::Zilla mentioned for module development. My understanding is that it's most useful for those with a large family of CPAN distributions to manage. Oh - if you (or other readers) aren't aware of them, do check out autodie and Try::Tiny around errors and exceptions, Moose (for a full-featured object-oriented framework) and Moo (for a smaller lightweight version).
I think that advice is all reasonably non-controversial. I find cpanm to be much more pleasant than the "full" cpan client, and Moo seems pretty popular nowadays too.
Take a look at Module::Starter and its much more capable (and complex) successor Dist::Zilla.
Whatever you do, don't use h2xs. Module::Starter was created specifically because h2xs was such an inappropriate tool for creating distributions.

How 'bad' is replacing /usr/bin/perl with /usr/local/bin/perl on CentOS?

ANSWERED: Basically, it can be done with no major side-effects if you compiled your own perl and you did it the same way your OS did. While it isn't a recommended practice, I've been able to run like this for more then a month. I would conclude it is relatively safe to do if you know what you are doing.
We came to the conclusion at work today that we needed to upgrade perl to 5.10.0
CentOS 5.x comes with perl 5.8.8.
We determined that the effort involved in maintaining scripts with #!/usr/bin/perl was futile.
According to some install stuff on CPAN and other places, it isn't a 'good' idea to replace the OS's version of perl. I already updated the link in /usr/bin/. So my question is, how bad is it really to replace /usr/bin/perl?
I've not noticed any adverse effects in our systems yet, but I'm prepared to correct the link (back to 5.8.8) as soon as there is a problem.
I'm worried that there may be some modules in the CentOS standard distro that aren't included in CPAN's source 5.10.0. I'm still trying to figure out what those modules might be.
Thanks in advance.
In my experience, the best practice is to compile your entire stack (Perl, Apache, ImageMagick, ...) from source yourself. That gives you complete control over which versions of everything are used and when everything gets upgraded.
Replacing /usr/bin/perl with one you compiled is a crap shoot. The OS might be using /usr/bin/perl as part of its maintenance or init scripts so changing it could brick your server or cause strange failures.
So ignore the system Perl, build your own, and fix your scripts to refer to your version of Perl.
Generally newer versions of Perl5 attempt to maintain backward compatibility with older versions. But that's not 100% assured. For example, a script that depends on an undefined behavior in Perl 5.8.8 (which shouldn't happen but sometimes does), that behavior may be different under 5.10.0. Nevertheless, it's usually fairly safe to assume that a script written for Perl 5.8.8 will run under 5.10.0 assuming there are no other factors involved.
But there usually are other factors (modules, byte compatibility for XS code, and so on). The list of possible gotchas is huge. That doesn't mean that any one of them will snag you on this go-around, but there is potential for problems.
If you've already got an upgraded Perl in /usr/local/bin, go ahead and use it. But don't dismantle or upgrade the old /usr/bin/ version. It's only a small chunk of hard drive (very small by today's standards).
By the way, a lot of people speak highly of perlbrew (App::Perlbrew on CPAN) as a tool to help maintain multiple versions of Perl.
Well, if you do decide to change the location of where Perl is installed, that is completely up to you and where you prefer it to be. But, keep in mind that any scripts that exist with a shebang line pointing to #!/usr/bin/perl will possibly break.
My recommendation would be that after you have installed it, create a soft link in /usr/bin/perl pointing to the executable for the new version of Perl that you installed. Just a thought. Its a work around to avoid breaking anything.
Creating the link above would certainly help to avoid the possibility of 'bricking your server' as #Mu pointed out.
Regards,
Jeff

Merging PDF file on Windows with perl

I need a way to merge PDF files on Windows using perl, it has to be perl because it is part of my script to organize a directory on Windows server. Any ideas?
See this very related question: How can I merge PDF files with Perl.
If the CAM::PDF module doesn't suit you (if you can't get it on your Windows environment), the pdftk mentioned there is available for Windows (see Installing pdftk). You can use that from your perl scripts.
Please have a look at the PDF Processing with Perl article for other options.
It's not trivial to write a program that parses two PDF files, manipulates them, and writes them back out as a single merged file. But if I were to dive into the task I would probably use the cpan module PDF::API2. It seems to be one of the most complete and most robust PDF modules on CPAN, though not necessarily the simplest to figure out. There are other PDF modules under the PDF::* heirarchy on CPAN, and some of them may provide just enough functionality for you, with less of a learning curve.
But let me suggest something else: If you can find a ready-made tool that will merge two PDF files, you could allow Perl to send the files through that program, and retrieve the results. This might be a simpler approach, and one that you can be reasonably certain already works (as opposed to you spending a lot of time debugging your own solution). Your existing Perl script could interface with an external program that already has the capability you need.

How do I add custom module distributions to my local CPAN mirror?

I'm getting ready to set up a full CPAN mirror for internal use at my company. However, we have several internal Module::Build based distributions that I'd like to make available to people from this mirror. These distributions should ONLY be available from our mirror; they are internal libraries only. Essentially, once people have set up their CPAN config file to load "cpan.mycompany.com' mirror, I'd like them to be able to do a
cpan install MyCompany::Bundle
cpan install MyCompany::Other::Module
On their command line to install any number of internal, custom module distributions. Ideally, as versions of these module distributions are incremented, all of those versions would be indexed by our internal CPAN mirror and made available, just as as previous versions of CPAN modules are made available.
After the initial question, I was able to come up with some other possibilities.
There's CPAN::Inject, but it looks like I can't use it to get a cpan install My::Module syntax.
Then there's MyCPAN::App::DPAN, which also looks interesting, and almost looks like what I need. Does anyone have experience with this tool?
Another one I just came across was CPAN::Site. This seems to also be able to set up a custom CPAN distribution. Any thoughts on this tool?
If you're using CPAN::Mini to create your mirror, then you use CPAN::Mini::Inject to add your own modules to it.
To do this with a full CPAN mirror, CPAN::Site covers this nicely. It lets you make a mirror, and then inject your own libraries right into it, complete with tools to help you manage setting it up and keeping it up to date.
I would like to second the suggestion for CPAN::Site - the author is responsive and will gladly apply fixes if you ask or file a bug report on the CPAN RT.
I've been using it recently to make a "micro-cpan" containing only what a particular application needs and nothing else, along with cpanminus to make installation in any environment dead-simple. However, don't ask me for my solution - miyagawa++ was at YAPC::NA this year and showed off "Carton" which does all that and more, way better than my hacky stuff.
CPAN::Mini::Inject is perhaps a bit too "low-level" in that it requires that you specify a whole lot of information about each dist up-front before injecting into the minicpan - I feel that just about all of that should be auto-detected by analyzing the dist, for example by using CPAN::ParseDistribution.
MyCPAN::App::DPAN is actually quite cool, but has a bit of a learning curve and may not be the right tool for the job. I've also found it has a tendency to choke on some badly-formed dists and detecting that involves treawling through the logs (as far as I can tell - maybe there's a better way to do it) However, I'd highly suggest checking it out.
If you're still interested in MyCPAN::App::DPAN, I've just posted how I use it to create a mini CPAN-like directory structure, in (one of) the answers to this question:
Internal CPAN - what module
(I don't know if it's OK to link to my own answer here. Let me know if it isn't.)

How do I use CPAN.pm to download other Perl modules?

I'm new to Object-Oriented programming and the perldoc page on CPAN.pm confuses the hell out of me. My program needs to download a couple of modules if they don't already exist. Is this basically just:
CPAN::Shell->install("Module::Name::Here");
or is there more to it? Does that download the package, unarchive it, and install it, or just one or two of those steps? If it's not all three, how do I do the other one (or two)? I would like it to make sure it doesn't try to re-install anything if the package is already there - is this the default behavior of the function or no?
And how can I tell if Perl couldn't connect to CPAN to get the package?
No one else has mentioned it, but you have to load the CPAN config first:
use CPAN;
CPAN::HandleConfig->load;
CPAN::Shell::setup_output;
CPAN::Index->reload;
# now do your stuff
You can also look at the cpan(1) script that comes with CPAN.pm to see a lot of the programmer's interface in action. I also wrote on article for the latest issue of The Perl Review showing examples of the programmer's interface to CPAN.pm.
However, you might not need to do any of this. Why is your program downloading modules on its own? Are you trying to create a distribution that has dependencies? There are better ways to handle that so you don't have to repeat the work that's already done in other tools. For instance, see my article Creating Perl Application Distributions. You treat your program as if it's a module and get the benefit of all the cool module tools so you don't have to reinvent something.
If you tell us more about the problem that you're actually trying to solve, we might have other good answers too. :)
Good luck,
the perldoc page on CPAN.pm confuses the hell out of me.
Yes, documentation of the CPAN API is still a bit lacking. It wasn't every really designed for programmatic use by others. You might have better luck with CPANPLUS, if that's available to you.
My program needs to download a couple of modules if they don't already exist. Is this basically just: CPAN::Shell->install("Module::Name::Here");
Yes, that's pretty much it for the simplest possible thing. In fact, that's pretty much all the 'cpan' command line program does when you type "cpan Module::Name::Here". However, you will need to have CPAN.pm configured in advance.
Does that download the package, unarchive it, and install it?
Yes, all three.
I would like it to make sure it doesn't try to re-install anything if the package is already there - is this the default behavior of the function or no?
Yes, the default behavior is not to install anything if the module is up to date. You can actually check that yourself with the "uptodate()" method like this:
my $mod = CPAN::Shell->expand("Module", "Module::Name::Here");
$mod->install unless $mod->uptodate;
And how can I tell if Perl couldn't connect to CPAN to get the package?
That's hard to do programmatically in a way that would be simple to explain. You either need to look at the output or else just check $mod->uptodate afterwards;
my $mod = CPAN::Shell->expand("Module", "Module::Name::Here");
if ( ! $mod->uptodate ) {
$mod->install;
die "Problems installing" unless $mod->uptodate;
}
Best of luck!
Basically using CPAN is the following:
perl -MCPAN -e shell
if this is the first time you are running it, it will ask you a few questions and save the results in a configuration file.
then to install PGP::Sign just type:
install PGP::Sign
and you're set.
As for you last question, don't worry, it will say to you whether it can connect or not.
As you can tell, most of us use only use CPAN.pm in the interactive mode, however, you're on the right track.
Things I can point out for the moment:
Yes, calling CPAN::Shell->install() will download, compile, test and install a package. It should also do the same for any dependencies the package has, recursively.
The default behaviour is to not install anything which is already installed (unless a newer version is available).
I'm not strictly sure how the error handling works - I'll look into it, and report back.
It might prompt your user, though.
Keltia has it right. I'll add that his first instruction is done from the command prompt, usually as root, but not necessarily so. The second command is done from the CPAN prompt. You can also do it all on the command line, but I usually don't.
If you're using windows, your best bet is to use PPM, but its repositories are annoyingly out of date most times.