Merging PDF file on Windows with perl - perl

I need a way to merge PDF files on Windows using perl, it has to be perl because it is part of my script to organize a directory on Windows server. Any ideas?

See this very related question: How can I merge PDF files with Perl.
If the CAM::PDF module doesn't suit you (if you can't get it on your Windows environment), the pdftk mentioned there is available for Windows (see Installing pdftk). You can use that from your perl scripts.
Please have a look at the PDF Processing with Perl article for other options.

It's not trivial to write a program that parses two PDF files, manipulates them, and writes them back out as a single merged file. But if I were to dive into the task I would probably use the cpan module PDF::API2. It seems to be one of the most complete and most robust PDF modules on CPAN, though not necessarily the simplest to figure out. There are other PDF modules under the PDF::* heirarchy on CPAN, and some of them may provide just enough functionality for you, with less of a learning curve.
But let me suggest something else: If you can find a ready-made tool that will merge two PDF files, you could allow Perl to send the files through that program, and retrieve the results. This might be a simpler approach, and one that you can be reasonably certain already works (as opposed to you spending a lot of time debugging your own solution). Your existing Perl script could interface with an external program that already has the capability you need.

Related

Need a Solution for file conversion in progress 4GL from .htm to .xlsx format

I have to load a .htm file and save it to .xlsx file format by automation using progress? Need a solution to solve this!!
AFAIK there is nothing in Progress that will help you do this. If I had to do this, I'd look to Apache POI, which is capable of creating .xlsx fairly cleanly and has a reasonable learning curve, although it is picky about data coming into it and its error messages are typically obtuse. Using Progress directly, you could parse the .html (painfully), but creating .xlsx yourself is probably unrealistic. So, I'd also hunt around for any tools that can do this directly. Good luck.
What O/S and Progress version are you using?
If the HTML is well-formatted, perhaps you could use XML parsing to assist. I have never tried this, though.
As for writing an Excel document, there are many approaches. If you are running on a Windows box that has Excel on it, there are solutions that allow you to call the Excel libraries from inside of Progress. If you need something portable, your choices are fewer. We use ABL_xks.i, which works under both Linux and Windows. It uses the native libraries under Windows, and produces an Excel XML spreadsheat under Linux.
As I recall, there exists a library that allows you to edit an OpenOffice template (Word or Excel) from within Progress. (I would have to go searching for this, but a good place to start looking might be the OpenEdge Hive). And there several commercial packages (especially report generators) that range from using the OO template technique to full automation of Excel output.
If this doesn't point you in the right direction, fill us in with some more details about what you want to do.

Perl CPAN-style Packaging with no lib/*.pm

I've got a collection of Perl scripts and a couple XML data files that they depend on which I'd like to distribute. Currently, I've got a shell script which copies bin/* and share/* to a target installation tree. It seems a little clunky, so I'd like to go with something like the standard CPAN way of packaging Perl.
Does is make sense to bundle what I've got in a CPAN-style package? I suspect there is nothing wrong with it, but every tutorial I've looked at thinks that lib/Blah.pm is an essential file in any package - I don't even have a lib/ directory, let alone any .pm files.
Is there a standard solution for packaging a collection of Perl scripts, along with some data in a share/ directory?
Distributions don't care about modules. Most of the tools are set up to handle modules by default because that's the common case, but you can really distribute anything as long as you provide the logic to tell the build files what to do with whatever files that you provide.
ExtUtils::Makemaker is difficult to use for this sort of thing, but Module::Build (despite the word "Module") makes it much easier. However, you have to know a bit about custom Module::Build classes so you can override the default behavior that you don't want.
If you are talking about standalone scripts, you can look at my scriptdist distribution, or the Dr. Dobbs article I wrote about it. It won't handle the share/ portion for you, but it's not too hard to add.

Simple example of batch file and windows scheduler

I need to create a batch file which will copy web log files from a web server to a local desktop box on daily frequency.
I'm a web developer, but I'd like to take a stab at learning the process for creating a batch file and I think using the windows scheduler should get me where I need to go.
In any case, I'm just looking for a jumping off point.
I understand the premise behind a batch file (echo to print info, commands to cause actions such as mkdir or move, etc), but some straight forward tutorials would be great.
Or even a reference guide such as devguru.com or 4guysfromrolla.com would be helpful.
Thanks,
Creating a batch file is relatively straightforward.
Just type out the commands you want as you would in the command shell, and save the file with a .bat extension.
There's a simple example here that you may find useful. Note, you can use any editor to create your batch file, as long as it saves in a text format.
Depending on which version of Windows you're using, the process to create a scheduled task is slightly different:
Windows XP
Windows Vista
Edit: A little followup on misteraiden's answer.
Essentially, what you're looking for is scripting functionality. There are a variety of tools available. A batch file is the simplest form of scripting that Windows supports. You could, for example, write scripts in PowerShell or Python. Both are more powerful and flexible scripting languages. Depending on what the requirements are for your script, and what you feel like learning, they may be more appropriate.
However, If all you want to do is a copy, the simplest, easiest place to start is a batch file.
This is a little left-of-field, but using an XML build interpreter such as NAnt could come in handy here. Probably over-kill for what you are trying to do, but if you learn it now, you'll be able to apply it's uses in many different places.
You could use Windows Scheduler to trigger the build, which would then complete various operations such as deleting, copying, logging on to network shares.
However, perhaps to learn this you would probably need to learn more about the command line and command line programming.
Either way, I recommend you check out some of the NAnt examples that deal with copying and other basics etc..
I found one of the best references other than the Microsoft website that was mentioned in an earlier is: http://www.robvanderwoude.com/batchfiles.php I have been using this for many of the issues I have had and have been using it to learn more. I think since you have the premise of how batch files work, this will work out will for you.

How to distribute native perl scripts without custom module overhead?

How can someone distribute native (non-"compiled/perl2exe/...") Perl scripts without forcing users to be aware of the custom (non-CPAN) modules that the scripts needs in order to run?
The problem is users will inevitably copy the script somewhere else on the system and take the script out of its native environment and then it can no longer find the modules it needs to run.
I've sometimes settled with just copying the module into the actual script, but I'd prefer a cleaner solution.
Update: I better clarify. I distribute a bunch of scripts which happen to use similar modules in the backend. The users understand how to run Perl scripts, but rather than relying on telling them "no don't move the script" I'd prefer to simply allow them to move the files. The path of least resistence.
The right way is to tell them "Don't do that!" I would hope that they wouldn't expect to move an exe file and have the program continue to work. This is no different.
That said, there are a couple of alternatives. One is replacing the script with a wrapper (e.g. pl2bat) that knows the full path to the real script. Another is to use PAR, but that would require PAR and/or parl (from PAR::Packer) to be installed.
If a script that your prepared for a client needs "custom" modules, simply pack your modules as if you were trying to upload them to cpan. Then give the package to the client and he can use the cpan utility to install the script and the modules.
Distribute an installer along with the script. The installer will need to be run with root privileges and it will put the custom modules into the standard system location (/usr/local/lib/perl5/site_perl or whatever).
I've not tried this, but Module::Install looks helpful in this regard. It's described as a:
Standalone, extensible Perl module installer
As a variant of the "put your modules all in one place and make your applications aware of it" that will even work across multiple computers and networks, maybe you should check out PAR::Repository and respectively PAR::Repository::Client. You'd just provide a single binary client executable that connects to the repository (via file system or https) and executes any of the arbitrary number of programs (using an arbitrary set of modules) provided by the repository that the user asks for.
If there are many users, this also has a benefit for maintenance: Simply update the software provided by the repository and the users will start using the updated code for their system when they next start the programs.
It would be really nice if you could just use a NeXTSTEP style application bundle. Since you probably aren't developing for a platform that uses bundles, you'll have to make do.
Put all support files in a known location, and point your executable at those files for access to settings and libraries. The easiest way to do this is with a simple installer.
For example, with an app called foo, put all your required files in /opt/xlyd_apps/foo, libraries in /opt/xlyd_apps/foo/lib, configuration in/opt/xlyd_apps/foo/etc, and so on. Put the executable in /opt/xlyd_apps/foo/bin.
The important thing is to make sure the executable knows to look in /opt/xlyd_apps/foo for all its goodies, so if the customer/client move the foo script to a new location the install still works.
So, while you can't make the whole thing self contained and relocatable, you have made the actual calling script relocatable.
I've actually come up with my own solution, and I'm kind of curious what kind of reception it will have.
I've written a script that reads a perl script and looks for "use/require" statements. Upon finding them it checks if the module is part of the standard library (looks at module path for /perl5/\d+.\d+[\d.]+/) and then rewrites the use/require line in two different ways depending on usage.
If require is found:
{
... inline the entire module here...
}
If use is found:
BEGIN {
... inline the entire module here...
}
If use has imports, immediately follow above with:
BEGIN { import Module ...imports seen... }
I understand this doesn't work with modules that use XS, but I was fine with this. Mostly I need to support only pure perl modules.

How can I create a Zip archive in Perl?

I need to create a Zip archive after filtering the list of files I want to include. Preferably I'd like the module to work in both Windows and Linux.
Since I need to filter the list of files, I don't really want to to use an external program. I'd rather not introduce external dependencies either so I can compile the script into a single executable on Windows (using ActiveState PDK).
What I already tried
Until now I've used Archive::Zip found on CPAN but it has a major bug on Windows machine that use non-ASCII filenames: the filenames get corrupted in the archive as they don't get translated into unicode.
There is a bug report filed for that but it hasn't been updated in over 10 months and in the module documentation the developer is rather unhelpful (of the "fix your computer or get rid of Windows" kind).
Update:
Thanks to the clarifications from brian and Alan Haggai Alavi it seems that enough love is being put in Archive::Zip to get these bugs out soon and finally have a fully functioning zip module in Windows.
Although the module documentation says some stupid things about Windows, the current maintainer is Adam Kennedy, the same guy who brought you Strawberry Perl. He's definitely not anti-Windows. He released a version October, so they are working on it. There's also an open grant from The Perl Foundation to fix Archive::Extract bugs.The bug you mention, RT 35334: Filename Encoding by Archive::Zip, maybe just needs someone to show it some love. That could be you. People solve the problems that bother them, so maybe nobody interested in the module needs this just yet.
The module has had problems, and I've been following its progress since I use it in a couple projects. It has gotten a lot better recently and can certainly use some love. Sometimes open source means helping to fix the problems that you encounter. I know this doesn't help you solve your problem immediately, but that's how I think you're going to get this done aside from system() calls.
The above said bug has been solved very lately by the addition of Unicode filename support under Windows. A release featuring the fix will be available in CPAN within a week.
You could try the standard-distribution Archive::Extract. It may not be any better than Archive::Zip, but the documentation says that, if there are problems, it goes under the hood to try to use command-line tools on your system to unzip the file. This is probably most robust on Unix, but Windows has a zip archive utility, and it should be accessible via the command line. Plus, Archive::Extract can handle many other types of compression (theoretically).
Of course, it may turn out that Archive::Extract simply figures out what kind of compression the file uses and then passes it to the appropriate other library, which might be Archive::Zip.
You might also try IO::Uncompress::Unzip and it's counterpart, IO::Compress::Zip, for just unzipping, reading, and rezipping. If absolutely necessary. Again, I don't know how much better these will work, but they are all part of the standard library.