How can I merge PDF files with Perl? - perl

Using Perl, how can I combine or merge the sample PDF files into a single PDF file?

CAM::PDF can do this quite easily, and has a simple command-line front end to help. Note: I'm the author of that library. Example:
appendpdf.pl file1.pdf file2.pdf outfile.pdf
From the SYNOPSIS section of the perldoc:
my $anotherpdf = CAM::PDF->new('test2.pdf');
$pdf->appendPDF($anotherpdf);

Why do you need to do it from Perl? Chris has already mentioned CAM::PDF.
If you just need to merge them, pdftk (PDF ToolKit) works just fine. It's a simple command line:
pdftk file1.pdf file2.pdf cat output merged.pdf

You can use the GhostScript utility pdf2ps to convert the PDFs into PostScript files, concatenate the PostScript files, and then use ps2pdf to convert the result back into a PDF.

Perlmonks has a fine discussion of this topic with, well, more than one way to do it.

Related

Convert a folder containing asciidocs and pictures to pdf

I would like to convert this book Mastering the Lightning Network, which is freely available through GitHub to a pdf for personal use.
Unfortunately, I have only figured out how to "translate" single files using asciidoc or asciidoctor-pdf. The options for folders don't seem to work with the configuration of the repository.
There has to be an easy way to translate everything, including all files and pictures. Would be very thankful if somebody could help me out.
As far as I know it is not possible to convert a folder containing AsciiDoc files to a pdf, a simple script could do it but the problem would be in what order do you want your files to be converted?
The simplest solution for you is to create your own content.adoc file and use the include macro to select what files you want to convert and in what order, it could look something like this:
= Mastering the Lightning Network
include::01_introduction.asciidoc[]
include::02_getting_started.asciidoc[]
include::03_how_ln_works.asciidoc[]
include::04_node_client.asciidoc[]
include::05_node_operations.asciidoc[]
include::06_lightning_architecture.asciidoc[]
include::07_payment_channels.asciidoc[]
include::08_routing_htlcs.asciidoc[]
include::09_channel_operation.asciidoc[]
include::10_onion_routing.asciidoc[]
include::11_gossip_channel_graph.asciidoc[]
include::12_path_finding.asciidoc[]
include::13_wire_protocol.asciidoc[]
include::14_encrypted_transport.asciidoc[]
include::15_payment_requests.asciidoc[]
include::16_security_privacy_ln.asciidoc[]
include::17_conclusion.asciidoc[]
and you convert using asciidoctor-pdf content.adoc
You could try using imagemagick:
magick *.jpg out.pdf

Cleaning up text files with sed?

I have a bunch of text files that need cleaning up. Example
`E..4B?#.#...
..9J5.....P0.z.n9.9.. ........
.k#a..5
E...y^#.r...J5..
E...y_#.r...J5..
..9.P..n9..0.z............
….2..3..9…n7…..#.yr`
Is there any way sed can do this? Like notice weird patterns?
For this answer, I will assume that you have access to standard unix/linux tools.
Your file might be in some word-processor format. If so, the best way to get rid of the junk is to open it with that program. You may be able to find out which with file:
$ file mysteryfile
mysteryfile: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1 ....
If that doesn't work, there is a standard unix utility for extracting text from binary files. It is called strings:
$ strings mysteryfile
Some
Recovered Text
...
The behavior of strings can be fine tuned with several options. See man strings.

Easy way to get source files from doxygen documentation?

I have found the source for a game that I am interested in modifying, and it's in the form of doxygen "documentation".
Under "files" I get html files with names not related to the cpp and h files, and also line numbers.
I was wondering if there is a way to convert the doxygen documentation into regular cpp and h files.
How many files are there? If not too many, a manual copy-paste perhaps...
html2text is a python program that is pretty good at converting html to, well, text.

How to programmatically convert PostScript to PDF with the fewest steps?

Is there any way to just slap on a header and use a PS file as a PDF, assuming that the PS is very simple and do anything complicated?
I want to do this programmatically, not using ps2pdf.
Thanks.
You can certainly *try" "just slapping on a header" ... but I don't think you'll get too far :-)
Personally, I'd suggest ps2pdf is the best solution (for example, invoke it with ShellExec() or system()).
But if you want a programmatic solution, ps2pdf is just a wrapper around Ghostscript. Have you considered using the Ghostscript libraries?
You cannot wrap a PostScript file into a PDF file.
Although a PDF file looks similar to a PostScript file,
a PDF file must have a special structure, including a cross-reference
table at the end with file offsets to different parts of the PDF file.
To understand the PDF file format you can download the PDF Reference from:
http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf
If your software generates the PostScript file, maybe you can
extend it to write a PDF file too? It takes some time to understand
the PDF file format but it is not especially difficult if you are familiar with PostScript.
If this is too difficult, then use pdf2ps to do the hard work for you.

How can I search and replace in a PDF document using Perl?

Does anyone know of a free Perl program (command line preferable), module, or anyway to search and replace text in a PDF file without using it like an editor.
Basically I want to write a program (in Perl preferably) to automate replacing certain words (e.g. our old address) in a few hundred PDF files. I could use any program that supports command line arguments. I know there are many modules on CPAN that manipulate or create pdfs but they don't have (that I've seen) any sort of simple search and replace.
Thanks in advance for any and all advice!!!
Take a look at CAM::PDF. More specifically the changeString method.
How did you generate those PDFs in the first place? Search-and-replace in the original sources and re-generate PDFs seems to be more viable. Direct editing PDFs can be very difficult, and I'm not aware of any free tools that can do it easily.