Merge 2 pdf files and preserve forms - forms

I'd like to merge at least 2 PDF files into one while preserving all the form elements in the original PDFs. The form elements include text fields, radio buttons, check boxes, drop down menus and others. Please have a look at this sample PDF file with forms:
http://foersom.com/net/HowTo/data/OoPdfFormExample.pdf
Now try to merge it with any other arbitrary PDF file.
Can you do it?
EDIT: As for the implementation, I'd ideally prefer a command line solution on a linux plattform using open source tools such as 'ghostscript', or any other tool that you think is appropriate to solve this task.
Of course, everybody is welcome to supply any working solution to this problem, including a coded solution that involves writing a script which makes some API calls to a pdf-processing library. However, I'd suggest to take the path of least resistance first (CMD Solution).
Best Regards
EDIT #2: Well there are indeed several CMD tools that merge PDFs. However, these tools don't seem to, AFAIK, to preserve the forms in the original PDFs! These tools appear to simply just concatenate the printouts of all those PDFs into a single Printout, which is then presented as a single PDF.
Furthermore, If you printout a PDF file with forms into a file, you lose all the forms in it. This clearly not what I'm looking for.

I have found success using pdftk, which is an open-source software that runs on linux and can be called from your terminal.
To concatenate multiple pdfs into one (and preserve form-fillable elements), you can use the following command:
pdftk input1.pdf input2.pdf cat output output-file.pdf

Related

Can VSCode display a binary file if there is an executable that will convert it to an equivalent text file?

I sometimes use VSCode for a Delphi 7 project because I like VSCode's git functionality and for a few other reasons (superior string search, diff, etc).
Delphi 7 is a pain, and to get it to consistently compile I need to convert the dfm files to their binary version (all 2300 of them). This of course makes them unviewable in the diff viewer, or to just open the file?
Is there a setting where if I open that file, it will first pass it through the convert.exe (that's its actual name) util so that it can be viewed as a text? I understand that this might be read-only, which would be sufficient to my needs (though if on save it could just pass it back through, that'd be great too).
I'm having trouble figuring out what exactly to to search for on Google (the keywords seem too generic), but I can imagine some generalized functionality that would work for other environments beyond just Delphi/pascal.

Index PDF files and generate keywords summary

I have a large amount of PDF files in my local filesystem I use as documentation base and I would like to create an index of these files.
I would like to :
Parse the contents of the PDF files to get keywords.
Select the most relevant keywords to make a summary.
Create static HTML pages for some keywords with entries linked to the appropriate files.
My questions are :
Is there an existing tool to perform the whole job ?
What is the most appropriate tool to parse PDF files content, filter (by words size) and counting the words?
I consider using Perl, swish-e, pdfgrep to make a script. Do you know other tools which could be useful?
Given that points 2 and 3 seem custom I'd recommend to have your own script, use a tool out of it to parse pdf, process its output as you please, and write HTML (perhaps using another tool).
Perl is well suited for that, since it excels in processing that you'll need and also provides support for working with all kinds of file formats, via modules.
As for reading pdf, here are some options if your needs aren't too elaborate
Use CAM::PDF (and CAM::PDF::PageText) or PDF-API2 modules
Use pdftotext from the poppler library (probably in poppler-utils package)
Use pdftohtml with -xml option, read the generated simple XML file with XML::libXML or XML::Twig
The last two are external tools which you use via Perl's builtins like system.
The following text processing, to build your summary and design the output, is precisely what languages like Perl are for. The couple of tasks that are mentioned take a few lines of code.
Then write out HTML, either directly if simple or using a suitable module. Given your purpose, you may want to look into HTML::Template. Also see this post, for example.
Full parsing of PDF may be infeasible, but if the files aren't too complex it should work.
If your process for selecting keywords and building statistics is fairly common, there are integrated tools for document management (search for bibliography managers). However, I think that most of them resort to external tools to parse pdf so you may still be better off with your own script.

How to create reports containing text and figures with MATLAB

I am using a MATLAB script to tune the control system on a machine. When the tuning is complete, I would like a report containing text (especially serial number, date/time and the values determined during tuning) and plots, especially transfer functions.
What do to you recommend?
Whatever solution I use should be compatible with the MATLAB compiler so I can distribute my solution to a team of field engineers.
Ideally the report will be a PDF document.
The MATLAB report generator does not seem to be the right product as it appears that I have to break up my script into little pieces and embed them in the report template. My script contains opportunities for the user to intervene and change values or reject the tune if plots don't look right and my hunch is that this will be difficult if the code runs from the report generator. Also, I fear code structure and maintainability will be lost if the code structure is determined by the requirements of the report template.
Please comment if my assumptions are wrong.
UPDATE
I have now switched to use the MATLAB Report Generator with release r2016b and it is working very well for my compiled code users. Unfortunately it means that colleagues who have a MATLAB licence need to buy the Report Generator too, to use my tools scripted.
As the MATLAB Report Generator's development manager, I am concerned that this question may leave the wrong impression about the Report Generator's capabilities.
For one thing, the Report Generator does not require you to break a script up into little pieces and run them inside a template. You can do this if you choose and in some circumstances, it makes sense, but it is not a requirement. In fact, many Report Generator applications use a MATLAB script or program to interact with a user, generate data in the MATLAB workspace, and as a final step, generate a report from the workspace data.
Moreover, as of the R2014b version, the MATLAB Report Generator comes with a document generation API, called the DOM API, that allows you to embed document generation statements in a MATLAB program. For example, you can programmatically create a document object, add and format text, paragraphs, tables, images, lists, and subdocuments, and output Microsoft Word, HTML, or PDF output, depending on the output type you select. You can even programmatically fill in the blanks in forms that you create, using Word or an HTML editor.
The API runs on Windows, Linux, and Mac platforms and generates Word and HTML output on all three, without the use of Word. On Windows, it uses Word under the hood to produce PDF output from the Word documents that it generates.
The latest release of the MATLAB Report Generator introduces a PowerPoint API with capabilities similar to the DOM API. If you need to include report generation in your MATLAB application, please don't rule out the MATLAB Report Generator based on past impressions. You may be surprised at just how powerful it has become.
I've done this quite a bit. You're right that MATLAB Report Generator is typically not a great solution. #Max suggests the right approach (automating Word through its COM interface), but I'd add a few extra comments and tips, based on my experiences.
Remember that if you're going with this solution, you are depending that your end-users will be running Windows, and have a copy of Office on their machine. If you want to ultimately produce a PDF report, that will need to be Office 2010 or above.
I would bet that you'll find it easier to automate the report generation in Excel rather than Word. Given that you're producing a report from MATLAB, you'll likely be wanting quite a lot of things in tables of numbers, which are easier to lay out in Excel.
If you are going to do it in Word, the easiest way is to first (without MATLAB) create a template .doc/.docx file, which contains any generic text that will be the same for all reports and blank tables for any information. Turn on track changes, and insert empty comments at each point that you will be filling in information. Then within your report creation routine in MATLAB, connect to Word and iterate through each comment, replacing it with whatever data you wish.
If you are learning to automate Excel from MATLAB, this page from the Excel Interop documentation is really helpful. There's an equivalent one for Word.
Unlike #Max, I've never had good results by saving figures to an .emf file and then inserting them. In theory that does preserve editability, but I've never found that valuable. Instead, get the figure looking right (and the right size) in MATLAB, then copy it to the clipboard with print(figHandle, 'dbitmap') and paste to Excel with Worksheet.Range('A1').PasteSpecial.
To save as a PDF, use Workbook.ExportAsFixedFormat('xlTypePDF', pathToOutputFile).
Hope that helps!
I think you are right about the report generator.
In my opinion the fastest/easiest approach would be to generate the report in a html document. For that you just need the figures and write a text file, conversion should be trivial.
Quite similar approach would be to create a Latex file. And then create a pdf from it - though for this you'd need to install latex on your deployed machines.
Lastly you could use the good integration of Java in Matlab. There are several libraries you could use - like this. But I wonder if all the complication will be worth it.
Have you considered driving Microsoft Word through its ActiveX interface? I've done this in compiled Matlab programs and it works well. Look at the Matlab help for actxserver(): The object you want to create is of type Word.Application.
Edit to add: To get figures into the document, save them as .emf files using the -dmeta argument to print(), then add them to the document like this:
WordServer.Selection.InlineShapes.AddPicture(fileName);

Screen scraping: Automating a vim script

In vim, I loaded a series of web pages (one at a time) into a vim buffer (using the vim netrw plugin) and then parsed the html (using the vim elinks plugin). All good. I then wrote a series of vim scripts using regexes with a final result of a few thousand lines where each line was formatted correctly (csv) for uploading into a database.
In order to do that I had to use vim's marking functionality so that I could loop over specific points of the document and reassemble it back together into one csv line. Now, I am considering automating this by using Perl's "Mechanize" library of classes (UserAgent, etc).
Questions:
Can vim's ability to "mark" sections of a document (in order to
perform substitutions on) be accomplished in Perl?
It was suggested to use "elinks" directly - which I take to mean to
load the page into a headless browser using ellinks and perform Perl
scripts on the content from there(?)
If that's correct, would there become a deployment problem with
elinks when I migrate the site from my localhost LAMP stack setup to
a hosting company like Bluehost?
Thanks
Edit 1:
TYRING TO MIGRATE KNOWLEDGE FROM VIM TO PERL:
If #flesk (below) is right, then how would I go about performing this routine (written in vim) that "marks" lines in a text file ("i" and "j") and then uses that as a range ('i,'j) to perform the last two substitutions?
:g/^\s*\h/d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|
\ 'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
I am not seeing this capability in the perldoc perlre manual. Am I missing either a module or some basic Perl understanding of m/ or qr/ ??
I'm sure all you need is some kind of HTML parser. For example I'm using HTML::TreeBuilder::XPath.

How can I search and replace in a PDF document using Perl?

Does anyone know of a free Perl program (command line preferable), module, or anyway to search and replace text in a PDF file without using it like an editor.
Basically I want to write a program (in Perl preferably) to automate replacing certain words (e.g. our old address) in a few hundred PDF files. I could use any program that supports command line arguments. I know there are many modules on CPAN that manipulate or create pdfs but they don't have (that I've seen) any sort of simple search and replace.
Thanks in advance for any and all advice!!!
Take a look at CAM::PDF. More specifically the changeString method.
How did you generate those PDFs in the first place? Search-and-replace in the original sources and re-generate PDFs seems to be more viable. Direct editing PDFs can be very difficult, and I'm not aware of any free tools that can do it easily.