Is there a way to catenate specific pages from multiple pdfs using pdftk? - pdftk

So say I have a few pdfs with. Some with a single page, some with 3, 4, 5 and so on…
I want to extract say the 4th page from the pdfs (ignoring ones without the 4th) page and merge them into a single pdf.
Tried something like
$ pdftk *.pdf cat 4 output merged.pdf
Going by this pdftk gives me a single page pdf output with the 4th page of the first input pdf.
Wondering whether I should write an elaborate script or if there’s an easier way
I do have a few workarounds where I burst the pdfs and then merge the pages I need but looking for something simpler.

Related

Merge 2 pdf files and preserve forms

I'd like to merge at least 2 PDF files into one while preserving all the form elements in the original PDFs. The form elements include text fields, radio buttons, check boxes, drop down menus and others. Please have a look at this sample PDF file with forms:
http://foersom.com/net/HowTo/data/OoPdfFormExample.pdf
Now try to merge it with any other arbitrary PDF file.
Can you do it?
EDIT: As for the implementation, I'd ideally prefer a command line solution on a linux plattform using open source tools such as 'ghostscript', or any other tool that you think is appropriate to solve this task.
Of course, everybody is welcome to supply any working solution to this problem, including a coded solution that involves writing a script which makes some API calls to a pdf-processing library. However, I'd suggest to take the path of least resistance first (CMD Solution).
Best Regards
EDIT #2: Well there are indeed several CMD tools that merge PDFs. However, these tools don't seem to, AFAIK, to preserve the forms in the original PDFs! These tools appear to simply just concatenate the printouts of all those PDFs into a single Printout, which is then presented as a single PDF.
Furthermore, If you printout a PDF file with forms into a file, you lose all the forms in it. This clearly not what I'm looking for.
I have found success using pdftk, which is an open-source software that runs on linux and can be called from your terminal.
To concatenate multiple pdfs into one (and preserve form-fillable elements), you can use the following command:
pdftk input1.pdf input2.pdf cat output output-file.pdf

ImageMagick: How to batch append 4 parts of images into one (2 rows, 2 columns) (I have 500+ images that need to be combined like this)

everyone!
I am using ImageMagick-7.0.10-Q16 on Windows 10. I’ve tried Googling for answers, but I’m still left very confused about how to do this. Most of the answers have been for UNIX and not Windows, I have no idea what it means, or given me errors. I don’t have any experience with coding or Windows PowerShell, so forgive my slowness
I have scanned pages of books that have been split into four pieces of jpg files. The images are named after the page number and the orientation of the corresponding piece. BL=Bottom left. BR=Bottom right. TR=Top right. TL=Top left. (BM=Bottom pieces merged. TB=Top pieces merged). So “BL0001.jpg" is the bottomleft piece of page 1. I’m not mentioning their sizes because I don’t want them to be resized or whatever. I just want them to be combined via append like a puzzle like this:
Combined jpg pieces.
The borders and the text-boxes there are just to demonstrate, and are not to be included
So the files are for example like this:
BL0001.jpg
BR0001.jpg
TL0001.jpg
BR0001.jpg
BL0002.jpg
BR0002.jpg
TL0002.jpg
BR0002.jpg
And so on...
This was the last thing I’ve tried in Windows PowerShell:
magick convert B*0001.jpg +append 0001BM.jpg
magick convert T*0001.jpg +append 0001TM.jpg
magick convert 0001*.jpg +swap -append 0001merged.jpg
This combines 4 parts into one image just like I want it to. I found out adding * works like a wildcard and merges all the images like BR and TR together in one go. But I can’t do that for the page number (in this case ‘0001’ in ‘B*0001.jpg’), because that would merge all the files in the folder into the same image, something I don’t want. So what I want to figure out is to how to “batch” run this command for with a sequential numbering system for the different pages. In other words, use a command to batch combine pieces of an image into one image, but with all the scanned pages in jpg in the folder. I know the commands above create addition files with the merged top and bottom parts before the final merge, but I don’t know how to make this command otherwise. I'm willing to try other commands/things too
Using ImageMagick v7 in a simple Windows BAT script you could do something like this...
#echo off
setlocal EnableDelayedExpansion
for /l %%n in ( 1 1 9999 ) do (
set V1=000%%n
set V1=!V1:~-4!
magick *!V1!.jpg +append -crop 2x1# +swap -append +repage !V1!merged.jpg
)
exit /b
That uses a "for" loop to read all four "*0001.jpg" images at a time into an ImageMagick command. The "set V1=" lines are to make sure the variables have the correct number of leading zeros.
The IM command appends, crops, and appends the four images into the properly ordered output, and writes the image as "0001merged.jpg". Then it moves on to process "*0002.jpg" and so on.
I put a top limit on the number of image sets to process with that "9999" in the "for" command to work with the number of leading zeros. Make sure that number is the same or more than the number of image sets you have. It will just print an error for each loop after it goes over the number of image sets, but no harm done.
Note: Using ImageMagick v7 you should just use "magick" because when you use "magick convert" it emulates IMv6 behavior. You probably won't usually want that.

Combining pdftk strings for specific pages

I've checked "Similar questions" and went through a lot of search but I can't seem to find a way to combine the snippets I already figured out; would be awesome if someone is able to help.
Using pdftk, alternatively running through PowerShell
I got two .pdf files (f.e.: A=1000 pages, B=5000 pages) which I need to combine in a specific way to generate a new .pdf file. In detail I need page 1-3, 4-6[...] of file A merged with page 1-4, 4-8[...] of file B with a blank page between 1-3 & 4-6.
So far I figured how to burst the files, add a blank page and combine them to a new .pdf file. Yet I'm only able to that for one needed document at a time (a new file with 8 pages).
pdftk fileC.pdf fileD.pdf cat output fileE.pdf
pdftk A=fileE.pdf B=blankpage.pdf cat A1-1 B1-1 A2-4 output conclusion.pdf
Now I'm wondering if there's a way to output the complete file with a command? Otherwise I'd have to do it for every merge of two long files.
Thanks in advance!

How to export a PDF with figures on multiple pages?

I am trying to export a larger number of Matlab figures that are generated in for loop to a single PDF file. Right now the best thing I could come up with is to all print them to a PostScrip file using the -append option like this:
print('Temp_Plots','-dpsc','-append')
After that I could convert the PS file to a PDF file. This workflow was okay until I started to use plots with 2 y axis. Unfortunately it seems like Matlab's PS export cannot properly handle this situation and does not color the lines appropriately.
As there is no -append option for the direct PDF export what other methods do I have to append all my plots to a single file without losing the assigned colors or other hickups?
I would recommend trying out the publish command and push that to its limits first.
Following the documentation:
options = struct('format','pdf','outputDir','C:\myPublishedOutput');`
publish('myCode.m',options);
Take a look at Publishing Markup to see how to get the look you want.
This search brings up some possibly related posts, but none that I saw that directly match your issue.
References:
1. Publishing Markup (Mathworks)
2. Output Preferences for Publishing (Mathworks)
3. Publishing M-Files in MATLAB
4. Publish Your Work in Matlab

MATLAB How to delete a specific page from a .pdf File?

I recently learned how to download .pdf files using urlwrite, but I was wondering if there is any way to specify which pages of the .pdf to save.
The files are always either 1 or 2 pages long, and I only want to keep the first page of the .pdf. Is there any way to directly download just the first page, and if not, is there a way to download the entire .pdf and then get rid of the 2nd page?
I know that it is possible to manually get rid of the second page in Preview or Adobe Acrobat and other applications, but it'd make things a lot easy if I could automate the process in MATLAB.
Any help would be greatly appreciated!
Find an appropriate command line tool (example uses pdftk), and then you can make a call to it from MATLAB. Use sprintf to assemble the appropriate command and then pass it to system. This puts the output in a temporary file then uses movefile to change the filename back:
temp = 'sometempfile.pdf';
urlwrite(someurl, filename);
system(sprintf('pdftk %s cat 1 output %s dont_ask',filename,temp));
movefile(temp, filename);