Batch merging image files to pdf files using perl in windows

Batch merging image files to pdf files using perl in windows - perl

I have a bunch of image files in this naming format:
313024_Page_1_Image_0001.png
313024_Page_1_Image_0002.png
313025_Page_1_Image_0001.png
313025_Page_1_Image_0002.png
313025_Page_2_Image_0001.png
And I would like to convert the files with the same numbers (pre "Page_") to a single pdf with that name. For example, using the above five files:
313024_Page_1_Image_0001.png
313024_Page_1_Image_0002.png
would merge to 313024.pdf
and
313025_Page_1_Image_0001.png
313025_Page_1_Image_0002.png
313025_Page_2_Image_0001.png
would merge to 313025.pdf
I would like to be able to run this script in Perl in windows.
Thanks in advance,
Jake

Imagemagick includes a convert program that will take PNG files and make PDF files from them, e.g.:
$ convert source.png -compress zip source.pdf
You can also append image files into a larger image file, before converting to PDF:
$ convert {listOfImageFilenames} -append -compress zip verticallyStitchedFilename.pdf
You can run this within a Perl script via system() or through the Imagemagick API (example).
You'll probably need to adjust these calls for the special way that Microsoft Windows does things, but it shouldn't be too hard.

Related

7zip command line powershell extract and view txt compressed

I use the command to extract only the folders containing a specific name, but now I want to know if it is possible to make 7zip extract only .txt files that have at least one word of my choice in them. Like for example
& 'C:\Program Files\7-zip7z.exe' x *.* -o(directory) *.txt (word I want to have inside the .txt e.g. password)
I used this command as an example of what I wanted it to look like, but in reality it only works for extracting just the filename
If you also have another way to do this with compressed files other than 7zip, I will be happy with any help

Convert a folder containing asciidocs and pictures to pdf

I would like to convert this book Mastering the Lightning Network, which is freely available through GitHub to a pdf for personal use.
Unfortunately, I have only figured out how to "translate" single files using asciidoc or asciidoctor-pdf. The options for folders don't seem to work with the configuration of the repository.
There has to be an easy way to translate everything, including all files and pictures. Would be very thankful if somebody could help me out.

As far as I know it is not possible to convert a folder containing AsciiDoc files to a pdf, a simple script could do it but the problem would be in what order do you want your files to be converted?
The simplest solution for you is to create your own content.adoc file and use the include macro to select what files you want to convert and in what order, it could look something like this:
= Mastering the Lightning Network
include::01_introduction.asciidoc[]
include::02_getting_started.asciidoc[]
include::03_how_ln_works.asciidoc[]
include::04_node_client.asciidoc[]
include::05_node_operations.asciidoc[]
include::06_lightning_architecture.asciidoc[]
include::07_payment_channels.asciidoc[]
include::08_routing_htlcs.asciidoc[]
include::09_channel_operation.asciidoc[]
include::10_onion_routing.asciidoc[]
include::11_gossip_channel_graph.asciidoc[]
include::12_path_finding.asciidoc[]
include::13_wire_protocol.asciidoc[]
include::14_encrypted_transport.asciidoc[]
include::15_payment_requests.asciidoc[]
include::16_security_privacy_ln.asciidoc[]
include::17_conclusion.asciidoc[]
and you convert using asciidoctor-pdf content.adoc

You could try using imagemagick:
magick *.jpg out.pdf

pytesseract results different from tesseract command line results

I am trying to convert a scanned page to text using both pytesseract and tesseract command line on Ubuntu. The results are remarkably different (pytesseract performs way better than tesseract command line) and I am unable to understand why. I looked at the default values for the parameters and tried altering some of the parameter values in tesseract command line (like psm ) but I am unable to get the same result as pytesseract. Due to lack of proper documentation in pytesseract I am not able to figure out what default values for parameters are used.
Here is my pytesseract code
print(pytesseract.image_to_string(Image.open('test.tiff'))

Looking at the source code of pytesseract, it seems the image is always converted into a .bmp file.
Working with a .bmp file and psm of 6 at the command line with Tesseract gives same result as pytesseract.
Also, tesseract can work with uncompressed bmp files only. Hence, if ImageMagick is used to convert .pdf to .bmp, the following will work
convert -density 300 -quality 100 mypdf.pdf BMP3:mypdf.bmp
tesseract mypdf.bmp -psm 6 mypdf txt

In tessaract v5 3.0+
Pytessaract does not convert images to BMP. You can verify this by commenting out cleanup(f.name) in the save context manager, which is found within the source code /pytesseract/pytesseract.py. The filename of the temp file will also need to retrieved (Pytessaract was saving files within temp files directory of the user, ie. "[path-to-user]\AppData\Local[file-name]". I found what Pytesseract is actually doing is in the prepare function.
Basically, taking the temp file and using that same file with the tesseract command directly will yeild the same results

How do you trim the XMP XML contained within a jpg

Through the use of sanselan I've found that the root cause of iPhone photos imported to windows becoming uneditable is that there is content (white space?) after the actual XML (for more details and a linked example of the bad XMP XML see https://apple.stackexchange.com/questions/45326/why-can-i-not-edit-some-photos-imported-from-an-iphone-to-windows-vista).
I'd like to scan through my photo archive and 'trim' the XMP XML.
Is there an easy way to do this?
I have some java code that can recursively navigate my photo archive and DETECT the issue. I'm not sure how to trim and write the XML back though.

Obtain the existing XML using any means.
The following works if using the Apache Sanselan library:
String xmpXml = Sanselan.getXmpXml(new File('/path/to/jpeg'));
Then trim it...
xmpXml = xmpXml.trim();
Then write it back to the file using the solution to serializing Xmp XML to an existing jpeg.

try the following steps:
collect all of the photos in a single folder (e.g. folder xmlToConvert on your Desktop)
open a Terminal.app window
cd to the directory you put the files in (e.g. cd ~/Desktop/xmlToConvert)
run the following command from your command line prompt
mkdir converted ; for f in *.xml ; do cat $f | head -n $(wc -l $f) > converted/$f ; done
the converted/ sub-directory should now contain all the files without the whitespace at the end.
(i.e. a folder called converted in the xmlToConvert you created on your Desktop)
hth

Basic script to replace image files in multiple directories

I have a situation where we have several thousand image files that have become corrupted on our server (Windows 2008 R2 x64). I have a working image file that I want to replace the corrupt files with. The files must retain the same name and path (size, timestamps, etc do not matter).
So the basic idea would be to replace each corrupt image file with the working file.
I do not write code, only the occasional windows batch file.
Should I use VB or PowerShell (or something else) for this? What will the script look like for this?
I apologize in advance if this question is too basic for stackoverflow.

You don't really need a batch file,
try looking at the for command
e.g.
FOR /R %f in (*.jpg) DO copy newfile.jpg "%f"
This should do a recursive search and copy newfile.jpg over the jpg's it finds.
It all boils down to how you are identifying the broken jpgs.

When I dont use a wild card for example
FOR /R %f in (broken.jpg) DO copy newfile.jpg "%f"
Then newfile.jpg gets copied to every subdirectory. If I use a wildcard ( *,?) the command works as expected. Is there a way to have this commend work with a (set) that does not contain wildcards?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Batch merging image files to pdf files using perl in windows - perl

Related

7zip command line powershell extract and view txt compressed

Convert a folder containing asciidocs and pictures to pdf

pytesseract results different from tesseract command line results

How do you trim the XMP XML contained within a jpg

Basic script to replace image files in multiple directories

Categories

Resources