ImageMagick: How to batch append 4 parts of images into one (2 rows, 2 columns) (I have 500+ images that need to be combined like this) - powershell

everyone!
I am using ImageMagick-7.0.10-Q16 on Windows 10. I’ve tried Googling for answers, but I’m still left very confused about how to do this. Most of the answers have been for UNIX and not Windows, I have no idea what it means, or given me errors. I don’t have any experience with coding or Windows PowerShell, so forgive my slowness
I have scanned pages of books that have been split into four pieces of jpg files. The images are named after the page number and the orientation of the corresponding piece. BL=Bottom left. BR=Bottom right. TR=Top right. TL=Top left. (BM=Bottom pieces merged. TB=Top pieces merged). So “BL0001.jpg" is the bottomleft piece of page 1. I’m not mentioning their sizes because I don’t want them to be resized or whatever. I just want them to be combined via append like a puzzle like this:
Combined jpg pieces.
The borders and the text-boxes there are just to demonstrate, and are not to be included
So the files are for example like this:
BL0001.jpg
BR0001.jpg
TL0001.jpg
BR0001.jpg
BL0002.jpg
BR0002.jpg
TL0002.jpg
BR0002.jpg
And so on...
This was the last thing I’ve tried in Windows PowerShell:
magick convert B*0001.jpg +append 0001BM.jpg
magick convert T*0001.jpg +append 0001TM.jpg
magick convert 0001*.jpg +swap -append 0001merged.jpg
This combines 4 parts into one image just like I want it to. I found out adding * works like a wildcard and merges all the images like BR and TR together in one go. But I can’t do that for the page number (in this case ‘0001’ in ‘B*0001.jpg’), because that would merge all the files in the folder into the same image, something I don’t want. So what I want to figure out is to how to “batch” run this command for with a sequential numbering system for the different pages. In other words, use a command to batch combine pieces of an image into one image, but with all the scanned pages in jpg in the folder. I know the commands above create addition files with the merged top and bottom parts before the final merge, but I don’t know how to make this command otherwise. I'm willing to try other commands/things too

Using ImageMagick v7 in a simple Windows BAT script you could do something like this...
#echo off
setlocal EnableDelayedExpansion
for /l %%n in ( 1 1 9999 ) do (
set V1=000%%n
set V1=!V1:~-4!
magick *!V1!.jpg +append -crop 2x1# +swap -append +repage !V1!merged.jpg
)
exit /b
That uses a "for" loop to read all four "*0001.jpg" images at a time into an ImageMagick command. The "set V1=" lines are to make sure the variables have the correct number of leading zeros.
The IM command appends, crops, and appends the four images into the properly ordered output, and writes the image as "0001merged.jpg". Then it moves on to process "*0002.jpg" and so on.
I put a top limit on the number of image sets to process with that "9999" in the "for" command to work with the number of leading zeros. Make sure that number is the same or more than the number of image sets you have. It will just print an error for each loop after it goes over the number of image sets, but no harm done.
Note: Using ImageMagick v7 you should just use "magick" because when you use "magick convert" it emulates IMv6 behavior. You probably won't usually want that.

Related

ImageMagick crop with row/column in file name only saving last image

I'm attempting to crop an image using ImageMagick and via PowerShell. I can crop the image fine with the following command, and it creates the 2000+ images:
convert -crop 16x16 .\original.png tileOut%d.png
However, I would like to take advantage of ImageMagick's ability to dynamically set the file name.
According to a post on their forums I should be able to run something like the following via a batch file:
convert ^
bigimage.jpg ^
-crop 256x256 ^
-set filename:tile "%%[fx:page.x/256+1]_%%[fx:page.y/256+1]" ^
+repage +adjoin ^
tiled_%%[filename:tile].gif
I shouldn't need to escape the % since I'm running this in PowerShell directly, so I used the following:
convert -crop 16x16 .\original.png -set filename:tile "%[fx:page.x/16+1]_%[fx:page.y/16+1]" +repage +adjoin directory\tiled_%[filename:tile].png
However, when I run this command I end up with one file called tiled_%[filename and another called tiled_45_47.png.
So while it does seem to create the last file, it only creates the one. The first file is 0 bytes in size, but takes up over 8 MB of space on disc, according to properties on the file.
Trying to run the command in a batch file results in the same behavior, which makes me think PowerShell itself isn't the issue, but rather the command is.
According to the documentation +adjoin is required since I want different images. +repage doesn't make much sense to me, but I've kept it in the command since the original had it, and excluding it doesn't seem to change the output. -set filename seems pretty straightforward.
Large size of the first leads me to believe that all the previous images might be getting added to it. However, the file name also suggests it's getting hung up on the :, but it doesn't appear to be a special character in PowerShell. It's also creating an image for the very last crop. Baffling.
So what am I doing wrong?
Thanks in advance!
EDIT:
PowerShell 5.0.10586.0, on Windows 10.
ImageMagick 6.9.2 Q16 (64-bit)
From the comments, I'm thinking the issue might be with the ImageMagick command.
I'm not using Powershell, but I think you will have more success by specifying your image first, then the crop, then setting the filename:
convert original.png -crop 16x16 -set filename:tile "%[fx:page.x/16+1]_%[fx:page.y/16+1]" +repage "tiled_%[filename:tile].png"
So in the past I was using the following command to crop images, with the %d being automatically converted to a number based upon the sequence.
convert -crop 16x16 .\original.png directory\tileOut%d.png
That works perfectly fine. However, the example provided on that forum had the original file name listed as the first argument to the convert command. Changing my command so that it was listed first results in the expected behavior.
convert .\original.png -crop '16x16' -set 'filename:tile' '%[fx:page.x/16+1]_%[fx:page.y/16+1]' +repage +adjoin 'directory\tiled_%[filename:tile].png'
The use of single quotes in so many locations may not be required, but it works.

ghostscript not creating exact images

I am running below script to create images from postscript file, the images are coming but on first page watermark is not there.
gs -dUseCIEColor -dNOPAUSE -sDEVICE=jpeg -dFirstPage="1" -dLastPage=2 -sOutputFile=outputImage_%0d_A.gif -dJPEGQ=100 -r300 -q inputFile.ps -c quit;'
I am giving the link of ps file which i am using.
http://speedy.sh/Y7vWj/inputFile.ps
Can anybody please help!!!!
Thanks in advance...
OK you haven't stated what version of Ghostscript you are using, nor have you been very clear about what is missing. By 'watermak' do you mean the dark grey text 'PAULDAVIS' written diagonally across the very dark grey rectangle ?
If so then I can see that using the current version of Ghostscript and your command line, its not missing
A few observations on your command line:
-dUseCIEColor - Don't use this unless you know exactly what you are doing and why you want this, I'm guessing you don't (because you have not set any Color Rendering Dictionary). With this you get very dark grey text which is nearly invisible against the very dark grey rectangle. Not surprising since this relates to colour management.
You've set the device to jpeg, but you've set the output file to have a .gif extension.
You are using -dFirstPage and -dLastPage which have no effect when the input is not PDF (though this is added as a new feature in unreleased code).
You've set FirstPage=1 and LastPage=2 on a 2 page file.....
You have set -dFirstPage="1", which isn't going to work for any code which parses and uses it. The quotes won't work.
I'd recommend you do not set -q or -dQUIET when trying to diagnose problems, telling Ghostscript to be quiet will potentially mean you miss useful information.
-c quit; -c means 'process the next part of the command line as PostScript'. But quit; isn't valid PostScript (the semicolon should not be present) and will throw a PostScript error. If you want GS to exit after processing, consider simply using -dBATCH.

Defining what is a line in Tesseract

I'm working on document recognition for scanned bank statement. The statements that I have are organized by lines, such as the one attached. Because Tesseract does such a good job at detecting the areas of text, it breaks the lines in the middle (I'm assuming this is because of the large white space between the first block in the line (blurred for privacy reason), and the next one ('EUR', or 'COURS').
In the hocr file, the bbox of all the elements in the line are within 2px or so, so I could potentially rebuild a line myself. However, this seems more like a hack. Is there a way to tell Tesseract that lines should be as wide as the document itself? Or would there be another way to go about it? I've tried playing with the psm option, but with no luck.
-psm 6 -- Assume a single uniform block of text -- should work. If not, you may want to use the older version 2.0x, which does not perform page layout analysis.

Reading image files serially from a folder in matlab

I tried to read .jpg files from a folder in matlab using dir command. But I am not getting them from the first image stored in the folder. Instead it started from 10th image. I want to know how to read the files serially starting from the beginning.
I am almost certain that if you are using a simple enough command, it will give you all files that are there. However, this may not seem to be the case because of this little line in the description:
Results appear in the order returned by the operating system.
This may mean, that you will first see files like 1, 100, 1000,1999 and only later a file with numer 2. Of course you can sort the results after you have collected them, and then process them in your desired order.
For completenes, one would like to have something like:
dir *.jpg
or if you want to be sure to catch everything that even remotely resembles .jpg:
dir *.*j*p*g*

Diff tool to align shuffled lines

Suppose I have two documents that are identical except the lines are shuffled. Is there a tool that can show me which lines in document A correspond to which lines on document B by drawing lines to connect them (kinda like Cairo does for machine translation word alignments)?
What if the files have some level of differing lines (I don't want to figure out which lines are similar to each other -- if there isn't an exact match for a line, then that line has no match.)
Note: I am not looking to sort the files and compare them, rather I am looking to get a visualization of how far out of order the files are relative to each other, and which particular regions tend to move together, and which tend to be shuffled.
Windiff will show you the line in the left file it thinks the line in the right file came from, but it's often mistaken when lines are the same (e.g. a line with just a } in a cc file).
I just discovered psame in a google search which (at least algorithmically) does the same thing.