I am running below script to create images from postscript file, the images are coming but on first page watermark is not there.
gs -dUseCIEColor -dNOPAUSE -sDEVICE=jpeg -dFirstPage="1" -dLastPage=2 -sOutputFile=outputImage_%0d_A.gif -dJPEGQ=100 -r300 -q inputFile.ps -c quit;'
I am giving the link of ps file which i am using.
http://speedy.sh/Y7vWj/inputFile.ps
Can anybody please help!!!!
Thanks in advance...
OK you haven't stated what version of Ghostscript you are using, nor have you been very clear about what is missing. By 'watermak' do you mean the dark grey text 'PAULDAVIS' written diagonally across the very dark grey rectangle ?
If so then I can see that using the current version of Ghostscript and your command line, its not missing
A few observations on your command line:
-dUseCIEColor - Don't use this unless you know exactly what you are doing and why you want this, I'm guessing you don't (because you have not set any Color Rendering Dictionary). With this you get very dark grey text which is nearly invisible against the very dark grey rectangle. Not surprising since this relates to colour management.
You've set the device to jpeg, but you've set the output file to have a .gif extension.
You are using -dFirstPage and -dLastPage which have no effect when the input is not PDF (though this is added as a new feature in unreleased code).
You've set FirstPage=1 and LastPage=2 on a 2 page file.....
You have set -dFirstPage="1", which isn't going to work for any code which parses and uses it. The quotes won't work.
I'd recommend you do not set -q or -dQUIET when trying to diagnose problems, telling Ghostscript to be quiet will potentially mean you miss useful information.
-c quit; -c means 'process the next part of the command line as PostScript'. But quit; isn't valid PostScript (the semicolon should not be present) and will throw a PostScript error. If you want GS to exit after processing, consider simply using -dBATCH.
Related
I am running tesseract on windows 11 using the command prompt.
The text file is my training data. Words that I want to turn into images.
The output is the next step in the Tesseract process for training my font.
I am saying find fonts but I only have one font in the folder.
text2image --text="C:\PythonProjects\DiabloTesseractTrainFont\text.txt" --outputbase="C:\PythonProjects\DiabloTesseractTrainFont\Output\Dia.font.exp0" --fontconfig_tmpdir="C:\PythonProjects\DiabloTesseractTrainFont" --find_fonts --fonts_dir="C:\PythonProjects\DiabloTesseractTrainFont\Diablo Fonts"
The result:
Total chars = 223645
Font Exocet Light failed with 223518 hits = 99.94%
Not sure why it fails. I have built something similar to this before. I have tried with a font file that I know has worked and it does the exact same thing.
Any help would be appreciated.
I solved it. In the text file, there were some characters that had been changed when I read them into python. I believe they used to be bullet points but when I read the file I had implemented in python ASCII encoding and ignore errors. I figured that those characters would be removed. I was wrong. Those bullet points were replaced with text that said PAD. I found it in notepad++ and highlighted one of them and then replaced them with a space. Note in Notepad++ when I did the replace it did not have anything in the find field but it still replaced all of them. Now it compiles just fine. I was stuck for many hours I hope this helps someone.
everyone!
I am using ImageMagick-7.0.10-Q16 on Windows 10. I’ve tried Googling for answers, but I’m still left very confused about how to do this. Most of the answers have been for UNIX and not Windows, I have no idea what it means, or given me errors. I don’t have any experience with coding or Windows PowerShell, so forgive my slowness
I have scanned pages of books that have been split into four pieces of jpg files. The images are named after the page number and the orientation of the corresponding piece. BL=Bottom left. BR=Bottom right. TR=Top right. TL=Top left. (BM=Bottom pieces merged. TB=Top pieces merged). So “BL0001.jpg" is the bottomleft piece of page 1. I’m not mentioning their sizes because I don’t want them to be resized or whatever. I just want them to be combined via append like a puzzle like this:
Combined jpg pieces.
The borders and the text-boxes there are just to demonstrate, and are not to be included
So the files are for example like this:
BL0001.jpg
BR0001.jpg
TL0001.jpg
BR0001.jpg
BL0002.jpg
BR0002.jpg
TL0002.jpg
BR0002.jpg
And so on...
This was the last thing I’ve tried in Windows PowerShell:
magick convert B*0001.jpg +append 0001BM.jpg
magick convert T*0001.jpg +append 0001TM.jpg
magick convert 0001*.jpg +swap -append 0001merged.jpg
This combines 4 parts into one image just like I want it to. I found out adding * works like a wildcard and merges all the images like BR and TR together in one go. But I can’t do that for the page number (in this case ‘0001’ in ‘B*0001.jpg’), because that would merge all the files in the folder into the same image, something I don’t want. So what I want to figure out is to how to “batch” run this command for with a sequential numbering system for the different pages. In other words, use a command to batch combine pieces of an image into one image, but with all the scanned pages in jpg in the folder. I know the commands above create addition files with the merged top and bottom parts before the final merge, but I don’t know how to make this command otherwise. I'm willing to try other commands/things too
Using ImageMagick v7 in a simple Windows BAT script you could do something like this...
#echo off
setlocal EnableDelayedExpansion
for /l %%n in ( 1 1 9999 ) do (
set V1=000%%n
set V1=!V1:~-4!
magick *!V1!.jpg +append -crop 2x1# +swap -append +repage !V1!merged.jpg
)
exit /b
That uses a "for" loop to read all four "*0001.jpg" images at a time into an ImageMagick command. The "set V1=" lines are to make sure the variables have the correct number of leading zeros.
The IM command appends, crops, and appends the four images into the properly ordered output, and writes the image as "0001merged.jpg". Then it moves on to process "*0002.jpg" and so on.
I put a top limit on the number of image sets to process with that "9999" in the "for" command to work with the number of leading zeros. Make sure that number is the same or more than the number of image sets you have. It will just print an error for each loop after it goes over the number of image sets, but no harm done.
Note: Using ImageMagick v7 you should just use "magick" because when you use "magick convert" it emulates IMv6 behavior. You probably won't usually want that.
I'm attempting to crop an image using ImageMagick and via PowerShell. I can crop the image fine with the following command, and it creates the 2000+ images:
convert -crop 16x16 .\original.png tileOut%d.png
However, I would like to take advantage of ImageMagick's ability to dynamically set the file name.
According to a post on their forums I should be able to run something like the following via a batch file:
convert ^
bigimage.jpg ^
-crop 256x256 ^
-set filename:tile "%%[fx:page.x/256+1]_%%[fx:page.y/256+1]" ^
+repage +adjoin ^
tiled_%%[filename:tile].gif
I shouldn't need to escape the % since I'm running this in PowerShell directly, so I used the following:
convert -crop 16x16 .\original.png -set filename:tile "%[fx:page.x/16+1]_%[fx:page.y/16+1]" +repage +adjoin directory\tiled_%[filename:tile].png
However, when I run this command I end up with one file called tiled_%[filename and another called tiled_45_47.png.
So while it does seem to create the last file, it only creates the one. The first file is 0 bytes in size, but takes up over 8 MB of space on disc, according to properties on the file.
Trying to run the command in a batch file results in the same behavior, which makes me think PowerShell itself isn't the issue, but rather the command is.
According to the documentation +adjoin is required since I want different images. +repage doesn't make much sense to me, but I've kept it in the command since the original had it, and excluding it doesn't seem to change the output. -set filename seems pretty straightforward.
Large size of the first leads me to believe that all the previous images might be getting added to it. However, the file name also suggests it's getting hung up on the :, but it doesn't appear to be a special character in PowerShell. It's also creating an image for the very last crop. Baffling.
So what am I doing wrong?
Thanks in advance!
EDIT:
PowerShell 5.0.10586.0, on Windows 10.
ImageMagick 6.9.2 Q16 (64-bit)
From the comments, I'm thinking the issue might be with the ImageMagick command.
I'm not using Powershell, but I think you will have more success by specifying your image first, then the crop, then setting the filename:
convert original.png -crop 16x16 -set filename:tile "%[fx:page.x/16+1]_%[fx:page.y/16+1]" +repage "tiled_%[filename:tile].png"
So in the past I was using the following command to crop images, with the %d being automatically converted to a number based upon the sequence.
convert -crop 16x16 .\original.png directory\tileOut%d.png
That works perfectly fine. However, the example provided on that forum had the original file name listed as the first argument to the convert command. Changing my command so that it was listed first results in the expected behavior.
convert .\original.png -crop '16x16' -set 'filename:tile' '%[fx:page.x/16+1]_%[fx:page.y/16+1]' +repage +adjoin 'directory\tiled_%[filename:tile].png'
The use of single quotes in so many locations may not be required, but it works.
I'm working on document recognition for scanned bank statement. The statements that I have are organized by lines, such as the one attached. Because Tesseract does such a good job at detecting the areas of text, it breaks the lines in the middle (I'm assuming this is because of the large white space between the first block in the line (blurred for privacy reason), and the next one ('EUR', or 'COURS').
In the hocr file, the bbox of all the elements in the line are within 2px or so, so I could potentially rebuild a line myself. However, this seems more like a hack. Is there a way to tell Tesseract that lines should be as wide as the document itself? Or would there be another way to go about it? I've tried playing with the psm option, but with no luck.
-psm 6 -- Assume a single uniform block of text -- should work. If not, you may want to use the older version 2.0x, which does not perform page layout analysis.
I have an .svg file with only one color. I want to change this color to another and export it through command line. (I have to do it about 100 times, so doing it by hand doesn't work.) I use Inkscape at the moment.
I am able to change the background color and export with this command:
inkscape -f name.svt -e output_name.png -b #000080
But I cannot find the way to change the normal color.
I find this verb:
org.inkscape.color.replacecolor
But I don't know, how to add the color I want to use, somewhere I read, that I cannot add variables to verbes, but in that case how can be this verb used?
Thank you in advance.
This is not a general purpose solution, but since a SVG file is just XML internally, if your SVG file is simple enough you might be able to get away with a simple sed replacement. For example the following replaces #000000 (black) with #ffffff (white):
sed -e "s/#000000/#ffffff/" input.svg >output.svg
This may or may not be good enough for your needs.
Although inkscape is a powerful tool, I think you will have more luck (control) with xlst tranformations. This will allow you to do anything to an xml file rather than parsing the file as an svg image and relying on the available API. You might take a look at a tool like xsltproc for this.