I'm new to Tesseract and investigating how it works.
But in some cases it failes to recognise the simpliest text ("0")
I've checked the processed image and it looks pretty clear to me.
Any suggestions what might be wrong?
Source image: and tessinput.tif:
$ ./tesseract.exe /c/dev/git/fifa/proclubs-stats/assist.png stdout conf.txt
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 450
Empty page!!
Estimating resolution as 450
Empty page!!
Related
My Java project deals with OCRing pdfs to index them. Each pdf page is converted into a png which is then piped to tesseract 4.
The pdf->png conversion uses renderImageWithDPI from PDFBox PdfRenderer :
buffImage = pdfRenderer.renderImageWithDPI(currentPage,
PNG_DENSITY,
ImageType.GRAY);
with PNG_DENSITY = 300 as advised on tesseract's wiki to get best results.
The OCR command is
The command used for tesseract is
tesseract input.png output -l fra --psm 1 --oem 1
I also tryed --psm 2 or 3 which also involve page segmentation ie
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
With a scanned PDF (producer/creator is Adobe Acrobat 7.0, which involves copyrighted content so I can't share it) of 146 pages, tesseract makes endless computations (the process never ends) on a given page (85).
As it was too long to test (ie : wait until page 85 gets OCRed), I decided to generate an extract of this pdf with Evince "print to file" feature.
Now the pdf produced by Evince (producer/creator is cairo 1.14.8), Tesseract handles it successfully (ie the image gets OCRed).
The difference is the image resolution. The image that fails is 4991x3508 pixels whereas the one that succeeds is only 3507x2480 pixels.
Please note : tesseract in "Sparse text with OSD" (ie --psm 12) handles the page "successfully" although the text (on 2 columns) is not understandable (ie the 2 columns are mixed)
EDIT after several trials and errors
It looks like the input image has to have a width strictly less than 4000 pixels to work with page segmentation. Looking at Tesseract source code, in a class called "pgedit" the canvas size seems limited to 4000 x 4000 as the constructor of a "ScrollView" (for whatever it serves) is :
ScrollView::ScrollView(const char* name, int x_pos, int y_pos, int x_size,
int y_size, int x_canvas_size, int y_canvas_size, bool y_axis_reversed)
So my question now is, why is there a limit of 4000 pixels wide / high to use page segmentation, and what should I do if a pdf page converted to png at 300dpi exceeds 4000 pixels (either wide or high or both) ?
Any help appreciated,
I'm currently experimenting with webp encoder (no wic) on windows 64 environment. My samples are 10 jpg stock photos depicting landscapes and houses, and the photos already optimized in jpegtran. I do this because my goal is to optimize the images of a whole website where the images have already been compressed with photoshop using the save for web command with various values on quality and then optimized with jpegtran.
I found out that using values smaller than -q 85 have a visual impact on the quality of the webp images. So I'm playing with values above 90 where the difference is smaller. I also concluded that I have to use -jpeg_like because without it the output is sometimes bigger in size than the original, which is not acceptable. I also use -m 6 -f 100 -strong because I really don't mind about the time the encoder needs to produce the output and trying to achieve the smoother results. I tried several values for these and concluded that -m 6 -f 100 -strong have the best output regarding quality and size.
I also tried the -preset photo avoiding any other parameter except -q but the size of the output gets bigger.
What I don't understand from https://developers.google.com/speed/webp/docs/cwebp#options are the options -sns , -segments which seem to have a great impact on the output size. Sometimes the output is bigger and sometimes smaller in size for the same options but I haven't concluded yet what is the reason for that and how to properly use them.
I also don't understand the -sharpness option which doesn't have an impact at the output size at least for me.
My approach is far less than a scientific approach and more like a trial and error method and If anybody knows how to use those options for the specific input and explain them for optimum results I would appreciate such a feedback.
-strong and -sharpness only change the strength of the filtering in the header of the compressed bitstream. They will be used at decoding time. That's why you don't see a change in file size for these.
-sns controls the choice of filtering strength and quantization values within each segments. A segment is just a group of macroblocks in the picture, that are believed to be sharing similar properties regarding complexity and compressibility. A complex photo should likely use the maximum allowed 4 segments (which is the default).
I'm using Computer Vision System Toolbox in Matlab (R2015a, Windows7) to mask frames in the video file and write them into a new video file. By masking, I replace about 80% of the image with 0s and 1s:
videoFileReader = vision.VideoFileReader(fin);
videoFileWriter=vision.VideoFileWriter(fout, ...
'FileFormat', 'MPEG4', 'FrameRate', videoFileReader.info.VideoFrameRate);
frame = step(videoFileReader);
frame_new=mask(frame); %user function
step(videoFileWriter, frame_new);
The size (1080 x 1920 x 3) and the format (single) of the original and modified frames remain the same. Yet the masked videos are much bigger than the original ones, e.g. 1GB original video turns into nearly 4GB after masking. These large new files can not be opened (Windows 7, VLC media). Handbrake does not recognize them as a legit video file either.
When I mask only about 20% of the image, the masked video still come out large (up to 2.5Gb), but I have no problem opening these.
I tried adding 'VideoCompressor', 'MJPEG Compressor', but this gives a warning.
videoFileWriter=vision.VideoFileWriter(fin, 'FileFormat', 'MPEG4', ...
'FrameRate', videoFileReader.info.VideoFrameRate, 'VideoCompressor', 'MJPEG Compressor');
<...>
Warning: The VideoCompressor property is not relevant in this configuration of the System object.
We have TBs of video data to deidentify, so any suggestion would be much appreciated.
Thanks!
Larissa,
The size of the output MPEG-4 file can be controlled by adjusting the Quality parameter of the system object. This is a value from 0-100 which controls the output bitrate. So, higher the quality, larger the file. The default value is 75. The system object uses the Microsft API's to create MPEG-4 files.
Secondly, you need to call release(videoFileWriter) to complete writing the file. I just want to confirm that you are doing it and have just omitted it for the purposes of this code snippet.
The VideoCompressor property is not valid for MPEG-4 file format because the compressor to be used is fixed. You can choose that property only when you write out AVI files. However, you probably will not reach the same level of compression as MPEG-4.
Hope this helps.
Dinesh
Download ffmpeg here:https://git.ffmpeg.org/ffmpeg.git
For windows, open a bash terminal and run:
cat <path to folder with images>/*.png | <path to ffmpeg bin folder>/ffmpeg.exe -f image2pipe -i - output.mkv
For unix, do similar but download the appropriate build of ffmpeg.
I tried on a 7.90GB folder and got a 6.4MB .mkv-file. Works like a charm!
I wanted to import a tif image into Matlab workspace as a variable using File/import data tool. But I got the following error "Warning: The datatype for tag SamplesPerPixel should be TIFF_SHORT instead of TIFF_LONG. This may cause data corruption". The image type is float single, 32 bit. and size is really big (4144,12619,7). Can matlab read and display such an image. What does this error mean? and how can I correct it?
Thank you so much
Read the TIFF specification.
From the warning message it appears there's some problem with the format chosen. When reading a TIFF file, each IFD has many entries, one of them being SamplesPerPixel (see page 24 of the specification). This should be of type SHORT (see page 15 for the list of types and what they are). However, apparently, you have type LONG there. That seems to be causing problem. Either matlab is identifying it incorrectly, or the software you used to save the image is not following the TIFF specification.
The Problem I have is when using ffmpeg to encode a YUV using libx264 I don't get all the frame information in -vstats output. It raises the question of how reliable ffmpeg is, and therefore can any 'codec benchmark' review based on ffmpeg be trusted?
I am analysing codec's to determine how they perform. I am using ffmpeg and its -vstats option to look at an encoded movie frame by frame. the process I use:
RAW YUV -> bar-code each frame with frame number -> Bar-coded YUV
Bar-coded YUV -> encoded (e.g. with libx264) -> MKV -> Decoded to YUV
I can compare the two outputs ('Bar-coded YUV' & 'Decoded to YUV') using the bar-code in each frame. I can then compare, exactly, an original frame with an encoded frame using PSNR etc.
When encoding using libx264 and libdirac, there are some frame information which is missing. Other codecs, such as mpeg2video or even libvpx, don't have this problem.
I have found that libx264 vstats are missing for the first 40 to 50 frames. I have since proved that the missing information is actually the last 40 to 50 frames.
It also looks like ffmpeg calculates average bitrate based on the information in vstats. But as there is missing frames the average bitrate is less than what it should be.
Below are links to the average bitrate error example:
http://dl.dropbox.com/u/6743276/ffmpeg_probs/ffmpeg_av_bitrate_error.png
http://dl.dropbox.com/u/6743276/ffmpeg_probs/ffmpeg_av_bitrate_error.xlsx
Below is a link to the PSNR & f_size graph:
http://dl.dropbox.com/u/6743276/ffmpeg_probs/frame_mismatch.png
Below is a link to the output & command line options:
http://dl.dropbox.com/u/6743276/ffmpeg_probs/stderr.txt
I think this is also a bug, anyone clever enough to work it out might want to follow this tracker:
http://roundup.ffmpeg.org/issue2248
I have just discovered something which makes me very red in the face!! quite annoyed, but never mind :)
A fellow ffmpeg user pointed out that ffprobe should output more frame info, which it did. here is a link to his handy tip:
http://forums.creativecow.net/thread/291/71
Using this I found the following:
Actual average bitrate (ffprobe data): 8355.2776056338
Actual average bitrate (ffmpeg vstats data): 8406.23275471698
Ffmpeg -vstats avg_br: 7816.3
Reproduced above: 7816.32168421053
Ffmpeg standard error output 'bitrate=': 8365.8
Below is a link to my workings out:
http://dl.dropbox.com/u/6743276/ffmpeg_probs/ffprobe_vs_ffmpeg-vstats.xlsx
What I have discovered is I should have been using the average bitrate info from ffmpeg standard error output, it looks like the most reliable!