I'm using Tesseract on some images (excellent quality computer-generated 300 dpi) and generally it is working ok. But there's a problem with the numbers in that 1123.45 is being returned as 1 123.45. It seems it's being a little aggressive on the numeric kerning and breaking things where it shouldn't. I've dug through the big list of parameters and tried tweaking a few but not had success. Any recommendations?
Thanks
Terry
Related
I did a very simple text, the footer contact form on the left of the website or the right of the website. The results showed "no clear winner". But the below data shows that one has 5 conversions vs 1, which I consider to be significant (albeit low numbers). It also says there is a 95% probability that this one will be better.
What am I not understanding about this data? Is it that the numbers are too low in volume to give a reading or is it a bug or is there something I've missed?
Its probably because your AB Test did not have a lot of traffic, in each variant. So 5 conversions vs 1, is not really a big difference between the two.
I'm currently experimenting with webp encoder (no wic) on windows 64 environment. My samples are 10 jpg stock photos depicting landscapes and houses, and the photos already optimized in jpegtran. I do this because my goal is to optimize the images of a whole website where the images have already been compressed with photoshop using the save for web command with various values on quality and then optimized with jpegtran.
I found out that using values smaller than -q 85 have a visual impact on the quality of the webp images. So I'm playing with values above 90 where the difference is smaller. I also concluded that I have to use -jpeg_like because without it the output is sometimes bigger in size than the original, which is not acceptable. I also use -m 6 -f 100 -strong because I really don't mind about the time the encoder needs to produce the output and trying to achieve the smoother results. I tried several values for these and concluded that -m 6 -f 100 -strong have the best output regarding quality and size.
I also tried the -preset photo avoiding any other parameter except -q but the size of the output gets bigger.
What I don't understand from https://developers.google.com/speed/webp/docs/cwebp#options are the options -sns , -segments which seem to have a great impact on the output size. Sometimes the output is bigger and sometimes smaller in size for the same options but I haven't concluded yet what is the reason for that and how to properly use them.
I also don't understand the -sharpness option which doesn't have an impact at the output size at least for me.
My approach is far less than a scientific approach and more like a trial and error method and If anybody knows how to use those options for the specific input and explain them for optimum results I would appreciate such a feedback.
-strong and -sharpness only change the strength of the filtering in the header of the compressed bitstream. They will be used at decoding time. That's why you don't see a change in file size for these.
-sns controls the choice of filtering strength and quantization values within each segments. A segment is just a group of macroblocks in the picture, that are believed to be sharing similar properties regarding complexity and compressibility. A complex photo should likely use the maximum allowed 4 segments (which is the default).
I'm trying to find a secret message, a string, in a 256x256 png image. It's supposed to have "used an old school trick to hide the data", and apparently that method is mentioned in the steganography Wikipedia article.
I tried what appeared to me as most oldschool an straightforward first: LSB steganography. But no luck. I know the first and last characters of the string ("F" and "}"), and I thought they may have mixed the common lsb method up a bit, so I inspected the very first pixels and the very last pixels of the picture myself. However, no apparent combination (like only red values of each pixel) would allow for the correct character. Hence I'm pretty positive it's not using lsb.
In a second, rather desperate try I saw that Wikipedia talks about stripping the most significant six bits, leaving only the least significant two, and then normalizing the picture. I wrote a little script to do this, but no luck here either.
I also looked at the metadata with identify -verbose image.png. Nothing. The file ends as it should after the IEND chunk, so nothing hidden beyond that either.
I'm running out of ideas, so here my question:
Any hints what might classify as old school trick, that I haven't already tried? I'm sure I missed something obvious. This exercise came with a few others, and they all looked harder at first glance than they really were.
Thanks a lot. :)
It turned out that there was a chunk in the middle of the picture with a long text, which contained the wanted string, hidden in the least bits of the blue values only, in least bit first order. Somehow I missed that combination in my preliminary tests. So there you go. :)
To anybody having a similar problem: I find it's best to write a script to test all more commonsense variations (like only single colors, vertical, least-bit or greatest-bit first, etc.) in one large run. It's too easy to miss a simple one otherwise and get hopelessly stuck in crazy complicated theories.
I am trying to reproduce the experiments on the ai-junkie website http://www.ai-junkie.com/ann/som/som1.html to cluster/group different colors together using Self Organizing maps(SOM) on a larger color dataset. I use about 400 images of differing solid colors and since they are solid colors, the color values in any color space(for example, RGB) would be same for all the points in a particular image. Hence the features I use before clustering using SOM are just the 3 dimensional color value for each image.
When I perform SOM, source code of which is obtained from http://knnl.sourceforge.net/ with 40 rows , 40 columns and 20 iterations(epoch=20), the result of clustering makes no sense to me. I looks like follows:
I feel like this is just random clustering(if I can call it that) and even a k-means algorithm would give better results. Any thoughts on what could have possibly gone wrong?
20 iterations is not enough for SOM algorithm. Try rows*columns*500. It's default value for the learning algorithm. On simple datasets like yours you can reduce this number but 20 is too small number. And be patien it's gonna take a while :)
It looks wrong, as you say it looks just like a random clustering.
A variety of things could have gone wrong. A few that come to mind: the number of iterations is not sufficient, the neighborhood function is not adequate, the implementation of the library you're using has some bug.
You can download the example posted on ai-junkie.com directly:
ai-junkie.com SOM Demo
Not sure what the SourceForge library is. Or are you asking for help debugging it?
i have made a similar SOM with AForge, you can have the source if still need. i tried with a 4*4 and a 16*16 SOM, and i just needed a few iterations (<100) to adap. Sure, it also depends on the learning factor.
I am trying to use tesseract-2.04 in my iPhone application and just want to detect the numbers. What I am doing here is first I am cross compiling tesseract to generate lib file using this post http://robertcarlsen.net/2009/07/15/cross-compiling-for-iphone-dev-884 and then using the the demo application at http://robertcarlsen.net/2010/01/12/ocr-for-iphone-source-1080 , but the results far away than realistic.
I am not able to resolve the issue or how to train tesseract so that it comes closure for practical usage.
Please help.
Thanks,
Madhup
I get quite good results setting
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
while gently urging the user to let the numbers fit in a certain box. This makes locating the numbers easier for me, and ensures the user keeps the image steady and at a reasonable distance leading to a sharper image.
I have thought about altering valid_word() in tesseract-2.04/dict/permute.cpp, but there seems to be no need for that.
The next step will be to hardcode a minimum/maximum char size so recognition time can become way less than the 500 ms it is now. Then the next step will be to add some code that keeps track of results in time, so that reading 5 90% of the time and 8 only 10% will lead the code to remember the 5.
It all depends on the use case you have. I'm lucky in the sense that I'm allowed to just show a 200x50 box which will contain the number.