Scanning invoices using OCR in swift - swift

I am currently working on scanning invoices with OCR scanning. All invoices use the "OCRB" font, and have the same formatting.
The bottom of a sample invoice looks like this
This is what the user needs to scan.
I have tried many different libraries to detect what I want. But most libraries doesn't give me the correct result. The best result came from Firebase ML Vision text recognition.
But the resulting output I get is this:
I can calculate if the values are correct, except for the amount, presented in the middle. In this case it's presented as "3557 00" but if the user moves the camera a bit further to the right, the result I get is "557 00". Since both MLKit and other libraries cuts around the word, I have no idea if the full sum is presented or not.
If I would get a single space before the word, I could get that there is a full "word", in this case a sum.
Anyone has any ideas of how what library to use to get the best result?

Related

Mozilla Deep Speech SST suddenly can't spell

I am using deep speech for speech to text. Up to 0.8.1, when I ran transcriptions like:
byte_encoding = subprocess.check_output(
"deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio audio/2830-3980-0043.wav", shell=True)
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I would get back results that were pretty good. But since 0.8.2, where the scorer argument was removed, my results are just rife with misspellings that make me think I am now getting a character level model where I used to get a word-level model. The errors are in a direction that looks like the model isn't correctly specified somehow.
Now I when I call:
byte_encoding = subprocess.check_output(
['deepspeech', '--model', 'deepspeech-0.8.2-models.pbmm', '--audio', myfile])
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I now see errors like
endless -> "endules"
service -> "servic"
legacy -> "legaci"
earning -> "erting"
before -> "befir"
I'm not 100% that it is related to removing the scorer from the API, but it is one thing I see changing between releases, and the documentation suggested accuracy improvements in particular.
Short: The scorer matches letter output from the audio to actual words. You shouldn't leave it out.
Long: If you leave out the scorer argument, you won't be able to detect real world sentences as it matches the output from the acoustic model to words and word combinations present in the textual language model that is part of the scorer. And bear in mind that each scorer has specific lm_alpha and lm_beta values that make the search even more accurate.
The 0.8.2 version should be able to take the scorer argument. Otherwise update to 0.9.0, which has it as well. Maybe your environment is changed in a way. I would start in a new dir and venv.
Assuming you are using Python, you could add this to your code:
ds.enableExternalScorer(args.scorer)
ds.setScorerAlphaBeta(args.lm_alpha, args.lm_beta)
And check the example script.

Why FastText test of a model return only 1 exemple when my test file contains 135

I'm trying to test the model (model.bin) i've made with fastText on a test file (test.txt). In this test file, i have 135 labelised data. I'm expecting from fastText to test my model on this number of example, but instead, it only test it over 1 example. Where does come from this problem ?
I've already tried to do such a thing with another model and another testing file and all worked nicely.
this is how I test my model. model_baby.bin is the model, and test.data.txt is my testing file.
./fasttext test model_baby.bin test.data.txt
N 1
P#1 1
R#1 0.0164
Number of examples: 1
And here is an extract from my testing file
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great. __label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat. __label__3.0
As i have more than 1 labelised example in my testing file, I expect the output "Number of examples: " to be at least more than 1 but the actual one is "1"
From the official documentation (https://fasttext.cc/docs/en/supervised-tutorial.html): Each line of the text file contains a list of labels, followed by the corresponding document. All the labels start by the __label__ prefix, which is how fastText recognize what is a label or what is a word.
I don't understand very much your extract. I think it should be like this:
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great.
__label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat.
__label__3.0 ...

ADHoc Information Retrieval

I want to extract the total bill from image receipts. I could extract the entire data present in the image but now I am struck with the problem of extracting only the information that I need.
This is the image that I have.
I am pasting the extracted information from the image
m cm lnnk 3mm: :33; no 1 z m
x Visut all! ms“; (or nulnunn mfn an an: nan.
Sub Iota] 19.56
TOTAL 19.56
VISA 1956
Fun 19.56
D!!! You Know 0
For ureat-tastlru dessens under 200
cahries, try our Triple Berry Frozen
Yogurt Sunda: a dish of Frozen Yogurt.
or a Vanma rozen Vugurt Done.
From this data I just want to extract the total bill. To get this I found out that I could use Ad Hoc Normalization (Adhoc retrieval). Can someone provide any insights on Adhoc retrieval. If there are any other option to extract the data from the image please let me do so. I am using tesseract to extract this information. Sometimes it is no giving the proper output. I could use some help in improvising the output given by the tesseract.
Why do you need ad hoc retrieval in this case? Since you are getting the OCR result from the receipt, you can simply perform a regular text search for the item appearing next to "TOTAL".
There are algorithms for image text search, but this seems like overkill for such a straightforward application unless there is a good reason to do so.

Twitter advanced search by date AND location for research purposes

I'm trying to research a topic and I need to get all tweets within 2013 and 2015 and a specific location for two keywords.
I tried to get the results via Advanced Search but I allways get no results.
I tried:
cannabis near:"España" within:15mi since:2013-10-07 until:2015-01-01
cannabis near:"Spain" within:15mi since:2013-10-07 until:2015-01-01
Basically, I have a database of scraped press articles sorted by date from a bunch of sources, and I want to know how the agenda of this news sources have an impact on the social media conversation.
I could do it over Reddit if it was the case for the US, but there's no Spanish alternative (well, we have Meneame, but the user base is very left-leaning and I think it will be very narrow).
So I wanted to either scrape the search results or get them via API, but It's not working, and AFAIK I can't do anything similar with Facebook.
One way to achieve this is by using Twitter's geocode operator. In the example below i took Madrid as a center and covered a radius of 600km around it like this:
(canabis OR cannabis) geocode:40.4381311,-3.8196196,600km since:2013-10-07 until:2015-12-31
Try it...
The syntax is as follows:
([your_boolean_search_query]) geocode:[latitude],[longitude],[radius]km since:[] until:[]
one easy way to find latitudes and longitudes of locations is to use Google Maps. Simply navigate to a place using the search box and then copy the latitude and longitude element from the URL line in the browser. Here it is for Madrid. The latitude and longitude are right after the # sign, separated by a comma:
https://www.google.com/maps/place/Madrid,+Spain/#40.4381311,-3.8196196,54451m/data=!3m2!1e3!4b1!4m5!3m4!1s0xd422997800a3c81:0xc436dec1618c2269!8m2!3d40.4167754!4d-3.7037902?hl=en
Try it...

Single barcode with Code128B and Code128C with iTextSharp

I wish to generate a barcode mixing code128B and code128C with iTextSharp DLL. Do you know how to do that ? I currently know only with a single codeset.
By example, I wish to generate a barcode with the value 8L1 91450 883421 0550 001065
where "8L1 91450" is in code128B and "883421 0550 001065" is in code128C.
Thanks
Barcode128 will actually automatically switch from B to C if and when it can but it sounds like you don't want this. For the control that you're looking for you'll need to set your barcode's CodeType property to Barcode.CODE128_RAW and manually set the raw values.
There's a couple of posts out there that give the basic idea but unfortunately they tend to assume to much knowledge of iText or too much knowledge of barcodes.
I'm not a barcode expert either but the basic idea is to create a string that starts with Barcode128.START_B, then the first part of your text, then Barcode128.START_C and then the second. When in raw mode, text isn't ASCII, however. You can use this site to get the character codes for various ASCII values. But basically instead of sending the letter L you'd send (char)44.
Hopefully this gets you started at least.