Is barcode Code39 scanning reliable? - encoding

Within my iOS App I am using a 3rd party scanning library for scanning Code39 barcodes. This software sometimes gets the scan wrong (e.g. a value of "13415566" comes back as "U *"). Sometimes the same barcode works ok, then scan the same code again and it is wrong.
The 3rd party software vendor reports that Code39 isn't a 'reliable' format, and that 'it has no error protection and it is often possible to get false reads'.
This seems ridiculous to me. The codes in question have no check-digit, but even so, surely this is simply a bug in the scanning software? Is Code39 known for this sort of thing? How can it possibly be an adopted format if it 'gets it wrong' sometimes!
Thanks.

There should be no major issues with Code 39 readability. When used in applications where reliable scanning is important Code 39 is normally deployed with protection against misreads in the form a modulo 43 check digit that the scanner is configured to verify prior to passing the code on to the system. Any half-decent barcode generator or barcode reader will support Code 39 check digits.
As I haven't seen the print quality of the Code 39 barcodes that your are scanning it is impossible to be certain however I would certainly suggest that you are using reader software that has very poor quality Code 39 scanning.
Your barcode library is probably confused for the following reason, but it is impossible to be sure without extensive debugging of the device...
Below I have aligned two Code 39 images that were created using the online barcode generator based on Barcode Writer in Pure PostScript. On top is a horizontally flipped image containing "U" and beneath is an image containing "13415566".
Reading the top image from right-to-left you can see that there is a degree of similarity with some portion of the bottom image.
A scanner might be forgiven for the misread of this unprotected Code 39 except that it has following against it:
It should be expecting a quite zone (whitespace) before the leading start bars sequence.
It should be expecting a quite zone after the trailing stop bars sequence.
The bar pattern for "U" is not entirely correct.
The assumed stop bar sequence is not entirely correct.

Many barcode scanners read black and white sections on a single line. They have no clue as to whether the line is horizontal, vertical, or diagonal, and have no inherent means of knowing if the line "enters" a barcode on one side and leaves at the other, or if it enters via the top, crosses the barcode diagonally, and exits via the bottom.
Some barcode formats like Interleaved 2 of 5 start and end with patterns which can commonly occur within a barcode [I2of5 starts with BwBw and ends with BBwB], and it is possible for a partial scan which slips off the top or bottom to be misread as though it were a valid scan of a shorter code. Some other barcode formats start and end with patterns that are chosen so that there is no way a partial scan can read as valid data. Code 39 is somewhere between.
Every valid code 39 barcode starts with BwBBwBBwwBw and ends with wBwBBwBBwwB. It is possible to have the sequence wBwBB to appear at the end of one character and BBwwBw to appear at the start of the next, with a single "w" between them. If two such pairs of characters appear within a barcode, a limited variety of characters appear between them, and the scan exits the first pair at just the right place and likewise exits the second pair at just the right place, it is possible that the scanner would see a legitimately-formed barcode whose content bore no apparent resemblance to the original. Someone who deliberately chose barcode data that met the necessary criteria and tried to scan it at an angle to generate a false read would have little trouble getting false reads from many scanners, but both the data and scanning angle would have to be "just right" in order to cause problems.
If one is concerned about the possibility of such misreads, it is possible to print barcodes in such a way as to guarantee that a scan which leaves the code will not be seen as valid. A simple way of doing this is to print black above and below the barcode, so that any scan which enters and/or exits via the top or bottom will perceive the code as starting and/or ending with an exceptionally thick black bar. In many places where one sees "stacked" barcodes, they will be separated by a pattern of dots which do not hold information, but are instead designed to ensure that a scan which crosses from one row to another cannot be perceived as valid.

Related

Assistance with my class assignment

Can anyone help me with this problem?
Problem Statement
As outlined in the background story, you are in the middle of a shark-infested
ocean. The sandbank along which you may travel is very narrow (only wide
enough to walk in a straight line with a little room to make turns) and is
bordered by electric wire along the edges i.e. if you touch off the wire, you will be electrocuted and fall into the ocean – a state from which there is no return.
The years in prison have taken their toll both physically and mentally. Due to
adverse sleeping conditions, sensory deprivation and regular beatings, your
senses are poor and movements now restricted to taking 1 exact same step
at a time in a forwards direction (i.e. the direction in which you are facing).
Also, you are no longer able to turn left and turning to the right can only be
done 90 degrees at a time. 2.) the only data you are able to store in your
memory is limited to a single integer value (it may be possible to scrounge up
some space to store a boolean value, though you're unlikely to need this).
Luckily, you can still do the basic arithmetic operations of addition and
subtraction. To aid your decision-making and control of repetitive actions you
know about an IF statement and WHILE loop. You also recognise a true or
false response to a question and can test the values of integers using any of
the operators <, >, ≤ and ≥. Of course, you also know about the interger
values 0, 1, 2 etc.. Unfortunately, logical operations are beyond your current
processing power. Any further assumptions should be confirmed by the tutor.
In a rare display of compassion HAL has permitted that you be allowed ask 3
questions, to which you will receive a true or false response: “in front of
gate?”, “in front of wire?” and “in front of sand?”. You may also ask the
complement of these questions i.e. “not in front of gate?”, “not in front of
wire?” and “not in front of sand?”. There is no restriction on how many times
you may ask these questions and these states cannot exist at the same time.
Basically I need a short algorithm to get from gate 1 to gate 2. I've been at it for hours and can't seem to do it and it's due for today. Please please help thank you
While (not in front of gate)
int right_count = 0
While (not in front of sand)
Turn right
right_count++
If right_count == 2
Turn right
Step forward
Step forward // party time
Edit: I'd be interested in seeing some of your attempts

ways to hide secret in png image (steganography)

I'm trying to find a secret message, a string, in a 256x256 png image. It's supposed to have "used an old school trick to hide the data", and apparently that method is mentioned in the steganography Wikipedia article.
I tried what appeared to me as most oldschool an straightforward first: LSB steganography. But no luck. I know the first and last characters of the string ("F" and "}"), and I thought they may have mixed the common lsb method up a bit, so I inspected the very first pixels and the very last pixels of the picture myself. However, no apparent combination (like only red values of each pixel) would allow for the correct character. Hence I'm pretty positive it's not using lsb.
In a second, rather desperate try I saw that Wikipedia talks about stripping the most significant six bits, leaving only the least significant two, and then normalizing the picture. I wrote a little script to do this, but no luck here either.
I also looked at the metadata with identify -verbose image.png. Nothing. The file ends as it should after the IEND chunk, so nothing hidden beyond that either.
I'm running out of ideas, so here my question:
Any hints what might classify as old school trick, that I haven't already tried? I'm sure I missed something obvious. This exercise came with a few others, and they all looked harder at first glance than they really were.
Thanks a lot. :)
It turned out that there was a chunk in the middle of the picture with a long text, which contained the wanted string, hidden in the least bits of the blue values only, in least bit first order. Somehow I missed that combination in my preliminary tests. So there you go. :)
To anybody having a similar problem: I find it's best to write a script to test all more commonsense variations (like only single colors, vertical, least-bit or greatest-bit first, etc.) in one large run. It's too easy to miss a simple one otherwise and get hopelessly stuck in crazy complicated theories.

Which 2d barcode has the highest data capacity/density

;)
if you wanted encode 2mb of data onto a 2d-bar code, which 2-bar code would be good to starting point or recommend.
There are lots and different types of 2dbar codes out today,Aztec 2-d barcodes,maxicodes,Pdf417,Microsoft HCCB,vericodes....etc...lots.... all unique in their own way.
i guess in a nutshell my questions is.... which barcode would make a good start off point to encode 2mb of data??
i tried reading through the Qr code international standard turns out even # version 40L the most amount of data you could encode is on to a Qr code is
1) numeric data: 7 089 characters
2) alphanumeric data: 4 296 characters
3) 8-bit byte data: 2 953 characters
4) Kanji data: 1 817 characters
which are all a far cry from the 17million bits thats is 2mb
my goal was to create something like
http://realestatemobilemarketingsolutions.com/wp-content/uploads/2012/07/real-estate-mobile-marketing.png
After you scan the barcode you can view photos of the house/property on your phone, you dont have to walk-in or wait for an open home,20 photos # 100kb each is about 2mb
Even if you could create a single 2D barcode which will encode the whole thing, the user won't be able to scan the whole thing in one go. No one has a cellphone imager which will support that kind of resolution. Your best bet is to do a QR-code with a URL in it.
Things like DataMatrix and QR-codes are extensible. You have a limit to how much data can be encoded into one block, but you CAN create a code which has multiple blocks. Indeed, if you look at this page, you'll see a discussion of using pages full of 2D barcodes as a form of data backup. They were able to fit up to 1/2 MByte of raw data into a single page. That's at 600 dpi, which will require a scanner (not a smartphone) to decode.
From what I've been reading, DataMatrix tends to have less overhead and, therefore, will stuff more (payload) data into a square inch for a given DPI. You would need a mobile app capable of shooting multiple images (tiles) of a very large image and either:
compositing the individual images into one large one for decoding OR
decoding each of the smaller blocks and reconstructing the original data from the pieces
I know of no app which will do that.
I've pondered providing bulk data via 2D barcodes. I was pondering publishing a mobile app in a magazine and providing a way for people to "download" the app from the magazine, without needing to provide a website / FTP site where they could download it. I'd first need to provide an app which could decode such a monster. Then, the end user would have to be patient enough to scan the whole thing. Good luck with that.
I MIGHT be able to provide a large 2D barcode containing a .torrent file and then using existing BitTorrent apps to download the resulting app; I have a .torrent for a recent Linux Live-DVD where the .torrent is < 32 KB.
A chunk of data (an app or images) in the MB or larger range ... really not feasible through this channel. The megabytes of data you're wanting to provide ... again ... really not feasible through this channel.
Voiceye Code is the highest density 3d code I have been able to find. Works well too, but code making software is price prohibitive to screw around with. 500.00 (ish)
How about using some variant of DataGlyphs, which has a lot in common with steganography? In other words, you use a greyscale image to also store your data...
I have developed a reader for JAB codes that can read whole audio file from a codebar. JAB codes are very high capacity due to polychrome nature.
More on this here

Near Duplicate Detection in Data Streams

I am currently working on a streaming API that generates a lot of textual content. As expected, the API gives out a lot of duplicates and we also have a business requirement to filter near duplicate data.
I did a bit of research on duplicate detection in data streams and read about Stable Bloom Filters. Stable bloom filters are data structures for duplicate detection in data streams with an upper bound on the false positive rate.
But, I want to identify near duplicates and I also looked at Hashing Algorithms like LSH and MinHash that are used in Nearest Neighbour problems and Near Duplicate Detection.
I am kind of stuck and looking for pointers as to how to proceed and papers/implementations that I could look at?
First, normalize the text to all lowercase (or uppercase) characters, replace all non-letters with a white space, compress all multiple white spaces to one, remove leading and trailing white space; for speed I would perform all these operations in one pass of the text. Next take the MD5 hash (or something faster) of the resulting string. Do a database lookup of the MD5 hash (as two 64 bit integers) in a table, if it exists, it is an exact duplicate, if not, add it to the table and proceed to the next step. You will want to age off old hashes based either on time or memory usage.
To find near duplicates the normalized string needs to be converted into potential signatures (hashes of substrings), see the SpotSigs paper and blog post by Greg Linden. Suppose the routine Sigs() does that for a given string, that is, given the normalized string x, Sigs(x) returns a small (1-5) set of 64 bit integers. You could use something like the SpotSigs algorithm to select the substrings in the text for the signatures, but making your own selection method could perform better if you know something about your data. You may also want to look at the simhash algorithm (the code is here).
Given the Sigs() the problem of efficiently finding the near duplicates is commonly called the set similarity joins problem. The SpotSigs paper outlines some heuristics to trim the number of sets a new set needs to be compared to as does the simhash method.
http://micvog.com/2013/09/08/storm-first-story-detection/ has some nice implementation notes

Training tesseract to use with iPhone

I am trying to use tesseract-2.04 in my iPhone application and just want to detect the numbers. What I am doing here is first I am cross compiling tesseract to generate lib file using this post http://robertcarlsen.net/2009/07/15/cross-compiling-for-iphone-dev-884 and then using the the demo application at http://robertcarlsen.net/2010/01/12/ocr-for-iphone-source-1080 , but the results far away than realistic.
I am not able to resolve the issue or how to train tesseract so that it comes closure for practical usage.
Please help.
Thanks,
Madhup
I get quite good results setting
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
while gently urging the user to let the numbers fit in a certain box. This makes locating the numbers easier for me, and ensures the user keeps the image steady and at a reasonable distance leading to a sharper image.
I have thought about altering valid_word() in tesseract-2.04/dict/permute.cpp, but there seems to be no need for that.
The next step will be to hardcode a minimum/maximum char size so recognition time can become way less than the 500 ms it is now. Then the next step will be to add some code that keeps track of results in time, so that reading 5 90% of the time and 8 only 10% will lead the code to remember the 5.
It all depends on the use case you have. I'm lucky in the sense that I'm allowed to just show a 200x50 box which will contain the number.