Handwritten Character Recognition differentiating between forward slash "/" and 1? - neural-network

Does anyone have any suggestions for helping a neural network differentiate between the forward slash character "/" and the number one? I'm using the MINST dataset and a lot of the 1's are sidewise and look a lot like the forward slash character, making a lot of my forward slashes become categorized as 1's instead of /'s.
Anyone have any idea how to fix this kind of problem?

Add more "/" images and "1" images to your data. You can also add data augmentation like slight rotation.
To get more accuracy in classifying between "/" and "1", you can train a binary CNN classifier as your secondary classifier just to do this job. Call this classifier when your primary model predicts either "/" or "1".
More importantly, if you, as a human, are not able to classify between "/" and "1", then it's better not to expect a CNN to do the classification.

Related

Implementating spell drawing/casting mechanism in Luau (Roblox)

I am coding a spell-casting system where you draw a symbol with your wand (mouse), and it can recognize said symbol.
There are two methods I believe might work; neural networking and an "invisible grid system"
The problem with the neural networking system is that It would be (likely) suboptimal in Roblox Luau, and not be able to match the performance nor speed I wish for. (Although, I may just be lacking in neural networking knowledge. Please let me know whether I should continue to try implementing it this way)
For the invisible grid system, I thought of converting the drawing into 1s and 0s (1 = drawn, 0 = blank), then seeing if it is similar to one of the symbols. I create the symbols by making a dictionary like:
local Symbol = { -- "Answer Key" shape, looks like a tilted square
00100,
01010,
10001,
01010,
00100,
}
The problem is that user error will likely cause it to be inaccurate, like this "spell"'s blue boxes, showing user error/inaccuracy. I'm also sure that if I have multiple Symbols, comparing every value in every symbol will surely not be quick.
Do you know an algorithm that could help me do this? Or just some alternative way of doing this I am missing? Thank you for reading my post.
I'm sorry if the format on this is incorrect, this is my first stack-overflow post. I will gladly delete this post if it doesn't abide to one of the rules. ( Let me know if there are any tags I should add )
One possible approach to solving this problem is to use a template matching algorithm. In this approach, you would create a "template" for each symbol that you want to recognize, which would be a grid of 1s and 0s similar to what you described in your question. Then, when the user draws a symbol, you would convert their drawing into a grid of 1s and 0s in the same way.
Next, you would compare the user's drawing to each of the templates using a similarity metric, such as the sum of absolute differences (SAD) or normalized cross-correlation (NCC). The template with the lowest SAD or highest NCC value would be considered the "best match" for the user's drawing, and therefore the recognized symbol.
There are a few advantages to using this approach:
It is relatively simple to implement, compared to a neural network.
It is fast, since you only need to compare the user's drawing to a small number of templates.
It can tolerate some user error, since the templates can be designed to be tolerant of slight variations in the user's drawing.
There are also some potential disadvantages to consider:
It may not be as accurate as a neural network, especially for complex or highly variable symbols.
The templates must be carefully designed to be representative of the expected variations in the user's drawings, which can be time-consuming.
Overall, whether this approach is suitable for your use case will depend on the specific requirements of your spell-casting system, including the number and complexity of the symbols you want to recognize, the accuracy and speed you need, and the resources (e.g. time, compute power) that are available to you.

What's the point to have a UNK token for out of vocabulary words during decoding?

First of all, I know this question is kind of off-topic, but I have already tried to ask elsewhere but got no response.
Adding a UNK token to the vocabulary is a conventional way to handle oov words in tasks of NLP. It is totally understandable to have it for encoding, but what's the point to have it for decoding? I mean you would never expect your decoder to generate a UNK token during prediction, right?
Depending on how you preprocess your training data, you might need the UNK during training. Even if you use BPE or other subword segmentation, OOV can appear in the training data, usually some weird UTF-8 stuff, fragments of alphabets, you are not interested in at all, etc.
For example, if you take WMT training data for English-German translation, do BPE and take the vocabulary, you vocabulary will contain thousands of Chinese characters that occur exactly once in the training data. Even if you keep them in the vocabulary, the model has no chance to learn anything about them, not even to copy them. It makes sense to represent them as UNKs.
Of course, what you usually do at the inference time is that you prevent the model predict UNK tokens, UNK is always incorrect.
I have used it one time in the following situation:
I had a preprocessed word2vec(glove.6b.50d.txt) and I was outputting an embedded vector, in order to transform it into a word I used cosine similarity based on all vectors in the word2vec if the most similar vector was the I would output it.
Maybe I'm just guessing it here, but what I think might happen under the hoods is that it predicts based on previous words(e.g. it predicts the word that appeared 3 iterations ago) and if that word is the neural net outputs it.

Neural network to figure out the meaning of user input

I'm trying to make a neural network try to figure out the meaning of input(keyboard keys in this case) according to the user.
I have multiple possible output "commands" that the NN can interpret the inputs to mean, and at each state certain outputs can count as beneficial while others are a detriment
When the NN starts up for the first time, no input should have any particular meaning to it but as time goes on I want the NN to be able to figure out what the user most likely meant.
I've tried a Multilayer perceptron NN that has as many input nodes as there are physical inputs and as many output nodes as there are commands and a number of nodes equal to the sum of the other two layers as a single hidden layer, in this case it is then a 5,15,10.
The NN assumes that the user will only make moves that are in the NN's best interest.
So far it seems the NN is just figuring out what is the command it can take that will most likely result in a beneficial move, regardless of the input key rather than trying to figure out what key should result in what move according to the user.
Because of this I'm wondering (most likely wrong) if I should produce a separate NN for each input to try and figure out the current output according to the user.
Is there a different type of NN I should look into that will work better, and is there a recommended configuration for this problem?
I'll be happy with some recommendations of reading material that would help in this particular problem.
I'm at best an amateur in NN and would like to learn a lot more about the whole field, But I'm trying to focus my efforts on this problem for now.
Accordng to me you want the output to be according to behaviour of the player as number of inputs are more than in actual case. So according to me there should be some type of memory for the actions taken by the player in order to find the patterns.This can be done using Long Short term Memory.

Transforming text with neural network

I previously worked with neural networks, but just for fun mainly using them with normalized Categorical(enum),Numeric and Bit(bool) values. I know NNs have trouble understanding characters, but I was wondering if they could understand how to transform text.
So is it possible for example to train NN to do following:
13/20 = 20
aa/bb = bb
20/10 = 10
Or (replaced d with f)
abcde = abcfe
tdfg = tffg
ddhj = ffhj
If yes, how reliably? Or maybe there is something better suited for the job?
The question wasn't related to implementation but since I have just finished a lab on a similar topic, here are some practical tips:
It would be quite possible and easy to use a RNN network to transform the text in that way. Clone this (https://github.com/karpathy/char-rnn) repository and train it with a file placed in data/folder/input.txt with enough size in the same format you want the output:
abcde = abcfe
tdfg = tffg
ddhj = ffhj
use this command to train:
th train.lua -data_dir data/folder
When testing the network it should be able to produce a correct output based on the seed text you provide:
th sample.lua cv/[latest_sample] -primetext "abcd" -length 7
should be able to produce the output:
abcd = abcf
Everything depends on complexity of the transformation. If you are interested in these examples, then, certanly yes, this is doable and reliable. Second example is trivial, you just present NN one character at the time, encode input and output as one-hot vector (one neuron per character) and it will do the job. First example can be solved by converting left and right part to one-hot input vector representing one of all possible combinations of two symbols and having two outputs, asking NN to select if first part or second part should be selected (better ways to encode input exists, especially for long strings). Provided you have enough training examples, everything should work fine.
Generally, days when it was prohibitively difficut for NNs to deal with texts are long gone. Now NNs can be trained to do machine translation (better then any other method) and even, to some extent, trained to predict output of simple computer programs based on program character string (but this is still difficult task for NNs).

What is the relation between OCR and Artificial Neural Network?

I saw different articles speaking about OCR form recognition (data extraction) and they said that they used Neural Network in order to do form recognition, so what's the relation between Artificial Neural network (ANN) and form recognition? If I want to extract fields from a BusinessCard, is it required to use ANN or is it optional? In other words when do I need to use ANN and when I don't?
It's a little different. ANN is just an "expert" in all OCR. But OCR engines contain many experts. When you study ANN you will build a simple OCR engine using just ANN but this does not compare to modern engines that use this in conjunction with tri-grams, morphology, data types ( very important for BCR and Forms ), dictionaries, connected components algorithm, etc. So look at it as just one of the tools in the bag of tricks to extract quality results. A good engine will incorporate ANN and all the others. In BCR there are additional considerations and it should be very heavy on connected components, dictionaries first, then use ANN and pattern matching for the actually recognition.
ANN is one way to perform OCR. There are others. Hence if you want to extract fields from a BusinessCard using ANN is only optional.
Good question. I recently spent some time playing with OCRopus, a Google project that does OCR - you can get it for free and play with it yourself. I'm pretty sure that it has an ANN as one of the modules behind it. However, the whole process of Optical Character Recognition can have many steps (lots of different little modules that each do something and pass the results to the next module).
So, here are some of the things I remember as being done by modules in that project:
There was a module that turned the image into black and white - this makes it easier for later modules to deal with.
Getting rid of speckles / spackles.
Straightening out the lines of text.
Breaking lines of text into individual words (it's been a few weeks, not sure about this one)
Basically, you can do the above using little bits of code that don't involve a neural net. So it's simpler doing it with these little bits of code.
The neural net I think is used just to recognize the individual characters - which character of a group of possible characters is it.
There's a training command in the OCRopus that I had running for over a week on end, and it kept sending line samples to the map, slowly changing the map as it went. I think it was training the ANN part.