I am going to do a neural network project about handwritten digits recognition but this area is well-studied. I have found some essays online but most of them are before 2012. Could anyone tell me what is the state-of-the-art technology and current issue of this area ?
SO you can refer MNIST DATASET because it is made up of a huge number of corner cases and it contains 28*28 pixel images with 42000 rows and 786 cols so that you can easily get an idea
how much data you want to train and for predictions without needing the train_test_split.
So that you can get a proper idea of the dataset and you can Visualize it.
dataset link:- MNIST DATASET
refer video:- HANDWRITTEN DIGIT RECOGNITION
The state of the art is almost always determined as the result of performance against a particular dataset. For handwritten digits, the MNIST dataset is the one I have seen most commonly referenced (though I'm not an expert in the area). For any dataset you should be able to easily find the state of the art performance against it very near where you download it.
Related
I have a dilemma. If you have only one type of invoice/document, and you have a specific field that you want to process from that invoice and use somewhere else (that filed happens to be a handwritten digit, sometimes written with dashes or slashes), would you use some OCR software or build your own CNN for recognizing the digits? What accuracy would you expect from OCR? Would your CNN be more accurate, as you are just interested in a specific type of digit writing, with specific image dimensions, etc. What would be better in the given situation?
Keep in mind, that you would not use it in any other way, or any other place for handwritten digits recognition, and you already have up to 100k and more documents that are copied to a computer by a human, and you can use it for training and testing.
Thank you.
I would definitely go for a CNN based solution. Since the structure of your document is consistent:
Extract the desired portion of the document with a standard computer vision approach
Train a CNN on an annotated set of a few thousand documents. You should even be able to finetune an existing CNN trained on MNIST and this would require less training images.
This approach should give you >99% accuracy without much effort. The accuracy of the OCR solution really depends on which library you use and the preprocessing you implement.
I'm looking for handwritten mathematical operators Database like MNIST (the handwritten database for digits) to develop a new application using neural network.
For the moment, i found only one satisfactory source on Kaggle with 45x45 jpep images extracted from CROHME.
Are there any other (better) sources known of mathematical symbols ?
You can take a look at this dataset HAMEX, though it was also one of the datasets partially merged to form the dataset for CROHME.
You can find more information and other datasets here.
Background
I've been studying Neural Networks, specifically the implmentation provided by this incredible online book. In the example network provided, we're shown how to create a neural network that classifies the MNIST training data to perform Optical Character Recognition (OCR).
The network is configured so that the input stimuli represents a discrete range of thresholded pixel data from a 24x24 image; at the output, we have ten signal paths which represent each of the different solutions for the input images; these are used classify a handwritten digit from zero to nine. In this implementation, a handwritten '3' would drive a strong signal down the third output path.
Now, I've seen that Neural Networks can be applied to far more 'unpredictable' output solutions; for example, take the team who taught a network to recognize the hair on a human:
Question
Surely in the application above, we couldn't use a fixed output array length because the number of points that would qualify within an image would vary just so wildly between different samples. Can anyone recommend what kind of pattern would have been used to accomplish this?
Assumption
In the interest of completeness, I'm going to propose that the team could have employed a kind of 'line following robot' for the classification task. So for an input image, a network could be trained by using a small range of discrete commands (LEFT, RIGHT, UP, DOWN) for a fixed period t and train the network to control the robot like an Etch-a-Sketch.
Alternatively, we could implement a network which would map pixels one-to-one, and define whether individual pixels contributed to hair; but this wouldn't be compatible with different image resolutions.
So, do either of these solutions sound plausable? If so, are these basic implementations of a known generic solution for this kind of problem? What approach would you use?
I have a project for face recognition of five people that I want my CNN to detect, and I was wondering if people could have a look at my model to see if this is a step in the right direction
def model():
model= Sequential()
# sort out the input layer later
model.add(convolutional.Convolution2D(64,3,3, activation='relu'), input_shape=(3,800,800))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64,3,3, activation='relu'))
model.add(convolutional.MaxPooling2D((2,2), strides=(2,2)))
flatten()
model.add(Dense(128, activation='relu'))
model.add(Dropout(p=0.2))
model.add(Dense(number_of_faces, activation='softmax'))
so the model will be taking in pictures (headshots found on google of 5 people) in 3 channels of size 800 by 800 with 64 feature maps, pooled and then another set of feature maps
and then connected to a mlp for classification into a binary vector for 5 output neurons. My question is, is this a decent approach to try and classify headshots of certain people?
for example if I were to download one hundred pictures of a certain person and put them through this model, would the feature space created in the convolution be big enough to capture
the features of that face and four others?
thanks for the help guys
Well, it is not an engineering issue but a scientific one. It is hard to judge whether 100 picture is enough for your purpose without showing current progress (like, what is the accuracy now? Are your facing overfitting or underfitting.
But, YES, extra data of faces can help with your model, especially when those faces are of same context (background, light, angle, skin color, etc.) with your eventual testing data.
If you are interesting in face recognition, you can start with Deep Learning Face Representation from Predicting 10,000 Classes (unofficial code here), they use 10 thousand faces as extra dataset to train. You can search "DeepID" for more information.
If you are an engineering guy, you can check Facial Expression Recognition with Convolutional Neural Networks, this report focus more on implementation, which is also implemented by Keras.
By then way, 800*800 is extra large in face recognition community. You might like to resize them to a smaller size. Otherwise your program might be too gargantuan to train and consumes butch of memory.
Face recognition is not a regular classification study. If you train your model for 5 people, even if it would be a successful model, you need to re-train it if a new person join to the team. It means that your new model might not be successful anymore.
We firstly train a regular classification model but then drop its final softmax layer and use its early layer to represent images. Representations are multi-dimensional vector. Herein, we expect that image pair of same person should have high similarity whereas image pair of different persons should have low similarity. We can find the vector similarities with cosine similarity or euclidean distance methods.
To sum up, you should not train a model anymore for face recognition application. You just need to use a neural networks to predict. Predictions will be representations.
I recommend you to use deepface. It wraps state-of-the-art face recognition models such as VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID and Dlib. It also handles face detection and alignment in the background. You just need to call a line of code to apply face recognition.
#!pip install deepface
from deepface import DeepFace
models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']
obj = DeepFace.verify("img1.jpg", "img2.jpg", model_name = models[0])
print(obj["verified"], ", ", obj["distance"])
Returned object stores max threshold value and found distance. In this way, it returns True in verified param if the image pair is same person, returns False if the image pair is different persons.
I need to develop an optical character recognition program in Matlab (or any other language that can do this) to be able to extract the reading on this photograph.
The program must be able to upload as many picture files as possible since I have around 40000 pictures that I need to work through.
The general aim of this task is to record intraday gas readings from the specific gas meter shown in the photograph. The is a webcam currently setup that is programmed to photgraph the readings every minute and so the OCR program would help in then having historic intraday gas reading data.
Which is the best software to do this in and are there any online sources that are available for this??
I'd break down the basic recognition steps as follows:
Locate meter display within the image
Isolate and clean up the digits
Calculate features
Classify each digit using a model you've trained using historic examples
Assuming that the camera for a particular location does not move, step 1 will only need to be performed once. Step 2 will include things like enhancing contrast and filtering noise. Step 3 can include any useful calculations you can think of, such as mean and skew of "ink" (white) pixels. Step 4 would utilize a model you build to classify a single digit as '0', '1', ... '9', and could be accomplished using k-nearest neighbors, logistic regression, SVM, neural network, etc.
A couple of things would make 1 in Predictor's answere easy: Placing the cam directly above the meter, adding sufficient light, maybe placing bright pink strips around the meter to help segment out the display :).
Once you do this, and the cam remains fixed, you can use a manual process once and then have it applied to all subsequent images to segment out the digits. If the lighting is good and consistent, you might just be able to use simple template matching to identify each of the segmented digits.
Actually, once you get a sample of all the digits, you might even be able to classify them on something simpler (like sum of thresholded pictures).
In recently, there is many object detect method can be used to deal with this problem.