I'm new to deep learning and I'm working on a project that involves working on cartoon images and recognizing the emotions of the cartoon characters, I tried the approach of transfer learning but on doing some research I realised that the ImageNet and InceptionV3 only work for human faces. What approach should I follow? The training set is limited of about 300 images and the test set has around 180 images. I'm still a beginner in this field and I thought this would be a good project to start with. Any suggestions/guidance will be much appreciated. Thank you .
If your data is very low you can use data augmentation.Take a look:
https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9
and also:
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
If data augmentation did not help you, you must try another
algorithms.neural nets needs lots of data.If your Data is less your
network will overfitt.
you can use data augmentation. for data augmentation, you could use the imgaug package. here is the documentation of Imgaug package
Related
I need to find the location of an image that the user provides within an image that I provide.
It is safe to assume at the time of the analysis that the user provided image is certain to be contained within the image to be compared with.
I’ve looked through and even have some experience with Core ML and Vision image classification however I am struggling to convince myself that it is the correct way to approach this problem. I feel like the way “feature values” is handled in Vision it is almost the reverse of what I’m looking for.
My question: Is there a feature of Core ML or Vision that tackles this particular problem head on?
Other information that may be needed;
It is not safe to assume that images provided are pixel to pixel perfect due to possible resolution differences.
They may also be provided in any shape although possible to crop to a standardised shape before analysis.
Rotation will also need to be accounted for.
There would not be cases where the image is in the image twice.
Take a look at some of the feature detection and matching algorithms.
For example, you could use SIFT (scale-invariant feature transform algorithm) with RANSAC (Random sample consensus algorithm) to do exactly what you described.
If you are using OpenCV there are plenty of such algorithms which you can easily use. (FAST, Shi-Tomasi, etc.)
I think you need something like this expale in OpenCV
Is there a simple explanation for dummies like me? I know that there's a source code of Leela, I've heard that it uses neural networks with MCTS (plus UCT), but there are lot of hard things remaining. Do I need to train Leela myself by running it? Or do I need to download something from Internet? (so-called trained data?) If so, do I need constantly update this data? Does it play stronger with every game?
Thank you much for advance.
What would be best way to detect difference between 2 images, one image is taken at beginning of process and the other at the end, goal is to detect if there is any difference in the images.
Based on my research neural networks seems good for this type of problem, but I don't have experience using them, I am not sure should this problem be treated as classification or anomaly detection? Also if you have any useful literature/GitHub projects/papers to share I would be thankful.
I have a few general questions regarding using pre-trained image classification models in mobile.
How big is a typical pre-trained model?
If it is too big for mobile, what is the best strategy from there?
I checked out the documentation of DeepLearning for Java, anywhere to download pre-trained model?
Thanks in advance.
It's really task dependent. I mean..you can't just say given an unknown problem what is an arbitrary size of neural net.
In general, if you're doing vision expect to be hundreds of megs, but a lot of it comes down tot he activation sizes. I would advise just doing some benchmarking overall. You can't really just handwave that.
A lot of the pretrained models are for computer vision only. They are based on keras. You shouldn't "download" them yourself. No framework works like that.
We have a managed module for that in the model zoo you should use instead.
Hi I have been searching though research papers on what features would be good for me to use in my handwritten OCR classifying neural network. I am a beginner so I have been just taking the image of the handwritten character, made a bounding box around it, and then resize it into a 15x20 binary image. So this means i have an input layer of 300 features. From the papers i have found on google (most of which are quite old) the methods really vary. My accuracy is not bad with just a binary grid of the image, but I was wondering if anyone had other features I could use to boost my accuracy. Or even just pointing me in the right direction. I would really appreciate it!
Thanks,
Zach
I haven't read any actual papers on this topic, but my advice would be to get creative. Use anything you could think of that might help the classifier identify numbers.
My first thought would be to try and identify "lines" in the image, maybe via a modified "sliding window" algorithm (sliding/rotating line?), or to try and identify a "line of best fit" to the image (to help the classifier respond to changes in italicism or writing style). Really though, if you're using a neural network, it should be picking up on these sorts of things without your manual help (that's the whole point of them!)
I would focus first on the structure and topology of your net to try and improve performance, and worry about additional features only if you cannot get satisfactory performance some other way. Also you could try improving the features you already have, make sure the character is centered in the image, maybe try an algorithm to skew italicised characters to make them vertical?
In my experience these sorts of things don't often help, but you could get lucky and run into one that improves your net :)