I am trying to build cnn model (keras) that can classify image based on users emotions. I am having issues with data. I have really small data for training. Will augmenting data help? Does it improve accuracy? In which case one should choose to augment data and should avoid?
Will augmenting data help? Does it improve accuracy?
That's hard to say in advance. But almost certainly, when you already have a model which is better than random. And when you choose the right augmentation method.
See my masters thesis Analysis and Optimization of Convolutional Neural Network Architectures, page 80 for many different augmentation methods.
In which case one should choose to augment data and should avoid?
When you don't have enough data -> augment
Avoid augmentations where you can't tell the emotion after the augmentation. So in case of character recognition, rotation is a bad idea (e.g. due to 6 vs 9 or u vs n or \rightarrow vs \nearrow)
Yes, data augmentation really helps, and sometimes it's really necessary. (But take a look at Martin Thoma's answer, there are more details there and some important "take-cares").
You should use it when:
You have too little data
You notice your model is overfitting too easily (may be a model too powerful too)
Overfitting is something that happens when your model is capable of memorizing the data. Then it gets splendid accuracy for training data, but terrible accuracy for test data.
Increasing the size of training data will make it more difficult for your model to memorize. Small changes here and there will make your model stop paying attention to details that don't mean anything (but are capable of creating distinctions between images) and start paying attention to details that indeed cause the desired effect.
Related
I am a beginner with very less knowledge about CNN & RNN.
For Eg: RNN works better for time series and CNN for spacial features, knowing this it might make easy for me to select between RNN and CNN.
Though, if I am made to make a choice between ResNet, InceptionNet, etc for particular application, How do I get an intution of which would work better?
Please state your particular application if you want your answer in detail.
But I if I want to answer you by considering your general question, I must state that:
- It depends on your dataset (number of items and size of data), feature engineering and type of your features.
- The evaluation measure of your particular application: like as accuracy, precision, recall, RMSE, F-measure and etc.
So, If you want to get intution about your network, it is better to run it on your data, if not, read the paper which has the same dataset as your own, and read the analyze part of paper.
But every neural network acts better in some kind of data. For example it is typical to use LSTM for sequential data.
Experimentation Experimentation Experimentation
Get your hands dirty, you'll automatically get an intuition of which would work better.
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
I've already trained the neural network in Keras for detecting two classes of images (cats and dogs) and got accuracy on test data. Is it enough for the conclusion in the master thesis or should I do other actions for evaluating the quality of network (for instance, cross-validation)?
Not really, I would expect more than just accuracy from my students in any classification setup. Accuracy only evaluates that particular network on that particular test set but you would have to some extent justify the design choices you've made in building that network. Here are some things to consider:
Presumably you have some hyper-parameters you've fixed, you can investigate how these affect your results. How many filters? How many layers? and most importantly why?
An important aspect of object classification is how your model handles noise. Depending on your dataset, one simple way would be to pre-process the test data, blur it, invert colours etc and you'll see that your performance will drop. Why does it do that? How does the confusion matrix look like then?
What is the performance of the network? Is it fast, slow compared to another system, say VGG?
When you evaluate your project in general not just the network, asking why things worked helps a lot, not just why things didn't work.
I'm handling a Deep Learning classification task to distinguish whether an image/video is boring or interesting.
Based on ten-thousand labeled data(1. interesting 2. a little interesting 3. normal 4. boring), I used some pre-trained imagenet model(resnet / inception / VGG etc) to fine-tune my classification task.
My training error is very small, means it has been converged already. But test error is very high, accuracy is only around 35%, very similar with a random result.
I found the difficult parts are:
Same object has different label, for example, a dog on grass, maybe a very cute dog can be labeled as an interesting image. But an ugly dog may be labeled as a boring image.
Factors to define interesting or boring is so many, image quality, image color, the object, the environment... If we just detect good image quality image or we just detect good environment image, it may be possible, but how we can combine all these factors.
Every one's interesting point is different, I may be interested with pets, but some other one may think it is boring, but there are some common sense that everyone think the same. But how can I detect it?
At last, do you think it is a possible problem that can be solved using deep learning? If so, what will you do with this task?
This is a very broad question. I'll try and give some pointers:
"My training error is very small... But test error is very high" means you overfit your training set: your model learns specific training examples instead of learning general "classification rules" applicable to unseen examples.
This usually means you have too many trainable parameters relative to the number of training samples.
Your problem is not exactly a "classification" problem: classifying a "little interesting" image as "boring" is worse than classifying it as "interesting". Your label set has order. Consider using a loss function that takes that into account. Maybe "InfogainLoss" (if you want to maintain discrete labels), or "EuclideanLoss" (if you are willing to accept a continuous score).
If you have enough training examples, I think it is not too much to ask from a deep model to distinguish between an "interesting" dog image and a "boring" one. Even though the semantic difference is not large, there is a difference between the images, and a deep model should be able to capture it.
However, you might want to start your finetuning from a net that is trained for "aesthetic" tasks (e.g., MemNet, flickr style etc.) and not a "semantic" net like VGG/GoogLeNet etc.
How do I approach the problem with a neural network and a intrusion detection system where by lets say we have an attack via FTP.
Lets say some one attempts to continuously try different logins via brute force attack on an ftp account.
How would I set the structure of the NN? What things do I have to consider? How would it recognise "similar approaches in the future"?
Any diagrams and input would be much appreciated.
Your question is extremely general and a good answer is a project in itself. I recommend contracting someone with experience in neural network design to help come up with an appropriate model or even tell you whether your problem is amenable to using a neural network. A few ideas, though:
Inputs need to be quantized, so start by making a list of possible numeric inputs that you could measure.
Outputs also need to be quantized and you probably can't generate a simple "Yes/no" response. Most likely you'll want to generate one or more numbers that represent a rough probability of it being an attack, perhaps broken down by category.
You'll need to accumulate a large set of training data that has been analyzed and quantized into the inputs and outputs you've designed. Figuring out the process of doing this quantization is a huge part of the overall problem.
You'll also need a large set of validation data, which should be quantized in the same way as the training data, but that should not take any part in the training, as otherwise you will simply force a correlation network that may well be completely meaningless.
Once you've completed the above, you can think about how you want to structure your network and the specific algorithms you want to use to train it. There is a wide range of literature on this topic, but, honestly, this is the simpler part of the problem. Representing the problem in a way that can be processed coherently is much more difficult.