I have a variable A which is 581*581 single and when I use the command imshow(mat2gray(A)) I see a biological cell that I am supposed to. Let's say I am conducting research where I want to test out how the JPEG algorithm (or some other, for the matter of compression) performs on this data. How do I test this out? When I read research papers, they are able to test various compression algorithms out there on their data. I want to be able to know the achieved compression ratio, time complexity, and other details for that matter to compare the performances of different compression algorithms.
Related
What I'm trying to do is to detect in a small set of audio samples if any are generated by the same instrument. If so, those are considered duplicates and filtered out.
Listen to this file of ten concatenated samples. You can hear that the first five are all generated by the same instrument (an electric piano) so four of them are to be deemed duplicates.
What algorithm or method can I use to solve this problem? Note that I don't need full-fledged instrument detection as I'm only interested in whether the instrument is or isn't the same. Note also that I don't mean literally "the same instrument" but rather "the same acoustic flavor just different pitches."
Task Formulation
What you need is a Similarity Metric (a type of Distance Metric), that predicts two samples of the same instrument / instrument type as very similar (low score) and two samples of different instruments as quite different (high score). And that this holds regardless of which note is being played. So it should be sensitive to timbre, and not sensitive to musical content.
Learning setup
The task can be referred to as Similarity Learning. A popular and effective approach for neural networks is Triplet Loss. Here is a blog-post introducing the concept in the context of image similarity. It has been applied successfully to audio before.
Model architecture
The primary model architecture I would consider would be a Convolutional Neural Network on log-mel spectrograms. Try first to use a generic model like OpenL3 as a feature extractor. It produces a 1024 dimensional output called an Audio Embedding, which you can do a triplet loss model on top of.
Datasets
The key to success for your application will to have a suitable dataset. You might be able to utilize the Nsynth dataset. Maybe training on that alone can give OK performance. Or you may be able to use it as a training set, and then fine-tune on your own training set.
You will at a minimum need to create a validation/test set for your own audio clips, in order to evaluate performance of the model. Minimum some 10-100 labeled examples of each instrument type of interest.
Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data).
Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabeled data? Is it something like SOM or Boltzmann machine?
Has anybody heard about this? If yes, can you provide some links to sources or papers. I am curious. Greatly appreciate!
There are lots of ways to deep-learn from unlabeled data. Layerwise pre-training was developed back in the 2000s by Geoff Hinton's group, though that's generally fallen out of favor.
More modern unsupervised deep learning methods include Auto-Encoders, Variational Auto-Encoders, and Generative Adversarial Networks. I won't dive into the details of all of them, but the simplest of these, auto-encoders, work by compressing an unlabeled input into a low dimensional real-valued representation, and using this compressed representation to reconstruct the original input. Intuitively, a compressed code that can effectively be used to recreate an input is likely to capture some useful features of said input. See here for an illustration and more detailed description. There are also plenty of examples implemented in your deep learning library of choice.
I guess in some sense any of the listed methods could be used as pre-training, e.g for preparing a network for a discriminative task like classification, though I'm not aware of that being a particularly common practice. Initialization methods, activation functions, and other optimization tricks are generally advanced enough to do well without more complicated initialization procedures.
I have read a lot about image encoding techniques, e.g. Bag of Visual Words, VLAD or Fisher Vectors.
However, I have a very basic question: we know that we can perform descriptor matching (brute force or by exploiting ANN techniques). My question is: why don't we just use them?
From my knowledge, Bag of Visual Words are made of hundreds of thousands of dimensions per image to have accurate representation. If we consider an image with 1 thousand SIFT descriptors (which is already a considerable number), we have 128 thousands floating numbers, which is usually less than the number of dimensions of BoVW, so it's not for a memory reason (at least if we are not considering large scale problems, then VLAD/FV codes are preferred).
Then why do we use such encoding techniques? Is it for performance reasons?
I had a hard time understanding your question.
Concerning descriptor matching, brute force, ANN matching techniques are used in retrieval systems. Recent matching techniques include KDtree, Hashing, etc.
BoVW is a traditional representation scheme. At one time BOVW combined with Inverted index was the state-of-the-art in information retrieval systems. But the dimension (memory usage per image) of BOVW representation (upto millions) limits the actual number of images that can be indexed in practice.
FV and VLAD are both compact visual representations with high discriminative ability, something which BoVW lacked. VLAD is known to be extremely compact (32Kb per image), very discriminative and efficient in retrieval and classification tasks.
So yes, such encoding techniques are used for performance reasons.
You may check this paper for deeper understanding: Aggregating local descriptors into a compact image
representation.
If I've understood correctly, when training neural networks to recognize objects in images it's common to map single pixel to a single input layer node. However, sometimes we might have a large picture with only a small area of interest. For example, if we're training a neural net to recognize traffic signs, we might have images where the traffic sign covers only a small portion of it, while the rest is taken by the road, trees, sky etc. Creating a neural net which tries to find a traffic sign from every position seems extremely expensive.
My question is, are there any specific strategies to handle these sort of situations with neural networks, apart from preprocessing the image?
Thanks.
Using 1 pixel per input node is usually not done. What enters your network is the feature vector and as such you should input actual features, not raw data. Inputing raw data (with all its noise) will not only lead to bad classification but training will take longer than necessary.
In short: preprocessing is unavoidable. You need a more abstract representation of your data. There are hundreds of ways to deal with the problem you're asking. Let me give you some popular approaches.
1) Image proccessing to find regions of interest. When detecting traffic signs a common strategy is to use edge detection (i.e. convolution with some filter), apply some heuristics, use a threshold filter and isolate regions of interest (blobs, strongly connected components etc) which are taken as input to the network.
2) Applying features without any prior knowledge or image processing. Viola/Jones use a specific image representation, from which they can compute features in a very fast way. Their framework has been shown to work in real-time. (I know their original work doesn't state NNs but I applied their features to Multilayer Perceptrons in my thesis, so you can use it with any classifier, really.)
3) Deep Learning.
Learning better representations of the data can be incorporated into the neural network itself. These approaches are amongst the most popular researched atm. Since this is a very large topic, I can only give you some keywords so that you can research it on your own. Autoencoders are networks that learn efficient representations. It is possible to use them with conventional ANNs. Convolutional Neural Networks seem a bit sophisticated at first sight but they are worth checking out. Before the actual classification of a neural network, they have alternating layers of subwindow convolution (edge detection) and resampling. CNNs are currently able to achieve some of the best results in OCR.
In every scenario you have to ask yourself: Am I 1) giving my ANN a representation that has all the data it needs to do the job (a representation that is not too abstract) and 2) keeping too much noise away (and thus staying abstract enough).
We usually dont use fully connected network to deal with image because the number of units in the input layer will be huge. In neural network, we have specific neural network to deal with image which is Convolutional neural network(CNN).
However, CNN plays a role of feature extractor. The encoded feature will finally feed into a fully connected network which act as a classifier. In your case, I dont know how small your object is compare to the full image. But if the interested object is really small, even use CNN, the performance for image classification wont be very good. Then we probably need to use object detection(which used sliding window) to deal with it.
If you want recognize small objects on large sized image, you should use "scanning window".
For "scanning window" you can to apply dimention reducing methods:
DCT (http://en.wikipedia.org/wiki/Discrete_cosine_transform)
PCA (http://en.wikipedia.org/wiki/Principal_component_analysis)
How do I approach the problem with a neural network and a intrusion detection system where by lets say we have an attack via FTP.
Lets say some one attempts to continuously try different logins via brute force attack on an ftp account.
How would I set the structure of the NN? What things do I have to consider? How would it recognise "similar approaches in the future"?
Any diagrams and input would be much appreciated.
Your question is extremely general and a good answer is a project in itself. I recommend contracting someone with experience in neural network design to help come up with an appropriate model or even tell you whether your problem is amenable to using a neural network. A few ideas, though:
Inputs need to be quantized, so start by making a list of possible numeric inputs that you could measure.
Outputs also need to be quantized and you probably can't generate a simple "Yes/no" response. Most likely you'll want to generate one or more numbers that represent a rough probability of it being an attack, perhaps broken down by category.
You'll need to accumulate a large set of training data that has been analyzed and quantized into the inputs and outputs you've designed. Figuring out the process of doing this quantization is a huge part of the overall problem.
You'll also need a large set of validation data, which should be quantized in the same way as the training data, but that should not take any part in the training, as otherwise you will simply force a correlation network that may well be completely meaningless.
Once you've completed the above, you can think about how you want to structure your network and the specific algorithms you want to use to train it. There is a wide range of literature on this topic, but, honestly, this is the simpler part of the problem. Representing the problem in a way that can be processed coherently is much more difficult.