How to combine anomaly detection model with Object Detection Model - classification

newbie here in deep learning. My question is:
I have an already trained object detection (yolov5) model for 3 classes [0,1,2]. Now, my next step is to classify one class , e.g. class [0] as anomaly or not. In other words, I need an additional classifier to further classify it into to sub-classes , i.e., anomalous or non-anomalous through the use of classifier or anomaly detection model. Can you give me an advice on how can I proceed with this? I will use GANs as anomaly detection model. This would be a great help. Thank you in advance.

one of the ways to solve your problems is through One-Class Learning (OCL). In OCL, the algorithm learns from only one user-defined interest class (in your case, non-anomalous objects of class 0) and classifies a new example as belonging to this interest class or not. Thus, you can adapt OCL algorithms to your problem. One of the ways is to use labeled examples of class 0 non-anomalous and use these examples in learning the algorithm. Finally, your algorithm will answer whether new instances of class 0 are non-anomalous (class of interest) or anomalous (non-interest class). Examples of OCL algorithms can be found at: https://scikit-learn.org/stable/modules/outlier_detection.html. It is noteworthy that these are traditional OCL algorithms. On the other hand, there are already versions of OCL algorithms based on deep learning.
In addition, I'll provide you with an example of anomaly detection using the One-Class Support Vector Machine (OCSVM), one of the most traditional famous OCL algorithms.
from sklearn.svm import OneClassSVM as OCSVM
normal_data_of_class_0 = [[],[],[],[], ......, []] #class 0 normal examples
ocsvm = OCSVM()
ocsvm.fit(normal_data_of_class_0)
data_of_class_0 = [[],[],[],[], ......, []] #class 0 new examples
y_pred = ocsvm.predict(data_of_class_0) # +1 == normal data (interest class) || -1 == abnormal data (outliers)

Related

what machine learning algorithm is used while creating a text classifier model using createML?

I'm creating a text classification model for sentiment analysis, I would like to know what machine learning algorithm is used by createML here?
You can set the algorithm used by the classifier or specify the language which you would like to classify in the parameters of the initializer. See the developer documentation in Xcode under CreateML > MLTextClassifier > MLTextClassifier.ModelParameters
init(validation: MLTextClassifier.ModelParameters.ValidationData, algorithm: MLTextClassifier.ModelAlgorithmType, language: NLLanguage?)
under the parameter algorithm set the algorithm you would like to use.
In developer documentation under the enum MLTextClassifier.ModelAlgorithmType you will find what is available. For example: case crf(revision: Int?) is a conditional random field model algorithm.

Image Processing MLP - Detecting Classes

I've implemented an MLP that is able to detect hand written digits. So far the algorithm can identify numbers 0 and 1, but when I have implemented a new class, e.i. 2, the algorithm is unable to learn it. At the beginning I thought I had had a mistake in the implementation of the new class so I decided to swap the new class for a previous one that worked, in other words, if class0 was 0 and new class was 2 now class0 is 2 and new class is 0. Surprisingly the new class managed to be detected with almost no error, but class0 had a huge error, which means, the new class is properly implemented.
The MLP has two layers with 20 hidden units each, both of them are nonlinear with a sigmoidal function.
I think if I am able to understand your question properly then as you will add a new class and training a model such as here you trained a neural network then the final layer will change i.e, the no. of neurons in the final layer will be changed as a new class is added.
This can be one of the reasons for not detecting the new class.

Multiclass classification in SVM

I have been working on "Script identification from bilingual documents".
I want to classify the pages/blocks as either Eng(class 1), Hindi (class 2) or Mixed using libsvm in matlab. but the problem is that the training data i have consists of samples corresponding to Hindi and english pages/blocks only but no mixed pages.
The test data i want to give may consists of Mixed pages/blocks also, in that case i want it to be classified as "Mixed". I am planning to do it using confidence score or probability values. like if the prob value of class 1 is greater than a threshold (say 0.8) and prob value of class 2 is less than a threshold say(0.05) then it will be classified as class 1, and class 2 vice-versa. but if aforementioned two conditions dont satisfy then i want to classify it as "Mixed".
The third return value from the "libsvmpredict" is prob_values and i was planning to go ahead with this prob_values to decide whether the testdata is Hindi, English or Mixed. but at few places i learnt that "libsvmpredict" does not produce the actual prob_values.
Is there any way which can help me to classify the test data into 3 classes( Hindi, English, Mixed) using training data consisting of only 2 classes in SVM.
This is not the modus operandi for SVMs.
In no way SVMs can predict a given class without knowing it, without knowing how to separate such class from all other classes.
The function svmpredict() in LibSVM actually shows the probability estimates and the greater this value is, the more confident you can be regarding your prediction. But you cannot rely on such values if you have just two classes in order to predict a third class: indeed svmpredict() will return as many decision values as there are classes.
You can go on with your thresholding system (which, again, is not SVM-based) but it most likely fail or give bad performances. Think about that: you have to set up two thresholds and use them in a logic AND manner. The chance of correctly classified non-Mixed documents will indeed drastically decrease.
My suggestion is: instead of wasting time setting up thresholds, with a high chance of bad performances, join some of these texts together or create some new files with some Hindi and some English lines in order to add to your training data some proper Mixed documents and perform a standard 3-classes SVM system.
In order to create such files you can as well use Matlab, which has a pretty decent file I/O functions such as fread(), fwrite(), fprintf(), fscanf(), importdata() and so on...

Bidirectional LSTM for Classification

I am done with searching "how to implement bidirectional lstm network for a classification problem (say with iris data)". I have not found any satisfying answer. In almost every cases I came by a solution where BLSTM is implemented for a sequence prediction problem. My simple question is that, how can I create a bidirectional network in pybrain. As whenever I am trying to build one I am writing *
network=pybrain.BidirectionalNetwork()
*
My intention was to add modules later by pybrain.addInputModule() or so. But it is failing of course as I am not specifying seqlen as in
n = BidirectionalNetwork(seqlen=20, inputsize=1,
hiddensize=5, symmetric=False)
what will be seqlen if I have 4 inputs and 3 outputs(as in iris data) and 150 sample data. will it be 150? Things are not clear as I have no example of classification problem.

Why too few features are selected in this dataset by subset selection method

I have a classification dataset with 148 input features (20 of which are binary and the rest are continuous on the range [0,1]). The dataset has 66171 negative example and only 71 positive examples.
The dataset (arff text file) can be downloaded from this dropbox link: https://dl.dropboxusercontent.com/u/26064635/SDataset.arff.
In Weka suite, when I use CfsSubsetEval and GreedyStepwise (with setSearchBackwards() set to true and also false), the selected feature set contains only 2 features (i.e. 79 and 140)! It is probably needless to say that the classification performance with these two features are terribly bad.
Using ConsistencySubsetEval (in Weka as well) leads to the selection of ZERO features! When feature ranking methods are used instead and the best (e.g. 12) features are selected, a much better classification performance is achieved.
I have two questions:
First, What is it about the dataset that leads to the selection of such a few features? is it because of the imbalance between the number of positive and negative examples?
Second, and more importantly, are there any other subset selection methods (in Matlab or otherwise) that I can try and may lead to the selection of more features?
Clearly, the class imbalance is not helping. You could try to take a subsample of the dataset for better diagnostic. SpreadSubsample filter lets you do that, stating what are the maximun class imbalance admisible, like 10:1, 3:1, or whatever you find appropriate.
For selection methods, you could try dimensionality reduction methods, like PCA, in WEKA, first.
But if the algorithms are selecting those sets of features, they seem to be the most meaningful for your classificatin task.