Adaboost algorithm and its usage in face detection - classification

I am trying to understand Adaboost algorithm but i have some troubles. After reading about Adaboost i realized that it is a classification algorithm(somehow like neural network). But i could not know how the weak classifiers are chosen (i think they are haar-like features for face detection) and how finally the H result which is the final strong classifier can be used. I mean if i found the alpha values and compute the H ,how am i going to benefit from it as a value (one or zero) for new images. Please is there an example describes it in a perfect way? i found the plus and minus example that is found in most adaboost tutorials but i did not know how exactly hi is chosen and how to adopt the same concept on face detection. I read many papers and i had many ideas but until now my ideas are not well arranged.
Thanks....

Adaboost is aclassification algorithm, it uses weak classifiers (any thing that give more than 50% correct result, better than random). And finally combines them in one strong classifier.
The training stages find the alpha variables which computes the H(final result).
H=Sigma(alpha(i)*h(i)) such that h(i) is 1 or zero for two classes problem.
It seems that H is a weighted sum of all the weak features, so when we have a new input(not seen before) we apply the weak classifiers h(i) and multiply them with the correct alphas we get from training stages to get one or zero.
For more clarification see the "The Top Ten Algorithms in Data Mining" book which can be found on gigapeida.com website.

Related

What is the Difference between evolutionary computing and classification?

I am looking for some comprehensive description. I couldn't find it via browsing as things are more clustered on the web and its not in my scope currently.
Classification and evolutionary computing is comparing oranges to apples. Let me explain:
Classification is a type of problem, where the goal is to determine a label given some input. (Typical example, given pixel values, determine image label).
Evolutionary computing is a family of algorithms to solve different types of problems. They work with a "population" of candidates (imagine a set of different neural networks trying to solve a given problem). Somehow you evaluate how good each candidate is in the given task (typically using a "fitness function", but there are other methods). Then a new generation of candidates is produced, taking the best candidates from the previous generation as a model, and including mutations and cross-over (that is, introducing changes). Repeat until happy.
Evolutionary computing can absolutely be used for classification! But there are examples where it is used in different ways. You may use evolutionary computing to create an artificial neural network controlling a robot (in this case, inputs are sensor values, outputs are commands for actuators). Or to create original content free of a given goal, as in Picbreeder.
Classification may be solved using evolutionary computation (maybe this is why you where confused in the first place) but other techniques are also common. You can use decision trees, or notably deep-learning (based on backpropagation).
Deep-learning based on backpropagation may sound similar to evolutionary computation, but it is quite different. Here you have only one artificial neural network, and a clear rule (backpropagation) telling you which changes to introduce every iteration.
Hope this helps to complement other answers!
Classification algorithms and evolutionary computing are different approaches. However, they are related in some ways.
Classification algorithms aim to identify the class label of new instances. They are trained with some labeled instances. For example, recognition of digits is a classification algorithm.
Evolutionary algorithms are used to find out the minimum or maximum solution of an optimization problem. They randomly explore the solution space of the given problem. They can find a good solution in a reasonable time and are not able to find the global optimum in all problems.
In some classification approaches, evolutionary algorithms are used to find out the optimal value of the parameters.

Difference between function approximator and optimization algorithm?

I just started learning about artificial neural networks and genetic algorithms and found that the difference between them is that ANN is a function approximator and that GA is an optimization algorithm (according to SO). Problem is I am not 100% sure where and how to draw the line between these definitions; is there a simpler way to explain what the difference is using e.g. analogies (assume I am a 10 year old)? What I found particularly confusing is that both types seem to be able to solve the same problem in some cases (e.g. Traveling Salesman Problem).
ANNs approximate an unknown function that correlates input and output. The goal of ANNs is to find the mathematical relation between both: if is presented a new input, the modeling found by the net gives an approximation to the true value. Example: find the pressure of gas in a tube giving as input temperature, viscosity, density, section of tube, ecc., using a set of measurements for training.
GAs are used often to find max or min of a function (optimization). For example: find the optimal net (minor error) for my previous example, using a set of nets, or solve the traveling salesman problem (given a set of cities, visit each city once and find the minimal path).

How to Combine two classification model in matlab?

I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.

Can KNN be better than other classifiers?

As Known, there are classifiers that have a training or a learning step, like SVM or Random Forest. On the other hand, KNN does not have.
Can KNN be better than these classifiers?
If no, why?
If yes, when, how and why?
The main answer is yes, it can due to no free lunch theorem implications. FLT can be loosley stated as (in terms of classification)
There is no universal classifier which is consisntenly better at any task than others
It can also be (not very strictly) inverted
For each (well defined) classifier there exists a dataset where it is the best one
And in particular - kNN is well-defined classifier, in particular it is consistent with any distibution, which means that given infinitely many training points it converges to the optimal, Bayesian separator.
So can it be better than SVM or RF? Obviously! When? There is no clear answer. First of all in supervised learning you often actually get just one training set and try to fit the best model. In such scenario any model can be the best one. When statisticians/theoretical ML try to answer whether one model is better than another, we actually try to test "what would happen if we would have ifinitely many training sets" - so we look at the expected value of the behaviour of the classifiers. In such setting, we often show that SVM/RF is better than KNN. But it does not mean that they are always better. It only means, that for randomly selected dataset you should expect KNN to work worse, but this is only probability. And as you can always win in a lottery (no matter the odds!) you can also always win with KNN (just to be clear - KNN has bigger chances of being a good model than winning a lottery :-)).
What are particular examples? Let us for example consider a rotated XOR problem.
If the true decision boundaries are as above, and you only have this four points. Obviously 1NN will be much better than SVM (with dot, poly or rbf kernel) or RF. It should also be true once you include more and more training points.
"In general kNN would not be expected to exceed SVM or RF. When kNN does, that says something very interesting about the training data. If many doublets are present i the data set, a nearest neighbor algorithm works very well."
I heard the argument something like as written by Claudia Perlich in this podcast:
http://www.thetalkingmachines.com/blog/2015/6/18/working-with-data-and-machine-learning-in-advertizing
My intuitive understanding of why RF and SVM is better kNN in generel: All algorithms basicly assume some local similarity, such that samples very alike gets classified alike. kNN can only choose the most similar samples by distance(or some other global kernel). So the samples which could influence a prediction on kNN would exists within a hyper sphere for the Euclidean distance kernel. RF and SVM can learn other definitions of locality which could stretch far by some features and short by others. Also the propagation of locality could take up many learned shapes, and these shapes can differ through out the feature space.

Bayesian Classifier

When using the Bayesian classifier in matlab what’s the best way to avoid over fitting and inaccuracies?
I am using 1000 samples at the moment for the training data of which 750 are "normal" and 250 are "anomalous" (of one specific kind).
Has anyone found a good percentage of which works to train the classifier or does each problem require a specific amount of training data. I would assume the latter but I am struggling to figure out how I can improve the accuracy, what method could I use. Any example would be grateful.
Below is an example of what I am currently using:
training_data = data;
target_class = Book2(indX,:)
class = classify(test_data,training_data, target_class, 'diaglinear')
confusionmat(target_class,class)
% Display Results of Naive Bayes Classification
input = target_class;
% find the unique elements in the input
uniqueNames=unique(input)';
% use string comparison ignoring the case
occurrences=strcmpi(input(:,ones(1,length(uniqueNames))),uniqueNames(ones(length(input),1),:));
% count the occurences
counts=sum(occurrences,1);
%pretty printing
for i=1:length(counts)
disp([uniqueNames{i} ': ' num2str(counts(i))])
end
% output matching data
dataSample = fulldata(indX, :)
This is an old question, but maybe someone arriving here from Google could still benefit from an answer. I've not used Naive Bayes with Matlab, but have experience in other environments and authored the ruby nbayes gem. You've got at least a few questions in here so let's unpack them.
Overfitting and Accuracy. Don't buy the hype -- Naive Bayes is definitely prone to overfitting, so make sure you use cross validation when measuring the validity of your classifier. I've found that good feature selection (e.g., removing useless terms/tokens) usually boosts accuracy and will also help to reduce overfitting. And, of course, more data never hurts (but may not help if you already have a lot).
Class imbalance issues. It looks like you are trying to classify new instances as either "normal" or "anomalous". In general, you want the balance of classes to match what exists in the real world (what you are modelling). If you choose not to, maybe because anomalous instances are too few, then make sure you manually set the prior distributions on the classes to their real value.
For more detailed info, I highly recommend excerpts from the Stanford IR book:
http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html