Perceptron model: Effect of the learning rate when all weights are initialized to zeros - neural-network

In Python Machine Learning 2rd book, related to the weight initialization step in the Perceptron, the author wrote:
I'm not expert at Trigonometry and Algebra so I don't really understand how the angle between two vectors (is zero) relates to the effect of learning rate to the weights (when it's initialized to all zeros). Please help to connect the two points?

Related

Bagging with knn as learners

I am struggling in understanding why the matlab function fitcenseble doesn't allow to create an ensemble model using knn learners with bagging, but only with the random subspace method, which is more similar to the random forest one.
I would like to use bagging in order to compare the bagging method using different types of learners (e.g., knn and trees).
I hope you will help me, thank you in advance,
Marta
Bagging is rarely used in conjunction with k-nn classifiers, as the decision surfaces are typically too stable and any multiples of datapoints in the bootstrap sample do not shift the 'weight' like in many other models. Paraphrasing (1):
The probability that any single datapoint appears at least once in a bootstrap sample is ~0.632. Consider a simple 2-class 1-NN classifier bagged with N bootstrap samples. A test datapoint can change classification only if its nearest neighbours in the learning set is not in at least half of the N bootstrap samples. The probability for this to occur is the same as the probability of flipping a weighted coin with a 0.632 probability for heads N times and getting less than 0.5N heads. As N gets larger this probability gets smaller and smaller. Similiar logic holds for multiclass problems and k-NN.
If you want to create your own bagging models you can do it with bootstrp. bootstrp() can be called without a function by calling:
[~, BootIndices] = bootstrap(N, [], Data);
BootSample = Data(BootIndices);
(1) Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996):
123-140. Chapter 6.4.

ANN: Perceptron performance in determining point positions

So I'm starting with machine learning and Artificial Neural Networks and I found this article The Nature of Code that introduces to Artificial Neural Networks and the idea of a Perceptron.
Through the article they show you how to create a Perceptron that is able to discriminate between points positioned above or below a line based on the function:
f(x) = 2x + 1
I developed my own Perceptron in Swift and used XCode Playgrounds to illustrate my Perceptron performance.
The perceptron takes 3 inputs: x, y and bias (always 1). The weights of the 3 inputs are generated at random, and after some training they are adjusted.
This first graphic shows the value of the first weight over trainings. As you can see, the value stabilizes at the end, and this is a proof that the Perceptron learned to discriminate points:
The second graphic represents the function line and all the training points (selected at random). The green points are the ones that the Perceptron predicted well, whereas the red ones are the wrong predictions:
As you can see, almost all of the red dots are situated in the inverse function:
f(x) = -2x - 1
My question is why this line of error dots appear. I thought that at a certain point all the weights would be stabilized and that the Perceptron performance would be 100%, but it never does. Is this because of a code bug or ANN always have this tiny interval of error?
Any explanation will be welcomed, although keep in mind that I'm a newbie at ML and ANN.
Thank you very much.

Beginners issue in polynomial curve fitting [Part 1]

I have just started understanding modeling techniques based on regression models and was going through MATLAB curve fitting toolbox and the SO. I have fundamental doubts and unable to proceed further. I have a single vector set with k=100 data points which I want to fit into an AR model,MA model,ARMA model successively to see which is better suited.Starting with an AR(p) model of the form y(k+1)=a*y(k)+ b*y(k-1)The command
coeff = polyfit(x,y,d)
will fit a polynomial of degree say d=1 with p number of coefficients indicating the order of the model (AR(p)). But I just have 1 set of data which is the recording of the angular moment.So,what will go as the first parameter (x) of the function signature i.e what will be x,y?Then, what if the linear models are not good enough so I may have to select the nonlinear models.Can somebody please guide with code snippets what are the steps in fitting,checking for overfitting,residual calculation etc.
x is likely to be k (index of y). And the whole code:
c =polyfit(1:length(y), y, d).
Matlab has a curve fitting toolbox. You could use it to check different nonlinear fitting in GUI to get some intuition.
If you want steps there's a great Coursera Machine Learning course. The beginning of this course is related to linear regression and I recommend you to spend some hours at least on that beginning.

Pattern recognition in Neural Network using matlab simulation

I am new to this neural network in matlab. I wanted to create a Neural Network using matlab simulation.
This matlab simulation is using pattern recognition.
I am running on a windows XP platform.
For example, I have a sets of waveforms of circular shape.
I have extracted out the poles.
These poles will teach my Neural Network that it is circular in shape, hence whenever I input another set of slightly different circular shape waveform, the Neural Network is able to distinguish between the shape.
Currently, I have extracted the poles of these 3 shapes, cylinder, circle and rectangle.
But I am clueless of how I should go about creating my Neural Network.
I'd recommend utilizing SOM (Self-organizing map) for pattern recognition since it's really robust. Also there's a Som Toolbox for Matlab you might be interested in. However, to make it learn waves while neglecting their offsets, you'd need to make some changes to the "similarity function". These changes will affect quite a lot on the SOM's training time but if that's not a problem, keep reading.
For the SOM you'll have to sample your waves to constant sized vectors, let say:
sin x -> sin_vector = (a1, a2, a3, ..., aN)
cos x -> cos_vector = (b1, b2, b3, ..., bN)
Usually similarity of "SOM-vectors" is calculated with euclidian distance. Euclidian distance of those two vectors is huge since they have a different offset. In your case they should be considered to be similar ie. distance to be small. So.. if you don't sample all the similar waves from the same starting point, they will be classified in different classes. That is probably a problem. But! Similarity of vectors in SOM is calculated in order to find the BMU (best-matching unit) from the map and pulling the BMU's and its neigborhood's vectors torwards the values of the given sample. So all you need to change is the way to compare those vectors and the way to pull the vectors' values torwards the sample so that both will be "offset-tolerent".
Slow but working solution is first finding the best offset index for each vector. Best offset index is the one that will produce the smallest value with euclidian distance for the sample. Smallest distance calculated with some node of the net will then be the BMU. Then the BMU's and its neigborhood's vectors are pulled torwards the given sample using the offset index calculated for each node just before. Everything else should work out-of-the-box.
This solution is relatively slow but should work great. I'd recommend studying the consept of SOM thoroughly and then reading this post (and angry comments) again :)
PLEASE comment if you know some mathematical solution that would be better than that previous one!
You can try to use Matlab's Neural network pattern recognition tool nprtool as it is specialize to train and test neural network for pattern recognition.

Adaboost algorithm and its usage in face detection

I am trying to understand Adaboost algorithm but i have some troubles. After reading about Adaboost i realized that it is a classification algorithm(somehow like neural network). But i could not know how the weak classifiers are chosen (i think they are haar-like features for face detection) and how finally the H result which is the final strong classifier can be used. I mean if i found the alpha values and compute the H ,how am i going to benefit from it as a value (one or zero) for new images. Please is there an example describes it in a perfect way? i found the plus and minus example that is found in most adaboost tutorials but i did not know how exactly hi is chosen and how to adopt the same concept on face detection. I read many papers and i had many ideas but until now my ideas are not well arranged.
Thanks....
Adaboost is aclassification algorithm, it uses weak classifiers (any thing that give more than 50% correct result, better than random). And finally combines them in one strong classifier.
The training stages find the alpha variables which computes the H(final result).
H=Sigma(alpha(i)*h(i)) such that h(i) is 1 or zero for two classes problem.
It seems that H is a weighted sum of all the weak features, so when we have a new input(not seen before) we apply the weak classifiers h(i) and multiply them with the correct alphas we get from training stages to get one or zero.
For more clarification see the "The Top Ten Algorithms in Data Mining" book which can be found on gigapeida.com website.