Neural networks - input values - neural-network

I have a question that may be trivial but it's not described anywhere i've looked. I'm studying neural networks and everywhere i look there's some theory and some trivial example with some 0s and 1s as an input. I'm wondering: do i have to put only one value as an input value for one neuron, or can it be a vector of, let's say, 3 values (RGB colour for example)?

The above answers are technically correct, but don't explain the simple truth: there is never a situation where you'd need to give a vector of numbers to a single neuron.
From a practical standpoint this is because (as one of the earlier solutions has shown) you can just have a neuron for each number in a vector and then have all of those be the input to a single neuron. This should get you your desired behavior after training, as the second layer neuron can effectively make use of the entire vector.
From a mathematical standpoint, there is a fundamental theorem of coding theory that states that any vector of numbers can be represented as a single number. Thus, if you really don't want an extra layer of neurons, you could simply encode the RGB values into a single number and input that to the neuron. Though, this coding function would probably make most learning problems more difficult, so I doubt this solution would be worth it in most cases.
To summarize: artificial neural networks are used without giving a vector to an input unit, but lose no computational power because of this.

When dealing with multi-dimensional data, I believe a two layer neural network is said to give better result.
In your case:
R[0..1] => (N1)----\
\
G[0..1] => (N2)-----(N4) => Result[0..1]
/
B[0..1] => (N3)----/
As you can see, the N4 neurone can handle 3 entries.
The [0..1] interval is a convention but a good one imo. That way, you can easily code a set of generic neuron classes that can take an arbitrary number of entries (I had template C++ classes with the number of entries as template parameter personally). So you code the logic of your neurons once, then you toy with the structure of the network and/or combinations of functions within your neurons.

Generally, the input for a single neuron is a value between 0 and 1. That convention is not just for ease of implementation but because normalizing the input values to the same range ensures that each input carries similar weighting. (If you have some images with 8 bit color with pixel values between 0 and 7 and some images with 16 bit color with pixel values between 0 and 255 you probably wouldn't want to favor the 24 bit color images just because the numerical values are higher. Similarly, you will probably want your images to be the same dimensions.)
As far as using pixel values as inputs, it is very common to try to gather a higher level representation of the image than its pixels (more info). For example, given a 5 x 5 (normalized) gray scale image:
[1 1 1 1 1]
[0 0 1 0 0]
[0 0 1 0 0]
[0 0 1 0 0]
[0 0 1 0 0]
We could use a the following feature matrices to help discover horizontal, vertical, and diagonal features of the images. See python haar face detection for more information.
[1 1] [0 0] [1 0] [0 1] [1 0], [0 1]
[0 0], [1 1], [1 0], [0 1], [0 1], [1 0]
To build the input vector, v, for this image, take the first 2x2 feature matrix and "apply" it with element-wise multiplication to the first position in the image. Applying,
[1 1] (the first feature matrix) to [1 1] (the first position in the image)
[0 0] [0 0]
will result in 2 because 1*1 + 1*1 + 0*0 + 0*0 = 2. Append 2 to the back of your input vector for this image. Then move this feature matrix to the next position, one to the right, and apply it again, adding the result to the input vector. Do this repeatedly for each position of the feature matrix and for each of the feature matrices. This will build your input vector for a single image. Be sure that you build the vectors in the same order for each image.
In this case the image is black and white, but with RGB values you could extend the algorithm to do the same computation but add 3 values to the input vector for each pixel--one for each color. This should provide you with one input vector per image and a single input to each neuron. The vectors will then need to be normalized before running through the network.

Normally a single neuron takes as its input multiple real numbers and outputs a real number, which typically is calculated as applying the sigmoid function to the sum of the real numbers (scaled, and then plus or minus a constant offset).
If you want to put in, say, two RGB vectors (2 x 3 reals), you need to decide how you want to combine the values. If you add all the elements together and apply the sigmoid function, it is equivalent to getting in six reals "flat". On the other hand, if you process the R elements, then the G elements, and the B elements, all individually (e.g. sum or subtract the pairs), you have in practice three independent neurons.
So in short, no, a single neuron does not take in vector values.

Use light wavelength normalized to visible spectrum as the input.
There are some approximate equations on the net.
Search for RGB to wavelength conversion
or
use HSL color model and extract Hue component and possibly use Saturation and Lightness as well. Well...

It can be whatever you want, as long as you write your inner function accordingly.
The examples you mention use [0;1] as their domain, but you can use R, R², or whatever you want, as long as the function you use in your neurons is defined on this domain.
In your case, you can define your functions on R3 to allow for RGB values to be handled
A trivial example : use (x1, y1, z1),(x2,y2,z2)->(ax1+x2,by1+y2,cz1+z2) as your function to transform two colors into one, a b and c being your learning coefs, which you will determine during the learning phase.
Very detailed information (including the answer to your question) is available on Wikipedia.

Related

Does my Neural Net Vector Input Size Need to Match the Output Size?

I’m trying to use a Neural Network for purposes of binary classification. It consist of three layers. The first layer has three input neurons, the hidden layer has two neurons, and the output layer has three neurons that output a binary value of 1 or 0. Actually the output is usually a floating point number, but it typically rounds up to a whole number.
If the network only outputs vectors of 3, then shouldn't my input vectors be the same size? Otherwise, for classification, how else do you map the output to the input?
I wrote the neural network in Excel using VBA based on the following article: https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/
So far it works exactly as described in the article. I don’t have access to a machine learning library at the moment so I’ve chosen to give this a try.
For example:
If the output of the network is [n, n ,n], does that mean that my input data has to be [n, n, n] also?
From what I read in here: Neural net input/output
It seems that's the way it should be. I'm not entirely sure though.
To speak simple,
for regression task, your output usually has the dimension [1] (if you predict single value).
For the classification task, your output should have the same number of dimensions equal to the number of classes you have (outputs are probabilities, the sum of them = 1).
So, there is no need to have equal dimensions of input and output. NN is just a projection of one dimension to another.
For example,
regression, we predict house prices: input is [1, 10] (to features of the property), the output is [1] - price
classification, we predict class (will be sold or not): input is [1, 11] (same features + listed price), output is [1, 2] (probability of class 0 (will be not sold) and 1 (will be sold); for example, [1; 0], [0; 1] or [0.5; 0.5] and so on; it is binary classification)
Additionally, equality of input-output dimensions exists in more specific tasks, for example, autoencoder models (when you need to present your data in other dimension and then represent it back, to the original dimension).
Again, the output dimension is the size of outputs for 1 batch. Only one, not of the whole dataset.

Plotting K-means results in Matlab

I have 3 sets of signals, each containing 4 distinct operational states, and I have to classify the states in each signal using K-means in Matlab. The classification is done after I have smoothened the original signal using a filter. My output should be a plot of the smoothened signal with each part of the signal in a different color to denote the different operational state.
I am very new to Matlab, and this is what I have for the classification part.
numClusters = 4;
idx_1 = kmeans([X_1 smoothY_1],numClusters,'Replicates', 5);
[numDataPoints,numDimensions] = size(smoothY_1);
Colors = hsv(numClusters);
for i = 1 : numDataPoints
plot(X_1(i),smoothY_1(i),'.','Color',Colors(idx_1(i),:))
hold on
end
I have a few questions.
1) It appears to me that the kmeans function in Matlab will return a set of arbitrary cluster index in every run. For example, running the code above on the same signal twice may give me the cluster index (for 10 data points) [4 4 2 2 2 1 1 3 3 3] and [2 2 1 1 1 4 4 3 3 3], resulting in arbitrary colors denoting each state. Ideally, I would like the indices to be (somewhat) ordered and the colors to be the same for corresponding states, so that it makes sense to say "Red means Operational State 1, blue means State 2, etc". How can I synchronize this?
I have 2 pictures to illustrate this.
Set 1 and 2 are two of the datasets. Each stage of the signal is in a different color. I would like, for example, the first segment to be red, second in cyan, third in green, fourth in purple.
2) I can't seem to plot the graph using the specifier '-'. There is no output when I tried to do that, so I'm forced to use '.', which isn't what i want. How can I plot a continuous curve here?
3) Right now, I'm running K-means independently on all 3 sets of data, so there's no concept of training/test datasets. I would like to use one dataset for training and the other 2 for testing, but I don't know how to do that using K-means in Matlab. How can I do that?
ETA: I noticed that my smoothed plots are all about half the heights of my plots of the original data, e.g. the highest point in my original signal is y = 22, while the highest point in my smoothed signal is y = 11, although the shape remains the same. Is this correct?
ETA2: I realized that it seems as if what the K-means clustering did was simply divide the graph into numClusters segments (based on X_1 values) and that's it. I've tried with different values of numClusters and each gave me equally divided segments. Surely this can't be right? For instance, isn't it more likely that the long segment after the biggest spike belong to the same cluster, rather than 3 clusters? Should I be using K-means at all?
For the first question:
You can reorder your vector with
[~,~,a] = unique(a,'stable');
For the second question:
You can find all the information about the LineSpec here:
LineSpec
If you don't add a LineSpec the default option is a continuous line, as you want.
For the third question:
I don't think that you can train your kmean algorithm (due to the method) as it could be possible with an SVM, but i'm waiting for an expert opinion.

Is my implementation of confusion matrix correct? Or is something else at fault here?

I have trained a multi class svm classifier with 5 classes, i.e. svm(1)...svm(5).
I then used 5 images not used to during the training of these classifiers for testing.
These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class.
predict = svmclassify(svm(i_t),test_features);
The predict produces a 5 by 1 vector showing the result.
-1
1
1
1
-1
I sum these and then insert it into a diagonal matrix.
Ideally it should be a diagonal matrix with 5 written diagonally when all images are correctly classified. But the result is very poor. I mean in some cases I am getting negative result. I just want to verify if this poor result is because my confusion matrix is not accurate or if I should use some other feature extractor.
Here is the code I wrote
svm_table = [];
for i_t = 1:numel(svm)
test_folder = [Path_training folders(i_t).name '\']; %select writer
feature_count = 1; %Initialize count for feature vector accumulation
for j_t = 6:10 %these 5 images that were not used for training
[img,map] = imread([test_folder imlist(j_t).name]);
test_img = imresize(img, [100 100]);
test_img = imcomplement(test_img);
%Features extracted here for each image.
%The feature vector for each image is a 1 x 16 vector.
test_features(feature_count,:) = Features_extracted;
%The feature vectors are accumulated in a single matrix. Each row is an image
feature_count = feature_count + 1; % increment the count
end
test_features(isnan(test_features)) = 0; %locate Nan and replace with 0
%I was getting NaN in some images, which was causing problems with svm, so just replaced with 0
predict = svmclassify(svm(i_t),test_features); %produce column vector of preicts
svm_table(end+1,end+1) = sum(predict); %sum them and add to matrix diagonally
end
this is what I am getting. Looks like a confusion matrix but is very poor result.
-1 0 0 0 0
0 -1 0 0 0
0 0 3 0 0
0 0 0 1 0
0 0 0 0 1
So I just want to know what is at fault here. My implementation of confusion matrix. My way of testing the svm or my selection of features.
I would like to add some issues:
You mention that: << These 5 images are then tested with their respective classifier. i.e. If 5 images were taken from class one they are tested against the same class. >>
You are never supposed to know the class (category) of test images. Of course, you need to know the test category labels for calculating various metrics such as accuracy, precision, confusion matrix etc. Apart from that, when you are using SVM to determine which class the example belongs to, you have to try all the SVMs.
There are two popular ways of training and testing multi-class SVMs, namely one-vs-all and one-vs-one approach. Read this answer and its corresponding question to understand them in detail.
I don't know if MATLAB SVM is capable of doing multiclass classification, but if you use LIBSVM then its uses one-vs-one approach. It will also do the testing for you correctly. However, if you want to design your own one-vs-one classifier, this is how you should proceed:
Say you have 5 classes, then train all possible combinations of pairs = 5c2 = 10 pairs ({1,2}, ..., {1,5},{2,1},...,{2,5},...,{5,4}). While testing, you have to apply all the 10 models and count all the votes to decide the final result. For example, we train models for 4 pairs (say), ({1 vs 2}, {1 vs 3}, {2 vs 1}, {2 vs 3}) and the outputs of 4 models are {1,1,0,1} respectively. That means, your 4 predicted classes are {1,1,1,2}. Therefore, the final class is 1.
Once you get all the predicted labels, then you can actually use the command confusionmat to get the confusion matrix. If you want to make your own, then make a 5x5 matrix of zeros. Add a 1 to the position (actual label, predicted label) i.e. if the actual class was 2 and you predicted it as 3, then add 1 at the position (2nd row, 3rd col) in the matrix.
Several issues that I can see...
1) What you're using is not really a multi class SVM. Your taking several different SVM models and applying them to the same test data (not really the same thing). You need to look at the documentation for svmtrain. When you use it you give it two kinds of data, the training data (parameter vectors for each training image) and the Group data (vector of classes for the images associated with the vectors..). What you get will be one SVM model which will decide between 1 of the options. (I usually use libsvm, so Im not that familiar with Matlabs SVM implementation, but that should be the gist of it)
2) Your confusion matrix is derived incorrectly (see: http://en.wikipedia.org/wiki/Confusion_matrix). Start by making a 5x5 zeros matrix to hold the confusion matrix. Loop through each of your test images and let the SVM model classify the image (it should pick 1 of the five possibilities). Add 1 at the proper position of the confusion matrix. So if the image should classify as a 3 and the SVM classifies it as a 4 you should add 1 to the 3,4 position...

Picking objects from a vector with defined probabilities on matlab/octave

Is there any function on Matlab/Octave that randomly picks a value from a list accordingly to a given probability?
For example: we have the vector [1 3 7]. The function I am looking for should pick one of those numbers with probability .25 for 1, .35 for 3 and .4 for 7.
I am trying to implement it myself, but I'd like to know if there is some build-in function for the next time I need something like this.
You are looking for a function statistics toolbox called randsample. It samples k values out of n with replacement (without replacement is not supported). You want to select one value, which can be done as follows:
nSamplesToChoose=1;
weightVector=[0.2 0.5 0.3];%weights some to one so as to represent probability distribution
yourArray=[5 6 7]; %length of the array should be same as the length of weightVector.
chosenSample=randsample(yourArray,nSamplesToChoose,true,weightVector)
P.S. I encourage you to implement this by yourself. You may refer to this question.
What you described is like a Generalized Bernoulli Distribution. So, you can use the Multinomial Distribution to generate this data.
The MATLAB help page is here.
In your case, n=1 and p=[.25 .35 .4].
mnrnd(n,p)
will return a 1 x 3 vector which has only one non-zero element which corresponds to the random variable which should be chosen.
TL;DR Version:
To generate the required output, you can simply do dot([1 3 7], mnrnd(1,[.25 .35 .4]))

Matlab: how to find which variables from dataset could be discarded using PCA in matlab?

I am using PCA to find out which variables in my dataset are redundand due to being highly correlated with other variables. I am using princomp matlab function on the data previously normalized using zscore:
[coeff, PC, eigenvalues] = princomp(zscore(x))
I know that eigenvalues tell me how much variation of the dataset covers every principal component, and that coeff tells me how much of i-th original variable is in the j-th principal component (where i - rows, j - columns).
So I assumed that to find out which variables out of the original dataset are the most important and which are the least I should multiply the coeff matrix by eigenvalues - coeff values represent how much of every variable each component has and eigenvalues tell how important this component is.
So this is my full code:
[coeff, PC, eigenvalues] = princomp(zscore(x));
e = eigenvalues./sum(eigenvalues);
abs(coeff)/e
But this does not really show anything - I tried it on a following set, where variable 1 is fully correlated with variable 2 (v2 = v1 + 2):
v1 v2 v3
1 3 4
2 4 -1
4 6 9
3 5 -2
but the results of my calculations were following:
v1 0.5525
v2 0.5525
v3 0.5264
and this does not really show anything. I would expect the result for variable 2 show that it is far less important than v1 or v3.
Which of my assuptions is wrong?
EDIT I have completely reworked the answer now that I understand which assumptions were wrong.
Before explaining what doesn't work in the OP, let me make sure we'll have the same terminology. In principal component analysis, the goal is to obtain a coordinate transformation that separates the observations well, and that may make it easy to describe the data , i.e. the different multi-dimensional observations, in a lower-dimensional space. Observations are multidimensional when they're made up from multiple measurements. If there are fewer linearly independent observations than there are measurements, we expect at least one of the eigenvalues to be zero, because e.g. two linearly independent observation vectors in a 3D space can be described by a 2D plane.
If we have an array
x = [ 1 3 4
2 4 -1
4 6 9
3 5 -2];
that consists of four observations with three measurements each, princomp(x) will find the lower-dimensional space spanned by the four observations. Since there are two co-dependent measurements, one of the eigenvalues will be near zero, since the space of measurements is only 2D and not 3D, which is probably the result you wanted to find. Indeed, if you inspect the eigenvectors (coeff), you find that the first two components are extremely obviously collinear
coeff = princomp(x)
coeff =
0.10124 0.69982 0.70711
0.10124 0.69982 -0.70711
0.9897 -0.14317 1.1102e-16
Since the first two components are, in fact, pointing in opposite directions, the values of the first two components of the transformed observations are, on their own, meaningless: [1 1 25] is equivalent to [1000 1000 25].
Now, if we want to find out whether any measurements are linearly dependent, and if we really want to use principal components for this, because in real life, measurements my not be perfectly collinear and we are interested in finding good vectors of descriptors for a machine-learning application, it makes a lot more sense to consider the three measurements as "observations", and run princomp(x'). Since there are thus three "observations" only, but four "measurements", the fourth eigenvector will be zero. However, since there are two linearly dependent observations, we're left with only two non-zero eigenvalues:
eigenvalues =
24.263
3.7368
0
0
To find out which of the measurements are so highly correlated (not actually necessary if you use the eigenvector-transformed measurements as input for e.g. machine learning), the best way would be to look at the correlation between the measurements:
corr(x)
ans =
1 1 0.35675
1 1 0.35675
0.35675 0.35675 1
Unsurprisingly, each measurement is perfectly correlated with itself, and v1 is perfectly correlated with v2.
EDIT2
but the eigenvalues tell us which vectors in the new space are most important (cover the most of variation) and also coefficients tell us how much of each variable is in each component. so I assume we can use this data to find out which of the original variables hold the most of variance and thus are most important (and get rid of those that represent small amount)
This works if your observations show very little variance in one measurement variable (e.g. where x = [1 2 3;1 4 22;1 25 -25;1 11 100];, and thus the first variable contributes nothing to the variance). However, with collinear measurements, both vectors hold equivalent information, and contribute equally to the variance. Thus, the eigenvectors (coefficients) are likely to be similar to one another.
In order for #agnieszka's comments to keep making sense, I have left the original points 1-4 of my answer below. Note that #3 was in response to the division of the eigenvectors by the eigenvalues, which to me didn't make a lot of sense.
the vectors should be in rows, not columns (each vector is an
observation).
coeff returns the basis vectors of the principal
components, and its order has little to do with the original input
To see the importance of the principal components, you use eigenvalues/sum(eigenvalues)
If you have two collinear vectors, you can't say that the first is important and the second isn't. How do you know that it shouldn't be the other way around? If you want to test for colinearity, you should check the rank of the array instead, or call unique on normalized (i.e. norm equal to 1) vectors.