I have a node representing a random variable whith 3314 realizations and 49 dimensions each, can it be treated as a discrete variable? Each realization is a binary vector of 49 dimensions, the other nodes are observable therefore the network training is performed with learn_params. This function transforms data as follows:
local_data = data(fam, :);
if iscell(local_data)
local_data = cell2num(local_data);
end
When this transformation takes place, the data lose their structure and become just an array of 162386 = (49 * 3314) so every vector of 49 dimensions cannot be seen as an embodiment of the variable, now will be 162386 realizations and the training on network node will not be correct.
I want to know if it is really possible (right?) take this variable as discrete and what training alternative I have for not modifying the data structure.
Related
Hi I am using tensorflow at my university to try to classify steering angles of a simulation program using only the images the simulation produces.
The Steering angles are values from -1 to 1 and I separated them into 50 "buckets". So the first value of my prediction vector would mean that the predicted steering angle is between -1 and -0.96.
The following shows the classification and optimization functions I am using.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
y is a vector that with 49 zeros and a single 1 for the correct bucket. My question now is.
How do I take into account if e.g. the correct bucket is at index 25, that the a prediction of 26 is much better than a prediction of 48.
I didn't post the actual network since it is just a couple of conv2d and maxpool layers with a fully connected layer at the end.
Since you are applying Cross entropy or negative log likelihood. you are penalizing the system given the predicted output and the ground truth.
So saying that your system predicted different numbers on your 50 classes output and the highest one was the class number 25 but your ground truth is class 26. So your system will take the value predicted on 26 and adapt the parameters to produce the highest number on this output the next time it sees this input.
You could do two basic things:
Change your y and prediction to be scalars in the range -1..1; make the loss function be (y-prediction)**2 or something. A very different model, but perhaps more reasonable that the one-hot.
Keep the one-hot target and loss, but have y = target*w, where w is a constant matrix, mostly zeros, 1s on the diagonal, and smaller values on the next diagonal, elements (e.g. y(i) = target(i) * 1. + target(i-1) * .5 + target(i+1) * .5 + ...); kind of gross, but it should converge to something reasonable.
I am trying to train a linear SVM on a data which has 100 dimensions. I have 80 instances for training. I train the SVM using fitcsvm function in MATLAB and check the function using predict on the training data. When I classify the training data with the SVM all the data points are being classified into only one class.
SVM = fitcsvm(votes,b,'ClassNames',unique(b)');
predict(SVM,votes);
This gives outputs as all 0's which corresponds to 0th class. b contains 1's and 0's indicating the class to which each data point belongs.
The data used, i.e. matrix votes and vector b are given the following link
Make sure you use a non-linear kernel, such as a gaussian kernel and that the parameters of the kernel are tweaked. Just as a starting point:
SVM = fitcsvm(votes,b,'KernelFunction','RBF', 'KernelScale','auto');
bp = predict(SVM,votes);
that said you should split your set in a training set and a testing set, otherwise you risk overfitting
I'm trying to learn neural net that is 289x300x1. E.g. input vector is 289 elements, 300 hidden neurons, 1 class-output.
So
net = feedforwardnet(300);
net = train(net,X,y,'useParallel','yes','showResources','yes');
gives error
Error using nn7/perfsJEJJ>calc_Y_trainPerfJeJJ (line 37) Error
detected on worker 2. Requested 87301x87301 (56.8GB) array exceeds
maximum array size preference.
X is an array of size 289x2040, type of elements is double.
y is an array of size 1x2040, type of elemetns is double.
I dont understand why matlab wants so much of memory for such small task. Weights need to be stored = 289 * 300 * 64 bytes which is ~5.5 MB.
And how to solve it.
It is probably due to a combination of a few things:
The number of neurons into your hidden layer is rather large.... are you sure 300 features / neurons is what you need? Consider breaking down the problem to fewer features... a dimensionality reduction may be fruitful, but I'm just speculating. However, from what I know, a neural network of 300 hidden neurons should be fine from experience, but I just brought this point up because that hidden neuron size is rather large.
You have too many inputs going in for training. You have 2040 points going in and that's perhaps why it's breaking. Try breaking up the dataset into chunks of a given size, then incrementally train the network for each chunk.
Let's assume that point #1 you can't fix, but you can address point #2, something like this comes to mind:
chunk_size = 200; %// Declare chunk size
num_chunks = ceil(size(X,2)/chunk_size); %// Get total number of chunks
net = feedforwardnet(300); %// Initialize NN
%// For each chunk, extract out a section of the data, then train the
%// network. Retrain on original network until we run out of data to train
for ii = 1 : num_chunks
%// Ensure cap off if we get to the chunk at the end that isn't
%// evenly divisible by the chunk size
if ii*chunk_size > size(X,2)
max_val = size(X,2);
else
max_val = ii*chunk_size;
end
%// Specify portion of data to extract
interval = (ii-1)*chunk_size + 1 : max_val;
%// Train the NN on this data
net = train(net, X(:,interval), y(interval),'useParallel','yes','showResources','yes'));
end
As such, break up your data into chunks, train your neural network on each chunk separately and update the neural network as you go. You can do this because neural networks basically implement Stochastic Gradient Descent where the parameters are updated each time a new input sample is provided.
I have created an Auto Encoder Neural Network in MATLAB. I have quite large inputs at the first layer which I have to reconstruct through the network's output layer. I cannot use the large inputs as it is,so I convert it to between [0, 1] using sigmf function of MATLAB. It gives me a values of 1.000000 for all the large values. I have tried using setting the format but it does not help.
Is there a workaround to using large values with my auto encoder?
The process of convert your inputs to the range [0,1] is called normalization, however, as you noticed, the sigmf function is not adequate for this task. This link maybe is useful to you.
Suposse that your inputs are given by a matrix of N rows and M columns, where each row represent an input pattern and each column is a feature. If your first column is:
vec =
-0.1941
-2.1384
-0.8396
1.3546
-1.0722
Then you can convert it to the range [0,1] using:
%# get max and min
maxVec = max(vec);
minVec = min(vec);
%# normalize to -1...1
vecNormalized = ((vec-minVec)./(maxVec-minVec))
vecNormalized =
0.5566
0
0.3718
1.0000
0.3052
As #Dan indicates in the comments, another option is to standarize the data. The goal of this process is to scale the inputs to have mean 0 and a variance of 1. In this case, you need to substract the mean value of the column and divide by the standard deviation:
meanVec = mean(vec);
stdVec = std(vec);
vecStandarized = (vec-meanVec)./ stdVec
vecStandarized =
0.2981
-1.2121
-0.2032
1.5011
-0.3839
Before I give you my answer, let's think a bit about the rationale behind an auto-encoder (AE):
The purpose of auto-encoder is to learn, in an unsupervised manner, something about the underlying structure of the input data. How does AE achieves this goal? If it manages to reconstruct the input signal from its output signal (that is usually of lower dimension) it means that it did not lost information and it effectively managed to learn a more compact representation.
In most examples, it is assumed, for simplicity, that both input signal and output signal ranges in [0..1]. Therefore, the same non-linearity (sigmf) is applied both for obtaining the output signal and for reconstructing back the inputs from the outputs.
Something like
output = sigmf( W*input + b ); % compute output signal
reconstruct = sigmf( W'*output + b_prime ); % notice the different constant b_prime
Then the AE learning stage tries to minimize the training error || output - reconstruct ||.
However, who said the reconstruction non-linearity must be identical to the one used for computing the output?
In your case, the assumption that inputs ranges in [0..1] does not hold. Therefore, it seems that you need to use a different non-linearity for the reconstruction. You should pick one that agrees with the actual range of you inputs.
If, for example, your input ranges in (0..inf) you may consider using exp or ().^2 as the reconstruction non-linearity. You may use polynomials of various degrees, log or whatever function you think may fit the spread of your input data.
Disclaimer: I never actually encountered such a case and have not seen this type of solution in literature. However, I believe it makes sense and at least worth trying.
I am experimenting with Matlab, set up a Narx Neural Network with the input vector consisting of 2 values, each of them is delayed 30 times, than I have a hidden sigmoid layer with 40 neurons, another one with 15 and the output layer consisting of one value with a purelin function.
I try to transfer the network to the c/c++ lib fann, so I try to understand which layer does what.
netc.b{3} = 0.2302
and netc.LW{6} gives me a vector with 15 values. When I set the values to zero by
netc.LW{6} = zeros(1,15)
And feed the network with zeros by
out = netc(con2seq([zeros(1,40);zeros(1,40)]))
I would expect only the bias to show up at the output, but I get 40 times the value 311.7813. Setting the bias on the output layer to zero I get 40 times 255.5 as output. What do I get wrong?