Is it possible to rank the features based on their importance using autoencoder? - feature-selection

I am using Autoencoder for the first time. I have come to know that it
reduces the dimensionality
of the input data set. I am not sure what does that actually mean. Does it select some specific features from the input features? Is it possible to rank the features using autoencoder?
My data looks like as below:
age height weight working_hour rest_hour Diabetic
54 152 72 8 4 0
62 159 76 7 3 0
85 157 79 7 4 1
24 153 75 8 4 0
50 153 79 8 4 1
81 154 80 7 3 1
The features are age, height, wieght, working_hour and rest_hour. Target column is Diabetic. Here I have 5 features and I want to use less features. That is why I want to implement autoencoder to select the best features for the prediction.

Generally it is not possible with a vanilla autoencoder (AE). An AE performs a non-linear mapping to a hidden dimension and back to the original. However, you have no chance of interpreting this mapping. You could use contrained AEs, but i would not recommend it, when you work for the first time with AEs.
However, you just want a reduction of the input dimension. What you can do is to train an embedding. You train the AE with the desired number of nodes in the bottleneck and use the output of the encoder as input for your other algorithm.
You can split the AE into to two functions: encoder (E) and decoder (D). Your forward propagation is then D(E(x)), when x is your input. After you finished training the AE (with a reasonable reconstruction error!), you predict only E(x) and feed it tour your other algorithm.
Another way would be a PCA, which is basically a linear AE. You can define a maximum number of hidden dimensions and evaluate their stake on the reconstruction error. Furthermore, it is much easier to implement and you do not need knowledge of tensorflow or pytorch.

Related

Predict Test Sample Response for SVM Regression

I am running test on data samples using the example of SVM Regression Model, in the case of the example given in this MathWorks documentation (link: https://uk.mathworks.com/help/stats/compactregressionsvm.predict.html#buvytaz) the training data needs to have the same number of rows as the predict data, this is required so far to be able to run the prediction. What can I do if my data varies from the number of rows? How can I train my support vector machine with a data that have different number of samples and still be able to predict with the consequence of having maybe bigger error?
Data sample of the training data for the model and the data that I want to use for Mdl = fitrsvm.
ans=10×2 table
Training data Data to predict
___________ ____________
14 9.4833
27 28.938
10 7.765
28
22 21.054
29 31.484
24.5 30.306
18.5
32 28.225
28
Step by step verification of what I wanted to do:
What I did was:
1. Built a model
2. Test it with YFit
3. Modify the table and it did work!.
4. I doubled the size of the table to predict and it did work!.
I did something wrong before.
You can't train your model with unlabelled data, i.e. that has no "predict" value. I would suggest you just filter out all the unlabelled data points and train the model on this subset.
From intuition this just represents the fact, that you cannot learn from these data points. If I want to learn the relationship between age -> income, it does not help me at all to ask someone JUST their age and not their income. The information is useless to answer my question.

YOLOv3 convolutional layers count

I am really confused about the count of the convolutional layers exist in YOLOv3!
According to the paper they are using darknet-53 and they don't mention any further details or addition to that structure!
However, according to the build of AlexeyAB it is composed of 106 layers!
moreover, the towardsdatascience website claims that the additional 53 layers are added for the detection process, but what does that really mean are the first 53 layers only for feature extraction then?
So my question is, what is the matter of these extra unmentioned-in-the-paper 53 layers? where did they come from? and why?
According to AlexeyAB (creator of very popular forked Darknet version) https://groups.google.com/forum/?nomobile=true#!topic/darknet/9WppEzRouMU (This link is deprecated somehow)
Yolo has
75 cnn-layers (convolutional layers) + 31 other layers (shortcut, route, upsample, yolo) = 106 layers in total.
You can count the total of CNN layer in cfg file, there are 75. Also remember that Yolo V3 does detection at 3 different scales, which are at layer 82,94,106.
Darknet-53 is the name of the extractor developed by Joseph Redmon et al., and it does indeed constitute the first 53 layers of YOLOv3. The next 53 layers are dedicated to resizing, concatenation and upsampling the input to prepare them for detection at three different scales at layer 82, 94 and 106 respectively. The first layer detects the largest objects, the second the ones in the middle, and the last layer all that remains (in theory at least).
I think the idea of this hierarchical structure is the further one moves into YOLOv3, the more high-level information it is able to extract.

What is the optimal hidden units size?

Suppose we have a standard autoencoder with three layers (i.e. L1 is the input layer, L3 the output layer with #input = #output = 100 and L2 is the hidden layer (50 units)). I know the interesting part of an autoencoder is the hidden part L2. Instead of passing 100 inputs to my supervised model, it will feed it with 50 inputs. What is the optimal hidden units size? 50 is well, but why not using 51, 52 or 63 hidden units? Does 51 will perform better the supervised model than 50 hidden units?
Suppose now that the number of inputs is 1,000,000. If N is the number of units, then I don't want to test out each possible value for N to find out the optimal N. I thought there exists at least an algorithm to do not be obligated to test each possible value or eliminate some of them.
Could that question help?
There is no rule for it. number of Hidden layer selection is purely based on hit and trial.

Octave/Matlab: Arranging Space-Time Data in a Matrix

This is a question about coding common practice, not a specific error or other malfunctions.
I have a matrix of values of a variable that changes in space and time. What is the common practice, to use different columns for time or space values?
If there is a definite common practice, in the first place
Update: Here is an example of the data in tabular form. The time vector is much longer than the space vector.
t y(x1) y(x2)
1 100 50
2 100 50
3 100 50
4 99 49
5 99 49
6 99 49
7 98 49
8 98 48
9 98 48
10 97 48
It depends on your goal and ultimately doesn't matter that much. This is more the question of your convenience.
If you do care about the performance, there is a slight difference. Your code achieves maximum cache efficiency when it traverses monotonically increasing memory locations. In Matlab data stored column-wise, therefore processing data column-wise results in maximum cache efficiency. Thus, if you frequently access all the data at certain time layers, store space in columns. If you frequently access all the data at certain spatial points, store time in columns.

Predicting patterns in number sequences

My problem is as follows. As inputs I have sequences of whole numbers, around 200-500 per sequence. Each number in a sequence is marked as good or bad. The first number in each sequence is always good, but whether or not subsequent numbers are still considered good is determined by which numbers came before it. There's a mathematical function which governs how the numbers affect those that come after it but the specifics of this function are unknown. All we know for sure is it starts off accepting every number and then gradually starts rejecting numbers until finally every number is considered bad. Out of every sequence only around 50 numbers will ever be accepted before this happens.
It is possible that the validity of a number is not only determined by which numbers came before it, but also by whether these numbers were themselves considered good or bad.
For example: (good numbers in bold)
4 17 8 47 52 18 13 88 92 55 8 66 76 85 36 ...
92 13 28 12 36 73 82 14 18 10 11 21 33 98 1 ...
Attempting to determine the logic behind the system through guesswork seems like an impossible task. So my question is, can a neural network be trained to predict if a number will be good or bad? If so, approximately how many sequences would be required to train it? (assuming sequences of 200-500 numbers that are 32 bit integers)
Since your data is sequential and there is dependency between numbers, it should be possible to train a recurrent neural network. The recurrent weights take care of the relationship between numbers.
As a general rule of thumb, the more uncorrelated input sequences you have, the better it is. This survey article can help you get started with RNN: https://arxiv.org/abs/1801.01078
This is definitely possible. #salehinejad gives a good answer, but you might want to look for specific RNN's, such as the LSTM!
It's very good for sequence prediction. You just feed the network numbers one by one (sequentially).