Linear regression gives slightly better RMSE than the random forest regression's RMSE - linear-regression

RMSE of Random forest regression : 85.66
RMSE of linear regression : 85.62
I only have five features and around 3800 observations.
I read that most of cases random forest gives better result than linear regression.
But I got similar results for these two cases.
When does linear regression give slightly better result?
What can I exmaine more to find out the reason why linear regression gives slightly better RMSE?

Related

Are word2vec embeddings the same if i re-train on the same sentences?

If I give the same sentences to a word2vec model and train it 2 different times (of course with the same vector size), do I obtain the same embeddings for words?
There are several stochastic processes during word2vec training. First, the embeddings are randomly initialized, second, negative sampling is used to approximate the denominator in the softmax term. Only if those random processes, start with the same seed, the vectors will be exactly the same.
Otherwise, the training will converge to totally different vectors, however, the distances between the vectors will always be approximately the same.

Linear Regression using Neural Network

I am working on a regression problem with the following sample training data .
As shown I have an input of only 4 parameters with only one of them changing which is Z so the rest have no real value while an output of 124 parameters denoted from O1 to O124
Noting that O1 changes with a constant rate of 20 [1000 then 1020 then 1040 ...] while O2 changes with a different rate which is 30 however still constant and same applies for all the 124 outputs ,all changes linearily in a constant way.
I believed it's a trivial problem and a very simple neural network model will reach a 100% accuracy on testing data but the results were the opposite.
I reached 100% test accuracy using a linear regressor and 99.99997% test accuracy using KNN regressor
I reached 41% test data accuracy in a 10 layered neural network using relu activation while all the rest activation functions failed and shallow relu also failed
Using a simple neural network with linear activation function and no hidden layers I reached 92% on the test data
My Question is how can I get the neural network to get 100% on test data like the linear Regressor?
It is supposed that using a shallow network with linear activation to be equivilant to the linear regressor but the results are different ,am I missing something ?
If you use linear activation a deep model is in principle the same as a linear regression / a NN with 1 layer. E.g a deep NN with linear activation the prediction is given as y = W_3(W_2(W_1 x))), which can be rewritten as y = (W_3 (W_2 W_1))x, which is the same as y = (W_4 x), which is a linear Regression.
Given that check if your NN without a hidden layer converges to the same parameters as your linear regression. If this is not the case then your implementation is probably wrong. If this is the case, then your larger NN probably converges to some solution to the problem, where the test accuracy is just worse. Try different random seeds then.

How to train large dataset for classification in MATLAB

I have a large features dataset of around 111 Mb for classification with 217000 data points and each point has 1760000 features point. When used in training with SVM in MATLAB, it takes a lot of time.
How can be this data processed in MATLAB.
It depends on what sort of SVM you are building.
As a rule of thumb, with such big feature sets you need to look at linear classifiers, such as an SVM with no/the linear kernel, or logistic regression with various regularizations etc.
If you're training an SVM with a Gaussian kernel, the training algorithm has O(max(n,d) min (n,d)^2) complexity, where n is the number of examples and d the number of features. In your case it ends up being O(dn^2) which is quite big.

Evaluating performance of Neural Network embeddings in kNN classifier

I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture).
The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data.
Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AUC improves at smaller values of k but saturates at the similar values.
What could be a possible explanation of this behaviour?
[Reposted from CrossValidated]
To cross check if imbalanced classes are an issue, try fitting a SVM model. If that gives a better classification(possible if your ANN is not very deep) it may be concluded that classes should be balanced first.
Also, try some kernel functions to check if this transformation makes data linearly separable?

Non-linear classification vs regression with FFANN

I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?