Is there a way to present results from univariate analysis of multinomial gee model (perhaps using multgee package) using tbl_uvregression from daniel sjobergs gtsummary package?
I found a previous answer for a multinomial model:
multinomial logistic regression results table in wide format using the gtsummary package
Would it be possible to do something like that for the multinomial gee model and while removing NAs from variables when running multiple univariate regressions with tbl_uvregression?
Code for running multinomial gee model from the package document for a nominal multinomial outcome:
library(multgee)
data(housing)
fitmod <- nomLORgee(y ~ factor(time) * sec, data = housing, id = id, repeated = time)
Related
Suppose the linear model I want is sales = (0.4 * age) + (0.05 * income). How do I create this linear regression model in Weka without training on any data? I just want to save a model file that contains the linear relationship that I already know. No training is necessary. Is this possible in the Weka GUI or through the Java API? If so, how?
The Weka classifier MathExpressionClassifier that comes with the ADAMS framework, allows you to do that. You only have to supply the formula after the = as the expression.
Another alternative, if you don't want to switch to ADAMS, is the mxexpression-weka-package library. However, you will need to convert the attribute names in your formula to attX (with X being the 1-based attribute index). This package is also part of ADAMS.
I am classifying gender using a KNN classifier.
I want to add an SVM classifier instead of KNN classifier with the same labels of 0 and 1 (0 for women and 1 for men)
I have a matrix of test examples, sample, a matrix of training examples, training, and a vector with the labels for the training examples group. I want class, a vector of the labels for the test examples.
class = knnclassify(sample, training, group);
if class==1
x='Male';
else
x='Female';
end
How can I change this code to find class using an SVM?
To train an SVM, you will need the Statistics and Machine Learning Toolbox.
The biggest difference between the knnclassify and using an SVM classifier is that training and classifying new labels will be two separate steps.
1. Train your SVM : fitcsvm
This step teaches the classifier how to distinguish between your two classes. It is learning a linear separator (or a weighted combination of the features) which has the largest margin between positive and negative examples. All the examples you give it need to have ground truth labels.
SVM's have many tunable parameters that you can adjust during the training step. There are several good tutorials in the Matlab documentation which describe the differences, but for the most basic version, you can just use your training examples
model = fitcsvm(training,group);
This model will be used in the next step.
2. Classify new examples : predict
To classify your new example, run
class = predict(sample, model);
Notes:
Using your model, you can also run cross-fold validation, useful for accuracy analysis.
cvModel = crossval(model);
classError = kfoldLoss(cvModel);
You can also save your model, like any other Matlab variable for future use.
save('model.m', 'model');
knnclassify comes from the bioinformatics toolbox. In the Statistics and Machine Learning Toolbox, there is also a KNN model which you train with fitcknn and classify with predict. The benefit is that you can reuse your KNN model with several sets of data, compare cross-validation results, and save it for future use.
Why is h2o.randomforest calculating MSE on Out of bag sample and while training for a multinomail classification problem?
I have done binary classification also using h2o.randomforest, there it used to calculate AUC on out of bag sample and while training but for multi classification random forest is calculating MSE which seems suspicious. Please see this screenshot.
My target variable was a factor containing 4 factor levels model1, model2, model3 and model4. In the screenshot you would also a confusion matrix for these factors.
Can someone please explain this behaviour?
Both binomial and multinomial classification display MSE, so you will see it in the Scoring History table for both models (highlighted training_MSE column).
H2O does not evaluate a multinomial AUC. A few evaluation methods exist, but there is not yet a single widely adopted method. The pROC package discusses the method of Hand and Till, but mentions that it cannot be plotted and results rarely tested. Log loss and classification error are still available, specific to classification, as each has standard methods of evaluation in a multinomial context.
There is a confusion matrix comparing your 4 factor levels, as you highlighted. Can you clarify what more you are expecting? If you were looking for four individual confusion matrices, the four-column table contains enough information that they could be computed.
Can you suggest any implementation (matlab) of Multi-class classification algorithm for large database, I tried libsvm it's good except for large database and for the liblinear I can't use it for the multi classification
If you want to use liblinear for multi class classification, you can use one vs all technique. For more information Look at this.
But if you have large database then use of SVM is not recommended. As Run time complexity of SVM is O(N * N * m)
N = number of samples in data
m = number of features in data
So, alternatively You can use Neural Network. You can start with nntool available in MATLAB.
I'm using a binary classification with SVM and MLP for financial data. My input data has 21 features so I used dimensionally reduction methods for reducing the dimension of data. Some dimensionally reduction methods like stepwise regression report best features so I will used these features for my classification mode and another methods like PCA transform data to a new space and I use for instance 60% of best reported columns (features). The critical problem is in the phase of using final model. For example I used the financial data of past year and two years ago for today financial position. So now I want use past and today data to prediction next year. My question is here: Should I use PCA for new input data before inserting to my designed classification model? How can I use (For example Principal component analysis) for this data? I must use it like before? (pca(newdata…)) or there is some results from last PCA that I must use in this phase?
more information :
This is my system structure:
I have a hybrid classification method with optimization algorithm for select best features (inputs) of my model and best parameters of my classification method so for a classification method like MLP I takes long time to optimization with 21 features (beside of this I repeat every iteration of my optimization algorithm 12 times / cross section ) . So I want decrease the features with dimensionally reduction techniques (like PCA, NLPCA or supervised methods like LDA/FDA) before insert it to classification method. For example I’m using this structure of PCA code:
[coeff,score,latent,tsquared,explained,mu] = pca(_)
After that I will use 10 first columns of output (that sorted by PCA function) for input of my classification and optimization model. In final phase I will find the best model parameters with the best combination of inputs. For example my raw data has 21 features. After first phase of using PCA I will choose 10 features and in final model after optimization of my classification model. I will have a model with 5 best chosen features. Now I want use this model with new data. What must I do?
Thank you so much for your kind helps.
You should follow the following steps:
With your training data, create a PCA model
With the PCA of your training data, train your classifier
Apply the first PCA model to your new data
With the PCA of your new data, test the classifier
Here are some code snippets for steps 1 and 3 (2 and 4 depend on your classifier):
%Step 1.Generate a PCA data model
[W, Y] = pca(data, 'VariableWeights', 'variance', 'Centered', true);
%# Getting the correct W, mean and weights of data (for future data)
W = diag(std(data))\W;
[~, mu, we] = zscore(data);
we(we==0) = 1;
%Step 3.Apply the previous data model to a new vector
%# New coordinates as principal components
x = newDataVector;
x = bsxfun(#minus,x, mu);
x = bsxfun(#rdivide, x, we);
newDataVector_PCA = x*W;