How to use AUC packet? to calculate AUC - classification

I train and make prediction with the following example :
=======================
The outputs are :
and :
Now I need to calculate AUC using the AUC package, but I can't quite understand how to do it?
auc(roc(????, ?????))
Thanks
Manel

A possible solution could be :
library(AUC)
auc(roc(testing$Class, prediction$R)) # I get as an answer the "AUC"
result.auc <- auc(prediction$M, testing$Class)
result.roc <- roc(testing$Class, prediction$M)
plot(result.roc, print.thres="best", print.thres.best.method="closest.topleft") # plot the curve
You can comment on the solution, please¡¡¡¡

Related

How to obtain coefficient values from Spark-MLlib Linear Regression model (Scala)?

I'd like to obtain coefficient values of Linear Regression(LR) model in Spark-MLlib. Here I use the 'LinearRegressionWithSGD' to build the model and you can find the sample from the following link:
https://spark.apache.org/docs/2.1.0/mllib-linear-methods.html#regression
I could get the coefficient values from Spark-ML Linear Regression. Please find the reference link from below.
https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#linear-regression
Please help me with this. Thanks in advance !!
Took first lines of model creation from the first link you sent:
val model: LinearRegressionModel = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
.run(training)
// Here are the coefficient and intercept
val weights: org.apache.spark.mllib.linalg.Vector = model.weights
val intercept = model.intercept
val weightsData: Array[Double] = weights.asInstanceOf[DenseVector].values
The last 3 lines are the coefficient and intercept
The type of weights is
: org.apache.spark.mllib.linalg.Vector
That is a wrapper around the Breeze DenseVector

Why `libsvm` in matlab gives me all 1 prediction

I use svm in Rand matlab with the same dataset.
My R code works fine, which gives me some reasonable predictions.
matdat <- readMat(con = "data.mat")
svm.model <- svm(x = matdat$normalize.X, y = matdat$Yt)
pred <- predict(svm.model, newdata = matdat$normalize.X)
pred <- sapply(pred, function(x){ifelse(x > 0, 1, -1)})
sum(pred == matdat$Yt)/length(matdat$Yt)
But, my matlab code gives me all 1 prediction on the training data.
load('data.mat')
model2 = svmtrain(Yt, normalize_X,'-s 3 -c 1 -t 2 -p 0.1');
[predicted_label,accuracy, decision_values] = svmpredict(Yt, normalize_X, model2);
I have checked the default parameters of svm{e1071}, which in my opinion agrees with the matlab version.
I use the e1071 package with verion 1.6-7 in R. And the latest libsvm from the official page.
So, what can I do to find the reason, any ideas?
==== update====
Before feeding the data to libsvm in data, I apply mapstd to normalize the data which is automatically done in R. Then I got the same trained model in both R and Matlab.
In Matlab you use the -s 3 option which is regression, not classification.
As a starting point, don't assume anything about default parameters, just specify parameters explicitly in both R and Matlab.

How do I solve a function for x in Matlab?

I have this function defined:
% Enter the data that was recorded into two vectors, mass and period
mass = 0 : 200 : 1200;
period = [0.404841 0.444772 0.486921 0.522002 0.558513 0.589238 0.622942];
% Calculate a line of best fit for the data using polyfit()
p = polyfit(mass, period, 1);
fit=#(x) p(1).*x + p(2);
Now I want to solve f(x) = .440086, but can't find a way to do this. I know I could easily work it out by hand, but I want to know how to do this in the future.
If you want to solve a linear equation such as A*x+B=0, you can easily solve in MATLAB as follows:
p=[0.2 0.5];
constValue=0.440086;
A=p(1);
B=constValue-p(2);
soln=A\B;
If you want to solve a non-linear system of equations, you can use fsolve as follows (Here I am showing how to use it to solve above linear equation):
myFunSO=#(x)0.2*x+0.5-0.440086; %here you are solving f(x)-0.440086=0
x=fsolve(myFunSO,0.5) %0.5 is the initial guess.
Both methods should give you the same solution.

fitting curve with nonlinear function in matlab

I've a problem using matlab. I need to fit a dataset with a nonlinear function like:
f=alfa*(1+beta*(zeta))^(1/3)
where alfa and beta are the coefficients to be found. I want to use the least squares method. How can I do this with the command lsqcurvefit? Otherwise, there are other ways to solve my problem?
Thank so much.
Here there is the dataset:
zeta val
0.001141174 1.914017718
0.010606563 1.36090774
0.021610291 1.906194276
0.070026172 1.87606762
0.071438139 1.877264055
0.081679327 1.859341737
0.101181292 2.518896436
0.107877774 2.772125094
0.205038829 3.032759627
0.211802706 1.483644094
0.561521724 2.424261001
0.61500615 2.559041397
0.647249191 2.949944577
0.943396226 2.84068921
1.091107474 3.453699422
1.175260761 2.604008404
1.837813003 4.00262983
2.057613169 4.565849247
2.083333333 3.779001445
3.188521323 4.430824069
4.085801839 7.766971568
4.22832981 5.711800741
4.872107186 4.949950059
9.756097561 10.78574156
you have to use the fit-function with fitType=Power2
fitobject = fit(zeta2,val,'Power2')
you can also use the cftool to manually determine your coefficients, especially if you want to keep the (1/3). Maybe Least-Squares is not the best solution for your data, as woodchips said.
be aware that you have to substitute your zeta:
zeta2 = 1+beta*(zeta)
you can determine the coefficients as follows:
coeffvalues(fitobject)

Gamma distribution fit error

For a classification task I want to fit a gamma distribution to two pair of data: Distance population within class and between class. This is to determine the theoretical False Accept and False Reject Rate.
The fit Scipy returns puzzles me tough. A plot of the data is below, where circles denote within class distances and x-es between class distance, the solid line is the fitted gamma within class, the dotted line is the fitted gamma on the between class distance.
What I would have expected is that the gamma curves would peak at around ~10 and ~30, not at 0 for both. Does anyone see what's going wrong here?
This is my code:
pos = [7.4237931034482765, 70.522068965517235, 9.1634482758620681, 22.594137931034485, 7.3003448275862075, 6.3841379310344841, 10.693448275862071, 7.5237931034482761, 7.4079310344827594, 7.2696551724137928, 8.5551724137931036, 17.647241379310344, 7.8475862068965521, 14.397586206896554, 32.278965517241382]
neg = [32.951724137931038, 234.65724137931034, 25.530000000000001, 33.236551724137932, 258.49965517241378, 33.881724137931037, 18.853448275862071, 33.703103448275861, 33.655172413793103, 33.536551724137929, 37.950344827586207, 34.32586206896552, 42.997241379310346, 100.71379310344828, 32.875172413793102, 30.59344827586207, 19.857241379310345, 35.232758620689658, 30.822758620689655, 34.92896551724138, 29.619310344827586, 29.236551724137932, 32.668620689655171, 30.943448275862071, 30.80344827586207, 88.638965517241374, 25.518620689655172, 38.350689655172417, 27.378275862068971, 37.138620689655177, 215.63379310344828, 344.93896551724134, 225.93413793103446, 103.66758620689654, 81.92896551724138, 59.159999999999997, 463.89379310344827, 63.86827586206897, 50.453103448275861, 236.4603448275862, 273.53137931034485, 236.26103448275862, 216.26758620689654, 170.3003448275862, 340.60034482758618]
alpha1, loc1, beta1=ss.gamma.fit(pos, floc=0)
alpha2, loc2, beta2=ss.gamma.fit(neg, floc=0)
plt.plot(pos,[0.06]*len(pos),'ko')
plt.plot(neg,[0.04]*len(neg),'kx')
x = range(200)
plt.plot(x,ss.gamma.pdf(x, alpha1, scale=beta1), '-k')
plt.plot(x,ss.gamma.pdf(x, alpha2, scale=beta2), ':k')
plt.xlim((0,200))
The trick with floc=0 I got from here: Why does the Gamma distribution in SciPy have three parameters? But it does not always force loc1 and loc2 to be 0 :/
(This is really a comment, but I want to show the plot that I get.)
Are you sure you used floc=0 in the fit method when you made the plot? If I leave it out (or if I make the mistake--as I frequently do--of using loc=0 instead of floc=0), I get a plot that looks like the one you included.
Which versions of scipy and numpy are you using?
With scipy 0.12.0 and numpy 1.7.1, your code works for me. I added a couple print statements, and I get:
alpha1 = 1.86456504055 beta1 = 8.47415903767
alpha2 = 1.17943740138 beta2 = 86.51957394
along with the plot: