I'm getting the following error when calling NbClust():
Error in NbClust(data = ds[, sapply(ds, is.numeric)], diss = NULL, distance = "euclidean", : The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.
I've called ds <- ds[complete.cases(ds),] just before running NbClust so there's no missing values.
Any idea what's behind this error?
Thanks
I had same issue in my research.
So, I had mailed to Nadia Ghazzali, who is the package maintainer, and got an answer.
I'll attached my mail and her reply.
my e-mail:
Dear Nadia Ghazzali. Hello Nadia. I have some questions about
NbClust function in R library. I have tried googling but could not
find satisfying answers. First, I’m so grateful for you to making
this awsome R library. It is very helpful for my reasearch. I tested
NbClust function in NbClust library with my own data like below.
> clust <- NbClust(data, distance = “euclidean”,
min.nc = 2, max.nc = 10, method = ‘kmeans’, index =”all”)
But soon, an error has occurred. Error: division by zero! Error in
Indices.WBT(x = jeu, cl = cl1, P = TT, s = ss, vv = vv) : object
'scott' not found So, I tried NbClust function line by line and
found that some indices, like CCC, Scott, marriot, tracecovw,
tracew, friedman, and rubin, were not calculated because of object
vv = 0. I’m not very familiar with argebra so I don’t know meaning
of eigen value. But it seems to me that object ss(which is squart of
eigenValues) should not be 0 after prodected.
So, here is my questions.
I assume that my data is so sparse(a lot of zero values) that sqrt(eigenValues) becomes too small, is that right? I’m sorry I
can’t attach my data but I can attach some part of eigenValues and
squarted eigenValues.
> head(eigenValues)
[1] 0.039769880 0.017179826 0.007011972 0.005698736 0.005164871 0.004567238
> head(sqrt(eigenValues))
[1] 0.19942387 0.13107184 0.08373752 0.07548997 0.07186704 0.06758134
And if my assume is right, what can I do for this problems? Only one
way to drop out 7 indices?
Thank you for reading and I’ll waiting your reply. Best regards!
and her reply:
Dear Hansol,
Thank you for your interest. Yes, your understanding is good.
Unfortunately, the seven indices could not be applied.
Best regards,
Nadia Ghazzali
#seni The cause of this error is data related. If you look at the source code of this function,
NbClust <- function(data, diss="NULL", distance = "euclidean", min.nc=2, max.nc=15, method = "ward", index = "all", alphaBeale = 0.1)
{
x<-0
min_nc <- min.nc
max_nc <- max.nc
jeu1 <- as.matrix(data)
numberObsBefore <- dim(jeu1)[1]
jeu <- na.omit(jeu1) # returns the object with incomplete cases removed
nn <- numberObsAfter <- dim(jeu)[1]
pp <- dim(jeu)[2]
TT <- t(jeu)%*%jeu
sizeEigenTT <- length(eigen(TT)$value)
eigenValues <- eigen(TT/(nn-1))$value
for (i in 1:sizeEigenTT)
{
if (eigenValues[i] < 0) {
print(paste("There are only", numberObsAfter,"nonmissing observations out of a possible", numberObsBefore ,"observations."))
stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.")
}
}
And I think the root cause of this error is the negative eigenvalues that seep in when the number of clusters is very high, i.e. the max.nc is high. So to solve the problem, you must look at your data. See if it got more columns then rows. Remove missing values, check for issues like collinearity & multicollinearity, variance, covariance etc.
For the other error, invalid clustering method, look at the source code of the method here. Look at line number 168, 169 in the given link. You are getting this error message because the clustering method is empty. if (is.na(method))
stop("invalid clustering method")
So I am creating an app to find a value based on several inputs but hit an error as one of the outputs won't show.
This is the app layout
1
The problem is the total cost section won't show the value when I clicked the calculate button. The 'Q Optimal' works just fine.
2
the formula associated with the button on the right side looks like this:
dm=app.MinimumDemandEditField.Value;
dM=app.MaximumDemandEditField.Value;
tm=app.MinimumLeadTimeEditField.Value;
tM=app.MaximumLeadTimeEditField.Value;
r1=app.ReorderLevelEditField.Value;
Et = 0.5*(tm+tM);
vart = 1/12*(tM-tm)^2;
Ed = 0.5*(dm+dM);
vard = 1/12*(dM-dm)^2;
ED = 1/4*(dm+dM)*(tm+tM);
varD = 1/144*(3*(dm+dM)^2*(tM-tm)^2+3*(dM-dm)^2*(tm+tM)^2+(tM-
tm)^2*(dM-dm)^2);
gt = 1/(tM-tm);
fd = 1/(dM-dm);
fD = 1/((dM-dm)*(tM-tm));
f1=app.FixedCostEditField.Value;
c1=app.VariableCostEditField.Value;
h=app.HoldingCostEditField.Value;
s=50*c1;
app.ShortageCostEditField.Value = s
A1=c1+(h/Ed)*(r1-ED);
A2=fD*(r1*(tM-tm)*log(r1/(tM*dm))-(r1^2/dM)+(r1*tM)-
(r1*tm)*log((dM*tM)/r1)-Et);
syms x;
f=(x-r1)*fD;
EB= int(f,r1,dM*tM);
A3=Ed*f1+h*Ed*(fD*((r1^2*tm/2)-(dm*r1/2)*(tM^2-tm^2)+(dm^2/6)*
(tM^3-tm^3)-((r1^3/6*dM)-(dM*r1*tm^2/2)+(dM^2*tm^3/6)))+(fD/18)*
(tM^3-tm^3)*(dM^3-dm^3)-r1*ED+Ed*s*EB);
Q=(1/h)*((Ed*(A1+h*A2-c1)+(h*(ED-r1))));
Eoh=fD*((((r1^3*tM)/2)-(((dm*r1)/2)*(tM^2-tm^2))+(((dm^2)/6)*
(tM^3-tm^3))-((r1^3)/(6*dM))-((dM*r1*tm^2)/2)+((dM^2*tm^3)/6))+
((Q^2)/2*Ed)-(Q*ED/Ed)+((fD/(18*Ed))*((tM^3-tm^3)*(dM^3-dm^3)))+
(Q*r1/Ed)-(r1*ED/Ed));
TC= f1+c1*Q+h*Eoh+s*EB;
app.QOptimalEditField.Value = Q
app.TotalCostEditField.Value = TC
Running this gives the error:
3
I suspect the problem is with my integration process. Have I missed something or is there a better way to do this?
Thank you in advance
Regards,
Kevin Renard
I solved the problem as I noticed that the last error notification states that the input value must be a double scalar and the value of the integration process is not a double scalar, so I revised the integration code into:
syms x;
f=(x-r1)*fD;
EB= double(int(f,r1,dM*tM));
1
I am working on a tree regression. Everything works fine with my code but I don't get the predicted values at all. Instead I get all values for my y variable (response variable). Here's the code:
Separating in train and test set for data
`sample = sample.split(Data, SplitRatio = .80)
train = subset(Data, sample == TRUE)
test = subset(Data, sample == FALSE)
varYTrain <- train[c(3)]
varYTest <- test[c(3)]
varXTrain <- train[c(5:27)]
varXTest <- test[c(5:27)]`
Model
`x <- cbind(varXTrain,varYTrain)
fit <- rpart(as.matrix(varYTrain) ~ ., data = x, method="class")
summary(fit)`
This one doesn't work as I don't get predictions based on test data set for an unknown reason
`predicted <- predict(fit, data=varXTest)
summary(predicted)`
I would also like in the end for the output to compare predicted values to real values in my dataset, can I do that?
Thank you very much and don't hesitate to ask me a question if I am not clear enough it's my first time posting.
Cheers
I want to compute the following sum :
I have tried using the following code :
I = imread('C:\Users\Billal\Desktop\image.png');
[x,y,z]=size(I);
x=(1:x) ;
y=(1:y) ;
z=(1:z) ;
Fx=ones(size(x));
Fy=ones(size(y));
Fz=ones(size(z));
X=x*Fy';
Y=Fx*y';
Z=z*Fz';
f=I(X,Y,Z);
sum1 = sum(f(:));
[x1,y1,z1]=size(I);
total = sum1/(x1*y1*z1);
But the result is 0 . I could not figure out where is the problem ? I am following this tutorial .
https://www.mathworks.com/matlabcentral/newsreader/view_thread/126366
Please help me to solve this question .
You can do this in a single step:
result=1/prod(size(I))* sum(I(:));
In the end, the equation just adds up values of the whole image.
The question you link to needs to sum over values of x and y. You don't, you just need to sum over indexes, thus there is no need of all those Fx,Fy things
I am having problems doing a prediction with decision trees (CART).
I have this code:
training <- read.csv("pml-training.csv", header=TRUE)
set.seed(1972)
inTrain <- createDataPartition(y=training2$classe, p=0.6, list=FALSE)
wk_training <- training2[inTrain,]
wk_testing <- training2[-inTrain,]
wk_trainng dataset has 11776 vars and wk_testing 7846.
set.seed(1972)
model_dt <- train(wk_training$classe ~ ., data = wk_training, method="rpart")
print(model_dt, digits=3)
Run against wk_testing
predictions_dt <- predict(model_dt, newdata=wk_testing)
Then I expect predictions_dt to have 7846 rows as it has wk_testing,
but predictions_dt has only 165 rows ????
I don't know what I am doing wrong...
Can anybody help me?
Thanks in advance
If you have missing values, the predict function defaults to na.action = na.omit. You can test to see if this is the issue using na.action = na.fail. If this is the case, you might want to impute. See the preProcess option in train.