SHAP scatter plot for LightGBM classifier showing global contributions without interactions - classification

I have a binary classification problem for which I have developed a LightGBM classifier. I would like to plot global SHAP contributions for the X most important features, as in the scatter plot documentation the code shap.plots.scatter(shap_values[:, shap_values.abs.mean(0).argsort[-1]])
produces this plot
How could I get the desired plot for some of my features?
My attempt is as follows:
clf_final = lightgbm.LGBMClassifier(**RSCV.best_estimator_.get_params())
clf_final.fit(X_train, y_train)
# compute SHAP values
explainer_LGB = shap.TreeExplainer(clf_final)
shap_values_LGB = explainer_LGB.shap_values(X_train)
shap.plots.scatter(shap_values_LGB[0:,"pet_mean_lag_t-6"])
But it produces the error message: TypeError: list indices must be integers or slices, not tuple

Related

Plot Loss Function in Weka using Neuronal Perceptron

I want to plot my training and validation loss curves to visulize the model performance. How can I plot two curves in Weka (using Java)?
I tried using SGD:
`SGD p = new SGD();
p.setDontNormalize(true);
p.setDontReplaceMissing(true);
p.setEpochs(1);
p.setLearningRate(0.001);`
I tried using:
`System.out.println(eval.errorRate());`
But I could not find out to get the values to show me the curves.

How to generate 3d scatter plot with different colours for labels in MATLAB?

I have used TSNE in MATLAB for dimensionality reduction of a large data. I have been able to generate the Scatter Plot for TSNE in 2 dimensions which shows the labels of the cluster in different colors for each label, but, I am unable to do so in 3D. Referring to https://uk.mathworks.com/help/stats/tsne.html, I used the following syntax:-
Where, merged_data_all is a 21392x1974 table with the last column named FunctionalGroup containing different labels (similar to the Fisheriris species labels in the Mathworks example on tsne). Y2 is the 3 dimensional variable which I have been successfully able to generate of dimensions 21392 x 3 double.
figure
v = double(categorical(merged_data_all.FunctionalGroup));
c = full(sparse(1:numel(v),v,ones(size(v)),numel(v),3));
scatter3(Y2(:,1),Y2(:,2),Y2(:,3),15,c,'filled')
title('3-D Embedding')
view(-50,8)
When I use this code, I get the error "Error using sparse- Index exceeds array bounds". I even tried to use a modified version of the code and doing something like this
scatter3(Y(:,1), Y(:,2),Y(:,3),merged_data_all.FunctionalGroup)
In case of this, I get the error "Error using scatter3- Input arguments must be numeric, datetime or categorical". I am quite confused as to how I can plot a 3d scatter plot, with 14 different colors (for the 14 types of different labels I have in my FunctionalGroup column of merged_data_all). Any help in this regard would be highly appreciated. Thanks

scipy kde contour to probability

I have created a bivariate gaussian using the scipy.state kde library:
k = kde.gaussian_kde(data.T)
However I don't understand how the contours or z values relate to the probability of a new data point falling within those contours,
ngenerated distribution with contours
I would like to be able to define a contour with the equivalent of p=0.001, or put another way a contour defining 99.9% of expected observations.

How to propagate error when using scipy quad on a spline of data with measurement error?

I have a data set with N points which I fit a spline to and integrate using scipy.integrate.quad. I would like to use the N associated measurement errors to put an error estimate on the final integral value.
I originally tried to use the uncertainties package but the x+/-stddev objects did not work with scipy.
def integrand(w_point, x, y):
#call spline function to get data arbitrary points
f_i = spline_flux_full(x, y, w_point)
#use spline for normalizing data at arbitrary points
f_i_continuum = coef_continuum(w_point)
#this is the integrand evaluated at w_point
W_i = 1.-(f_i/f_i_continuum)
return(W_i)
Have any ideas?
Synthetic datasets. You have your data points with errors. Now generate 1000 datasets with each point drawn from a normal distribution centered around the measured point and standard deviation given by an errror at this point. Fit each dataset. Integrate. Repeat. Now you have 1000 values of the integral. Compute the mean and std dev of these values.

How to use matlab contourf to draw two-dimensional decision boundary

I finished an SVM training and got data like X, Y. X is the feature matrix only with 2 dimensions, and Y is the classification labels. Because the data is only in two dimensions, so I would like to draw a decision boundary to show the surface of support vectors.
I use contouf in Matlab to do the trick, but really find it hard to understand how to use the function.
I wrote like:
#1 try:
contourf(X);
#2 try:
contourf([X(:,1) X(:,2) Y]);
#3 try:
Z(:,:,1)=X(Y==1,:);
Z(:,:,2)=X(Y==2,:);
contourf(Z);
all these things do not correctly. And I checked the Matlab help files, most of them make Z as a function, so I really do not know how to form the correct Z matrix.
If you're using the svmtrain and svmclassify commands from Bioinformatics Toolbox, you can just use the additional input argument (...'showplot', true), and it will display a scatter plot with a decision boundary and the support vectors highlighted.
If you're using your own SVM, or a third-party tool such as libSVM, what you probably need to do is to:
Create a grid of points in your 2D input feature space using the meshgrid command
Classify those points using your trained SVM
Plot the grid of points and the classifications using contourf.
For example, in kind-of-MATLAB-but-pseudocode, assuming your input features are called X1 and X2:
numPtsInGrid = 100;
x1Range = linspace(x1lower, x1upper, numPtsInGrid);
x2Range = linspace(x2lower, x2upper, numPtsInGrid);
[X1, X2] = meshgrid(x1Range, x2Range);
Z = classifyWithMySVMSomehow([X1(:), X2(:)]);
contourf(X1(:), X2(:), Z(:))
Hope that helps.
I know it's been a while but I will give it a try in case someone else will come up with that issue.
Assume we have a 2D training set so as to train an SVM model, in other words the feature space is a 2D space. We know that a kernel SVM model leads to a score (or decision) function of the form:
f(x) = sumi=1 to N(aiyik(x,xi)) + b
Where N is the number of support vectors, xi is the i -th support vector, ai is the estimated Lagrange multiplier and yi the associated class label. Values(scores) of decision function in way depict the distance of the observation x frοm the decision boundary.
Now assume that for every point (X,Y) in the 2D feature space we can find the corresponding score of the decision function. We can plot the results in the 3D euclidean space, where X corresponds to values of first feature vector f1, Y to values of second feature f2, and Z to the the return of decision function for every point (X,Y). The intersection of this 3D figure with the Z=0 plane gives us the decision boundary into the two-dimensional feature space. In other words, imagine that the decision boundary is formed by the (X,Y) points that have scores equal to 0. Seems logical right?
Now in MATLAB you can easily do that, by first creating a grid in X,Y space:
d = 0.02;
[x1Grid,x2Grid] = meshgrid(minimum_X:d:maximum_X,minimum_Y:d:maximum_Y);
d is selected according to the desired resolution of the grid.
Then for a trained model SVMModel find the scores of every grid's point:
xGrid = [x1Grid(:),x2Grid(:)];
[~,scores] = predict(SVMModel,xGrid);
Finally plot the decision boundary
figure;
contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k');
Contour gives us a 2D graph where information about the 3rd dimension is depicted as solid lines in the 2D plane. These lines implie iso-response values, in other words (X,Y) points with same Z value. In our occasion contour gives us the decision boundary.
Hope I helped to make all that more clear. You can find very useful information and examples in the following links:
MATLAB's example
Representation of decision function in 3D space