Scipy sparse matrix dimension issue - scipy

I was working on a simple MultiOutputRegressor model with KNeighborsRegressor. My X_train, X_test, y_train, y_test are in Scipy sparse matrix data type. I have 1185 features and 46 targets to predict.
from sklearn.multioutput import MultiOutputRegressor
from sklearn.neighbors import KNeighborsRegressor
kreg = MultiOutputRegressor(KNeighborsRegressor())
# fit model
kreg.fit(X_train, y_train)
>>> MultiOutputRegressor(estimator=KNeighborsRegressor())
kreg.predict(X_test)
after kreg.predict(X_test) I got an error messages with the last one says
~/opt/anaconda3/envs/data3/lib/python3.8/site-packages/scipy/sparse/_index.py in >getitem(self, key)
62 return self._get_arrayXint(row, col)
63 elif isinstance(col, slice):
---> 64 raise IndexError('index results in >2 dimensions')
65 elif row.shape[1] == 1 and col.ndim == 1:
66 # special case for outer indexing
IndexError: index results in >2 dimensions
Where did I do wrong?
Thanks.

Turns out I shouldn't have my labels, i.e. y_train and y_test in sparse matrix data type. Once I kept them as Numpy array, the code worked.

Related

Why confusion matrix shows different results from random undersampling class distribution?

I have an imbalanced dataset that consists of 17 numerical features and 3 classes for output. I applied random undersampling and obtained the following confusion matrix with undersampling.
My question; when random undersampling show 33 numbers for each class why the confusion matrix shows more than 33?
#Raw Data Distribution
layers_counts=y.value_counts()
layers_counts
#Output
2 498
1 116
0 39
from imblearn.under_sampling import RandomUnderSampler
rus=RandomUnderSampler(sampling_strategy="not minority")
X_rus, y_rus = rus.fit_resample(Xtrain, ytrain)
y_rus.value_counts()
#Output
0 33
1 33
2 33
from sklearn.linear_model import LogisticRegression
classifier=LogisticRegression()
classifier.fit(X_rus, y_rus)
ypred=classifier.predict(Xtest)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix (y_test,ypred)
cm_df2 = pd.DataFrame(cm,
index = ['VCS','VSG','VG'],
columns = ['VCS','VSG','VG'])
plt.figure(figsize=(8,6))
sns.heatmap(cm_df2, annot=True)
plt.title('Confusion Matrix')
plt.ylabel('Actal Values')
plt.xlabel('Predicted Values')
plt.show()
When the rus provided 33 numbers for each class the confusion matrix is shown in the following, but I think it should be matched with 33? I am confused about that point, could you help me to understand?

Accessing a data matrix using indices stored at another matrix

In matlab, I commonly have a data matrix of size NxMxLxK which I wish to index along specific dimension (e.g. the forth) using an indices matrix of size NxMxL with values 1..K (assume the are all in this range):
>>> size(Data)
ans =
7 22 128 40
>>> size(Ind)
ans =
7 22 128
I would like to have code without loops which achieve the following effect:
Result(i,j,k) = Data(i,j,k,Ind(i,j,k))
for all values of i,j,k in range.
You can vectorize your matrices and use sub2ind :
% create indices that running on all of the options for the first three dimensions:
A = kron([1:7],ones(1,22*128));
B = repmat(kron([1:22],ones(1,128)),1,7);
C = repmat([1:128],1,7*22);
Result_vec = Data(sub2ind(size(Data),A,B,C,Ind(:)'));
Result = reshape(Result_vec,7,22,128);

MatLab split character matrix

I'm quite new to programming (MatLab) and I have a question.
I have a character matrix, consisting out of 500 rows and 81 columns. I would like
to transform this matrix into a vector with 500 rows. Each row having 81 characters.
If i try the following:
for i = 1:length(CharMatrix)
CharVect(i) = CharMatrix(i,:)
end
it gives the error: "Subscripted assignment dimension mismatch"
What am I doing wrong?
(given your clarifications), this might be the solution for you:
res = zeros(length(CharMatrix),1)
for i=1:length(CharMatrix)
res(i) = str2num(CharMatrix(i,:))
end
no need to create CharVect explicitly.

How can I get a SVM that has been trained on a bigger matrix to classify a different size matrix

I am training a one vs all svm classifier. I used a 200 by 459 matrix to train the classifier using VLFeat svm classifier. (http://www.vlfeat.org/matlab/vl_svmtrain.html)
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
where train_image_feats' is a 200 by 459 matrix, and tmp' is the label matrix which is 1 by 459 vector.
The above command trains the svm with no problem, but then to classify the scores obtained on the test matrix I get an error. The test matrix is obviously not of the same size as that of the training matrix.
scores(i, :) = W'*test_image_feats' + B;
Where test_image_feats' is a 200 by 90 matrix. scores is a 9 by 459 matrix. 9 Because there are 9 categories(labels) to classify and 459 are the number of training images.
The above command gives the error:
Subscripted assignment dimension mismatch.
Error in svm_classify (line 56) scores(i, :) = W'*test_image_feats'
+ B;
Edit: Full code added..
categories = unique(train_labels);
num_categories = length(categories);
scores = zeros([num_categories size(train_labels, 1)]); %train_labels is 459 by 1 size
for i=1:num_categories %there are 9 categories
tmp = strcmp(train_labels, categories{i});
tmp = tmp - (1-tmp);
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
scores(i, :) = W'*test_image_feats' + B;
end
predicted_categories = cell(size(train_labels));
parfor i=1:size(test_image_feats,1)
image_scores = scores(:, i);
label_index = find(image_scores==max(image_scores));
predicted_categories{i}=categories(label_index);
end
Conceptually you are training a model with 459 training samples to predict the scores of 90 test samples.
scores = zeros([num_categories size(train_labels, 1)]);
isn't right as it will be the size of the training set. In fact you don't have to care at all about the size of the training set, you could train the model with 20 or 20000 images the prediction step shouldn't be any different.
scores have to be defined with the test case in mind
scores = zeros([num_categories size(test_labels, 1)]);
When you used 459 for both it only worked because size(test_labels, 1) was equal to size(train_labels, 1)
The problem is not with your right hand side of the assignment, but with score(i,:): you are trying to assign a 9-by-90 size matrix into a single row of score - this simply won't fit.

Find extremum of multidimensional matrix in matlab

I am trying to find the Extremum of a 3-dim matrix along the 2nd dimension.
I started with
[~,index] = max(abs(mat),[],2), but I don't know how to advance from here. How is the index vector to be used together with the original matrix. Or is there a completely different solution to this problem?
To illustrate the task assume the following matrix:
mat(:,:,1) =
23 8 -4
-1 -26 46
mat(:,:,2) =
5 -27 12
2 -1 18
mat(:,:,3) =
-10 49 39
-13 -46 41
mat(:,:,4) =
30 -24 18
-40 -16 -36
The expected result would then be
ext(:,:,1) =
23
-46
ext(:,:,2) =
-27
18
ext(:,:,3) =
49
-46
ext(:,:,4) =
30
-40
I don't know how to use the index vector with mat to get the desired result ext.
1) If you want to find a maximum just along, let's say, 2d dimension, your variable index will be a matrix having dimensions (N,1,M), where N and M are number of elements of your matrix in the first and third dimensions respectively. In order to remove dummy dimensions, there is function squeeze() exist: index=squeeze(index) After that size(index) gives N,M
2) Depending on your problem, you probably need matlab function ind2sub(). First, you take a slice of your matrix, than find its maximum with linear indexing, and than you can restore your indicies with int2sub(). Here is an example for a 2D matrix:
M = randn(5,5);
[C,I] = max(M(:));
[index1,index2] = ind2sub(size(M),I);
Same method allows to find the absolute maximal element in whole 3D matrix.
Use ndgrid to generate the values along dimensions 1 and 3, and then sub2ind to combine the three indices into a linear index:
[~, jj] = max(abs(mat),[],2); %// jj: returned by max
[ii, ~, kk] = ndgrid(1:size(mat,1),1,1:size(mat,3)); %// ii, kk: all combinations
result = mat(sub2ind(size(mat), ii, jj, kk));
A fancier, one-line alternative:
result = max(complex(mat),[],2);
This works because, acccording to max documentation,
For complex input A, max returns the complex number with the largest complex modulus (magnitude), computed with max(abs(A)).