SHAP explanation for the inputs with different types in CNN models - shap

I have a question to use SHAP to explain the result of my CNN model. My CNN models take 2 inputs with different types. One is an image, and another is a feature vector. I trained and tested the model by taking both of them into account. No problem with establishing the model.
When I tried to use SHAP to explain the result for those two inputs simultaneously, it doesn't work. I actually have tried both deepexplainer and gradientexplainer. The error I got is below:
File "", line 1, in
shap_values = explainer.shap_values([x_test[:3], feature_test[:3]])
File "C:\Users\kaz10003\AppData\Local\Continuum\anaconda3\lib\site-> > packages\shap\explainers\deep_init_.py", line 119, in shap_values
return self.explainer.shap_values(X, ranked_outputs, output_rank_order)
File "C:\Users\kaz10003\AppData\Local\Continuum\anaconda3\lib\site-> packages\shap\explainers\deep\deep_tf.py", line 284, in shap_values
diffs = model_output[:, l] - self.expected_value[l] - > output_phis[l].sum(axis=tuple(range(1, output_phis[l].ndim)))
AttributeError: 'list' object has no attribute 'sum'
Anybody has any idea to know if SHAP supports such implementation? Here is my code:
n_features = 10
input_feat = Input((n_features,))
input_tensor = Input(shape=(50,60, 1))
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (input_tensor)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)
f_repeat = RepeatVector(6*7)(input_feat)
f_conv = Reshape((6, 7, n_features))(f_repeat)
p3_feat = concatenate([p3, f_conv], -1)
c3 = Flatten()(p3_feat)
c3 = Dense(512)(c3)
outputs = Dense(2, activation='softmax')(c3)
model = Model(inputs=[input_tensor, input_feat], outputs=[outputs])
model.summary()
explainer = shap.GradientExplainer(model, [x_train, feature_train])
shap_values = explainer.shap_values([x_test[:3], feature_test[:3]])

Related

Machine Translation FFN : Dimension problem due to window size

this is my first time creating a FFN to train it to translate French to English using word prediction:
Input are two arrays of size 2 x window_size + 1 from source language and window_size target language. And the label of size 1
For e.g for window_size = 2:
["je","mange", "la", "pomme","avec"]
and
["I", "eat"]
So the input of size [5] and [2] after concatenating => 7
Label: "the" (refering to "la" in French)
The label is changed to one-hot-encoding before comparing with yHat
I'm using unique index for each word ( 1 to len(vocab) ) and train using the index (not the words)
The output of the FFN is a probability of the size of the vocab of the target language
The problem is that the FFN doesn't learn and the accuracy stays at 0.
When I print the size of y_final (target probability) and yHat (Model Hypo) they have different dimensions:
yHat.size()=[512, 7, 10212]
with 64 batch_size, 7 is the concatenated input size and 10212 size of target vocab, while
y_final.size()= [512, 10212]
And over all the forward method I have these sizes:
torch.Size([512, 5, 32])
torch.Size([512, 5, 64])
torch.Size([512, 5, 64])
torch.Size([512, 2, 256])
torch.Size([512, 2, 32])
torch.Size([512, 2, 64])
torch.Size([512, 2, 64])
torch.Size([512, 7, 64])
torch.Size([512, 7, 128])
torch.Size([512, 7, 10212])
Since the accuracy augments when yHat = y_final then I thought that it is never the case because they don't even have the same shapes (2D vs 3D). Is this the problem ?
Please refer to the code and if you need any other info please tell me.
The code is working fine, no errors.
trainingData = TensorDataset(encoded_source_windows, encoded_target_windows, encoded_labels)
# print(trainingData)
batchsize = 512
trainingLoader = DataLoader(trainingData, batch_size=batchsize, drop_last=True)
def ffnModel(vocabSize1,vocabSize2, learningRate=0.01):
class ffNetwork(nn.Module):
def __init__(self):
super().__init__()
self.embeds_src = nn.Embedding(vocabSize1, 256)
self.embeds_target = nn.Embedding(vocabSize2, 256)
# input layer
self.inputSource = nn.Linear(256, 32)
self.inputTarget = nn.Linear(256, 32)
# hidden layer 1
self.fc1 = nn.Linear(32, 64)
self.bnormS = nn.BatchNorm1d(5)
self.bnormT = nn.BatchNorm1d(2)
# Layer(s) afer Concatenation:
self.fc2 = nn.Linear(64,128)
self.output = nn.Linear(128, vocabSize2)
self.softmaaax = nn.Softmax(dim=0)
# forward pass
def forward(self, xSource, xTarget):
xSource = self.embeds_src(xSource)
xSource = F.relu(self.inputSource(xSource))
xSource = F.relu(self.fc1(xSource))
xSource = self.bnormS(xSource)
xTarget = self.embeds_target(xTarget)
xTarget = F.relu(self.inputTarget(xTarget))
xTarget = F.relu(self.fc1(xTarget))
xTarget = self.bnormT(xTarget)
xCat = torch.cat((xSource, xTarget), dim=1)#dim=128 or 1 ?
xCat = F.relu(self.fc2(xCat))
print(xCat.size())
xCat = self.softmaaax(self.output(xCat))
return xCat
# creating instance of the class
net = ffNetwork()
# loss function
lossfun = nn.CrossEntropyLoss()
# lossfun = nn.NLLLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learningRate)
return net, lossfun, optimizer
def trainModel(vocabSize1,vocabSize2, learningRate):
# number of epochs
numepochs = 64
# create a new Model instance
net, lossfun, optimizer = ffnModel(vocabSize1,vocabSize2, learningRate)
# initialize losses
losses = torch.zeros(numepochs)
trainAcc = []
# loop over training data batches
batchAcc = []
batchLoss = []
for epochi in range(numepochs):
#Switching on training mode
net.train()
# loop over training data batches
batchAcc = []
batchLoss = []
for A, B, y in tqdm(trainingLoader):
# forward pass and loss
final_y = []
for i in range(y.size(dim=0)):
yy = [0] * target_vocab_length
yy[y[i]] = 1
final_y.append(yy)
final_y = torch.tensor(final_y)
yHat = net(A, B)
loss = lossfun(yHat, final_y)
################
print("\n yHat.size()")
print(yHat.size())
print("final_y.size()")
print(final_y.size())
# backprop
optimizer.zero_grad()
loss.backward()
optimizer.step()
# loss from this batch
batchLoss.append(loss.item())
print(f'batchLoss: {loss.item()}')
#Accuracy calculator:
matches = torch.argmax(yHat) == final_y # booleans (false/true)
matchesNumeric = matches.float() # convert to numbers (0/1)
accuracyPct = 100 * torch.mean(matchesNumeric) # average and x100
batchAcc.append(accuracyPct) # add to list of accuracies
print(f'accuracyPct: {accuracyPct}')
trainAcc.append(np.mean(batchAcc))
losses[epochi] = np.mean(batchLoss)
return trainAcc,losses,net
trainAcc,losses,net = trainModel(len(source_vocab),len(target_vocab), 0.01)
print(trainAcc)

One-hot vector prediction always returns the same value

My deep neural network returns the same output for every input. I tried (with no luck) different variations of:
loss
optimizer
network topology / layers types
number of epochs (1-100)
I have 3 outputs (one-hot) and for every input output they are like (it changes after every training):
4.701869785785675049e-01 4.793547391891479492e-01 2.381391078233718872e-01
This problem happens probably because of highly random nature of my training data (stock prediction).
The data set is also heavily shifted towards one of the answers (that's why I used sample_weight - calculated proportionally).
I think I can rule out overfitting (it happens even for 1 epoch and I have dropout layers).
One of the examples of my network:
xs_conv = xs.reshape(xs.shape[0], xs.shape[1], 1)
model_conv = Sequential()
model_conv.add(Conv1D(128, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Conv1D(64, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Flatten())
model_conv.add(Dense(128, activation='relu'))
model_conv.add(Dropout(0.4))
model_conv.add(Dense(3, activation='sigmoid'))
model_conv.compile(loss='mean_squared_error', optimizer='nadam', metrics=['accuracy'])
model_conv.fit(xs_conv, ys, epochs=10, batch_size=16, sample_weight=sample_weight, validation_split=0.3, shuffle=True)
I would understand if the outputs were random, but what happens seems very peculiar. Any ideas?
Data: computed.csv
Whole code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Input, Dense, Conv1D, Dropout, MaxPooling1D, Flatten
from keras.models import Model, Sequential
from keras import backend as K
import random
DATA_DIR = '../../Data/'
INPUT_DATA_FILE = DATA_DIR + 'computed.csv'
def get_y(row):
profit = 0.010
hot_one = [0,0,0]
hot_one[0] = int(row.close_future_5 >= profit)
hot_one[1] = int(row.close_future_5 <= -profit)
hot_one[2] = int(row.close_future_5 < profit and row.close_future_10 > -profit)
return hot_one
def rolling_window(window, arr):
return [np.array(arr[i:i+window]).transpose().flatten().tolist() for i in range(0, len(arr))][0:-window+1]
def prepare_data(data, widnow, test_split):
xs1 = data.iloc[:,1:26].as_matrix()
ys1 = [get_y(row) for row in data.to_records()]
xs = np.array(rolling_window(window, xs1)).tolist()
ys = ys1[0:-window+1]
zipped = list(zip(xs, ys))
random.shuffle(zipped)
train_size = int((1.0 - test_split) * len(data))
xs, ys = zip(*zipped[0:train_size])
xs_test, ys_test = zip(*zipped[train_size:])
return np.array(xs), np.array(ys), np.array(xs_test), np.array(ys_test)
def get_sample_weight(y):
if(y[0]): return ups_w
elif(y[1]): return downs_w
else: return flats_w
data = pd.read_csv(INPUT_DATA_FILE)
window = 30
test_split = .9
xs, ys, xs_test, ys_test = prepare_data(data, window, test_split)
ups_cnt = sum(y[0] for y in ys)
downs_cnt = sum(y[1] for y in ys)
flats_cnt = sum(y[0] == False and y[1] == False for y in ys)
total_cnt = ups_cnt + downs_cnt + flats_cnt
ups_w = total_cnt/ups_cnt
downs_w = total_cnt/downs_cnt
flats_w = total_cnt/flats_cnt
sample_weight = np.array([get_sample_weight(y) for y in ys])
_, input_columns = xs.shape
xs_conv = xs.reshape(xs.shape[0], xs.shape[1], 1)
model_conv = Sequential()
model_conv.add(Conv1D(128, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Conv1D(64, 15, input_shape=(input_columns,1), activation='relu'))
model_conv.add(MaxPooling1D(pool_size=3))
model_conv.add(Dropout(0.4))
model_conv.add(Flatten())
model_conv.add(Dense(128, activation='relu'))
model_conv.add(Dropout(0.4))
model_conv.add(Dense(3, activation='sigmoid'))
model_conv.compile(loss='mean_squared_error', optimizer='nadam', metrics=['accuracy'])
model_conv.fit(xs_conv, ys, epochs=1, batch_size=16, sample_weight=sample_weight, validation_split=0.3, shuffle=True)
xs_test_conv = xs_test.reshape(xs_test.shape[0], xs_test.shape[1], 1)
res = model_conv.predict(xs_test_conv)
plotdata = pd.concat([pd.DataFrame(res, columns=['res_up','res_down','res_flat']), pd.DataFrame(ys_test, columns=['ys_up','ys_down','y_flat'])], axis = 1)
plotdata[['res_up', 'ys_up']][3000:3500].plot(figsize=(20,4))
plotdata[['res_down', 'ys_down']][3000:3500].plot(figsize=(20,4))
I have run your model with the attached data and so far can say that the biggest problem is lack of data cleaning.
For instance, there's a inf value in .csv at line 623. After I've filtered them all out with
xs1 = xs1[np.isfinite(xs1).all(axis=1)]
... I collected some statistics over xs, namely min, max and mean. They turned out pretty remarkable:
-43.0049723138
32832.3333333 # !!!
0.213126234391
On average, the values are close to 0, but some are 6 orders of magnitude higher. These particular rows definitely hurt the neural network, so you should either filter them as well or come up with a clever way to normalize the features.
But even with them, the model ended up with 71-79% validation accuracy. The result distribution is a bit skewed towards the 3rd class, but in general pretty diverse to name it peculiar: 19% for class 1, 7% for class 2, 73% for class 3. Example test output:
[[ 1.93120316e-02 4.47684433e-04 9.97518778e-01]
[ 1.40607255e-02 2.45630667e-02 9.74113524e-01]
[ 3.07740629e-01 4.80920941e-01 2.28664145e-01]
...,
[ 5.72797097e-02 9.45571139e-02 8.07634115e-01]
[ 1.05512664e-01 8.99530351e-02 6.70437515e-01]
[ 5.24505274e-03 1.46622911e-01 9.42657173e-01]]

different clusters with same method

I am stuck in a problem with hierarchical clustering. I want to make a dendrogram and a heatmap, with a distance method of correlation (d_mydata=dist(1-cor(t(mydata))) and ward.D2 as clustering method.
As a gadget in the package pheatmap you can plot the dendrogram on the left side to visualize the clusters.
The pipeline of my analysis would be this:
create the dendrogram
test how many cluster would be the optimal (k)
extract the subjects in each cluster
create a heatmap
My surprise comes up when the dendrogram plotted in the heatmap is not the same as the one plotted before even when methods are the same.
So I decided to create a pheatmap colouring by the clusters classified before by cutree and test if the colours correspond to the clusters in the dendrogram.
This is my code:
# Create test matrix
test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")
test<-as.data.frame(test)
# Create a dendrogram with this test matrix
dist_test<-dist(test)
hc=hclust(dist_test, method="ward.D2")
plot(hc)
dend<-as.dendrogram(hc, check=F, nodePar=list(cex = .000007),leaflab="none", cex.main=3, axes=F, adjust=F)
clus2 <- as.factor(cutree(hc, k=2)) # cut tree into 2 clusters
groups<-data.frame(clus2)
groups$id<-rownames(groups)
#-----------DATAFRAME WITH mydata AND THE CLASSIFICATION OF CLUSTERS AS FACTORS---------------------
test$id<-rownames(test)
clusters<-merge(groups, test, by.x="id")
rownames(clusters)<-clusters$id
clusters$clus2<-as.character(clusters$clus2)
clusters$clus2[clusters$clus2== "1"]= "cluster1"
clusters$clus2[clusters$clus2=="2"]<-"cluster2"
plot(dend,
main = "test",
horiz = TRUE, leaflab = "none")
d_clusters<-dist(1-cor(t(clusters[,7:10])))
hc_cl=hclust(d_clusters, method="ward.D2")
annotation_col = data.frame(
Path = factor(colnames(clusters[3:12]))
)
rownames(annotation_col) = colnames(clusters[3:12])
annotation_row = data.frame(
Group = factor(clusters$clus2)
)
rownames(annotation_row) = rownames(clusters)
# Specify colors
ann_colors = list(
Path= c(Test1="darkseagreen", Test2="lavenderblush2", Test3="lightcyan3", Test4="mediumpurple", Test5="red", Test6="blue", Test7="brown", Test8="pink", Test9="black", Test10="grey"),
Group = c(cluster1="yellow", cluster2="blue")
)
require(RColorBrewer)
library(RColorBrewer)
cols <- colorRampPalette(brewer.pal(10, "RdYlBu"))(20)
library(pheatmap)
pheatmap(clusters[ ,3:12], color = rev(cols),
scale = "column",
kmeans_k = NA,
show_rownames = F, show_colnames = T,
main = "Heatmap CK14, CK5/6, GATA3 and FOXA1 n=492 SCALE",
clustering_method = "ward.D2",
cluster_rows = TRUE, cluster_cols = TRUE,
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation",
annotation_row = annotation_row,
annotation_col = annotation_col,
annotation_colors=ann_colors
)
anyone with the same issue? Am I making an stupid mistake?
Thank you in advance

Torch: back-propagation from loss computed over a subset of the output

I have a simple convolutional neural network, whose output is a single channel 4x4 feature map. During training, the (regression) loss needs to be computed only on a single value among the 16 outputs. The location of this value will be decided after the forward pass. How do I compute the loss from just this one output, while making sure all irrelevant gradients are zero'ed out during back-prop.
Let's say I have the following simple model in torch:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
print(net)
-- the loss (don't know how to employ it yet)
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a 2x4x4 output
y = net:forward(x)
print(y)
I have looked at nn.SelectTable and it seems like if I convert the output into tabular form I would be able to implement what I want?
This is my current solution. It works by splitting the output into a table, and then using nn.SelectTable():backward() to get the full gradient:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
-- convert output into a table format
net:add(nn.View(1, -1)) -- vectorize
net:add(nn.SplitTable(1, 1)) -- split all outputs into table elements
print(net)
-- the loss
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a (2)x4x4 output
y = net:forward(x)
print(y)
-- returns the output table's index belonging to specific location
function get_sample_idx(feat_h, feat_w, smpl_idx, feat_r, feat_c)
local idx = (smpl_idx - 1) * feat_h * feat_w
return idx + feat_c + ((feat_r - 1) * feat_w)
end
-- I want to back-propagate the loss of this sample at this feature location
local smpl_idx = 2
local feat_r = 3
local feat_c = 4
-- get the actual index location in the output table (for a 4x4 output feature map)
local out_idx = get_sample_idx(4, 4, smpl_idx, feat_r, feat_c)
-- the (fake) ground-truth
local gt = torch.rand(1)
-- compute loss on the selected feature map location for the selected sample
local err = loss:forward(y[out_idx], gt)
-- compute loss gradient, as if there was only this one location
local dE_dy = loss:backward(y[out_idx], gt)
-- now convert into full loss gradient (zero'ing out irrelevant losses)
local full_dE_dy = nn.SelectTable(out_idx):backward(y, dE_dy)
-- do back-prop through who network
net:backward(x, full_dE_dy)
print("The full dE/dy")
print(table.unpack(full_dE_dy))
I would really appreciate it somebody points out a simpler OR more efficient method.

Target value shape in Lasagne

I am trying to train a Siamese Lasagne model in batches of 100.
The inputs are X1 (100x3x100x100) and X2 (same size) and Y(100x1) and my last layer is a Dense layer of one output dimension as I am expecting a value of 0 or 1 as a target value. However, it is throwing an error for unexpected dimension. Below are the code excerpts:
input1 = lasagne.layers.InputLayer(shape=(None,3, 100, 100), input_var=None)
conv1_a = lasagne.layers.Conv2DLayer(input1,
num_filters=24,
filter_size=(7, 7),
nonlinearity=lasagne.nonlinearities.rectify)
pool1_a = lasagne.layers.MaxPool2DLayer(conv1_a, pool_size=(3, 3), stride=2)
Layer 2 is same as above.
Output Layer:
dense_b = lasagne.layers.DenseLayer(dense_a,
num_units=128,
nonlinearity=lasagne.nonlinearities.rectify)
dense_c = lasagne.layers.DenseLayer(dense_b,
num_units=1,
nonlinearity=lasagne.nonlinearities.softmax)
net_output = lasagne.layers.get_output(dense_c)
true_output = T.ivector('true_output')
The training code is below:
loss_value = train(X1_train,X2_train,Y_train.astype(np.int32))
print loss_value
ValueError: Input dimension mis-match. (input[0].shape[1] = 100,
input[1].shape[1] = 1) Apply node that caused the error:
Elemwise{Composite{((i0 * i1) + (i2 *
log1p((-i3))))}}(InplaceDimShuffle{x,0}.0, LogSoftmax.0,
Elemwise{sub,no_inplace}.0, SoftmaxWithBias.0) Toposort index: 113
Inputs types: [TensorType(int32, row), TensorType(float32, matrix),
TensorType(float64, row), TensorType(float32, matrix)] Inputs shapes:
[(1, 100), (100, 1), (1, 100), (100, 1)] Inputs strides: [(400, 4),
(4, 4), (800, 8), (4, 4)] Inputs values: ['not shown', 'not shown',
'not shown', 'not shown'] Outputs clients:
[[Sum{acc_dtype=float64}(Elemwise{Composite{((i0 * i1) + (i2 *
log1p((-i3))))}}.0)]]
Try using draw_net.py as follows:
import draw_net
dot = draw_net.get_pydot_graph(lasagne.layers.get_all_layers(your_last_layer),
verbose = True)
dot.write("test.pdf", format="pdf")
to dump the Lasagne graph in pdf format (requires graphviz to be installed)