implementing answer selection using deep learning LSTM model in keras - merge

I am trying to implement answer selection model in deep learning as shown below in keras based on this paper,
I understand implementing steps embedding, bi-LSTM and pooling in above flow.
But how do I implement the merge function to compute the cosine similarity and loss function in keras?
loss function is defined as,
L= max{0,M-cosine(q,a+)+cosine(q,a-)}
where,
M = constant margin
q = question
a+ = correct answer
a- = wrong answer
Update 1:
After going through few blogs, this is how I implemented.
#build model
input_question = Input(shape=(max_len, embedding_dim))
input_sentence = Input(shape=(max_len, embedding_dim))
question_lstm = Bidirectional(LSTM(64))
sentence_lstm = Bidirectional(LSTM(64))
encoded_question = question_lstm(input_question)
encoded_sentence = sentence_lstm(input_sentence)
cos_distance = merge([encoded_question, encoded_sentence], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
cos_similarity = Lambda(lambda x: 1-x)(cos_distance)
predictions = Dense(1, activation='sigmoid')(cos_similarity)
model = Model([input_question, input_sentence], [predictions])
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
With above implementation, I am still not able to figure out how to implement hinge loss. Please help

Related

How to use trained svm model to predict whether image contains a car object or not

I want to use the svm classifying whether an image contains car or not.
I trained svm classifier using HOG. Then I try to use the classifier, so I looked up certain Mathworks tutorial.
I could not fined any useful tutorial for using svm classifier.
I use the data set from http://cogcomp.org/Data/Car/
This is my code for svm classifier.
imgPos = imread(strrep(file, '*', int2str(0)));
[hog_4x4, vis4x4] = extractHOGFeatures(imgPos,'CellSize',[4 4]);
cellSize = [4 4];
hogFeatureSize = length(hog_4x4);
temp(1:500) = 1;
temp(501:1000) = 0;
trainingLabels = categorical(temp);
trainingFeatures = zeros(fileNum*2, hogFeatureSize, 'single');
for n = 1:500
posfile = strrep(posFile, "*", int2str(n-1));
imgPos = imread(posfile);
trainingFeatures(n, :) = extractHOGFeatures(imgPos, 'CellSize', cellSize);
negfile = strrep(negFile, "*", int2str(n-1));
imgNeg = imread(negfile);
trainingFeatures(n+500, :) = extractHOGFeatures(imgNeg, 'CellSize', cellSize);
end
classifier = fitcecoc(trainingFeatures, trainingLabels);
I want use the classifier to detect car objects.
If it's possible I want to surround each detected car object with frame.
Any help is appreciated.
Your looking for the predict method. Get your test data features and run the following:
predictions = predict(classifier, testFeatures);

EEG data classification with SWLDA using matlab

I want to ask your help in EEG data classification.
I am a graduate student trying to analyze EEG data.
Now I am struggling with classifying ERP speller (P300) with SWLDA using Matlab
Maybe there is something wrong in my code.
I have read several articles, but they did not cover much details.
My data size is described as below.
size(target) = [300 1856]
size(nontarget) = [998 1856]
row indicates the number of trials, column indicates spanned feature
(I stretched data [64 29] (for visual representation I did not select ROI)
I used stepwisefit function in Matlab to classify target vs non-target
Code is attached below.
ingredients = [targets; nontargets];
heat = [class_targets; class_nontargets]; % target: 1, non-target: -1
randomized_set = shuffle([ingredients heat]);
for k=1:10 % 10-fold cross validation
parition_factor = ceil(size(randomized_set,1) / 10);
cv_test_idx = (k-1)*parition_factor + 1:min(k * parition_factor, size(randomized_set,1));
total_idx = 1:size(randomized_set,1);
cv_train_idx = total_idx(~ismember(total_idx, cv_test_idx));
ingredients = randomized_set(cv_train_idx, 1:end-1);
heat = randomized_set(cv_train_idx, end);
[W,SE,PVAL,INMODEL,STATS,NEXTSTEP,HISTORY]= stepwisefit(ingredients, heat, 'penter', .1);
valid_id = find(INMODEL==1);
v_weights = W(valid_id)';
t_ingredients = randomized_set(cv_test_idx, 1:end-1);
t_heat = randomized_set(cv_test_idx, end); % true labels for test set
v_features = t_ingredients(:, valid_id);
v_weights = repmat(v_weights, size(v_features, 1), 1);
predictor = sum(v_weights .* v_features, 2);
m_result = predictor > 0; % class A: +1, B: 0
t_heat(t_heat==-1) = 0;
acc(k) = sum(m_result==t_heat) / length(m_result);
end
p.s. my code is currently very inefficient and might be bad..
In my assumption, stepwisefit calculates significant coefficients every steps, and valid column would be remained.
Even though it's not LDA, but for binary classification, LDA and linear regression are not different.
However, results were almost random chance.. (for other binary data on the internet, it worked..)
I think I made something wrong, and your help can correct me.
I will appreciate any suggestion and tips to implement classifier for ERP speller.
Or any idea for implementing SWLDA in Matlab code?
The name SWLDA is only used in the context of Brain Computer Interfaces, but I bet it has another name in a more general context.
If you track the recipe of SWLDA you will end up in Krusienski 2006 papers ("A comparison..." and "Toward enhanced P300..") and from there the book where stepwise logarithmic regression is explained: "Draper Smith, Applied Regression Analysis, 1981". However, as far as I am aware of, no paper gives actually the complete recipe on how to implement it (and their details and secrets).
My approach was using stepwiseglm:
H=predictors;
TH=variables;
lbs=labels % (1,2)
if (stepwiseflag)
mdl = stepwiseglm(H', lbs'-1,'constant','upper','linear','distr','binomial');
if (mdl.NumEstimatedCoefficients>1)
inmodel = [];
for i=2:mdl.NumEstimatedCoefficients
inmodel = [inmodel str2num(mdl.CoefficientNames{i}(2:end))];
end
H = H(inmodel,:);
TH = TH(inmodel,:);
end
end
lbls = classify(TH',H',lbs','linear');
You can also use a k-fold cross validaton approach using matlab cvpartition.
c = cvpartition(lbs,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'linear'))));

I am training a keras neural net. I would like to have a custom loss function given by y_true*y_pred. Is this allowed?

This is a snippet of my model:
W1 = create_base_network(latent_dim)
input_a = Input(shape=(1,latent_dim))
input_b = Input(shape=(1,latent_dim))
x_a = encoder(input_a)
x_b = encoder(input_b)
processed_a = W1(x_a)
processed_b = W1(x_b)
del1 = Lambda(Delta1, output_shape=Delta1_output_shape)([processed_a, processed_b])
model = Model(input=[input_a, input_b], output=del1)
# train
rms = RMSprop()
model.compile(loss='kappa_delta_loss', optimizer=rms)
Basically, the neural net is getting a (pre-trained) encoder representation of the two inputs and computing the difference in prediction values for the two inputs by passing through a MLP. This difference is Delta1 which is y_pred of the network. I want the loss function to be y_pred*y_true. However, when I do that, I get the error, 'Invalid objective: kappa_delta_loss'.
What am I doing wrong?
You almost answer the question yourself. Create your objective
function like ones in
https://github.com/fchollet/keras/blob/master/keras/objectives.py like
this,
import theano import theano.tensor as T
epsilon = 1.0e-9
def custom_objective(y_true, y_pred):
'''Just another crossentropy'''
y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)
y_pred /= y_pred.sum(axis=-1, keepdims=True)
cce = T.nnet.categorical_crossentropy(y_pred, y_true)
return cce and pass it to compile argument
model.compile(loss=custom_objective, optimizer='adadelta')
from https://github.com/fchollet/keras/issues/369
So you should create your custom loss function with two arguments, the first being the target and the second your prediction.
Assuming your output (y_pred) is a scalar, your custom objective could be
def custom objective(y_true,y_pred)
return K.dot(y_true,y_pred)
K for keras backend (more generic than the theano example)

Neural Nets with Pymc3

I am trying to use pymc3 to sample from the posterior, a set of single-hidden layer neural nets so that I could then convert the model to a hierarchical one, same as in Radford M.Neal's thesis. my first model looks like this:
def sample(nHiddenUnts,X,Y):
nFeatures = X.shape[1]
with pm.Model() as model:
#priors
bho = pm.Normal('hiddenOutBias',mu=0,sd=100)
who = pm.Normal('hiddenOutWeights',mu=0,sd=100,shape= (nHiddenUnts,1) )
bih = pm.Normal('inputBias',mu=0,sd=100 ,shape=nHiddenUnts)
wih= pm.Normal('inputWeights',mu=0,sd=100,shape=(nFeatures,nHiddenUnts))
netOut=T.dot( T.nnet.sigmoid( T.dot( X , wih ) + bih ) , who )+bho
#likelihood
likelihood = pm.Normal('likelihood',mu=netOut,sd=0.001,observed=Y)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(100000, step, start, progressbar=True)
return trace
and in the second model hyperpriors are added which are precisions of noise, input-to-hidden and hidden-to-output weights and biases( e.g. bihTau = precision of input->hidden bias ). The parameters for the hyperpriors are chosen so that they could be broad and also log-transformed.
#Gamma Hyperpriors
bhoTau, log_bhoTau = model.TransformedVar('bhoTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
WhoTau, log_WhoTau = model.TransformedVar('WhoTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
bihTau, log_bihTau = model.TransformedVar('bihTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
wihTau, log_wihTau = model.TransformedVar('wihTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
noiseTau, log_noiseTau = model.TransformedVar('noiseTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e+4),
pm.logtransform)
#priors
bho = pm.Normal('hiddenOutBias',mu=0,tau=bhoTau)
who = pm.Normal('hiddenOutWeights',mu=0,tau=WhoTau,shape=(nHiddenUnts,1) )
bih = pm.Normal('inputBias',mu=0,tau=bihTau ,shape=nHiddenUnts)
wih= pm.Normal('inputWeights',mu=0,tau=wihTau ,shape= (nFeatures,nHiddenUnts))
.
.
.
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
where bho,who,bin and win are biases and weights of hidden-to-output and input-to-hidden layers.
To check my models 3 to 5 sample points [0,1) was drawn from a one dimensional toy function of the following form
def g(x):
return np.prod( x+np.sin(2*np.pi*x),axis=1)
The first model (constant hyper-parameters) works fine! but when I sample from the posterior of the hyperparameters+parameters e.g. replace the priors in the first (above)listing with the ones in the second, neither the find_MAP() nor sampling method converges regardless of the number of samples and the resulted ANNs won't interpolate the sample points. Then I tried to integrate hyperpriors into my model one by one. The only one which could be integrated without problem was the noise precision. if I include any of the others then the sampler won't converge to the posterior. I tried this using one 'step function' for all the model variables and also combination of two separate step methods over parameters and hyperparams. In all the cases and with different number of samples the problem was still there.

Using PCA before classification

I am using PCA to reduce number of features before training Random Forest. I first used around 70 principal components out of 125 which were around 99% of the energy (according to eigen values). I got much worse results after training Random Forests with new transformed features. After that I used all the principal components and I got the same results as when I used 70. This made no sense to me since that is the same feature space only in difirent base (the space has only be rotated so that should not affect the boundary).
Does anyone have the idea what may be the problem here?
Here is my code
clc;
clear all;
close all;
load patches_training_256.txt
load patches_testing_256.txt
Xtr = patches_training_256(:,2:end);
Xtr = Xtr';
Ytr = patches_training_256(:,1);
Ytr = Ytr';
Xtest = patches_testing_256(:,2:end);
Xtest = Xtest';
Ytest = patches_testing_256(:,1);
Ytest = Ytest';
data_size = size(Xtr, 2);
feature_size = size(Xtr, 1);
mu = mean(Xtr,2);
sigma = std(Xtr,0,2);
mu_mat = repmat(mu,1,data_size);
sigma_mat = repmat(sigma,1,data_size);
cov = ((Xtr - mu_mat)./sigma_mat) * ((Xtr - mu_mat)./sigma_mat)' / data_size;
[v d] = eig(cov);
%[U S V] = svd(((Xtr - mu_mat)./sigma_mat)');
k = 124;
%Ureduce = U(:,1:k);
%XtrReduce = ((Xtr - mu_mat)./sigma_mat) * Ureduce;
XtrReduce = v'*((Xtr - mu_mat)./sigma_mat);
B = TreeBagger(300, XtrReduce', Ytr', 'Prior', 'Empirical', 'NPrint', 1);
data_size_test = size(Xtest, 2);
mu_test = repmat(mu,1,data_size_test);
sigma_test = repmat(sigma,1,data_size_test);
XtestReduce = v' * ((Xtest - mu_test) ./ sigma_test);
Ypredict = predict(B,XtestReduce');
error = sum(Ytest' ~= (double(cell2mat(Ypredict)) - 48))
Random forest heavily depends on the choice of the base. It is not a linear model, which is (up to normalization) rotation invariant, RF completely changes behaviour once you "rotate the space". The reason behind it lies in the fact that it uses decision trees as base classifiers which analyze each feature completely independently, so as the result it fails to find any linear combination of features. Once you rotate your space you change "meaning" of features. There is nothing wrong with that, simply tree based classifiers are rather bad choice to apply after such transformations. Use features selection methods instead (methods which select which features are valuable without creating any linear combinations). In fact, RFs themselves can be used for such task due to their internal "feature importance" computation,
There is already a matlab function princomp which would do pca for you. I would suggest not to fall in numerical error loops. They have done it for us..:)