How to extract an equation behind the trained DNN model using tf.keras - neural-network

Is it possible to extract an equation behind the trained model using Keras/Tensorflow with multiple hidden layers and 64 neurons. For example I have two input variables x1 and x2 and ine outout variable y. After training DNN model, I want to build an equation as: y_hat =bias + w1x1 + w2x2
For example [as here:][1]
It does not work with more hidden layers as:
model = tf.keras.models.Sequential()
model.add(normalizer)
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(1))
weights = model.layers[4].get_weights()[0]
biases = model.layers[4].get_weights()[1]
y_hat = bias + w1*x1 + w2*x2

Related

why is 1 appended to the input layer of a neural network?

I am following this tutorial for making neural network
https://www.kaggle.com/antmarakis/another-neural-network-from-scratch
I do not understand the train part of this code where 1 is appended to the input feature vector.
def Train(X, Y, lr, weights):`
`layers = len(weights)`
`for i in range(len(X)):`
`x, y = X[i], Y[i]`
`x = np.matrix(np.append(1, x)) # Augment feature vector`
`activations = ForwardPropagation(x, weights, layers)`
`weights = BackPropagation(y, activations, weights, layers)`
`return weights
any help in understanding this would be appreciated.
Forward propagation includes multiplying by weights and adding a bias term. The equation is
y = X*W + b. This can be written in a more vectorised form as y = [X, 1] * [W, b]. (* stands for matrix multiplication here).
In the code, the weights and biases seemed to have been combined into a single weight matrix W and x is modified as an augmented vector by appending a one to it.

LSTM Weight Matrix Interpretation

Consider the following code in Keras for building a LSTM model.
model = Sequential()
model.add(LSTM(30, input_dim=22,return_sequences=True, init = 'glorot_uniform'))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='Nadam')
model.fit(train3d, trainY3d, nb_epoch=100, batch_size=8)
As one can see there is only 1 LSTM layer. I see there are 14 matrices as output. If you check this link you would notice that there are 4 triples (Input gate, Forget gate, Cell State, Output gate) of W, U (parameter matrices) and b (bias vector) each for the LSTM layer. And then there are 2 matrices W and b for the dense layer.
My question is let's say for this case of 1 layer LSTM, is there a way to attribute 100% of Y to the impact for each input feature X_i for all i.

Hidden Markov model classifying a sequence in Matlab

I'm very new to machine learning, I'v read about Matlab's Statistics toolbox for hidden Markov model, I want to classify a given sequence of signals using it. I'v 3D co-ordinates in matrix P i.e [501x3] and I want to train model based on that. Evert complete trajectory ends on a specfic set of points, i.e at (0,0,0) where it achieves its target.
What is the appropriate Pseudocode/approach according to my scenario.
My Pseudocode:
501x3 matrix P is Emission matrix where each co-ordinate is state
random NxN transition matrix values (but i'm confused in it)
generating test sequence using the function hmmgenerate
train using hmmtrain(sequence,old_transition,old_emission)
give final transition and emission matrix to hmmdecode with an unknown sequence to give the probability (confusing also)
EDIT 1:
In a nutshell, I want to classify 10 classes of trajectories having each of [501x3] with HMM. I want to sampled 50 rows i.e [50x3] for each trajectory in order to build model. However, I'v murphyk's toolbox of HMM for such random sequences.
Here is a general outline of the approach to classifying d-dimensional sequences using hidden Markov models:
1) Training:
For each class k:
prepare an HMM model. This includes initializing the following:
a transition matrix: Q-by-Q matrix, where Q is the number of states
a vector of prior probabilities: Q-by-1 vector
the emission model: in your case the observations are 3D points so you could use a mutlivariate normal distribution (with specified mean vector and covariance matrix) or a Guassian mixture model (a bunch of MVN distributions combined using mixture coefficient)
after properly initializing the above parameters, you train the HMM model, feeding it the set of sequences belong to this class (EM algorithm).
2) Prediction
Next to classify a new sequence X:
you compute the log-likelihood of the sequence using each model log P(X|model_k)
then you pick the class that gave the highest probability. This is the class prediction.
As I mentioned in the comments, the Statistics Toolbox only implement discrete observation HMM models, so you will have to find another libraries or implement the code yourself. Kevin Murphy's toolboxes (HMM toolbox, BNT, PMTK3) are popular choices in this domain.
Here are some answers I posted in the past using Kevin Murphy's toolboxes:
Issue in training hidden markov model and usage for classification
Simple example/use-case for a BNT gaussian_CPD
The above answers are somewhat different from what you are trying to do here, but it's a good place to start.
The statement/case tells to build and train a hidden Markov's model having following components specially using murphyk's toolbox for HMM as per the choice:
O = Observation's vector
Q = States vector
T = vectors sequence
nex = number of sequences
M = number of mixtures
Demo Code (from murphyk's toolbox):
O = 8; %Number of coefficients in a vector
T = 420; %Number of vectors in a sequence
nex = 1; %Number of sequences
M = 1; %Number of mixtures
Q = 6; %Number of states
data = randn(O,T,nex);
% initial guess of parameters
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
if 0
Sigma0 = repmat(eye(O), [1 1 Q M]);
% Initialize each mean to a random data point
indices = randperm(T*nex);
mu0 = reshape(data(:,indices(1:(Q*M))), [O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
else
[mu0, Sigma0] = mixgauss_init(Q*M, data, 'full');
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
end
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 5);
loglik = mhmm_logprob(data, prior1, transmat1, mu1, Sigma1, mixmat1);

Programming a Basic Neural Network from scratch in MATLAB

I have asked a few questions about neural networks on this website in the past and have gotten great answers, but I am still struggling to implement one for myself. This is quite a long question, but I am hoping that it will serve as a guide for other people creating their own basic neural networks in MATLAB, so it should be worth it.
What I have done so far could be completely wrong. I am following the online stanford machine learning course by Professor Andrew Y. Ng and have tried to implement what he has taught to the best of my ability.
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
I have a feed 2 layer feed forward neural network.
The MATLAB code for the feedforward part is:
function [ Y ] = feedforward2( X,W1,W2)
%This takes a row vector of inputs into the neural net with weight matrices W1 and W2 and returns a row vector of the outputs from the neural net
%Remember X, Y, and A can be vectors, and W1 and W2 Matrices
X=transpose(X); %X needs to be a column vector
A = sigmf(W1*X,[1 0]); %Values of the first hidden layer
Y = sigmf(W2*A,[1 0]); %Output Values of the network
Y = transpose(Y); %Y needs to be a column vector
So for example a two layer neural net with two inputs and two outputs would look a bit like this:
a1
x1 o--o--o y1 (all weights equal 1)
\/ \/
/\ /\
x2 o--o--o y2
a2
if we put in:
X=[2,3];
W1=ones(2,2);
W2=ones(2,2);
Y = feedforward2(X,W1,W2)
we get the the output:
Y = [0.5,0.5]
This represents the y1 and y2 values shown in the drawing of the neural net
The MATLAB code for the squared error cost function is:
function [ C ] = cost( W1,W2,Xtrain,Ytrain )
%This gives a value seeing how close W1 and W2 are to giving a network that represents the Xtrain and Ytrain data
%It uses the squared error cost function
%The closer the cost is to zero, the better these particular weights are at giving a network that represents the training data
%If the cost is zero, the weights give a network that when the Xtrain data is put in, The Ytrain data comes out
M = size(Xtrain,1); %Number of training examples
oldsum = 0;
for i = 1:M,
H = feedforward2(Xtrain,W1,W2);
temp = ( H(i) - Ytrain(i) )^2;
Sum = temp + oldsum;
oldsum = Sum;
end
C = (1/2*M) * Sum;
end
Example
So for example if the training data is:
Xtrain =[0,0; Ytrain=[0/57;
1,2; 3/57;
4,1; 5/57;
5,2; 7/57; a1
3,4; 7/57; %This will be for a two input one output network x1 o--o y1
5,3; 8/57; \/ \_o
1,5; 6/57; /\ /
6,2; 8/57; x2 o--o
2,1; 3/57; a2
5,5;] 10/57;]
We start with initial random weights
W1=[2,3; W2=[3,2]
4,1]
If we put in:
Y= feedforward2([6,2],W1,W2)
We get
Y = 0.9933
Which is far from what the training data says it should be (8/57 = 0.1404). So the initial random weights W1 and W2 where a bad guess.
To measure exactly how bad/good a guess the random weights weights are we use the cost function:
C= cost(W1,W2,Xtrain,Ytrain)
This gives the value:
C = 6.6031e+003
Minimizing the cost function
If we minimize the cost function by searching all of the possible variables W1 and W2 and then picking the lowest, this will give the network that best approximates the training data
But when I Use the code:
[W1,W2]=fminsearch(cost(W1,W2,Xtrain,Ytrain),[W1,W2])
It gives an error message. It says: "Error using horzcat. CAT arguments dimensions are not consistent."Why am I getting this error and what can I do to fix it?
Can you please tell me if the feed forward and cost function parts of my code are correct, and where I am going wrong in the minimization (optimization) part?
Thank you!!!
Your Neural network seems alright, although the kind of training you're trying to do is quite in-efficient if you're training against labeled data as you're doing. In that case I would suggest looking into Back-propagation
About your error when training: Your error message hints at the problem: dimensions are not consistent
As argument x0 in fminsearch which is the initial guess for the optimizer, you send [W1, W2] but from what I can see, these matrices don't have the same number of rows, and therefore you can't add them together like that. I would suggest modifying your cost-function to take a vector as argument and then form your weight-vectors for different layers from that one vector.
You are also not supplying the cost-function correctly to fminsearch as you are just evaluating cost with w1, w2, Xtrain and Ytrain in-place.
According to the documentation (it's been years since I used Matlab) it seems like you pass the pointer to the cost-function as
fminsearch(cost, [W1; W2])
EDIT: You could express your weights and modify your code as follows:
global Xtrain
global Ytrain
W = [W1; W2]
fminsearch(cost, W)
Cost-function must be modified such that it doesn't take Xtrain, Ytrain as input because fminsearch will then try to optimize those too. Modify your cost-function like this:
function [ C ] = cost( W )
W1 = W[1:2,:]
W2 = W[3,:]
global Xtrain
global Ytrain
...

Matlab Neural Network gives unexpected results

I was toying around with the matlab neural network toolbox and I encountered some things I did not expect. My problem is especially with a classification network with no hidden layers, only 1 input and the tansig transfer function. So I expect this classifier to divide a 1D dataset at some point that is defined by the learned input weight and bias.
First of all, I thought that the formula for computing the output y for a given input x is:
y = f(x*w + b)
with w the input weight and b the bias. What is the correct formula for calculating the output of the network?
I also expected that translating the whole dataset by a certain value (+77) would have a big effect on the bias and/or weight. But this doesn't seem to be the case. Why does the translation of the dataset has not much effect on the bias and weight?
This is my code:
% Generate data
p1 = randn(1, 1000);
t1(1:1000) = 1;
p2 = 3 + randn(1, 1000);
t2(1:1000) = -1;
% view data
figure, hist([p1', p2'], 100)
P = [p1 p2];
T = [t1 t2];
% train network without hidden layer
net1 = newff(P, T, [], {'tansig'}, 'trainlm');
[net1,tr] = train(net1, P, T);
% display weight and bias
w = net1.IW{1,1};
b = net1.b{1,1};
disp(w) % -6.8971
disp(b) % -0.2280
% make label decision
class_correct = 0;
outputs = net1(P);
for i = 1:length(outputs)
% choose between -1 and 1
if outputs(i) > 0
outputs(i) = 1;
else
outputs(i) = -1;
end
% compare
if T(i) == outputs(i)
class_correct = class_correct + 1;
end
end
% Calculate the correct classification rate (CCR)
CCR = (class_correct * 100) / length(outputs);
fprintf('CCR: %f \n', CCR);
% plot the errors
errors = gsubtract(T, outputs);
figure, plot(errors)
% I expect these to be equal and near 1
net1(0) % 0.9521
tansig(0*w + b) % -0.4680
% I expect these to be equal and near -1
net1(4) % -0.9991
tansig(4*w + b) % -1
% translate the dataset by +77
P2 = P + 77;
% train network without hidden layer
net2 = newff(P2, T, [], {'tansig'}, 'trainlm');
[net2,tr] = train(net2, P2, T);
% display weight and bias
w2 = net2.IW{1,1};
b2 = net2.b{1,1};
disp(w2) % -5.1132
disp(b2) % -0.1556
I generated an artificial dataset that is made of 2 populations with a normal distribution with a different mean. I plotted these populations in a histogram, and trained the network with it.
I calculate the Correct classification rate, which is the percentage of the correct classified instances of the whole dataset. This is somewhere around 92% so I know the classifier works.
But, I expected net1(x) and tansig(x*w + b) to give the same output, this is not the case. What is the correct formula for calculating the output of my trained network?
And I expected net1 and net2 to have different weights and/or bias because net2 is trained on a translated version (+77) of the dataset where net1 is trained on. Why does the translation of the dataset has not much effect on the bias and weight?
First off, your code leaves the default MATLAB input pre-processing intact. You can check this with:
net2.inputs{1}
When I put your code in I got this:
Neural Network Input
name: 'Input'
feedbackOutput: []
processFcns: {'fixunknowns', removeconstantrows,
mapminmax}
processParams: {1x3 cell array of 2 params}
processSettings: {1x3 cell array of 3 settings}
processedRange: [1x2 double]
processedSize: 1
range: [1x2 double]
size: 1
userdata: (your custom info)
The important part being processFncs set to mapminmax. According to the docs, mapminmax will "Process matrices by mapping row minimum and maximum values to [-1 1]". That's why your "shifting" (re-sampling) your inputs arbitrarily had no effect.
I assume that by "calculating the output" you mean checking the performance of your network. You can do that by first simulating your network on a dataset (see doc nnet/sim) and then checking the "correctness" with the perform function. It will use the same cost function as you did when training:
% get predictions
[Y] = sim(net2,T);
% see how well we did
perf = perform(net2,T,Y);
Boomshuckalucka. Hope that helps.