pytorch - model.named_parameters() returns 0 after optimizer.zero_grad() step

pytorch - model.named_parameters() returns 0 after optimizer.zero_grad() step - neural-network

I am trying to store the weights of the model. The code is given below:
for step, batch in enumerate(train_dataloader):
outputs = model(**batch)
loss = outputs.loss
loss = loss / args.gradient_accumulation_steps
accelerator.backward(loss)
progress_bar.update(1)
progress_bar.set_postfix(loss=round(loss.item(), 3))
del outputs
gc.collect()
torch.cuda.empty_cache()
if (step+1) % args.gradient_accumulation_steps == 0 or (step+1) == len(train_dataloader):
optimizer.step()
scheduler.step()
optimizer.zero_grad()
reference_gradient = [ p.grad.view(-1) if p.grad is not None else torch.zeros(p.numel()) for
n, p in model.named_parameters()]
reference_gradient = torch.cat(reference_gradient)
However, reference_gradient tensor has all zeros in it. How can I save the gradients of the entire model?

If you zero_grad the gradients - you delete the information. You cannot access the gradients after you set them to zero. You need to save the gradients before optimizer.zero_grad().

Related

pytorch linear regression given wrong results

I implemented a simple linear regression and I’m getting some poor results. Just wondering if these results are normal or I’m making some mistake.
I tried different optimizers and learning rates, I always get bad/poor results
Here is my code:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
class LinearRegressionPytorch(nn.Module):
def __init__(self, input_dim=1, output_dim=1):
super(LinearRegressionPytorch, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self,x):
x = x.view(x.size(0),-1)
y = self.linear(x)
return y
input_dim=1
output_dim = 1
if torch.cuda.is_available():
model = LinearRegressionPytorch(input_dim, output_dim).cuda()
else:
model = LinearRegressionPytorch(input_dim, output_dim)
criterium = nn.MSELoss()
l_rate =0.00001
optimizer = torch.optim.SGD(model.parameters(), lr=l_rate)
#optimizer = torch.optim.Adam(model.parameters(),lr=l_rate)
epochs = 100
#create data
x = np.random.uniform(0,10,size = 100) #np.linspace(0,10,100);
y = 6*x+5
mu = 0
sigma = 5
noise = np.random.normal(mu, sigma, len(y))
y_noise = y+noise
#pass it to pytorch
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Variable(x_data).cuda()
target = Variable(y_data).cuda()
else:
inputs = Variable(x_data)
target = Variable(y_data)
for epoch in range(epochs):
#predict data
pred_y= model(inputs)
#compute loss
loss = criterium(pred_y, target)
#zero grad and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
#if epoch % 50 == 0:
# print(f'epoch = {epoch}, loss = {loss.item()}')
#print params
for name, param in model.named_parameters():
if param.requires_grad:
print(name, param.data)
There are the poor results :
linear.weight tensor([[1.7374]], device='cuda:0')
linear.bias tensor([0.1815], device='cuda:0')
The results should be weight = 6 , bias = 5

Problem Solution
Actually your batch_size is problematic. If you have it set as one, your targetneeds the same shape as outputs (which you are, correctly, reshaping with view(-1, 1)).
Your loss should be defined like this:
loss = criterium(pred_y, target.view(-1, 1))
This network is correct
Results
Your results will not be bias=5 (yes, weight will go towards 6 indeed) as you are adding random noise to target (and as it's a single value for all your data points, only bias will be affected).
If you want bias equal to 5 remove addition of noise.
You should increase number of your epochs as well, as your data is quite small and network (linear regression in fact) is not really powerful. 10000 say should be fine and your loss should oscillate around 0 (if you change your noise to something sensible).
Noise
You are creating multiple gaussian distributions with different variations, hence your loss would be higher. Linear regression is unable to fit your data and find sensible bias (as the optimal slope is still approximately 6 for your noise, you may try to increase multiplication of 5 to 1000 and see what weight and bias will be learned).
Style (a little offtopic)
Please read documentation about PyTorch and keep your code up to date (e.g. Variable is deprecated in favor of Tensor and rightfully so).
This part of code:
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Tensor(x_data).cuda()
target = Tensor(y_data).cuda()
else:
inputs = Tensor(x_data)
target = Tensor(y_data)
Could be written succinctly like this (without much thought):
inputs = torch.from_numpy(x).float()
target = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = inputs.cuda()
target = target.cuda()
I know deep learning has it's reputation for bad code and fatal practice, but please do not help spreading this approach.

MATLAB 2-Layer Neural Network from Scratch

Currently, I'm working on a simple two Layer NN (25 input - sigmoid, 199 outputs - softmax) from scratch for debug reasons - Precisely, I want to track some values.
My input are batches or generally speaking matrices of dimension (rows x 25) in order to fit the input layer structure. Regarding my weight matrices: the first but last rows are the weights w_ij. The last row includes the biases.
The forward method seems to work correctly but I think I have a wrong backpropagation.
My backpropagation code snippet:
%Error gradient for the softmax output
error = single(output) - single(targets);
%Error for the input layer - W21 includes w_ij
error_out_to_input = error*(W21.');
gradient_outputLayer = single(zeros(26,199));
gradient_outputLayer = single(first_layerout_zerofilled.')*single(error);
biasGrad = single(sum(error,1));
gradient_outputLayer(26:26,:) = single(biasGrad);
%InputLayer
%derivative of sigmoid o(1-o)
%1
grad = single(1);
%1-o
grad = single(grad) - single(first_layerout_zerofilled);
%o(1-o)
grad = single(first_layerout_zerofilled) .* single((grad));
%final error
grad = single(grad) .* single(error_out_to_input);
gradient_inputLayer = single(zeros(26,25));
gradient_inputLayer = single(inputs.')*single(grad);
biasGrad = single(sum(grad,1));
gradient_inputLayer(26:26,:) = single(biasGrad);
%Update
W1 = W1-gradient_inputLayer * learning_rate;
W2 = W2-gradient_outputLayer * learning_rate;
This is not a question of efficiency. I just want to be sure that my backprogation calculates the correct gradients. I hope someone can review.

Impact of using relu for gradient descent

What impact does the fact the relu activation function does not contain a derivative ?
How to implement the ReLU function in Numpy implements relu as maximum of (0 , matrix vector elements).
Does this mean for gradient descent we do not take derivative of relu function ?
Update :
From Neural network backpropagation with RELU
this text aids in understanding :
The ReLU function is defined as: For x > 0 the output is x, i.e. f(x)
= max(0,x)
So for the derivative f '(x) it's actually:
if x < 0, output is 0. if x > 0, output is 1.
The derivative f '(0) is not defined. So it's usually set to 0 or you
modify the activation function to be f(x) = max(e,x) for a small e.
Generally: A ReLU is a unit that uses the rectifier activation
function. That means it works exactly like any other hidden layer but
except tanh(x), sigmoid(x) or whatever activation you use, you'll
instead use f(x) = max(0,x).
If you have written code for a working multilayer network with sigmoid
activation it's literally 1 line of change. Nothing about forward- or
back-propagation changes algorithmically. If you haven't got the
simpler model working yet, go back and start with that first.
Otherwise your question isn't really about ReLUs but about
implementing a NN as a whole.
But this still leaves some confusion as the neural network cost function typically takes derivative of activation function, so for relu how does this impact cost function ?

The standard answer is that the input to ReLU is rarely exactly zero, see here for example, so it doesn't make any significant difference.
Specifically, for ReLU to get a zero input, the dot product of one entire row of the input to a layer with one entire column of the layer's weight matrix would have to be exactly zero. Even if you have an all-zero input sample, there should still be a bias term in the last position, so I don't really see this ever happening.
However, if you want to test for yourself, try implementing the derivative at zero as 0, 0.5, and 1 and see if anything changes.
The PyTorch docs give a simple neural network with numpy example with one hidden layer and relu activation. I have reproduced it below with a fixed random seed and three options for setting the behavior of the ReLU gradient at 0. I have also added a bias term.
N, D_in, H, D_out = 4, 2, 30, 1
# Create random input and output data
x = x = np.random.randn(N, D_in)
x = np.c_(x, no.ones(x.shape[0]))
y = x = np.random.randn(N, D_in)
np.random.seed(1)
# Randomly initialize weights
w1 = np.random.randn(D_in+1, H)
w2 = np.random.randn(H, D_out)
learning_rate = 0.002
loss_col = []
for t in range(200):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0) # using ReLU as activate function
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum() # loss function
loss_col.append(loss)
print(t, loss, y_pred)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y) # the last layer's error
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T) # the second laye's error
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0 # grad at zero = 1
# grad[h <= 0] = 0 # grad at zero = 0
# grad_h[h < 0] = 0; grad_h[h == 0] = 0.5 # grad at zero = 0.5
grad_w1 = x.T.dot(grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2

Logistic regression in Matlab, confused about the results

I am testing out logistic regression in Matlab on 2 datasets created from the audio files:
The first set is created via wavread by extracting vectors of each file: the set is 834 by 48116 matrix. Each traning example is a 48116 vector of the wav's frequencies.
The second set is created by extracting frequencies of 3 formants of the vowels, where each formant(feature) has its' frequency range (for example, F1 range is 500-1500Hz, F2 is 1500-2000Hz and so on). Each training example is a 3-vector of the wav's formants.
I am implementing the algorithm like so:
Cost function and gradient:
h = sigmoid(X*theta);
J = sum(y'*log(h) + (1-y)'*log(1-h)) * -1/m;
grad = ((h-y)'*X)/m;
theta_partial = theta;
theta_partial(1) = 0;
J = J + ((lambda/(2*m)) * (theta_partial'*theta_partial));
grad = grad + (lambda/m * theta_partial');
where X is the dataset and y is the output matrix of 8 classes.
Classifier:
initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c = 1:num_labels,
[theta] = fmincg(#(t)(lrCostFunction(t, X, (y==c), lambda)), initial_theta, options);
all_theta(c, :) = theta';
end
where num_labels = 8, lambda(regularization) is 0.1
With the first set, MaxIter = 50, and I get ~99.8% classification accuracy.
With the second set and MaxIter=50, the accuracy is poor - 62.589928
I thought about increasing MaxIter to a larger value to improve the performance, however, even at a ridiculous amount of iterations, the result doesn't go higher than 66.546763. Changing of the regularization value (lambda) doesn't seem to influence the results in any better way.
What could be the problem? I am new to machine learning and I can't seem to catch what exactly causes this drastic difference. The only reason that obviously stands out for me is that the first set's examples are very long vectors, hence, larger amount of features, and the second set's examples are represented by short 3-vectors. Is this data not enough to classify the second set? If so, what can be done about it to achieve better classification results for the second set?

Export a neural network trained with MATLAB in other programming languages

I trained a neural network using the MATLAB Neural Network Toolbox, and in particular using the command nprtool, which provides a simple GUI to use the toolbox features, and to export a net object containing the informations about the NN generated.
In this way, I created a working neural network, that I can use as classifier, and a diagram representing it is the following:
There are 200 inputs, 20 neurons in the first hidden layer, and 2 neurons in the last layer that provide a bidimensional output.
What I want to do is to use the network in some other programming language (C#, Java, ...).
In order to solve this problem, I try to use the following code in MATLAB:
y1 = tansig(net.IW{1} * input + net.b{1});
Results = tansig(net.LW{2} * y1 + net.b{2});
Assuming that input is a monodimensional array of 200 elements, the previous code would work if net.IW{1} is a 20x200 matrix (20 neurons, 200 weights).
The problem is that I noticed that size(net.IW{1}) returns unexpected values:
>> size(net.IW{1})
ans =
20 199
I got the same problem with a network with 10000 input. In this case, the result wasn't 20x10000, but something like 20x9384 (I don't remember the exact value).
So, the question is: how can I obtain the weights of each neuron? And after that, can someone explain me how can I use them to produce the same output of MATLAB?

I solved the problems described above, and I think it is useful to share what I've learned.
Premises
First of all, we need some definitions. Let's consider the following image, taken from [1]:
In the above figure, IW stands for initial weights: they represent the weights of neurons on the Layer 1, each of which is connected with each input, as the following image shows [1]:
All the other weights, are called layer weights (LW in the first figure), that are also connected with each output of the previous layer. In our case of study, we use a network with only two layers, so we will use only one LW array to solve our problems.
Solution of the problem
After the above introduction, we can proceed by dividing the issue in two steps:
Force the number of initial weights to match with the input array length
Use the weights to implement and use the neural network just trained in other programming languages
A - Force the number of initial weights to match with the input array length
Using the nprtool, we can train our network, and at the end of the process, we can also export in the workspace some information about the entire training process. In particular, we need to export:
a MATLAB network object that represents the neural network created
the input array used to train the network
the target array used to train the network
Also, we need to generate a M-file that contains the code used by MATLAB to create the neural network, because we need to modify it and change some training options.
The following image shows how to perform these operations:
The M-code generated will be similar to the following one:
function net = create_pr_net(inputs,targets)
%CREATE_PR_NET Creates and trains a pattern recognition neural network.
%
% NET = CREATE_PR_NET(INPUTS,TARGETS) takes these arguments:
% INPUTS - RxQ matrix of Q R-element input samples
% TARGETS - SxQ matrix of Q S-element associated target samples, where
% each column contains a single 1, with all other elements set to 0.
% and returns these results:
% NET - The trained neural network
%
% For example, to solve the Iris dataset problem with this function:
%
% load iris_dataset
% net = create_pr_net(irisInputs,irisTargets);
% irisOutputs = sim(net,irisInputs);
%
% To reproduce the results you obtained in NPRTOOL:
%
% net = create_pr_net(trainingSetInput,trainingSetOutput);
% Create Network
numHiddenNeurons = 20; % Adjust as desired
net = newpr(inputs,targets,numHiddenNeurons);
net.divideParam.trainRatio = 75/100; % Adjust as desired
net.divideParam.valRatio = 15/100; % Adjust as desired
net.divideParam.testRatio = 10/100; % Adjust as desired
% Train and Apply Network
[net,tr] = train(net,inputs,targets);
outputs = sim(net,inputs);
% Plot
plotperf(tr)
plotconfusion(targets,outputs)
Before start the training process, we need to remove all preprocessing and postprocessing functions that MATLAB executes on inputs and outputs. This can be done adding the following lines just before the % Train and Apply Network lines:
net.inputs{1}.processFcns = {};
net.outputs{2}.processFcns = {};
After these changes to the create_pr_net() function, simply we can use it to create our final neural network:
net = create_pr_net(input, target);
where input and target are the values we exported through nprtool.
In this way, we are sure that the number of weights is equal to the length of input array. Also, this process is useful in order to simplify the porting to other programming languages.
B - Implement and use the neural network just trained in other programming languages
With these changes, we can define a function like this:
function [ Results ] = classify( net, input )
y1 = tansig(net.IW{1} * input + net.b{1});
Results = tansig(net.LW{2} * y1 + net.b{2});
end
In this code, we use the IW and LW arrays mentioned above, but also the biases b, used in the network schema by the nprtool. In this context, we don't care about the role of biases; simply, we need to use them because nprtool does it.
Now, we can use the classify() function defined above, or the sim() function equally, obtaining the same results, as shown in the following example:
>> sim(net, input(:, 1))
ans =
0.9759
-0.1867
-0.1891
>> classify(net, input(:, 1))
ans =
0.9759
-0.1867
-0.1891
Obviously, the classify() function can be interpreted as a pseudocode, and then implemented in every programming languages in which is possibile to define the MATLAB tansig() function [2] and the basic operations between arrays.
References
[1] Howard Demuth, Mark Beale, Martin Hagan: Neural Network Toolbox 6 - User Guide, MATLAB
[2] Mathworks, tansig - Hyperbolic tangent sigmoid transfer function, MATLAB Documentation center
Additional notes
Take a look to the robott's answer and the Sangeun Chi's answer for more details.

Thanks to VitoShadow and robott answers, I can export Matlab neural network values to other applications.
I really appreciate them, but I found some trivial errors in their codes and want to correct them.
1) In the VitoShadow codes,
Results = tansig(net.LW{2} * y1 + net.b{2});
-> Results = net.LW{2} * y1 + net.b{2};
2) In the robott preprocessing codes,
It would be easier extracting xmax and xmin from the net variable than calculating them.
xmax = net.inputs{1}.processSettings{1}.xmax
xmin = net.inputs{1}.processSettings{1}.xmin
3) In the robott postprocessing codes,
xmax = net.outputs{2}.processSettings{1}.xmax
xmin = net.outputs{2}.processSettings{1}.xmin
Results = (ymax-ymin)*(Results-xmin)/(xmax-xmin) + ymin;
-> Results = (Results-ymin)*(xmax-xmin)/(ymax-ymin) + xmin;
You can manually check and confirm the values as follows:
p2 = mapminmax('apply', net(:, 1), net.inputs{1}.processSettings{1})
-> preprocessed data
y1 = purelin ( net.LW{2} * tansig(net.iw{1}* p2 + net.b{1}) + net.b{2})
-> Neural Network processed data
y2 = mapminmax( 'reverse' , y1, net.outputs{2}.processSettings{1})
-> postprocessed data
Reference:
http://www.mathworks.com/matlabcentral/answers/14517-processing-of-i-p-data

This is a small improvement to the great Vito Gentile's answer.
If you want to use the preprocessing and postprocessing 'mapminmax' functions, you have to pay attention because 'mapminmax' in Matlab normalizes by ROW and not by column!
This is what you need to add to the upper "classify" function, to keep a coherent pre/post-processing:
[m n] = size(input);
ymax = 1;
ymin = -1;
for i=1:m
xmax = max(input(i,:));
xmin = min(input(i,:));
for j=1:n
input(i,j) = (ymax-ymin)*(input(i,j)-xmin)/(xmax-xmin) + ymin;
end
end
And this at the end of the function:
ymax = 1;
ymin = 0;
xmax = 1;
xmin = -1;
Results = (ymax-ymin)*(Results-xmin)/(xmax-xmin) + ymin;
This is Matlab code, but it can be easily read as pseudocode.
Hope this will be helpful!

I tried to implement a simply 2-layer NN in C++ using OpenCV and then exported the weights to Android which worked quiet well. I wrote a small script which generates a header file with the learned weights and this is used in the following code snipped.
// Map Minimum and Maximum Input Processing Function
Mat mapminmax_apply(Mat x, Mat settings_gain, Mat settings_xoffset, double settings_ymin){
Mat y;
subtract(x, settings_xoffset, y);
multiply(y, settings_gain, y);
add(y, settings_ymin, y);
return y;
/* MATLAB CODE
y = x - settings_xoffset;
y = y .* settings_gain;
y = y + settings_ymin;
*/
}
// Sigmoid Symmetric Transfer Function
Mat transig_apply(Mat n){
Mat tempexp;
exp(-2*n, tempexp);
Mat transig_apply_result = 2 /(1 + tempexp) - 1;
return transig_apply_result;
}
// Map Minimum and Maximum Output Reverse-Processing Function
Mat mapminmax_reverse(Mat y, Mat settings_gain, Mat settings_xoffset, double settings_ymin){
Mat x;
subtract(y, settings_ymin, x);
divide(x, settings_gain, x);
add(x, settings_xoffset, x);
return x;
/* MATLAB CODE
function x = mapminmax_reverse(y,settings_gain,settings_xoffset,settings_ymin)
x = y - settings_ymin;
x = x ./ settings_gain;
x = x + settings_xoffset;
end
*/
}
Mat getNNParameter (Mat x1)
{
// convert double array to MAT
// input 1
Mat x1_step1_xoffsetM = Mat(1, 48, CV_64FC1, x1_step1_xoffset).t();
Mat x1_step1_gainM = Mat(1, 48, CV_64FC1, x1_step1_gain).t();
double x1_step1_ymin = -1;
// Layer 1
Mat b1M = Mat(1, 25, CV_64FC1, b1).t();
Mat IW1_1M = Mat(48, 25, CV_64FC1, IW1_1).t();
// Layer 2
Mat b2M = Mat(1, 48, CV_64FC1, b2).t();
Mat LW2_1M = Mat(25, 48, CV_64FC1, LW2_1).t();
// input 1
Mat y1_step1_gainM = Mat(1, 48, CV_64FC1, y1_step1_gain).t();
Mat y1_step1_xoffsetM = Mat(1, 48, CV_64FC1, y1_step1_xoffset).t();
double y1_step1_ymin = -1;
// ===== SIMULATION ========
// Input 1
Mat xp1 = mapminmax_apply(x1, x1_step1_gainM, x1_step1_xoffsetM, x1_step1_ymin);
Mat temp = b1M + IW1_1M*xp1;
// Layer 1
Mat a1M = transig_apply(temp);
// Layer 2
Mat a2M = b2M + LW2_1M*a1M;
// Output 1
Mat y1M = mapminmax_reverse(a2M, y1_step1_gainM, y1_step1_xoffsetM, y1_step1_ymin);
return y1M;
}
example for a bias in the header could be this:
static double b2[1][48] = {
{-0.19879, 0.78254, -0.87674, -0.5827, -0.017464, 0.13143, -0.74361, 0.4645, 0.25262, 0.54249, -0.22292, -0.35605, -0.42747, 0.044744, -0.14827, -0.27354, 0.77793, -0.4511, 0.059346, 0.29589, -0.65137, -0.51788, 0.38366, -0.030243, -0.57632, 0.76785, -0.36374, 0.19446, 0.10383, -0.57989, -0.82931, 0.15301, -0.89212, -0.17296, -0.16356, 0.18946, -1.0032, 0.48846, -0.78148, 0.66608, 0.14946, 0.1972, -0.93501, 0.42523, -0.37773, -0.068266, -0.27003, 0.1196}};
Now, that Google published Tensorflow, this became obsolete.

Hence the solution becomes (after correcting all parts)
Here I am giving a solution in Matlab, but if you have tanh() function, you may easily convert it to any programming language. It is for just showing the fields from network object and the operations you need.
Assume you have a trained ann (network object) that you want to export
Assume that the name of the trained ann is trained_ann
Here is the script for exporting and testing.
Testing script compares original network result with my_ann_evaluation() result
% Export IT
exported_ann_structure = my_ann_exporter(trained_ann);
% Run and Compare
% Works only for single INPUT vector
% Please extend it to MATRIX version by yourself
input = [12 3 5 100];
res1 = trained_ann(input')';
res2 = my_ann_evaluation(exported_ann_structure, input')';
where you need the following two functions
First my_ann_exporter:
function [ my_ann_structure ] = my_ann_exporter(trained_netw)
% Just for extracting as Structure object
my_ann_structure.input_ymax = trained_netw.inputs{1}.processSettings{1}.ymax;
my_ann_structure.input_ymin = trained_netw.inputs{1}.processSettings{1}.ymin;
my_ann_structure.input_xmax = trained_netw.inputs{1}.processSettings{1}.xmax;
my_ann_structure.input_xmin = trained_netw.inputs{1}.processSettings{1}.xmin;
my_ann_structure.IW = trained_netw.IW{1};
my_ann_structure.b1 = trained_netw.b{1};
my_ann_structure.LW = trained_netw.LW{2};
my_ann_structure.b2 = trained_netw.b{2};
my_ann_structure.output_ymax = trained_netw.outputs{2}.processSettings{1}.ymax;
my_ann_structure.output_ymin = trained_netw.outputs{2}.processSettings{1}.ymin;
my_ann_structure.output_xmax = trained_netw.outputs{2}.processSettings{1}.xmax;
my_ann_structure.output_xmin = trained_netw.outputs{2}.processSettings{1}.xmin;
end
Second my_ann_evaluation:
function [ res ] = my_ann_evaluation(my_ann_structure, input)
% Works with only single INPUT vector
% Matrix version can be implemented
ymax = my_ann_structure.input_ymax;
ymin = my_ann_structure.input_ymin;
xmax = my_ann_structure.input_xmax;
xmin = my_ann_structure.input_xmin;
input_preprocessed = (ymax-ymin) * (input-xmin) ./ (xmax-xmin) + ymin;
% Pass it through the ANN matrix multiplication
y1 = tanh(my_ann_structure.IW * input_preprocessed + my_ann_structure.b1);
y2 = my_ann_structure.LW * y1 + my_ann_structure.b2;
ymax = my_ann_structure.output_ymax;
ymin = my_ann_structure.output_ymin;
xmax = my_ann_structure.output_xmax;
xmin = my_ann_structure.output_xmin;
res = (y2-ymin) .* (xmax-xmin) /(ymax-ymin) + xmin;
end

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

pytorch - model.named_parameters() returns 0 after optimizer.zero_grad() step - neural-network

If you zero_grad the gradients - you delete the information. You cannot access the gradients after you set them to zero. You need to save the gradients before optimizer.zero_grad().

Related

pytorch linear regression given wrong results

MATLAB 2-Layer Neural Network from Scratch

Impact of using relu for gradient descent

Logistic regression in Matlab, confused about the results

Export a neural network trained with MATLAB in other programming languages

Categories

Resources