Am I able to calculate autograd of NN outputs wrt network inputs in flux.jl? - neural-network

I have a NN that is a function f:(t, v) -> (x,z). Am I able to calculate an autograd partial derivative df/dt? I want to use the the autograd calculation in a regularization term in my loss function.
yhat = net((t,v))
#calculate current value of df/dt here
penalized_loss(yhat, y) = loss(yhat, y) + penalty(df/dt)
I want to do something like
df/dt = gradient(net,t)
but I don't know how to tell the gradient function what the input (t) is

Based on the documentation, you can use the gradient this way:
function my_custom_train!(loss, ps, data, opt)
ps = Params(ps)
for d in data
gs = gradient(ps) do
loss(d...)
end
update!(opt, ps, gs)
end
end
This gradient(ps) do ... end is the Julian idiom to:
gradient(loss(d...), ps);
gradient comes from Zygote.jl, you can read more about here.

If it is a feed-forward neural network you can use BetaML and in particular copy the getGradient function ("step 1" and "step 2") to retrieve the backward stack up to the inputs.

Related

Fitting a neural network with ReLUs to polynomial functions

Out of curiosity I am trying to fit neural network with rectified linear units to polynomial functions.
For example, I would like to see how easy (or difficult) it is for a neural network to come up with an approximation for the function f(x) = x^2 + x. The following code should be able to do it, but seems to not learn anything. When I run
using Base.Iterators: repeated
ENV["JULIA_CUDA_SILENT"] = true
using Flux
using Flux: throttle
using Random
f(x) = x^2 + x
x_train = shuffle(1:1000)
y_train = f.(x_train)
x_train = hcat(x_train...)
m = Chain(
Dense(1, 45, relu),
Dense(45, 45, relu),
Dense(45, 1),
softmax
)
function loss(x, y)
Flux.mse(m(x), y)
end
evalcb = () -> #show(loss(x_train, y_train))
opt = ADAM()
#show loss(x_train, y_train)
dataset = repeated((x_train, y_train), 50)
Flux.train!(loss, params(m), dataset, opt, cb = throttle(evalcb, 10))
println("Training finished")
#show m([20])
it returns
loss(x_train, y_train) = 2.0100101f14
loss(x_train, y_train) = 2.0100101f14
loss(x_train, y_train) = 2.0100101f14
Training finished
m([20]) = Float32[1.0]
Anyone here sees how I could make the network fit f(x) = x^2 + x?
There seem to be couple of things wrong with your trial that have mostly to do with how you use your optimizer and treat your input -- nothing wrong with Julia or Flux. Provided solution does learn, but is by no means optimal.
It makes no sense to have softmax output activation on a regression problem. Softmax is used in classification problems where the output(s) of your model represent probabilities and therefore should be on the interval (0,1). It is clear your polynomial has values outside this interval. It is usual to have linear output activation in regression problems like these. This means in Flux no output activation should be defined on the output layer.
The shape of your data matters. train! computes gradients for loss(d...) where d is a batch in your data. In your case a minibatch consists of 1000 samples, and this same batch is repeated 50 times. Neural nets are often trained with smaller batches sizes, but a larger sample set. In the code I provided all batches consist of different data.
For training neural nets, in general, it is advised to normalize your input. Your input takes values from 1 to 1000. My example applies a simple linear transformation to get the input data in the right range.
Normalization can also apply to the output. If the outputs are large, this can result in (too) large gradients and weight updates. Another approach is to lower the learning rate a lot.
using Flux
using Flux: #epochs
using Random
normalize(x) = x/1000
function generate_data(n)
f(x) = x^2 + x
xs = reduce(hcat, rand(n)*1000)
ys = f.(xs)
(normalize(xs), normalize(ys))
end
batch_size = 32
num_batches = 10000
data_train = Iterators.repeated(generate_data(batch_size), num_batches)
data_test = generate_data(100)
model = Chain(Dense(1,40, relu), Dense(40,40, relu), Dense(40, 1))
loss(x,y) = Flux.mse(model(x), y)
opt = ADAM()
ps = Flux.params(model)
Flux.train!(loss, ps, data_train, opt , cb = () -> #show loss(data_test...))

How to give discrete input to a plant equation?

I have a plant equation.Say,
Plant = tf([0 1] ,[1 1],'InputDelay',1);
t = 1:1:100;
Now I have a input value a= 0.0552 ,at the time instance t=1.I want to calculate output of the plant at t=1(which should be a numeric value as well!) How to do that!
If I give input a(1)=0.5552 at t=1 then y (output) is calculated based upon only a(1).
Similarly at t=2 my input is a(2)=0.4481(say)....
at t=3 ,a(3)=0.4100 ...So on.Then how would i be able to get the proper y(t1,a1),y(t2,a2)...values .
You basically have a step input of value 0.0552. You can easily use the function step for this as:
Plant = tf([0 1] ,[1 1],'InputDelay',1);
t = 1:1:100;
opt = stepDataOptions;
opt.StepAmplitude = 0.0552;
step(Plant, t, opt);
That will create the following plot:
If you want not to plot, but to get the response, just catch step's output:
y=step(Plant, t, opt);
I find this after long search..Instead of using the equation in laplace form.We have to use the differential form the with the help of runge -kutta method it can be solved.( that is to get the output in numeric approximated terms)

How can I plot data to a “best fit” cos² graph in Matlab?

I’m currently a Physics student and for several weeks have been compiling data related to ‘Quantum Entanglement’. I’ve now got to a point where I have to plot my data (which should resemble a cos² graph - and does) to a sort of “best fit” cos² graph. The lab script says the following:
A more precise determination of the visibility V (this is basically how 'clean' the data is) follows from the best fit to the measured data using the function:
f(b) = A/2[1-Vsin(b-b(center)/P)]
Granted this probably doesn’t mean much out of context, but essentially A is the amplitude, b is an angle and P is the periodicity. Hence this is also a “wave” like the experimental data I have found.
From this I understand, as previously mentioned, I am making a “best fit” curve. However, I have been told that this isn’t possible with Excel and that the best approach is Matlab.
I know intermediate JavaScript but do not know Matlab and was hoping for some direction.
Is there a tutorial I can read for this? Is it possible for someone to go through it with me? I really have no idea what it entails, so any feed back would be greatly appreciated.
Thanks a lot!
Initial steps
I guess we should begin by getting a representation in Matlab of the function that you're trying to model. A direct translation of your formula looks like this:
function y = targetfunction(A,V,P,bc,b)
y = (A/2) * (1 - V * sin((b-bc) / P));
end
Getting hold of the data
My next step is going to be to generate some data to work with (you'll use your own data, naturally). So here's a function that generates some noisy data. Notice that I've supplied some values for the parameters.
function [y b] = generateData(npoints,noise)
A = 2;
V = 1;
P = 0.7;
bc = 0;
b = 2 * pi * rand(npoints,1);
y = targetfunction(A,V,P,bc,b) + noise * randn(npoints,1);
end
The function rand generates random points on the interval [0,1], and I multiplied those by 2*pi to get points randomly on the interval [0, 2*pi]. I then applied the target function at those points, and added a bit of noise (the function randn generates normally distributed random variables).
Fitting parameters
The most complicated function is the one that fits a model to your data. For this I use the function fminunc, which does unconstrained minimization. The routine looks like this:
function [A V P bc] = bestfit(y,b)
x0(1) = 1; %# A
x0(2) = 1; %# V
x0(3) = 0.5; %# P
x0(4) = 0; %# bc
f = #(x) norm(y - targetfunction(x(1),x(2),x(3),x(4),b));
x = fminunc(f,x0);
A = x(1);
V = x(2);
P = x(3);
bc = x(4);
end
Let's go through line by line. First, I define the function f that I want to minimize. This isn't too hard. To minimize a function in Matlab, it needs to take a single vector as a parameter. Therefore we have to pack our four parameters into a vector, which I do in the first four lines. I used values that are close, but not the same, as the ones that I used to generate the data.
Then I define the function I want to minimize. It takes a single argument x, which it unpacks and feeds to the targetfunction, along with the points b in our dataset. Hopefully these are close to y. We measure how far they are from y by subtracting from y and applying the function norm, which squares every component, adds them up and takes the square root (i.e. it computes the root mean square error).
Then I call fminunc with our function to be minimized, and the initial guess for the parameters. This uses an internal routine to find the closest match for each of the parameters, and returns them in the vector x.
Finally, I unpack the parameters from the vector x.
Putting it all together
We now have all the components we need, so we just want one final function to tie them together. Here it is:
function master
%# Generate some data (you should read in your own data here)
[f b] = generateData(1000,1);
%# Find the best fitting parameters
[A V P bc] = bestfit(f,b);
%# Print them to the screen
fprintf('A = %f\n',A)
fprintf('V = %f\n',V)
fprintf('P = %f\n',P)
fprintf('bc = %f\n',bc)
%# Make plots of the data and the function we have fitted
plot(b,f,'.');
hold on
plot(sort(b),targetfunction(A,V,P,bc,sort(b)),'r','LineWidth',2)
end
If I run this function, I see this being printed to the screen:
>> master
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
A = 1.991727
V = 0.979819
P = 0.695265
bc = 0.067431
And the following plot appears:
That fit looks good enough to me. Let me know if you have any questions about anything I've done here.
I am a bit surprised as you mention f(a) and your function does not contain an a, but in general, suppose you want to plot f(x) = cos(x)^2
First determine for which values of x you want to make a plot, for example
xmin = 0;
stepsize = 1/100;
xmax = 6.5;
x = xmin:stepsize:xmax;
y = cos(x).^2;
plot(x,y)
However, note that this approach works just as well in excel, you just have to do some work to get your x values and function in the right cells.

Decrete Fourier Transform in Matlab

I am asked to write an fft mix radix in matlab, but before that I want to let to do a discrete Fourier transform in a straight forward way. So I decide to write the code according to the formula defined as defined in wikipedia.
[Sorry I'm not allowed to post images yet]
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
So I wrote my code as follows:
%Brutal Force Descrete Fourier Trnasform
function [] = dft(X)
%Get the size of A
NN=size(X);
N=NN(2);
%====================
%Declaring an array to store the output variable
Y = zeros (1, N)
%=========================================
for k = 0 : (N-1)
st = 0; %the dummy in the summation is zero before we add
for n = 0 : (N-1)
t = X(n+1)*exp(-1i*2*pi*k*n/N);
st = st + t;
end
Y(k+1) = st;
end
Y
%=============================================
However, my code seems to be outputting a result different from the ones from this website:
http://www.random-science-tools.com/maths/FFT.htm
Can you please help me detect where exactly is the problem?
Thank you!
============
Never mind it seems that my code is correct....
By default the calculator in the web link applies a window function to the data before doing the FFT. Could that be the reason for the difference? You can turn windowing off from the drop down menu.
BTW there is an FFT function in Matlab

creating a train perceptron in MATLAB for gender clasiffication

I am coding a perceptron to learn to categorize gender in pictures of faces. I am very very new to MATLAB, so I need a lot of help. I have a few questions:
I am trying to code for a function:
function [y] = testset(x,w)
%y = sign(sigma(x*w-threshold))
where y is the predicted results, x is the training/testing set put in as a very large matrix, and w is weight on the equation. The part after the % is what I am trying to write, but I do not know how to write this in MATLAB code. Any ideas out there?
I am trying to code a second function:
function [err] = testerror(x,w,y)
%err = sigma(max(0,-w*x*y))
w, x, and y have the same values as stated above, and err is my function of error, which I am trying to minimize through the steps of the perceptron.
I am trying to create a step in my perceptron to lower the percent of error by using gradient descent on my original equation. Does anyone know how I can increment w using gradient descent in order to minimize the error function using an if then statement?
I can put up the code I have up till now if that would help you answer these questions.
Thank you!
edit--------------------------
OK, so I am still working on the code for this, and would like to put it up when I have something more complete. My biggest question right now is:
I have the following function:
function [y] = testset(x,w)
y = sign(sum(x*w-threshold))
Now I know that I am supposed to put a threshold in, but cannot figure out what I am supposed to put in as the threshold! any ideas out there?
edit----------------------------
this is what I have so far. Changes still need to be made to it, but I would appreciate input, especially regarding structure, and advice for making the changes that need to be made!
function [y] = Perceptron_Aviva(X,w)
y = sign(sum(X*w-1));
end
function [err] = testerror(X,w,y)
err = sum(max(0,-w*X*y));
end
%function [w] = perceptron(X,Y,w_init)
%w = w_init;
%end
%------------------------------
% input samples
X = X_train;
% output class [-1,+1];
Y = y_train;
% init weigth vector
w_init = zeros(size(X,1));
w = w_init;
%---------------------------------------------
loopcounter = 0
while abs(err) > 0.1 && loopcounter < 100
for j=1:size(X,1)
approx_y(j) = Perceptron_Aviva(X(j),w(j))
err = testerror(X(j),w(j),approx_y(j))
if err > 0 %wrong (structure is correct, test is wrong)
w(j) = w(j) - 0.1 %wrong
elseif err < 0 %wrong
w(j) = w(j) + 0.1 %wrong
end
% -----------
% if sign(w'*X(:,j)) ~= Y(j) %wrong decision?
% w = w + X(:,j) * Y(j); %then add (or subtract) this point to w
end
you can read this question I did some time ago.
I uses a matlab code and a function perceptron
function [w] = perceptron(X,Y,w_init)
w = w_init;
for iteration = 1 : 100 %<- in practice, use some stopping criterion!
for ii = 1 : size(X,2) %cycle through training set
if sign(w'*X(:,ii)) ~= Y(ii) %wrong decision?
w = w + X(:,ii) * Y(ii); %then add (or subtract) this point to w
end
end
sum(sign(w'*X)~=Y)/size(X,2) %show misclassification rate
end
and it is called from code (#Itamar Katz) like (random data):
% input samples
X1=[rand(1,100);rand(1,100);ones(1,100)]; % class '+1'
X2=[rand(1,100);1+rand(1,100);ones(1,100)]; % class '-1'
X=[X1,X2];
% output class [-1,+1];
Y=[-ones(1,100),ones(1,100)];
% init weigth vector
w=[.5 .5 .5]';
% call perceptron
wtag=perceptron(X,Y,w);
% predict
ytag=wtag'*X;
% plot prediction over origianl data
figure;hold on
plot(X1(1,:),X1(2,:),'b.')
plot(X2(1,:),X2(2,:),'r.')
plot(X(1,ytag<0),X(2,ytag<0),'bo')
plot(X(1,ytag>0),X(2,ytag>0),'ro')
legend('class -1','class +1','pred -1','pred +1')
I guess this can give you an idea to make the functions you described.
To the error compare the expected result with the real result (class)
Assume your dataset is X, the datapoins, and Y, the labels of the classes.
f=newp(X,Y)
creates a perceptron.
If you want to create an MLP then:
f=newff(X,Y,NN)
where NN is the network architecture, i.e. an array that designates the number of neurons at each hidden layer. For example
NN=[5 3 2]
will correspond to an network with 5 neurons at the first layers, 3 at the second and 2 a the third hidden layer.
Well what you call threshold is the Bias in machine learning nomenclature. This should be left as an input for the user because it is used during training.
Also, I wonder why you are not using the builtin matlab functions. i.e newp or newff. e.g.
ff=newp(X,Y)
Then you can set the properties of the object ff to do your job for selecting gradient descent and so on.