Why is this the correct way to do a cost function for a neural network? - matlab

So after beating my head against the wall for a few hours, I looked online for a solution to my problem, and it worked great. I just want to know what caused the issue with the way I was originally going about it.
here are some more details. The input is a 20x20px image from the MNIST datset, and there are 5000 samples, so X, or A1 is 5000x400. There are 25 nodes in the single hidden layer. The output is a one hot vector of 0-9 digits. y (not Y, which is the one hot encoding of y) is a 5000x1 vector with the value of 1-10.
Here was my original code for the cost function:
Y = zeros(m, num_labels);
for i = 1:m
Y(i, y(i)) = 1;
endfor
H = sigmoid(Theta2*[ones(1,m);sigmoid(Theta1*[ones(m, 1) X]'))
J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))')))
But then I found this:
A1 = [ones(m, 1) X];
Z2 = A1 * Theta1';
A2 = [ones(size(Z2, 1), 1) sigmoid(Z2)];
Z3 = A2*Theta2';
H = A3 = sigmoid(Z3);
J = (1/m)*sum(sum((-Y).*log(H) - (1-Y).*log(1-H), 2));
I see that this may be slightly cleaner, but what functionally causes my original code to get 304.88 and the other to get ~ 0.25? Is it the element wise multiplication?
FYI, this is the same problem as this question if you need the formal equation written out.
Thanks for any help I can get! I really want to understand where I'm going wrong

Transfer from the comments:
With a quick look, in J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))'))) there is definetely something going on with the parenthesis, but probably on how you pasted it here, not with the original code as this would throw an error when you run it. If I understand correctly and Y, H are matrices, then in your 1st version Y*log(H) is matrix multiplication while in the 2nd version Y.*log(H) is an entrywise multiplication (not matrix-multiplication, just c(i,j)=a(i,j)*b(i,j) ).
Update 1:
In regards to your question in the comment.
From the first screenshot, you represent each value yk(i) in the entry Y(i,k) of the Y matrix and each value h(x^(i))k as H(i,k). So basically, for each i,k you want to compute Y(i,k) log(H(i,k)) + (1-Y(i,k)) log(1-H(i,k)). You can do it for all the values together and store the result in matrix C. Then C = Y.*log(H) + (1-Y).*log(1-H) and each C(i,k) has the above mentioned value. This is an operation .* because you want to do the operation for each element (i,k) of each matrix (in contrast to multiplying the matrices which is totally different). Afterwards, to get the sum of all the values inside the 2D dimensional matrix C, you use the octave function sum twice: sum(sum(C)) to sum both columnwise and row-wise (or as # Irreducible suggested, just sum(C(:))).
Note there may be other errors as well.

Related

Error using sin. Not enough input arguments. Why is that so?

i am solving a mathematical problem and i can't continue do to the error.
I tried all constant with sin^2(x) yet its the same.
clear
clc
t = 1:0.5:10;
theta = linspace(0,pi,19);
x = 2*sin(theta)
y = sin^2*(theta)*(t/4)
Error using sin
Not enough input arguments.
Error in lab2t114 (line 9)
y = sin^2*(theta)*(t/4)
sin is a function so it should be called as sin(value) which in this case is sin(theta) It may help to consider writing everything in intermediate steps:
temp = sin(theta);
y = temp.^2 ...
Once this is done you can always insert lines from previous calculations into the next line, inserting parentheses to ensure order of operations doesn't mess things up. Note in this case you don't really need the parentheses.
y = (sin(theta)).^2;
Finally, Matlab has matrix wise and element wise operations. The element wise operations start with a period '.' In Matlab you can look at, for example, help .* (element wise multiplication) and help * matrix wise calculation. For a scalar, like 2 in your example, this distinction doesn't matter. However for calculating y you need element-wise operations since theta and t are vectors (and in this case you are not looking to do matrix multiplication - I think ...)
t = 1:0.5:10;
theta = linspace(0,pi,19);
x = 2*sin(theta) %times scalar so no .* needed
sin_theta = sin(theta);
sin_theta_squared = sin_theta.^2; %element wise squaring needed since sin_theta is a vector
t_4 = t/4; %divide by scalar, this doesn't need a period
y = sin_theta_squared.*t_4; %element wise multiplication since both variables are arrays
OR
y = sin(theta).^2.*(t/4);
Also note these intermediate variables are largely for learning purposes. It is best not to write actual code like this since, in this case, the last line is a lot cleaner.
EDIT: Brief note, if you fix the sin(theta) error but not the .^ or .* errors, you would get some error like "Error using * Inner matrix dimensions must agree." - this is generally an indication that you forgot to use the element-wise operators

Fix matlab code with error

I understand that there is an error with dimensions in line dr=(r-v*v/2)*dT . But I have little knowledge of Matlab. Help to fix it, please. The code is small and simple. Maybe someone will find time to look
function [optionPrice] = upAndOutCallOption(S,r,v,x,b,T,dT)
t = 0;
dr=[];
pert=[];
while (t < T) & (S < b)
t = t + dT;
dr = (r - v.*v./2).*dT;
pert = v.*sqrt( dT ).*randn();
S = S.*exp(dr + pert);
end
if S<b
% Within barrier, so price as for a European option.
optionPrice = exp(-r.*T).* max(0, S - x);
else
% Hit the barrier, so the option is withdrawn.
optionPrice = 0;
end
end
Call from another function of this kind:
for k=1:amountOfOptions
[optionPrices(k)] = upAndOutCallOption(stockPrice(k)*o,riskFreeRate(k)*o,... volatility(k)*o, strike(k)*o, barrier(k)*o, timeToExpiry(k)*o, sampleRate(k)*o);
result(k) = mean(optionPrices(k));
end
Therefore, any difficulties.
It's good that you know the problem is within dr = (r - v.*v./2).*dT;. The command itself has many possible problems which also related to dimensions:
Here you are doing element-wise multiplication (because of the .*) with matrices, which requires (in the case of your command) that r has the same number of rows AND columns as v (since because of element-wise, v.*v/2 has the same size as v).
Moreover, it is unnecessary to do element-wise division with scalar number, that means there is no need to have ./2 in Matlab.
And, finally, since it's element-wise multiplication again, the matrix (r - v.*v./2) must also have the same number of rows and columns as matrix dT.
Check here for more information about Matlab's matrix operations.

Optimizing code, removing "for loop"

I'm trying to remove outliers from a tick data series, following Brownlees & Gallo 2006 (if you may be interested).
The code works fine but given that I'm working on really long vectors (the biggest has 20m observations and after 20h it was not done computing) I was wondering how to speed it up.
What I did until now is:
I changed the time and date format to numeric double and I saw that it saves quite some time in processing and A LOT OF MEMORY.
I allocated memory for the vectors:
[n] = size(price);
x = price;
score = nan(n,'double'); %using tic and toc I saw that nan requires less time than zeros
trimmed_mean = nan(n,'double');
sd = nan(n,'double');
out_mat = nan(n,'double');
Here is the loop I'd love to remove. I read that vectorizing would speed up a lot, especially using long vectors.
for i = k+1:n
trimmed_mean(i) = trimmean(x(i-k:i-1 & i+1:i+k),10,'round'); %trimmed mean computed on the 'k' closest observations to 'i' (i is excluded)
score(i) = x(i) - trimmed_mean(i);
sd(i) = std(x(i-k:i-1 & i+1:i+k)); %same as the mean
tmp = abs(score(i)) > (alpha .* sd(i) + gamma);
out_mat(i) = tmp*1;
end
Here is what I was trying to do
trimmed_mean=trimmean(regroup_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(regroup_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
out_mat = temp*1;
But given that I'm totally new to Matlab, I don't know how to properly construct the matrix of neighbouring observations. I just think it should be shaped like: regroup_matrix= nan (n,2*k).
EDIT: To be specific, what I am trying to do (and I am not able to) is:
Given a column vector "x" (n,1) for each observation "i" in "x" I want to take the "k" neighbouring observations to "i" (from i-k to i-1 and from i+1 to i+k) and put these observations as rows of a matrix (n, 2*k).
EDIT 2: I made a few changes to the code and I think I am getting closer to the solution. I posted another question specific to what I think is the problem now:
Matlab: Filling up matrix rows using moving intervals from a column vector without a for loop
What I am trying to do now is:
[n] = size(price,1);
x = price;
[j1]=find(x);
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);
matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);
matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
I have problems with the matrix_left and matrix_right creation.
j1, that I am using for indexing is a column vector with the indices of price's observations. The output is simply
j1=[1:1:n]
price is a column vector of double with size(n,1)
For your reshape, you can do the following:
idxArray = bsxfun(#plus,(k:n)',[-k:-1,1:k]);
reshapedArray = x(idxArray);
Thanks to Jonas that showed me the way to go I came up with this:
idxArray_left=bsxfun(#plus,(k+1:n)',[-k:-1]); %matrix with index of left neighbours observations
idxArray_fill_left=bsxfun(#plus,(1:k)',[1:k]); %for observations from 1:k I take the right neighbouring observations, this way when computing mean and standard deviations there will be no problems.
matrix_left=[idxArray_fill_left; idxArray_left]; %Just join the two matrices and I have the complete matrix of left neighbours
idxArray_right=bsxfun(#plus,(1:n-k)',[1:k]); %same thing as left but opposite.
idxArray_fill_right=bsxfun(#plus,(n-k+1:n)',[-k:-1]);
matrix_right=[idxArray_right; idxArray_fill_right];
idx_matrix=[matrix_left matrix_right]; %complete index matrix, joining left and right indices
neigh_matrix=x(idx_matrix); %exactly as proposed by Jonas, I fill up a matrix of observations from 'x', following idx_matrix indexing
trimmed_mean=trimmean(neigh_matrix,10,'round',2);
score=bsxfun(#minus,x,trimmed_mean);
sd=std(neigh_matrix,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;
Again, thanks a lot to Jonas. You really made my day!
Thanks also to everyone that had a look to the question and tried to help!

Matlab: Finding two unknown constants/parameters in an equation

I've read up on fsolve and solve, and tried various methods of curve fitting/regression but I feel I need a bit of guidance here before I spend more time trying to make something work that might be the wrong approach.
I have a series of equations I am trying to fit to a data set (x) separately:
for example:
(a+b*c)*d = x
a*(1+b*c)*d = x
x = 1.9248
3.0137
4.0855
5.0097
5.7226
6.2064
6.4655
6.5108
6.3543
6.0065
c= 0.0200
0.2200
0.4200
0.6200
0.8200
1.0200
1.2200
1.4200
1.6200
1.8200
d = 1.2849
2.2245
3.6431
5.6553
8.3327
11.6542
15.4421
19.2852
22.4525
23.8003
I know c, d and x - they are observations. My unknowns are a and b, and should be constant.
I could do it manually for each x observation but there must be an automatic and far superior way or at least another approach.
Very grateful if I could receive some guidance. Thanks for the time!
Given your two example equations; let y=x./d, then
y = a+b*c
y = a+a*b*c
The first case is just a line, for which you can obtain a least squares fit (values for a and b) with polyfit(). In the second case, you can just say k=a*b (since these are both fitted anyway), then rewrite it as:
y = a+k*c
Which is exactly the same line as the first problem, except now b = k/a. In fact, b=b1/a is the solution to the second problem where b1 is the fit from the first problem. In short, to solve them both, you need one call to polyfit() and a couple of divisions.
Will that work for you?
I see two different equations to fit here. To spell out the code:
For (a+b*c)*d = x
p = polyfit(c, x./d, 1);
a = p(2);
b = p(1);
For a*(1+b*c)*d = x
p = polyfit(c, x./d, 1);
a = p(2);
b = p(1) / a;
No need for polyfit; this is just a linear least squares problem, which is best solved with MATLAB's slash operator:
>> ab = [ones(size(c)) c] \ (x./d)
ans =
1.411437211703194e+000 % 'a'
-7.329687661579296e-001 % 'b'
Faster, cleaner, more educative :)
And, as Emmet already said, your second equation is nothing more than a different form of your first equation, the difference being that the b in your first equation, is equal to a*b in your second one.

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.