Related
So after beating my head against the wall for a few hours, I looked online for a solution to my problem, and it worked great. I just want to know what caused the issue with the way I was originally going about it.
here are some more details. The input is a 20x20px image from the MNIST datset, and there are 5000 samples, so X, or A1 is 5000x400. There are 25 nodes in the single hidden layer. The output is a one hot vector of 0-9 digits. y (not Y, which is the one hot encoding of y) is a 5000x1 vector with the value of 1-10.
Here was my original code for the cost function:
Y = zeros(m, num_labels);
for i = 1:m
Y(i, y(i)) = 1;
endfor
H = sigmoid(Theta2*[ones(1,m);sigmoid(Theta1*[ones(m, 1) X]'))
J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))')))
But then I found this:
A1 = [ones(m, 1) X];
Z2 = A1 * Theta1';
A2 = [ones(size(Z2, 1), 1) sigmoid(Z2)];
Z3 = A2*Theta2';
H = A3 = sigmoid(Z3);
J = (1/m)*sum(sum((-Y).*log(H) - (1-Y).*log(1-H), 2));
I see that this may be slightly cleaner, but what functionally causes my original code to get 304.88 and the other to get ~ 0.25? Is it the element wise multiplication?
FYI, this is the same problem as this question if you need the formal equation written out.
Thanks for any help I can get! I really want to understand where I'm going wrong
Transfer from the comments:
With a quick look, in J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))'))) there is definetely something going on with the parenthesis, but probably on how you pasted it here, not with the original code as this would throw an error when you run it. If I understand correctly and Y, H are matrices, then in your 1st version Y*log(H) is matrix multiplication while in the 2nd version Y.*log(H) is an entrywise multiplication (not matrix-multiplication, just c(i,j)=a(i,j)*b(i,j) ).
Update 1:
In regards to your question in the comment.
From the first screenshot, you represent each value yk(i) in the entry Y(i,k) of the Y matrix and each value h(x^(i))k as H(i,k). So basically, for each i,k you want to compute Y(i,k) log(H(i,k)) + (1-Y(i,k)) log(1-H(i,k)). You can do it for all the values together and store the result in matrix C. Then C = Y.*log(H) + (1-Y).*log(1-H) and each C(i,k) has the above mentioned value. This is an operation .* because you want to do the operation for each element (i,k) of each matrix (in contrast to multiplying the matrices which is totally different). Afterwards, to get the sum of all the values inside the 2D dimensional matrix C, you use the octave function sum twice: sum(sum(C)) to sum both columnwise and row-wise (or as # Irreducible suggested, just sum(C(:))).
Note there may be other errors as well.
I am fitting an exponential decay function with lsqvurcefit in Matlab. To do this I first normalize my data because they differ several orders of magnitude. However Im not sure how to denormalize my fitted parameters.
My fitting model is s = O + A * exp(-t/T) where t and s are known and t is in the order of 10^-3 and s in the order of 10^5. So I subtract from them their mean and divide them by their standarddeviation. My goal is to find the best A, O and T that at the given times t will result most near s. However I dont know how to denormalize my resulting A O and T.
Might somebody know how to do this? I only found this question on SO about normalisation, but does not really address the same problem.
When you normalize, you must record the means and standard deviations for each of your featuers. Then you can easily use those values to denormalize.
e.g.
A = [1 4 7 2 9]';
B = 100 475 989 177 399]';
So you could just normalize right away:
An = (A - mean(A)) / std(A)
but then you can't get back to the original A. So first save the means and stds.
Am = mean(A); Bm = mean(B);
As = std(A); Bs = std(B);
An = (A - Am)/As;
Bn = (B - Bm)/Bs;
now do whatever processing you want and then to denormalize:
Ad = An*As + Am;
Bd = Bn*Bs + Bm;
I'm sure you can see that that's going to be an issue if you have a lot of features (i.e. you have to type code out for each feature, what a mission!) so lets assume your data is arranged as a matrix, data, where each sample is a row and each column is a feature. Now you can do it like this:
data = [A, B]
means = mean(data);
stds = std(data);
datanorm = bsxfun(#rdivide, bsxfun(#minus, data, means), stds);
%// Do processing on datanorm
datadenorm = bsxfun(#plus, bsxfun(#times, datanorm, stds), means);
EDIT:
After you have fit your model parameters (A,O and T) using normalized t and f then your model will expect normalized inputs and produce normalized outputs. So to use it you should first normalize t and then denormalize f.
So to find a new f by running the model on a normalized new t. So f(tn) where tn = (t - tm)/ts and tm is the mean of your training (or fitting) t set and ts the std. Then to get your correct magnitude f you must denormalize only f, so the full solution would be
f(tn)*fs + fm
So once again, all you need to do is save the mean and std you used to normalize.
I am trying to determine the (x,y,z) coordinates of a point p. What I have are the distances to 4 different points m1, m2, m3, m4 with known coordinates.
In detail: what I have is the coordinates of 4 points (m1,m2,m3,m4) and they are not in the same plane:
m1: (x1,y1,z1),
m2: (x2,y2,z2),
m3: (x3,y3,z3),
m4: (x4,y4,z4)
and the Euclidean distances form m1->p, m2->p, m3->p and m4->p which are
D1 = sqrt( (x-x1)^2 + (y-y1)^2 + (z-z1)^2);
D2 = sqrt( (x-x2)^2 + (y-y2)^2 + (z-z2)^2);
D3 = sqrt( (x-x3)^2 + (y-y3)^2 + (z-z3)^2);
D4 = sqrt( (x-x4)^2 + (y-y4)^2 + (z-z4)^2);
I am looking for (x,y,z). I tried to solve this non-linear system of 4 equations and 3 unknowns with matlab fsolve by taking the euclidean distances but didn't manage.
There are two questions:
How can I find the unknown coordinates of point p: (x,y,z)
What is the minimum number of points m with known coordinates and
distances to p that I need in order to find (x,y,z)?
EDIT:
Here is a piece of code that gives no solutions:
Lets say that the points I have are:
m1 = [ 370; 1810; 863];
m2 = [1586; 185; 1580];
m3 = [1284; 1948; 348];
m4 = [1732; 1674; 1974];
x = cat(2,m1,m2,m3,m4)';
And the distance from each point to p are
d = [1387.5; 1532.5; 1104.7; 0855.6]
From what I understood if I want to run fsolve I have to use the following:
1. Create a function
2. Call fsolve
function F = calculateED(p)
m1 = [ 370; 1810; 863];
m2 = [1586; 185; 1580];
m3 = [1284; 1948; 348];
m4 = [1732; 1674; 1974];
x = cat(2,m1,m2,m3,m4)';
d = [1387.5; 1532.5; 1104.7; 0855.6]
F = [d(1,1)^2 - (p(1)-x(1,1))^2 - (p(2)-x(1,2))^2 - (p(3)-x(1,3))^2;
d(2,1)^2 - (p(1)-x(2,1))^2 - (p(2)-x(2,2))^2 - (p(3)-x(2,3))^2;
d(3,1)^2 - (p(1)-x(3,1))^2 - (p(2)-x(3,2))^2 - (p(3)-x(3,3))^2;
d(4,1)^2 - (p(1)-x(4,1))^2 - (p(2)-x(4,2))^2 - (p(3)-x(4,3))^2;];
and then call fsolve:
p0 = [1500,1500,1189]; % initial guess
options = optimset('Algorithm',{'levenberg-marquardt',.001},'Display','iter','TolX',1e-1);
[p,Fval,exitflag] = fsolve(#calculateED,p0,options);
I am running Matlab 2011b.
Am I missing something?
How would the least squares solution be?
One note here is that m1, m2, m3, m4 and d values may not be given accurately but for an analytical solution that shouldn't be a problem.
mathematica readily numericall solves the three point problem:
p = Table[ RandomReal[{-1, 1}, {3}], {3}]
r = RandomReal[{1, 2}, {3}]
Reduce[Simplify[ Table[Norm[{x, y, z} - p[[i]]] == r[[i]] , {i, 3}],
Assumptions -> {Element[x | y | z, Reals]}], {x, y, z}, Reals]
This will typically return false as random spheres will typically not have triple intersection points.
When you have a solution you'll typically have a pair like this..
(* (x == -0.218969 && y == -0.760452 && z == -0.136958) ||
(x == 0.725312 && y == 0.466006 && z == -0.290347) *)
This somewhat surprisingly has a failrly elegent analytic solution. Its a bit involved so I'll wait to see if someone has it handy and if not and there is interest I'll try to remember the steps..
Edit, approximate solution following Dmitys least squares suggestion:
p = {{370, 1810, 863}, {1586, 185, 1580}, {1284, 1948, 348}, {1732,
1674, 1974}};
r = {1387.5, 1532.5, 1104.7, 0855.6};
solution = {x, y, z} /.
Last#FindMinimum[
Sum[(Norm[{x, y, z} - p[[i]]] - r[[i]] )^2, {i, 1, 4}] , {x, y, z}]
Table[ Norm[ solution - p[[i]]], {i, 4}]
As you see you are pretty far from exact..
(* solution point {1761.3, 1624.18, 1178.65} *)
(* solution radii: {1438.71, 1504.34, 1011.26, 797.446} *)
I'll answer the second question. Let's name the unknown point X. If you have only known point A and know the distance form X to A then X can be on a sphere with the center in A.
If you have two points A,B then X is on a circle given by the intersection of the spheres with centers in A and B (if they intersect that is).
A third point will add another sphere and the final intersection between the three spheres will give two points.
The fourth point will finaly decide which of those two points you're looking for.
This is how GPS actually works. You have to have at least three satellites. Then the GPS will guess which of the two points is the correct one, since the other one is in space, but it won't be able to tell you the altitude. Technically it should, but there are also errors, so the more satellites you "see" the less the error.
Have found this question which might be a starting point.
Take first three equation and solve i for 3 equation and 3 variables in MATLAB. After solving the equation you will get two pairs of values or we can say two set of coordinates of p.
keep each set in the 4th equation and you can find out the set which satisfies the equation is the answer
I've read up on fsolve and solve, and tried various methods of curve fitting/regression but I feel I need a bit of guidance here before I spend more time trying to make something work that might be the wrong approach.
I have a series of equations I am trying to fit to a data set (x) separately:
for example:
(a+b*c)*d = x
a*(1+b*c)*d = x
x = 1.9248
3.0137
4.0855
5.0097
5.7226
6.2064
6.4655
6.5108
6.3543
6.0065
c= 0.0200
0.2200
0.4200
0.6200
0.8200
1.0200
1.2200
1.4200
1.6200
1.8200
d = 1.2849
2.2245
3.6431
5.6553
8.3327
11.6542
15.4421
19.2852
22.4525
23.8003
I know c, d and x - they are observations. My unknowns are a and b, and should be constant.
I could do it manually for each x observation but there must be an automatic and far superior way or at least another approach.
Very grateful if I could receive some guidance. Thanks for the time!
Given your two example equations; let y=x./d, then
y = a+b*c
y = a+a*b*c
The first case is just a line, for which you can obtain a least squares fit (values for a and b) with polyfit(). In the second case, you can just say k=a*b (since these are both fitted anyway), then rewrite it as:
y = a+k*c
Which is exactly the same line as the first problem, except now b = k/a. In fact, b=b1/a is the solution to the second problem where b1 is the fit from the first problem. In short, to solve them both, you need one call to polyfit() and a couple of divisions.
Will that work for you?
I see two different equations to fit here. To spell out the code:
For (a+b*c)*d = x
p = polyfit(c, x./d, 1);
a = p(2);
b = p(1);
For a*(1+b*c)*d = x
p = polyfit(c, x./d, 1);
a = p(2);
b = p(1) / a;
No need for polyfit; this is just a linear least squares problem, which is best solved with MATLAB's slash operator:
>> ab = [ones(size(c)) c] \ (x./d)
ans =
1.411437211703194e+000 % 'a'
-7.329687661579296e-001 % 'b'
Faster, cleaner, more educative :)
And, as Emmet already said, your second equation is nothing more than a different form of your first equation, the difference being that the b in your first equation, is equal to a*b in your second one.
I have intensity points which is marked as pink in above plot, and these are stored in variable and is given as
intensity_info =[ 35.9349
46.4465
46.4790
45.7496
44.7496
43.4790
42.5430
41.4351
40.1829
37.4114
33.2724
29.5447
26.8373
24.8171
24.2724
24.2487
23.5228
23.5228
24.2048
23.7057
22.5228
22.0000
21.5210
20.7294
20.5430
20.2504
20.2943
21.0219
22.0000
23.1096
25.2961
29.3364
33.4351
37.4991
40.8904
43.2706
44.9798
47.4553
48.9324
48.6855
48.5210
47.9781
47.2285
45.5342
34.2310 ];
I also have information of point A, B and C which is calculated by :
[maxtab, mintab] = peakdet(intensity_info, 1); % maxtab has A and B information and
% mintab has C information
peakdet.m matlab code can be found here: (http://www.billauer.co.il/peakdet.html). I want to calculate point D (where there is sight increase in intensity value i.e. if we come down from point A intensity decreases but at point D there is slight increase in intensity). As seen from graph below point C can also lie in the left of point D and in this case if we come down from point B intensity decrease and at D there is slight increase in intensity. Intensity values for below graph below is given as:
intensity_info =[29.3424
39.4847
43.7934
47.4333
49.9123
51.4772
52.1189
51.6601
48.8904
45.0000
40.9561
36.5868
32.5904
31.0439
29.9982
27.9579
26.6965
26.7312
28.5631
29.3912
29.7496
29.7715
29.7294
30.2706
30.1847
29.7715
29.2943
29.5667
31.0877
33.5228
36.7496
39.7496
42.5009
45.7934
49.1847
52.2048
53.9123
54.7276
54.9781
55.0000
54.9781
54.7276
53.9342
51.4246
38.2512];
and Point A ,B and C calculated in same manner as above.
How can I calculate point D in these cases?
I'm not MATLAB literate, but if the 'tabs' are subtables, then perhaps you can manipulate them to create other subtables... something like (I repeat, illiterate)
left_of_graph = part of graph from A to C
right_of_graph = part of graph from C to B
left_delta = some fraction of the difference between A's y-value and C's y-value
right_delta = some fraction of the difference between C's y-value and B's y-value
[left_maxtab,left_mintab] = peakdet(left_of_graph,left_delta)
[right_maxtab,right_mintab] = peakdet(right_of_graph,right_delta)
I do have some experience with peak analysis, so I will say this will help, not answer, the problem. You can find all the peaks you want, but that doesn't mean the data is worth squinting at. Noise and imperfect resolution are real. Happy hunting!
PS You might also scan for all points higher than both neighbors in the whole band. That is gauranteed to miss none of the 'true' maxima, but to give you more 'false' maxima than you can count (although your data looks pretty smooth!).
The solution your looking for is a numerical method based on iterations,in your particular case bisection is the one who best suits cause others like uniform sequential search do not take an interval as an input.
This is the implementation for bisection:
function [ a, b, L ] = Bisection( f, a, b, e, d )
%[a,b] is the interval for your local maxima; e is the error for the result and d is the step(dx).
L = b - a;
while(L > e)
xa = ((a + b)/2) - d/2;
xb = ((a + b)/2) + d/2;
ya = subs(f,xa);
yb = subs(f,xb);
if(ya < yb)
a = xa;
else
b = xb;
end
L = b - a;
end
end
The method before is pretty efficient and straightforward to use, although there are others even better(in performance) like Fibonacci and Gold Section methods.
Cheers.
I have found the alternative solution. The extrema.m helped in finding Point D in above two garphs. The extrema.m can be downloaded from (http://www.mathworks.com/matlabcentral/fileexchange/12275-extrema-m-extrema2-m) and used in following way to find point D:
[ymax,imax,ymin,imin] = extrema(intensity_info);
figure;plot(x,intensity_info,x(imax),ymax,'g.',x(imin),ymin,'r.');