bsxfun implementation in solving a min. optimization task

bsxfun implementation in solving a min. optimization task - matlab

I really need help with this one.
I have to matrices L1 and L2, both are (500x3) of size.
First of all, I compute the difference of every element of each column of L1 from L2 as follows:
lib1 = bsxfun(#minus, L1(:,1)',L2(:,1));
lib1=lib1(:);
lib2 = bsxfun(#minus, L1(:,2)',L2(:,2));
lib2=lib2(:);
lib3 = bsxfun(#minus, L1(:,3)',L2(:,3));
lib3=lib3(:);
LBR = [lib1 lib2 lib3];
The result is this matrix LBR. Then I have a min-problem to solve:
[d,p] = min((LBR(:,1) - var1).^2 + (LBR(:,2) - var2).^2 + (LBR(:,3) - var3).^2);
Which returns the point p where this min-problem is fulfied. Finally I can go back to my matrices L1 and L2 to find the index-positions of the values which satisfy this min-problem. I done this as follows:
[minindex_alongL2, minindex_alongL1] = ind2sub(size(L1),p);
This is OK. But what I need now is:
I have to multiply , take the tensor-product, also called Kronecker product of a vector called alpha to LBR, alpha is given as follows:
alpha = 0:0.1:2;
And, this Kronecker product I have computed as follows:
val = bsxfun(#times,LBR,permute(alpha,[3 1 2]));
LBR = reshape(permute(val,[1 3 2]),size(val,1)*size(val,3),[]);
what I need now is: I need to solve the same minproblem:
[d,p] = min((LBR(:,1) - var1).^2 + (LBR(:,2) - var2).^2 + (LBR(:,3) - var3).^2);
but, this time, in addition of finding the index-positions and values from L1 and L2 which satisfies this min-problem, I need to find the index position of the single value from the alpha vector which has been multiplied and which fulfills the min-problem. I don't have idea how can I do this so any help will be very appreciated!
Thanks in advance!
Ps: I can post the L1 and L2 matrices if needed.

I believe you need this correction in your code -
[minindex_alongL2, minindex_alongL1] = ind2sub([size(L2,1) size(L1,1)],p)
For the solution, you need to add the size of p into the index finding in the last step as the vector whose min is calculated has the "added influence" of alpha -
[minindex_alongL2, minindex_alongL1,minindex_alongalpha] = ind2sub([size(L2,1) size(L1,1) numel(alpha)],p)
minindex_alongalpha might be of your interest.

Related

Why is this the correct way to do a cost function for a neural network?

So after beating my head against the wall for a few hours, I looked online for a solution to my problem, and it worked great. I just want to know what caused the issue with the way I was originally going about it.
here are some more details. The input is a 20x20px image from the MNIST datset, and there are 5000 samples, so X, or A1 is 5000x400. There are 25 nodes in the single hidden layer. The output is a one hot vector of 0-9 digits. y (not Y, which is the one hot encoding of y) is a 5000x1 vector with the value of 1-10.
Here was my original code for the cost function:
Y = zeros(m, num_labels);
for i = 1:m
Y(i, y(i)) = 1;
endfor
H = sigmoid(Theta2*[ones(1,m);sigmoid(Theta1*[ones(m, 1) X]'))
J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))')))
But then I found this:
A1 = [ones(m, 1) X];
Z2 = A1 * Theta1';
A2 = [ones(size(Z2, 1), 1) sigmoid(Z2)];
Z3 = A2*Theta2';
H = A3 = sigmoid(Z3);
J = (1/m)*sum(sum((-Y).*log(H) - (1-Y).*log(1-H), 2));
I see that this may be slightly cleaner, but what functionally causes my original code to get 304.88 and the other to get ~ 0.25? Is it the element wise multiplication?
FYI, this is the same problem as this question if you need the formal equation written out.
Thanks for any help I can get! I really want to understand where I'm going wrong

Transfer from the comments:
With a quick look, in J = (1/m) * sum(sum((-Y*log(H]))' - (1-Y)*log(1-H]))'))) there is definetely something going on with the parenthesis, but probably on how you pasted it here, not with the original code as this would throw an error when you run it. If I understand correctly and Y, H are matrices, then in your 1st version Y*log(H) is matrix multiplication while in the 2nd version Y.*log(H) is an entrywise multiplication (not matrix-multiplication, just c(i,j)=a(i,j)*b(i,j) ).
Update 1:
In regards to your question in the comment.
From the first screenshot, you represent each value yk(i) in the entry Y(i,k) of the Y matrix and each value h(x^(i))k as H(i,k). So basically, for each i,k you want to compute Y(i,k) log(H(i,k)) + (1-Y(i,k)) log(1-H(i,k)). You can do it for all the values together and store the result in matrix C. Then C = Y.*log(H) + (1-Y).*log(1-H) and each C(i,k) has the above mentioned value. This is an operation .* because you want to do the operation for each element (i,k) of each matrix (in contrast to multiplying the matrices which is totally different). Afterwards, to get the sum of all the values inside the 2D dimensional matrix C, you use the octave function sum twice: sum(sum(C)) to sum both columnwise and row-wise (or as # Irreducible suggested, just sum(C(:))).
Note there may be other errors as well.

Optimization of matrix on matlab using fmincon

I have a 30x30 matrix as a base matrix (OD_b1), I also have two base vectors (bg and Ag). My aim is to optimize a matrix (X) who's dimensions are 30X30 such that:
1) the squared difference between vector (bg) and vector of sum of all the columns is minimized.
2)the squared difference between vector (Ag) and vector of sum of all rows is minimized.
3)the squared difference between the elements of matrix (X) and matrix (OD_b1) is minimized.
The mathematical form of the equation is as follows:
I have tried this:
fun=#(X)transpose(bg-sum(X,2))*(bg-sum(X,2))+ (Ag-sum(X,1))*transpose(Ag-sum(X,1))+sumsqr(X_b-X);
[val,X]=fmincon(fun,OD_b1,AA,BB,Aeq,beq,LB,UB)
I don't get errors but it seems like it's stuck.
Is it because I have too many variables or is there another reason?
Thanks in advance

This is a simple, unconstrained least squares problem and hence has a simple solution that can be expressed as the solution to a linear system.
I will show you (1) the precise and efficient way to solve this and (2) how to solve with fmincon.
The precise, efficient solution:
Problem setup
Just so we're on the same page, I initialize the variables as follows:
n = 30;
Ag = randn(n, 1); % observe the dimensions
X_b = randn(n, n);
bg = randn(n, 1);
The code:
A1 = kron(ones(1,n), eye(n));
A2 = kron(eye(n), ones(1,n));
A = (A1'*A1 + A2'*A2 + eye(n^2));
b = A1'*bg + A2'*Ag + X_b(:);
x = A \ b; % solves A*x = b
Xstar = reshape(x, n, n);
Why it works:
I first reformulated your problem so the objective is a vector x, not a matrix X. Observe that z = bg - sum(X,2) is equivalent to:
x = X(:) % vectorize X
A1 = kron(ones(1,n), eye(n)); % creates a special matrix that sums up
% stuff appropriately
z = A1*x;
Similarly, A2 is setup so that A2*x is equivalent to Ag'-sum(X,1). Your problem is then equivalent to:
minimize (over x) (bg - A1*x)'*(bg - A1*x) + (Ag - A2*x)'*(Ag - A2*x) + (y - x)'*(y-x) where y = Xb(:). That is, y is a vectorized version of Xb.
This problem is convex and the first order condition is a necessary and sufficient condition for the optimum. Take the derivative with respect to x and that equation will define your solution! Sample example math for almost equivalent (but slightly simpler problem is below):
minimize(over x) (b - A*x)'*(b - A*x) + (y - x)' * (y - x)
rewriting the objective:
b'b- b'Ax - x'A'b + x'A'Ax +y'y - 2y'x+x'x
Is equivalent to:
minimize(over x) (-2 b'A - 2y'*I) x + x' ( A'A + I) * x
the first order condition is:
(A'A+I+(A'A+I)')x -2A'b-2I'y = 0
(A'A+I) x = A'b+I'y
Your problem is essentially the same. It has the first order condition:
(A1'*A1 + A2'*A2 + I)*x = A1'*bg + A2'*Ag + y
How to solve with fmincon
You can do the following:
f = #(X) transpose(bg-sum(X,2))*(bg-sum(X,2)) + (Ag'-sum(X,1))*transpose(Ag'-sum(X,1))+sum(sum((X_b-X).^2));
o = optimoptions('fmincon');%MaxFunEvals',30000);
o.MaxFunEvals = 30000;
Xstar2 = fmincon(f,zeros(n,n),[],[],[],[],[],[],[],o);
You can then check the answers are about the same with:
normdif = norm(Xstar - Xstar2)
And you can see that gap is small, but that the linear algebra based solution is somewhat more precise:
gap = f(Xstar2) - f(Xstar)
If the fmincon approach hangs, try it with a smaller n just to gain confidence that my linear algebra based solution is more precise, way way faster etc... n = 30 is solving a 30^2 = 900 variable optimization problem: not easy. With the linear algebra approach, you can go up to n = 100 (i.e. 10000 variable problem) or even larger.

I would probably solve this as a QP using quadprog using the following reformulation (keeping the objective as simple as possible to make the problem "less nonlinear"):
min sum(i,v(i)^2)+sum(i,w(i)^2)+sum((i,j),z(i,j)^2)
v = bg - sum(c,x)
w = ag - sum(r,x)
Z = xbase-x
The QP solver is more precise (no gradients using finite differences). This approach also allows you to add additional bounds and linear equality and inequality constraints.
The other suggestion to form the first order conditions explicitly is also a good one: it also has no issue with imprecise gradients (the first order conditions are linear). I usually prefer a quadratic model because of its flexibility.

Iteration of matrix-vector multiplication which stores specific index-positions

I need to solve a min distance problem, to see some of the work which has being tried take a look at:
link: click here
I have four elements: two column vectors: alpha of dim (px1) and beta of dim (qx1). In this case p = q = 50 giving two column vectors of dim (50x1) each. They are defined as follows:
alpha = alpha = 0:0.05:2;
beta = beta = 0:0.05:2;
and I have two matrices: L1 and L2.
L1 is composed of three column-vectors of dimension (kx1) each.
L2 is composed of three column-vectors of dimension (mx1) each.
In this case, they have equal size, meaning that k = m = 1000 giving: L1 and L2 of dim (1000x3) each. The values of these matrices are predefined.
They have, nevertheless, the following structure:
L1(kx3) = [t1(kx1) t2(kx1) t3(kx1)];
L2(mx3) = [t1(mx1) t2(mx1) t3(mx1)];
The min. distance problem I need to solve is given (mathematically) as follows:
d = min( (x-(alpha_p*t1_k - beta_q*t1_m)).^2 + (y-(alpha_p*t2_k - beta_q*t2_m)).^2 +
(z-(alpha_p*t3_k - beta_q*t3_m)).^2 )
the values x,y,z are three fixed constants.
My problem
I need to develop an iteration which can give me back the index positions from the combination of: alpha, beta, L1 and L2 which fulfills the min-distance problem from above.
I hope the formulation for the problem is clear, I have been very careful with the index notations. But if it is still not so clear... the step size for:
alpha is p = 1,...50
beta is q = 1,...50
for L1; t1, t2, t3 is k = 1,...,1000
for L2; t1, t2, t3 is m = 1,...,1000
And I need to find the index of p, index of q, index of k and index of m which gives me the min. distance to the point x,y,z.
Thanks in advance for your help!

I don't know your values so i wasn't able to check my code. I am using loops because it is the most obvious solution. Pretty sure that someone from the bsxfun-brigarde ( ;-D ) will find a shorter/more effective solution.
alpha = 0:0.05:2;
beta = 0:0.05:2;
L1(kx3) = [t1(kx1) t2(kx1) t3(kx1)];
L2(mx3) = [t1(mx1) t2(mx1) t3(mx1)];
idx_smallest_d =[1,1,1,1];
smallest_d = min((x-(alpha(1)*t1(1) - beta(1)*t1(1))).^2 + (y-(alpha(1)*t2(1) - beta(1)*t2(1))).^2+...
(z-(alpha(1)*t3(1) - beta(1)*t3(1))).^2);
%The min. distance problem I need to solve is given (mathematically) as follows:
for p=1:1:50
for q=1:1:50
for k=1:1:1000
for m=1:1:1000
d = min((x-(alpha(p)*t1(k) - beta(q)*t1(m))).^2 + (y-(alpha(p)*t2(k) - beta(q)*t2(m))).^2+...
(z-(alpha(p)*t3(k) - beta(q)*t3(m))).^2);
if d < smallest_d
smallest_d=d;
idx_smallest_d= [p,q,k,m];
end
end
end
end
end
What I am doing is predefining the smallest distance as the distance of the first combination and then checking for each combination rather the distance is smaller than the previous shortest distance.

Denormalize results of curve fit on normalized data

I am fitting an exponential decay function with lsqvurcefit in Matlab. To do this I first normalize my data because they differ several orders of magnitude. However Im not sure how to denormalize my fitted parameters.
My fitting model is s = O + A * exp(-t/T) where t and s are known and t is in the order of 10^-3 and s in the order of 10^5. So I subtract from them their mean and divide them by their standarddeviation. My goal is to find the best A, O and T that at the given times t will result most near s. However I dont know how to denormalize my resulting A O and T.
Might somebody know how to do this? I only found this question on SO about normalisation, but does not really address the same problem.

When you normalize, you must record the means and standard deviations for each of your featuers. Then you can easily use those values to denormalize.
e.g.
A = [1 4 7 2 9]';
B = 100 475 989 177 399]';
So you could just normalize right away:
An = (A - mean(A)) / std(A)
but then you can't get back to the original A. So first save the means and stds.
Am = mean(A); Bm = mean(B);
As = std(A); Bs = std(B);
An = (A - Am)/As;
Bn = (B - Bm)/Bs;
now do whatever processing you want and then to denormalize:
Ad = An*As + Am;
Bd = Bn*Bs + Bm;
I'm sure you can see that that's going to be an issue if you have a lot of features (i.e. you have to type code out for each feature, what a mission!) so lets assume your data is arranged as a matrix, data, where each sample is a row and each column is a feature. Now you can do it like this:
data = [A, B]
means = mean(data);
stds = std(data);
datanorm = bsxfun(#rdivide, bsxfun(#minus, data, means), stds);
%// Do processing on datanorm
datadenorm = bsxfun(#plus, bsxfun(#times, datanorm, stds), means);
EDIT:
After you have fit your model parameters (A,O and T) using normalized t and f then your model will expect normalized inputs and produce normalized outputs. So to use it you should first normalize t and then denormalize f.
So to find a new f by running the model on a normalized new t. So f(tn) where tn = (t - tm)/ts and tm is the mean of your training (or fitting) t set and ts the std. Then to get your correct magnitude f you must denormalize only f, so the full solution would be
f(tn)*fs + fm
So once again, all you need to do is save the mean and std you used to normalize.

Matlab using interp1 to find the index?

I have an array of Fa which contains values I found from a function. Is there a way to use interp1 function in Matlab to find the index at which a specific value occurs? I have found tutorials for interp1 which I can find a specific value in the array using interp1 by knowing the corresponding index value.
Example from http://www.mathworks.com/help/matlab/ref/interp1.html:
Here are two vectors representing the census years from 1900 to 1990 and the corresponding United States population in millions of people.
t = 1900:10:1990;
p = [75.995 91.972 105.711 123.203 131.669...
150.697 179.323 203.212 226.505 249.633];
The expression interp1(t,p,1975) interpolates within the census data to estimate the population in 1975. The result is
ans =
214.8585
- but I want to find the t value for 214.8585.

In some sense, you want to find roots of a function -
f(x)-val
First of all, there might be several answers. Second, since the function is piecewise linear, you can check each segment by solving the relevant linear equation.
For example, suppose that you have this data:
t = 1900:10:1990;
p = [75.995 91.972 105.711 123.203 131.669...
150.697 179.323 70.212 226.505 249.633];
And you want to find the value 140
val = 140;
figure;plot(t,p);hold on;
plot( [min(t),max(t)], [val val],'r');
You should first subtract the value of val from p,
p1 = p - val;
Now you want only the segments in which p1 sign changes, either from + -> -, or vice versa.
segments = abs(diff(sign(p1)==1));
In each of these segments, you can solve the relevant linear equation a*x+b==0, and find the root. That is the index of your value.
for i=1:numel(segments)
x(1) = t(segments(i));
x(2) = t(segments(i)+1);
y(1) = p1(segments(i));
y(2) = p1(segments(i)+1);
m = (y(2)-y(1))/(x(2)-x(1));
n = y(2) - m * x(2);
index = -n/m;
scatter(index, val ,'g');
end
And here is the result:

You can search for the value in Fa directly:
idx = Fa==value_to_find;
To find the index use find function:
find(Fa==value_to_find);
Of course, this works only if the value_to_find is present in Fa. But as I understand it, this is what you want. You do not need interp for that.
If on the other hand the value might not be present in Fa, but Fa is sorted, you can search for values larger than value_to_find and take the first such index:
find(Fa>=value_to_find,1);
If your problem is more complicated than that, look at Andreys answer.

Andrey's solution works in principle, but the code presented here does not. The problem is with the definition of the segments, which yields a vector of 0's and 1's, whereafter the call to "t(segments(i))" results in an error (I tried to copy & paste the code - I hope I did not fail in that simple task).
I made a small change to the definition of the segments. It might be done more elegantly. Here it is:
t = 1900:10:1990;
p = [75.995 91.972 105.711 123.203 131.669...
150.697 179.323 70.212 226.505 249.633];
val = 140;
figure;plot(t,p,'.-');hold on;
plot( [min(t),max(t)], [val val],'r');
p1 = p - val;
tn = 1:length(t);
segments = tn([abs(diff(sign(p1)==1)) 0].*tn>0);
for i=1:numel(segments)
x(1) = t(segments(i));
x(2) = t(segments(i)+1);
y(1) = p1(segments(i));
y(2) = p1(segments(i)+1);
m = (y(2)-y(1))/(x(2)-x(1));
n = y(2) - m * x(2);
index = -n/m;
scatter(index, val ,'g');
end

interpolate the entire function to a higher precision. Then search.
t = 1900:10:1990;
p = [75.995 91.972 105.711 123.203 131.669...
150.697 179.323 203.212 226.505 249.633];
precision = 0.5;
ti = 1900:precision:1990;
pi = interp1(t,p,ti);
now pi holds all pi values for every half a year. Assuming the values always increase you could find the year by max(ti(pi < x)) where x = 214.8585. Here pi < x creates a logical vector used to filter ti to only provide the years when p is less than x. max() is then used to take the most recent year, which will also be closest to x if the assumption that p is always increasing holds.

The answer to the most general case was given above by Andrey, and I agree with it.
For the example that you stated, a simple particular solution would be:
interp1(p,t,214.8585)
In this case you are solving for the year when a given population is known.
This approach will NOT work when there is more than one solution. If you try this with Andrey's values you will only get the first solution to the problem.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

bsxfun implementation in solving a min. optimization task - matlab

Related

Why is this the correct way to do a cost function for a neural network?

Optimization of matrix on matlab using fmincon

Iteration of matrix-vector multiplication which stores specific index-positions

Denormalize results of curve fit on normalized data

Matlab using interp1 to find the index?

Categories

Resources