I have done a MATLAB version of the push-relabel FIFO code (exactly like the one on wikipedia and tried it. The discharge function is exactly like wikipedia.
It works perfectly for small graphs (e.g. number of Nodes = 7). However, when I increase my graph size (i.e. number of nodes/vertices > 3500 or more) the "relabel" function runs very slowly, which is called in the "discharge" function. My graphs are huge (i.e. > 3000nodes) so I need to optimize my code.
I tried to optimize the code according to WIKIPEDIA suggestions for global relabeling/gap relabeling:
1) Make neighbour lists for each node, and let the index seen[u] be an iterator over this, instead of the range .
2) Use a gap heuristic.
I'm stuck at the first one , I don't understand what exactly I have to do since it seems there's details left out. (I made neighbor lists such that for vertex u, any connected nodes v(1..n) to u is in the neighbor list already, just not sure how to iterate with the seen[u] index).
[r,c] = find(C);
uc = unique(c);
s = struct;
for i=1:length(uc)
ind = find(c == uc(i));
s(uc(i)).n = [r(ind)];
end
AND the discharge function uses the 's' neighborhood struct list:
while (excess(u) > 0) %test if excess of current node is >0, if so...
if (seen(u) <= length(s(u).n)) %check next neighbor
v = s(u).n(seen(u));
resC = C(u,v) - F(u,v);
if ((resC > 0) && (height(u) == height(v) + 1)) %and if not all neighbours have been tried since last relabel
[C, F, excess] = push(C, F, excess, u, v); %push into untried neighbour
else
seen(u) = seen(u) + 1;
height = relabel(C, F, height, u, N);
end
else
height = relabel(C, F, height, u, N);
seen(u) = 1; %relabel start of queue
end
end
Can someone direct, show or help me please?
Related
I'm new to Matlab and want to write a program that chooses the value of a parameter (P) to minimize the difference between two vectors, where each vector is a variable in a dataframe. The first vector (call it A) is a predetermined vector of 1s and 0s, and the second vector (call it B) has each of its entries determined as an indicator function that depends on the value of the parameter P and other variables in the dataframe. For instance, let C be a third variable in the dataset, so
A = [1, 0, 0, 1, 0]
B = [x, y, z, u, v]
where x = 1 if (C[1]+10)^0.5 - P > (C[1])^0.5 and otherwise x = 0, and similarly, y = 1 if (C[2]+10)^0.5 - P > (C[2])^0.5 and otherwise y = 0, and so on.
I'm not really sure where to start with the code, except that it might be useful to use the fminsearch command. Any suggestions?
Edit: I changed the above by raising to a power, which is closer to the actual example that I have. I'm also providing a complete example in response to a comment:
Let A be as above, and let C = [10, 1, 100, 1000, 1]. Then my goal with the Matlab code would be to choose a value of P to minimize the differences between the coordinates of the vectors A and B, where B[1] = 1 if (10+10)^0.5 - P > (10)^0.5 and otherwise B[1] = 0, and similarly B[2] = 1 if (1+10)^0.5 - P > (1)^0.5 and otherwise B[2] = 0, etc. So I want to choose P to maximize the likelihood that A[1] = B[1], A[2] = B[2], etc.
I have the following setup in Matlab, where ds is the name of my dataset:
ds.B = zeros(size(ds,1),1); % empty vector to fill
for i = 1:size(ds,1)
if ((ds.C(i) + 10)^(0.5) - P > (ds.C(i))^(0.5))
ds.B(i) = 1;
else
ds.B(i) = 0;
end
end
Now I want to choose the value of P to minimize the difference between A and B. How can I do this?
EDIT: I'm also wondering how to do this when the inequality is something like (C[i]+10)^0.5 - P*D[i] > (C[i])^0.5, where D is another variable in my dataset. Now P is a scalar being multiplied rather than just added. This seems more complicated since I can't solve for P exactly. How can I solve the problem in this case?
EDIT 1: It seems fminbnd() isn't optimal, likely due to the stairstep nature of the indicator function. I've updated to test the midpoints of all the regions between indicator function flips, plus endpoints.
EDIT 2: Updated to include dataset D as a coefficient of P.
If you can package your distance calculation up in a single function based on P, you can then search for its minimum.
arraySize = 1000;
ds.A = double(rand([arraySize,1]) > 0.5);
ds.C = rand(size(ds.A));
ds.D = rand(size(ds.A));
B = #(P)double((ds.C+10).^0.5 - P.*ds.D > ds.C.^0.5);
costFcn = #(P)sqrt(sum((ds.A-B(P)).^2));
% Solving the equation (C+10)^0.5 - P*D = C^0.5 for P, and sorting the results
BCrossingPoints = sort(((ds.C+10).^0.5-ds.C.^0.5)./ds.D);
% Taking the average of each crossing point with its neighbors
BMidpoints = (BCrossingPoints(1:end-1)+BCrossingPoints(2:end))/2;
% Appending endpoints onto the midpoints
PsToTest = [BCrossingPoints(1)-0.1; BMidpoints; BCrossingPoints(end)+0.1];
% Calculate the distance from A to B at each P to test
costResult = arrayfun(costFcn,PsToTest);
% Find the minimum cost
[~,lowestCostIndex] = min(costResult);
% Find the optimum P
optimumP = PsToTest(lowestCostIndex);
ds.B = B(optimumP);
semilogx(PsToTest,costResult)
xlabel('P')
ylabel('Distance from A to B')
1.- x is assumed positive real only, because with x<0 then complex values show up.
Since no comment is made in the question it seems reasonable to assume x real and x>0 only.
As requested, P 'the parameter' a scalar, P only has 2 significant states >0 or <0, let's see how is this:
2.- The following lines generate kind-of random A and C.
Then a sweep of p is carried out and distances d1 and d2 are calculated.
d1 is euclidean distance and d2 is the absolute of the difference between A and and B converting both from binary to decimal:
N=10
% A=[1 0 0 1 0]
A=randi([0 1],1,N);
% C=[10 1 1e2 1e3 1]
C=randi([0 1e3],1,N)
p=[-1e4:1:1e4]; % parameter to optimize
B=zeros(1,numel(A));
d1=zeros(1,numel(p)); % euclidean distance
d2=zeros(1,numel(p)); % difference distance
for k1=1:1:numel(p)
B=(C+10).^.5-p(k1)>C.^.5;
d1(k1)=(sum((B-A).^2))^.5;
d2(k1)=abs(sum(A.*2.^[numel(A)-1:-1:0])-sum(B.*2.^[numel(A)-1:-1:0]));
end
figure;
plot(p,d1)
grid on
xlabel('p');title('d1')
figure
plot(p,d2)
grid on
xlabel('p');title('d2')
The only degree of freedom to optimise seems to be the sign of P regardless of |P| value.
3.- f(p,x) has either no root, or just one root, depending upon p
The threshold funtion is
if f(x)>0 then B(k)==1 else B(k)==0
this is
f(p,x)=(x+10)^.5-p-x^.5
Now
(x+10).^.5-p>x.^.5 is same as (x+10).^.5-x.^.5>p
There's a range of p that keeps f(p,x)=0 without any (real) root.
For the particular case p=0 then (x+10).^.5 and x.^.5 do not intersect (until Inf reached = there's no intersection)
figure;plot(x,(x+10).^.5,x,x.^.5);grid on
[![enter image description here][3]][3]
y2=diff((x+10).^.5-x.^.5)
figure;plot(x(2:end),y2);
grid on;xlabel('x')
title('y2=diff((x+10).^.5-x.^.5)')
[![enter image description here][3]][3]
% 005
This means the condition f(x)>0 is always true holding all bits of B=1. With B=1 then d(A,B) turns into d(A,1), a constant.
However, for a certain value of p then there's one root and f(x)>0 is always false keeping all bits of B=0.
In this case d(A,B) the cost function turns into d(A,0) and this is A itself.
4.- P as a vector
The optimization gains in degrees of freedom if instead of P scalar, P is considered as vector.
For a given x there's a value of p that switches B(k) from 0 to 1.
Any value of p below such threshold keeps B(k)=0.
Equivalently, inverting f(x) :
g(p)=(10-p^2)^2/(4*p^2)>x
Values of x below this threshold bring B closer to A because for each element of B it's flipped to the element value of A.
Therefore, it's convenient to consider P as a vector, not a ascalar, and :
For all, or as many (as possible) elements of C to meet c(k)<(10-p^2)^2/(4*p^2) in order to get C=A or
minimize d(A,C)
5.- roots of f(p,x)
syms t positive
p=[-1000:.1:1000];
zp=NaN*ones(1,numel(p));
sol=zeros(1,numel(p));
for k1=1:1:numel(p)
p(k1)
eq1=(t+10)^.5-p(k1)-t^.5-p(k1)==0;
s1=solve(eq1,t);
if ~isempty(s1)
zp(k1)=s1;
end
end
nzp=~isnan(zp);
zp(nzp)
returns
=
620.0100 151.2900 64.5344 34.2225 20.2500 12.7211
8.2451 5.4056 3.5260 2.2500 1.3753 0.7803
0.3882 0.1488 0.0278
I have a small logical array (A) of size 256x256x256 with an unknown shape of ones somewhere in the array. Also there is a smaller double array (se) of size 13x13x13. In se there is a defined cube of logical ones in the middle of the array (se).
I need to run over every logical element in A and for each logical one in A the smaller array se needs to add its ones to A. Meaning dilating the shape of A by se.
Here is what I got so far. It works, but is of very poor performance in respect to speed.
Does anyone has a suggestion of how to speed up this coding task?
[p, q, r] = size(se);
[m, n, o] = size(A);
temp = zeros(m, n, o);
for i = 1:m
for j = 1:n
for k = 1:o
if img_g(i, j, k) == 1
for s = 1:p
for t = 1:q
for u = 1:r
if se(s, t, u) == 1
c = i + s;
d = j + t;
e = k + u;
temp(c, d, e) = 1;
end
end
end
end
end
end
end
end
B = temp;
I am very grateful for any help and suggestions that improve my programming skills.
dependent on the processor you're using, you can at least use "parfor" for the outer loop (first loop).
this enables parallel computing and accelerates your performance by the number of physical kernels your processor got.
I'm not 100% sure this does what you're asking (as I'm not completely clear on what your code does), but perhaps the methodology will give some inspiration.
I've not generated a "logical" A, but a random one, and I've set a cube inside equal to 1. Similar for se. I use meshgrid to get arrays corresponding to the indices, and use a mask of logical indexing. (perhaps my mask is what you have for A in the first place?)
A = rand(255,255,255);
A(40:50, 23:33, 80:100) = 1;
mask = (A==1);
[I,J,K] = meshgrid(1:255);
se = rand(13,13,13);
se(4:6, 3:7, 2:8) = 1;
se_mask = (se==1);
[se_I, se_J, se_K] = meshgrid(1:13);
Here I'm assuming that the cube in A is far enough from any edge (say 13 spaces) so we wont get c, d or e larger than 255.
I've flattened mask into a row vector so find gives a single index ii we can use to refer to any point in A, then the original i,j and k indices are in I(ii), J(ii), and K(ii) respectively. Similar for se_I etc.
temp = zeros(255, 255, 255);
for ii=find(mask(:).')
for jj=find(se_mask(:).')
c = I(ii) + se_I(jj);
d = J(ii) + se_J(jj);
e = K(ii) + se_K(jj);
temp(c,d,e) = 1;
end % for
end % for
The matrices I, J, K, se_I, se_J and se_K are regular, so if making/storing these becomes a bottleneck you can write functions to replace them. Matrix temp might be very sparse depending on the size of your cubes of ones, so you could look into using sparse.
I've not compared the timings against your solution.
I have a list of coordinates, coord, which looks like this when plotted:
I want to remove the long string of points that goes completely from 0 to 1 from the data set, shown on this plot starting at (0, 11) and ending at (1, 11) and the other one that begins at (0, 24) and ends at (1, 28).
So far, I have tried using kmeans to group the data by height using this code:
jet = colormap('jet');
amount = 20;
step = floor(numel(jet(:,1))/amount);
idxOIarr = cell(numel(terp));
scale = 100;
for ii = 1:numel(terp)
figure;
hold on;
expandDat = [stretched{ii}(:,1), scale.*log(terp{ii}(:,2))];
[idx, cent] = kmeans(expandDat(:,1:2), amount, 'Distance', 'cityblock');
idxOIarr{ii} = idx;
for jj = 1:amount
scatter(stretched{ii}(idx == jj,1), FREQ(terp{ii}(idx == jj,2)), 10, jet(step*jj,:), 'filled');
end
end
resulting in this image: Although it does separate the higher rows quite well, it breaks the line in the middle in two and groups the line that begins at (0,20) with some data points below it.
Is there any other way to group and remove these points?
The most efficient way to solve this involves building a graph where each point is a vertex. You join points that you consider "connected" or "closed" with an edge. Thus, the graph will connected components. Now you need to look for the connected components that span the whole range from 0 to 1.
Build the graph. Finding neighbors is most efficient using an R-tree. Here are some suggestions. You can also use a k-d tree, for example. However, this is not strictly necessary, it just can get really slow without a proper spatial indexing structure, because you'll have to compare distances between each pair of points.
Given a Nx2 matrix coord, you can find the square distances between each pair:
D = sum((reshape(coord,[],1,2) - reshape(coord,1,[],2)).^2,3);
(note again that this is expensive if N is large, and in that case using an R-tree will speed things up significantly). D(i,j) is the distance between points with indices i and j (i.e. coord(i,:) and coord(j,:).
Next, build the graph, G, nodes i and j are connected if G(i,j)==1. G is a symmetric matrix:
G = D <= max_distance;
Find connected components. A connected component is just a set of nodes that you can reach from each other by following edges. You don't really need to find all connected components, you just need to find the set of points that have x=0, and starting from each, recursively visit all elements in its connected component to see if you can reach a point that has x=1.
This next code is not tested, but helpfully it gives a starting point:
start_indices = find(coord(:,1)==0); % Is exact equality appropriate here?
end_indices = find(coord(:,1)==1);
to_remove = [];
visited = false(size(coord,1), 1);
for ii=start_indices.'
% For each point with x=0, see if we can reach any of the points at x=1
[res, visited] = can_reach(ii, end_indices, G, visited);
if res
% For this point we can, remove it!
to_remove(end+1) = ii;
end
end
% Iterative function to visit all nodes in a connected component
function [res, visited] = can_reach(start, end_indices, G, visited)
visited(start) = true;
if any(start==end_indices)
% We've reach an end point, stop iterating and return true.
res = true;
return;
end
next = find(G(start,:)); % find neighbors
next(visited(next)) = []; % remove visited neighbors
for ii=next
[res, visited] = can_reach(ii, end_indices, G, visited);
if res
% Yes, we can visit an end point, stop iterating now.
return
end
end
end
I've run an experiment where a machine exerts a force on a bridge until it breaks. I need to cycle through my entire data set and calculate the toughness until I've hit the breaking point (fdValue1).
The toughness is calculated by summing up all the rectangles under my load vs. distance curve. (Basically the integral)
However, I have not been able to find a reasonable way of doing so and my current loop is an infinite loop and I'm not sure why.
%Initializing variables
bridge1Data = xlsread('Bridge1Data.xlsx', 'A2:C2971');
bridge2Data = xlsread('Bridge2Data.xlsx', 'A2:C1440');
bridge1Load = bridge1Data(:, 2);
bridge2Load = bridge2Data(:, 2);
bridge1Dist = bridge1Data(:, 3);
bridge2Dist = bridge2Data(:, 3);
[row1, col1] = size(bridge1Dist);
[row2, col2] = size(bridge2Dist);
bridge1Disp = zeros(row1, col1);
bridge2Disp = zeros(row2, col2);
fdValue1 = 0.000407350000000029;
&Main code
%Displacement
for k = 2:length(bridge1Dist)
bridge1Disp(k-1, 1) = bridge1Dist(k, 1) - bridge1Dist(k-1, 1);
end
%Max Load Bridge 1
maxLoad1 = 0;
for n = 1:length(bridge1Load)
for k = 1
if bridge1Load(n, k) > maxLoad1
maxLoad1 = bridge1Load(n, k);
end
end
end
%Cycle through data till failure, change proj data
totalRect1 = 0;
for j = 2:length(bridge1Disp)
while bridge1Disp(j, 1) ~= fdValue1
rectangle = (bridge1Disp(j, 1) - bridge1Disp(j-1, 1))*...
((bridge1Load(j, 1) + bridge1Load(j-1, 1))/2);
totalRect1 = totalRect1 + rectangle;
end
end
Basically, I make an array for the load and distance the machine pushes down on the bridge, assign a 'Failure Distance' value (fdValue) which should be used to determine when we stop calculating toughness. I then calculate displacement, calculate the maximum load. Then through the variable 'rectangle', calculate each rectangle and sum them all up in 'totalRect1', and use that to calculate the toughness by finding the area under the curve. Is anyone able to see why the loop is an infinite loop?
Thanks
The problem with the condition while bridge1Disp(j, 1) ~= fdValue1 is that you need to check for <= and not for (in)equality, since double numbers will almost never evaluate to be equal even if they seem so. To read more about this you can look here and also google for matlab double comparison. Generally it has something to do with precision issues.
Usually when checking for double equality you should use something like if abs(val-TARGET)<1E-4, and specify some tolerance which you are willing to permit.
Regardless,
You don't need to use loops for what you're trying to do. I'm guessing this comes from some C-programming habits, which are not required in MATLAB.
The 1st loop (Displacement), which computes the difference between every two adjacent elements can be replaced by the function diff like so:
bridge1Disp = diff(bridge1Dist);
The 2nd loop construct (Max Load Bridge 1), which retrieves the maximum element of bridge1Load can be replaced by the command max as follows:
maxLoad1 = max(bridge1Load);
For the last loop construct (Cycle ...) consider the functions I've mentioned above, and also find.
In the code section
%Cycle through data till failure, change proj data
totalRect1 = 0;
for j = 2:length(bridge1Disp)
while bridge1Disp(j, 1) ~= fdValue1
rectangle = (bridge1Disp(j, 1) - bridge1Disp(j-1, 1))*...
((bridge1Load(j, 1) + bridge1Load(j-1, 1))/2);
totalRect1 = totalRect1 + rectangle;
end
end
the test condition of the while loop is
bridge1Disp(j, 1) ~= fdValue1
nevertheless, in the while loop the value of bridge1Disp(j, 1) does not change so if at the first iteratiion of the while loop bridge1Disp(j, 1) is ~= fdValue1 the loop will never end.
Hope this helps.
I understand that there is an error with dimensions in line dr=(r-v*v/2)*dT . But I have little knowledge of Matlab. Help to fix it, please. The code is small and simple. Maybe someone will find time to look
function [optionPrice] = upAndOutCallOption(S,r,v,x,b,T,dT)
t = 0;
dr=[];
pert=[];
while (t < T) & (S < b)
t = t + dT;
dr = (r - v.*v./2).*dT;
pert = v.*sqrt( dT ).*randn();
S = S.*exp(dr + pert);
end
if S<b
% Within barrier, so price as for a European option.
optionPrice = exp(-r.*T).* max(0, S - x);
else
% Hit the barrier, so the option is withdrawn.
optionPrice = 0;
end
end
Call from another function of this kind:
for k=1:amountOfOptions
[optionPrices(k)] = upAndOutCallOption(stockPrice(k)*o,riskFreeRate(k)*o,... volatility(k)*o, strike(k)*o, barrier(k)*o, timeToExpiry(k)*o, sampleRate(k)*o);
result(k) = mean(optionPrices(k));
end
Therefore, any difficulties.
It's good that you know the problem is within dr = (r - v.*v./2).*dT;. The command itself has many possible problems which also related to dimensions:
Here you are doing element-wise multiplication (because of the .*) with matrices, which requires (in the case of your command) that r has the same number of rows AND columns as v (since because of element-wise, v.*v/2 has the same size as v).
Moreover, it is unnecessary to do element-wise division with scalar number, that means there is no need to have ./2 in Matlab.
And, finally, since it's element-wise multiplication again, the matrix (r - v.*v./2) must also have the same number of rows and columns as matrix dT.
Check here for more information about Matlab's matrix operations.