I'm working on an application to determine from an image the degree of alignment of a fiber network. I've read several papers on this issue and they basically do this:
Find the 2D discrete Fourier transform (DFT = F(u,v)) of the image (gray, range 0-255)
Find the Fourier Spectrum (FS = abs(F(u,v))) and the Power Spectrum (PS = FS^2)
Convert spectrum to polar coordinates and divide it into 1º intervals.
Calculate number-averaged line intensities (FI) for each interval (theta), that is, the average of all the intensities (pixels) forming "theta" degrees with respect to the horizontal axis.
Transform FI(theta) to cartesian coordinates
Cxy(theta) = [FI*cos(theta), FI*sin(theta)]
Find eigenvalues (lambda1 and lambda2) of the matrix Cxy'*Cxy
Find alignment index as alpha = 1 - lamda2/lambda1
I've implemented this in MATLAB (code below), but I'm not sure whether it is ok since point 3 and 4 are not really clear for me (I'm getting similar results to those of the papers, but not in all cases). For instance, in point 3, "spectrum" is referring to FS or to PS?. And in point 4, how should this average be done? are all the pixels considered? (even though there are more pixels in the diagonal).
rgb = imread('network.tif');%513x513 pixels
im = rgb2gray(rgb);
im = imrotate(im,-90);%since FFT space is rotated 90º
FT = fft2(im) ;
FS = abs(FT); %Fourier spectrum
PS = FS.^2; % Power spectrum
FS = fftshift(FS);
PS = fftshift(PS);
xoffset = (513-1)/2;
yoffset = (513-1)/2;
% Avoid low frequency points
x1 = 5;
y1 = 0;
% Maximum high frequency pixels
x2 = 255;
y2 = 0;
for theta = 0:pi/180:pi
% Transposed rotation matrix
Rt = [cos(theta) sin(theta);
-sin(theta) cos(theta)];
% Find radial lines necessary for improfile
xy1_rot = Rt * [x1; y1] + [xoffset; yoffset];
xy2_rot = Rt * [x2; y2] + [xoffset; yoffset];
plot([xy1_rot(1) xy2_rot(1)], ...
[xy1_rot(2) xy2_rot(2)], ...
'linestyle','none', ...
'marker','o', ...
'color','k');
prof = improfile(F,[xy1_rot(1) xy2_rot(1)],[xy1_rot(2) xy2_rot(2)]);
i = i + 1;
FI(i) = sum(prof(:))/length(prof);
Cxy(i,:) = [FI(i)*cos(theta), FI(i)*sin(theta)];
end
C = Cxy'*Cxy;
[V,D] = eig(C)
lambda2 = D(1,1);
lambda1 = D(2,2);
alpha = 1 - lambda2/lambda1
Figure: A) original image, B) plot of log(P+1), C) polar plot of FI.
My main concern is that when I choose an artificial image perfectly aligned (attached figure), I get alpha = 0.91, and it should be exactly 1.
Any help will be greatly appreciated.
PD: those black dots in the middle plot are just the points used by improfile.
I believe that there are a couple sources of potential error here that are leading to you not getting a perfect alpha value.
Discrete Fourier Transform
You have discrete imaging data which forces you to take a discrete Fourier transform which inevitably (depending on the resolution of the input data) have some accuracy issues.
Binning vs. Sampling Along a Line
The way that you have done the binning is that you literally drew a line (rotated by a particular angle) and sampled the image along that line using improfile. Using improfile performs interpolation of your data along that line introducing yet another potential source of error. The default is nearest neighbor interpolation which in the example shown below can cause multiple "profiles" to all pick up the same points.
This was with a rotation of 1-degree off-vertical when technically you'd want those peaks to only appear for a perfectly vertical line. It is clear to see how this sort of interpolation of the Fourier spectrum can lead to a spread around the "correct" answer.
Data Undersampling
Similar to Nyquist sampling in the Fourier domain, sampling in the spatial domain has some requirements as well.
Imagine for a second that you wanted to use 45-degree bin widths instead of the 1-degree. Your approach would still sample along a thin line and use that sample to represent 45-degrees worth or data. Clearly, this is a gross under-sampling of the data and you can imagine that the result wouldn't be very accurate.
It becomes more and more of an issue the further you get from the center of the image since the data in this "bin" is really pie wedge shaped and you're approximating it with a line.
A Potential Solution
A different approach to binning would be to determine the polar coordinates (r, theta) for all pixel centers in the image. Then to bin the theta components into 1-degree bins. Then sum all of the values that fall into that bin.
This has several advantages:
It removes the undersampling that we talked about and draws samples from the entire "pie wedge" regardless of the sampling angle.
It ensures that each pixel belongs to one and only one angular bin
I have implemented this alternate approach in the code below with some false horizontal line data and am able to achieve an alpha value of 0.988 which I'd say is pretty good given the discrete nature of the data.
% Draw a bunch of horizontal lines
data = zeros(101);
data([5:5:end],:) = 1;
fourier = fftshift(fft2(data));
FS = abs(fourier);
PS = FS.^2;
center = fliplr(size(FS)) / 2;
[xx,yy] = meshgrid(1:size(FS,2), 1:size(FS, 1));
coords = [xx(:), yy(:)];
% De-mean coordinates to center at the middle of the image
coords = bsxfun(#minus, coords, center);
[theta, R] = cart2pol(coords(:,1), coords(:,2));
% Convert to degrees and round them to the nearest degree
degrees = mod(round(rad2deg(theta)), 360);
degreeRange = 0:359;
% Band pass to ignore high and low frequency components;
lowfreq = 5;
highfreq = size(FS,1)/2;
% Now average everything with the same degrees (sum over PS and average by the number of pixels)
for k = degreeRange
ps_integral(k+1) = mean(PS(degrees == k & R > lowfreq & R < highfreq));
fs_integral(k+1) = mean(FS(degrees == k & R > lowfreq & R < highfreq));
end
thetas = deg2rad(degreeRange);
Cxy = [ps_integral.*cos(thetas);
ps_integral.*sin(thetas)]';
C = Cxy' * Cxy;
[V,D] = eig(C);
lambda2 = D(1,1);
lambda1 = D(2,2);
alpha = 1 - lambda2/lambda1;
Related
A question I strangely could not find on the internet. Given a complicated curve C (i.e. a curve that you can't fit with polynomials) defined by N points and centered around x0=0.5,0 (blue curve in figure), how can I rescale the curve so that the center is the same and the new curve is located at a constant distance d from the curve C (e.g. green curve in figure)?
So far the only way I could find is using the MATLAB function bwdist (https://fr.mathworks.com/help/images/ref/bwdist.html) which computes the Euclidean distance map of a binary image (see code below). However, I'm constrained by the size of my matrix i.e. a curve of 1e5 points is fine but a matrix of size (1e5,1e5) is big for bwdist...so the results using a coarse matrix is an ugly step-wise function. The code is
%%% profile
x = linspace(0,1,1e5);
y = -(x-0.5).^2/0.5^2 + 1 - 0.5*(exp(-(x-0.5).^2/2/0.2^2) - exp(-(-0.5).^2/2/0.2^2));
%%% define mask on a region that encompasses the curve
N=512;
mask = ones(N,N);
xm = linspace(0.9*min(x),1.1*max(x),N);
ym = linspace(0.9*min(y),1.1*max(y),N);
[Xm,Ym] = meshgrid(xm,ym);
%%% project curve on mask (i.e. put 0 below curve)
% get point of mask closer to each point of y
DT = delaunayTriangulation(Xm(:),Ym(:));
vi = nearestNeighbor(DT,x',y');
[iv,jv] = ind2sub(size(mask),vi);
% put 1 to indices of mask that are below projected curve
for p=1:length(iv)
mask(1:iv(p)-1,jv(p)) = 0;
end
%%% get euclidean distance
Ed = bwdist(logical(mask));
Ed = double(Ed);
%%% get contours of Ed at given values (i.e. distances)
cont = contour(Ed,linspace(0,1,50));
% cont has the various curves at given distances from original curve y
I add that I first tried moving a point of curve C for a distance d using the normal of the tangent but since the curve is non-linear, this direction is actually not necessarily the one giving the appropriate point. So at some distance, the curve becomes discontinuous because using the tangent does not give the point at a given distance from the curve, only from the considered point on curve C.
The code is
% profil
x = linspace(0,1,1e5);
y = -(x-0.5).^2/0.5^2 + 1 - 0.5*(exp(-(x-0.5).^2/2/0.2^2) - exp(-(-0.5).^2/2/0.2^2));
% create lines at Dist from original line
Dist = linspace(0,2e-1,6);
Dist = Dist(2:end);
Cdist(1).x = x;
Cdist(1).y = y;
Cdist(1).v = 0;
step = 10; % every step points compute normal to point and move points
points = [1:1:length(y)];
for d=1:length(Dist)
xd = x;
yd = y;
for p=1:length(points)
if points(p)==1
tang = [-(y(2)-y(1)) (x(2)-x(1))];
tang = tang/norm(tang);
xd(1) = xd(1) - Dist(d)*tang(1);
yd(1) = yd(1) - Dist(d)*tang(2);
elseif points(p)==length(y)
tang = [-(y(end)-y(end-1)) (x(end)-x(end-1))];
tang = tang/norm(tang);
xd(end) = xd(end) - Dist(d)*tang(1);
yd(end) = yd(end) - Dist(d)*tang(2);
else
tang = [-(y(p+1)-y(p-1)) (x(p+1)-x(p-1))];
tang = tang/norm(tang);
xd(p) = xd(p) - Dist(d)*tang(1);
yd(p) = yd(p) - Dist(d)*tang(2);
end
end
yd(yd<0)=NaN;
Cdist(d+1).x = xd;
Cdist(d+1).y = yd;
Cdist(d+1).v = Dist(d);
end
% plot
cmap=lines(10);
hold on
for c=1:length(Cdist)
plot(Cdist(c).x,Cdist(c).y,'linewidth',2,'color',cmap(c,:))
end
axis tight
axis equal
axis tight
Any idea ?
What you want to do is not possible.
Scaling a curve with respect to a center point while remaining equal distance to the original curve means that all the points on this curve are moving along its normal direction towards the center of scaling, and will eventually, reduce to a point.
Imagine drawing the normal direction of each point on this curve, and extend them to infinity. All these lines should pass through a same point, which is the center of scaling. Unfortunately, this is not the case for your curve.
I want to implement two dimensional matched filter for blood vessel extraction according to the paper "Detection of Blood Vessels in Retinal Images Using Two-Dimensional Matched Filters" by Chaudhuri et al., IEEE Trans. on Medical Imaging, 1989 (there's a PDF on the author's web site).
A brief discription is that blood vessel's cross-section has a gaussian distribution and therefore I want to use gaussian matched filter to increase SNR. Such a kernel may be mathematically expressed as:
K(x,y) = -exp(-x^2/2*sigma^2) for |x|<3*sigma, |y|<L/2
L here is the length of vessel with fixed orientation. Experimentally sigma=1.5 and L = 7.
My MATLAB code for this part is:
s = 1.5; %sigma
t = -3*s:3*s;
theta=0:15:165; %different rotations
%one dimensional kernel
x = 1/sqrt(6*s)*exp(-t.^2/(2*s.^2));
L=7;
%two dimensional gaussian kernel
x2 = repmat(x,L,1);
Consider the response of this filter for a pixel belonging to the background retina. Assuming the background to have constant intensity with zero mean additive Gaussian white noise, the expected value of the filter output should ideally be zero. The convolution kernel is, therefore, modified by subtracting the mean value of s(t) from the function itself. The mean value of the kernel is determined as: m = Sum(K(x,y))/(number of points).
Thus, the convolutional mask used in this algorithm is given by: K(x, y) = K(x,y) - m.
My MATLAB code:
m = sum(x2(:))/(size(x2,1)*size(x2,2));
x2 = x2-m;
A vessel may be oriented at any angle 0<theta<180 and the matched filter response is maximum when when it is aligned at theta+- 90 (cross-section distribution is gaussian not the vessel itself).
Thus we need to rotate the matched filter 12 times with 15 degree increment.
My MATLAB code is attached here but I don't get a desirable result. Any help is appreciated.
%apply rotated matched filter on image
r = {};
for k = 1:12
x3=imrotate(x2,theta(k),'crop');%figure;imagesc(x3);colormap gray;
r{k}=conv2(img,x3);
end
w=[];h = zeros(584,565);
for i = 1:565
for j = 1:584
for k = 1:32
w= [w ,r{k}(j,i)];
end
h(j,i)=max(abs(w));
w = [];
end
end
%show result
figure('Name','after matched filter');imagesc(h);colormap gray
For rotation I used imrotate which seems more sensible to me but in the paper it is different: suppose p=[x,y] be a discrete point in the kernel. To compute coefficients in the rotated kernel we have [u,v] = p*Rotation_Matrix.
Rotation_Matrix=[cos(theta),sin(theta);-sin(theta),cos(theta)]
And the kernel is:
K(x,y) = -exp(-u^2/2*s^2)
But the new kernel doesn't have a gaussian shape anymore. Using imrotate preserves gaussian shape. So what is the benefit of using Rotation matrix?
Input image is:
Output:
Matched filtering helps increase SNR but background noise is amplified too.
Am I right to use imrotate to rotate the kernel? My main problem is with rotation matrix that why and what is the right code to implement it.
The reason to build the filter from its analytic expression for each rotation, rather than using imrotate, is that the filter extent is not circular, and therefore rotating brings in "new" pixel values and pushes some other pixels out of the kernel. Furthermore, rotating a kernel constructed as here (smooth transition along one direction, step edge along the other dimension) requires different interpolation methods along each dimension, which imrotate cannot do. The resulting rotated kernel will always be wrong.
Both these issues can be easily seen when displaying the kernel you make together with two rotated versions:
This display brings an additional issues to the front: the kernel is not centered on a pixel, causing it to shift the output by half a pixel.
Note also that, when subtracting the mean, it is important that this mean be computed only over the original domain of the filter, and that any zeros used to pad this domain to a rectangular shape remain zero (these should not become negative).
The rotated kernels can be constructed as follows:
m = max(ceil(3*s),(L-1)/2);
[x,y] = meshgrid(-m:m,-m:m); % non-rotated coordinate system, contains (0,0)
t = pi/6; % angle in radian
u = cos(t)*x - sin(t)*y; % rotated coordinate system
v = sin(t)*x + cos(t)*y; % rotated coordinate system
N = (abs(u) <= 3*s) & (abs(v) <= L/2); % domain
k = exp(-u.^2/(2*s.^2)); % kernel
k = k - mean(k(N));
k(~N) = 0; % set kernel outside of domain to 0
This is the result for the three rotations used in the example above (the grey around the edges of the kernel corresponds to the value 0, the black pixels have a negative value):
Another issue is that you use conv2 with the default 'full' output shape, you should be using 'same' here, so that the output of the filter matches the input.
Note that, instead of computing all filter responses, and computing the max afterwards, it is much easier to compute the max as you compute each filter response. All of the above leads to the following code:
img = im2double(rgb2gray(img));
s = 1.5; %sigma
L = 7;
theta = 0:15:165; %different rotations
out = zeros(size(img));
m = max(ceil(3*s),(L-1)/2);
[x,y] = meshgrid(-m:m,-m:m); % non-rotated coordinate system, contains (0,0)
for t = theta
t = t / 180 * pi; % angle in radian
u = cos(t)*x - sin(t)*y; % rotated coordinate system
v = sin(t)*x + cos(t)*y; % rotated coordinate system
N = (abs(u) <= 3*s) & (abs(v) <= L/2); % domain
k = exp(-u.^2/(2*s.^2)); % kernel
k = k - mean(k(N));
k(~N) = 0; % set kernel outside of domain to 0
res = conv2(img,k,'same');
out = max(out,res);
end
out = out/max(out(:)); % force output to be in [0,1] interval that MATLAB likes
imwrite(out,'so_result.png')
I get the following output:
I would like to populate random points on a 2D plot, in such a way that the points fall in proximity of a "C" shaped polyline.
I managed to accomplish this for a rather simple square shaped "C":
This is how I did it:
% Marker color
c = 'k'; % Black
% Red "C" polyline
xl = [8,2,2,8];
yl = [8,8,2,2];
plot(xl,yl,'r','LineWidth',2);
hold on;
% Axis settings
axis equal;
axis([0,10,0,10]);
set(gca,'xtick',[],'ytick',[]);
step = 0.05; % Affects point quantity
coeff = 0.9; % Affects point density
% Top Horizontal segment
x = 2:step:9.5;
y = 8 + coeff*randn(size(x));
scatter(x,y,'filled','MarkerFaceColor',c);
% Vertical segment
y = 1.5:step:8.5;
x = 2 + coeff*randn(size(y));
scatter(x,y,'filled','MarkerFaceColor',c);
% Bottom Horizontal segment
x = 2:step:9.5;
y = 2 + coeff*randn(size(x));
scatter(x,y,'filled','MarkerFaceColor',c);
hold off;
As you can see in the code, for each segment of the polyline I generate the scatter point coordinates artificially using randn.
For the previous example, splitting the polyline into segments and generating the points manually is fine. However, what if I wanted to experiment with a more sophisticated "C" shape like this one:
Note that with my current approach, when the geometric complexity of the polyline increases so does the coding effort.
Before going any further, is there a better approach for this problem?
A simpler approach, which generalizes to any polyline, is to run a loop over the segments. For each segment, r is its length, and m is the number of points to be placed along that segment (it closely corresponds to the prescribed step size, with slight deviation in case the step size does not evenly divide the length). Note that both x and y are subject to random perturbation.
for n = 1:numel(xl)-1
r = norm([xl(n)-xl(n+1), yl(n)-yl(n+1)]);
m = round(r/step) + 1;
x = linspace(xl(n), xl(n+1), m) + coeff*randn(1,m);
y = linspace(yl(n), yl(n+1), m) + coeff*randn(1,m);
scatter(x,y,'filled','MarkerFaceColor',c);
end
Output:
A more complex example, using coeff = 0.4; and xl = [8,4,2,2,6,8];
yl = [8,6,8,2,4,2];
If you think this point cloud is too thin near the endpoints, you can artifically lengthen the first and last segments before running the loop. But I don't see the need: it makes sense that the fuzzied curve is thinning out at the extremities.
With your original approach, two places with the same distance to a line can sampled with a different probability, especially at the corners where two lines meet. I tried to fix this rephrasing the random experiment. The random experiment my code does is: "Pick a random point. Accept it with a probability of normpdf(d)<rand where d is the distance to the next line". This is a rejection sampling strategy.
xl = [8,4,2,2,6,8];
yl = [8,6,8,2,4,2];
resolution=50;
points_to_sample=200;
step=.5;
sigma=.4; %lower value to get points closer to the line.
xmax=(max(xl)+2);
ymax=(max(yl)+2);
dist=zeros(xmax*resolution+1,ymax*resolution+1);
x=[];
y=[];
for n = 1:numel(xl)-1
r = norm([xl(n)-xl(n+1), yl(n)-yl(n+1)]);
m = round(r/step) + 1;
x = [x,round(linspace(xl(n)*resolution+1, xl(n+1)*resolution+1, m*resolution))];
y = [y,round(linspace(yl(n)*resolution+1, yl(n+1)*resolution+1, m*resolution))];
end
%dist contains the lines:
dist(sub2ind(size(dist),x,y))=1;
%dist contains the normalized distance of each rastered pixel to the line.
dist=bwdist(dist)/resolution;
pseudo_pdf=normpdf(dist,0,sigma);
%scale up to have acceptance rate of 1 for most likely pixels.
pseudo_pdf=pseudo_pdf/max(pseudo_pdf(:));
sampled_points=zeros(0,2);
while size(sampled_points,1)<points_to_sample
%sample a random point
sx=rand*xmax;
sy=rand*ymax;
%accept it if criteria based on normal distribution matches.
if pseudo_pdf(round(sx*resolution)+1,round(sy*resolution)+1)>rand
sampled_points(end+1,:)=[sx,sy];
end
end
plot(xl,yl,'r','LineWidth',2);
hold on
scatter(sampled_points(:,1),sampled_points(:,2),'filled');
I have a time-dependent system of varying number of particles (~100k particles). In fact, each particle represents an interaction in a 3D space with a particular strength. Thus, each particle has (X,Y,Z;w) which is the coordinate plus a weight factor between 0 and 1, showing the strength of interaction in that coordinate.
Here http://pho.to/9Ztti I have uploaded 10 real-time snapshots of the system, with particles are represented as reddish small dots; the redder the dot, the stronger the interaction is.
The question is: how one can produce a 3D (spatial) density map of these particles, preferably in Matlab or Origin Pro 9 or ImageJ? Is there a way to, say, take the average of these images based on the red-color intensity in ImageJ?
Since I have the numerical data for particles (X,Y,Z;w) I can analyze those data in other software as well. So, you are welcome to suggest any other analytical approach/software
Any ideas/comments are welcome!
Assuming your data is in 3D continuous space and your dataset is just a list of the 3d positions of each particle interaction, it sounds like you want to make a 4D weighted histogram. You'll have to chop the 3d space into bins and sum the weighted points in each bin over time, then plot the results in a single 3d plot where color represents the summed weighted interactions over time.
Heres an example with randomly generated particle interactions:`
%% Create dataSet of random particle interations in 3d space
for i=1:5000
if i == 1
dataSet = [rand()*100 rand()*100 rand()*100 rand() i];
else
dataSet(i,:) = [rand()*100 rand()*100 rand()*100 rand() i];
end
end
% dataSet = [x y z interactionStrength imageNumber]
xLimits = [min(dataSet(:,1)) max(dataSet(:,1))];
yLimits = [min(dataSet(:,2)) max(dataSet(:,2))];
zLimits = [min(dataSet(:,3)) max(dataSet(:,3))];
binSize = 10; % Number of bins to split each spatial dimention into
binXInterval = (xLimits(2)-xLimits(1))/binSize;
binYInterval = (yLimits(2)-yLimits(1))/binSize;
binZInterval = (zLimits(2)-zLimits(1))/binSize;
histo = [];
for i=xLimits(1)+(binSize/2):binXInterval:xLimits(2) + (binSize/2)
for j=yLimits(1)+(binSize/2):binYInterval:yLimits(2) + (binSize/2)
for k=zLimits(1)+(binSize/2):binZInterval:zLimits(2) + (binSize/2)
%% Filter out particle interactions found within the current spatial bin
idx = find((dataSet(:,1) > (i - binSize)) .* (dataSet(:,1) < i));
temp = dataSet(idx,:);
idx = find((temp(:,2) > (j - binSize)) .* (temp(:,2) < j));
temp = temp(idx,:);
idx = find((temp(:,3) > (k - binSize)) .* (temp(:,3) < k));
temp = temp(idx,:);
%% Add up all interaction strengths found within this bin
histo = [histo; i j k sum(temp(:,4))];
end
end
end
%% Remove bins with no particle interactions
idx = find(histo(:,4)>0);
histo = histo(idx,:);
numberOfImages = max(dataSet(:,5));
%% Plot result
PointSizeMultiplier = 100000;
scatter3(histo(:,1).*binXInterval + xLimits(1),histo(:,2).*binYInterval + yLimits(1),histo(:,3).*binZInterval + zLimits(1),(histo(:,4)/numberOfImages)*PointSizeMultiplier,(histo(:,4)/numberOfImages));
colormap hot;
%Size and color represent the average interaction intensity over time
4D histogram made from 10000 randomly generated particle interactions. Each axis divided into 10 bins. Size and color represent summed particle interactions in each bin over time:
If your system can handle the matrix in Matlab it could be as easy as
A = mean(M, 4);
Assuming M holds the 4D compilation of your images then A would be your map.
One way would be to use a 3D scatter (bubble) plot, with variable circle/bubble sizes, proportional to the intensity of your particle.
Here is a simulated example:
N = 1e4; % number of particles
X = randn(N,1); % randomly generated coordinates
Y = 2*randn(N,1);
Z = 0.5*randn(N,1);
S = exp(-sqrt(X.^2 + Y.^2 + Z.^2)); % bubble size vector
scatter3(X,Y,Z,S*200)
end
Here I have randomly generated values for X, Y and Z, while S is reversely proportional to the distance from the center of the cloud.
In your case, if we assume that the (X,Y,Z,w) values are stored in a 2D array called Particles, it would be:
X = Particles(:,1);
Y = Particles(:,2);
Z = Particles(:,3);
S = Particles(:,4);
Hope that helped.
I have a simple loglog curve as above. Is there some function in Matlab which can fit this curve by segmented lines and show the starting and end points of these line segments ? I have checked the curve fitting toolbox in matlab. They seems to do curve fitting by either one line or some functions. I do not want to curve fitting by one line only.
If there is no direct function, any alternative to achieve the same goal is fine with me. My goal is to fit the curve by segmented lines and get locations of the end points of these segments .
First of all, your problem is not called curve fitting. Curve fitting is when you have data, and you find the best function that describes it, in some sense. You, on the other hand, want to create a piecewise linear approximation of your function.
I suggest the following strategy:
Split manually into sections. The section size should depend on the derivative, large derivative -> small section
Sample the function at the nodes between the sections
Find a linear interpolation that passes through the points mentioned above.
Here is an example of a code that does that. You can see that the red line (interpolation) is very close to the original function, despite the small amount of sections. This happens due to the adaptive section size.
function fitLogLog()
x = 2:1000;
y = log(log(x));
%# Find section sizes, by using an inverse of the approximation of the derivative
numOfSections = 20;
indexes = round(linspace(1,numel(y),numOfSections));
derivativeApprox = diff(y(indexes));
inverseDerivative = 1./derivativeApprox;
weightOfSection = inverseDerivative/sum(inverseDerivative);
totalRange = max(x(:))-min(x(:));
sectionSize = weightOfSection.* totalRange;
%# The relevant nodes
xNodes = x(1) + [ 0 cumsum(sectionSize)];
yNodes = log(log(xNodes));
figure;plot(x,y);
hold on;
plot (xNodes,yNodes,'r');
scatter (xNodes,yNodes,'r');
legend('log(log(x))','adaptive linear interpolation');
end
Andrey's adaptive solution provides a more accurate overall fit. If what you want is segments of a fixed length, however, then here is something that should work, using a method that also returns a complete set of all the fitted values. Could be vectorized if speed is needed.
Nsamp = 1000; %number of data samples on x-axis
x = [1:Nsamp]; %this is your x-axis
Nlines = 5; %number of lines to fit
fx = exp(-10*x/Nsamp); %generate something like your current data, f(x)
gx = NaN(size(fx)); %this will hold your fitted lines, g(x)
joins = round(linspace(1, Nsamp, Nlines+1)); %define equally spaced breaks along the x-axis
dx = diff(x(joins)); %x-change
df = diff(fx(joins)); %f(x)-change
m = df./dx; %gradient for each section
for i = 1:Nlines
x1 = joins(i); %start point
x2 = joins(i+1); %end point
gx(x1:x2) = fx(x1) + m(i)*(0:dx(i)); %compute line segment
end
subplot(2,1,1)
h(1,:) = plot(x, fx, 'b', x, gx, 'k', joins, gx(joins), 'ro');
title('Normal Plot')
subplot(2,1,2)
h(2,:) = loglog(x, fx, 'b', x, gx, 'k', joins, gx(joins), 'ro');
title('Log Log Plot')
for ip = 1:2
subplot(2,1,ip)
set(h(ip,:), 'LineWidth', 2)
legend('Data', 'Piecewise Linear', 'Location', 'NorthEastOutside')
legend boxoff
end
This is not an exact answer to this question, but since I arrived here based on a search, I'd like to answer the related question of how to create (not fit) a piecewise linear function that is intended to represent the mean (or median, or some other other function) of interval data in a scatter plot.
First, a related but more sophisticated alternative using regression, which apparently has some MATLAB code listed on the wikipedia page, is Multivariate adaptive regression splines.
The solution here is to just calculate the mean on overlapping intervals to get points
function [x, y] = intervalAggregate(Xdata, Ydata, aggFun, intStep, intOverlap)
% intOverlap in [0, 1); 0 for no overlap of intervals, etc.
% intStep this is the size of the interval being aggregated.
minX = min(Xdata);
maxX = max(Xdata);
minY = min(Ydata);
maxY = max(Ydata);
intInc = intOverlap*intStep; %How far we advance each iteraction.
if intOverlap <= 0
intInc = intStep;
end
nInt = ceil((maxX-minX)/intInc); %Number of aggregations
parfor i = 1:nInt
xStart = minX + (i-1)*intInc;
xEnd = xStart + intStep;
intervalIndices = find((Xdata >= xStart) & (Xdata <= xEnd));
x(i) = aggFun(Xdata(intervalIndices));
y(i) = aggFun(Ydata(intervalIndices));
end
For instance, to calculate the mean over some paired X and Y data I had handy with intervals of length 0.1 having roughly 1/3 overlap with each other (see scatter image):
[x,y] = intervalAggregate(Xdat, Ydat, #mean, 0.1, 0.333)
x =
Columns 1 through 8
0.0552 0.0868 0.1170 0.1475 0.1844 0.2173 0.2498 0.2834
Columns 9 through 15
0.3182 0.3561 0.3875 0.4178 0.4494 0.4671 0.4822
y =
Columns 1 through 8
0.9992 0.9983 0.9971 0.9955 0.9927 0.9905 0.9876 0.9846
Columns 9 through 15
0.9803 0.9750 0.9707 0.9653 0.9598 0.9560 0.9537
We see that as x increases, y tends to decrease slightly. From there, it is easy enough to draw line segments and/or perform some other kind of smoothing.
(Note that I did not attempt to vectorize this solution; a much faster version could be assumed if Xdata is sorted.)