I am using Matlab to fit some data in 2 coordinates (x,y) with a poly1 curve.
The problem is that I can't find a way to make the fitting line longer.
I need it from (180, 930) to (191, 944), but instead Matlab just draw the fitting line near the data, which is between those two coordinates.
Is there some argument to the fit command (or some preferences in the cftool) that can help me out?
Moreover, I've tried the "Adjust axes limits" option in the cftool, but it didn't help at all.
I've searched through the already asked questions, but I haven't found anything related to this.
I'm new to this program, therefore I'm sorry if this is a stupid question
Thanks in advance,
Giovanni
EDIT:
The code for the first image is:
[FitUp,goodnessUP] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),'poly1')
[FitDown,goodnessDOWN] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),'poly1')
plot(FitUp,'b')
hold on
plot(FitDown,'b')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),AKaterMatrix1msDX(:,4),'--r')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),AKaterMatrix1msDX(:,4),'--r')
The code for the second is:
[FitUp,goodnessUP] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),'poly1')
[FitDown,goodnessDOWN] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),'poly1')
plot(FitDown,'b')
hold on
plot(FitUp,'b')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),AKaterMatrix1msDX(:,4),'--r')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),AKaterMatrix1msDX(:,4),'--r')
Here you can find the two fits, it appears that the first fit is not cropped, while the second after the hold on is:
https://docs.google.com/file/d/0B749BCu7mnZHaEhITUZ1YzdfVDA/edit?usp=sharing
https://docs.google.com/file/d/0B749BCu7mnZHeDVTOGRuSkktUmc/edit?usp=sharing
You just need to be careful when and how you set the hold. First make some dummy data
AKaterMatrix1msDX(:, 1) = 185:189;
AKaterMatrix1msDX(:, 2) = 2*rand(5, 1)+933;
AKaterMatrix1msDX(:, 3) = 2*rand(5, 1)+940;
AKaterMatrix1msDX(:, 4) = 2*rand(5, 1);
Next, and this is the key part, set the axis to be what you want and turn hold on
figure
axis([180, 191, 930, 944]);
hold on
This do exactly what you did
[FitUp,goodnessUP] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),'poly1')
[FitDown,goodnessDOWN] = fit(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),'poly1')
plot(FitUp,'b')
hold on
plot(FitDown,'b')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,2),AKaterMatrix1msDX(:,4),'--r')
hold on
errorbar(AKaterMatrix1msDX(:,1),AKaterMatrix1msDX(:,3),AKaterMatrix1msDX(:,4),'--r')
If you don't need a lot of fit statistics, polyfit followed by polyval can give you your fit:
X = AKaterMatrix1msDX(:,1);
Y = AKaterMatrix1msDX(:,2);
dY = AKaterMatrix1msDX(:,4);
[a,S] = polyfit(X,Y)
extraPlotRange = 10;
newX = linspace(min(X)-extraPlotRange,max(X)+extraPlotRange,100);
[fitY,delta] = polyval(a,newX);
plot(X,Y)
hold on
plot(newX,fitY)
plot(newX,fitY+delta,':b')
plot(newX,fitY-delta,':b')
errorbar(X,Y,dY,'--r')
hold off
This will not, unfortunately, give you the same goodness of fit statistics that you might need, only the confidence intervals of the fit.
The other option, if you want to stay with fit, would be to get the fit coefficients using coeffvalues. Those fit coefficients would be the same as you get from polyfit.
aUp = coeffvalues(FitUp);
aDown = coeffValues(FitDown);
fitYup = polyval(aUp,newX);
fitYdown = polyval(aDown,newX);
etc.
Related
Apologies for the long post but this takes a bit to explain. I'm trying to make a script that finds the longest linear portion of a plot. Sample data is in a csv file here, it is stress and strain data for calculating the shear modulus of 3D printed samples. The code I have so far is the following:
x_data = [];
y_data = [];
x_data = Data(:,1);
y_data = Data(:,2);
plot(x_data,y_data);
grid on;
answer1 = questdlg('Would you like to load last attempt''s numbers?');
switch answer1
case 'Yes'
[sim_slopes,reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
case 'No'
disp('Take a look at the plot, find a range estimate, and press any button to continue');
pause;
prompt = {'Eliminate values ABOVE this x-value:','Eliminate values BELOW this x-value:','Size of divisions on x-axis:','Factor for similarity of slopes:'};
dlg_title = 'Point elimination';
num_lines = 1;
defaultans = {'0','0','0','0.1'};
if isempty(answer2) < 1
defaultans = {answer2{1},answer2{2},answer2{3},answer2{4}};
end
answer2 = inputdlg(prompt,dlg_title,num_lines,defaultans);
uv_of_x_range = str2num(answer2{1});
lv_of_x_range = str2num(answer2{2});
x_div_size = str2num(answer2{3});
K = str2num(answer2{4});
close all;
iB = find(x_data > str2num(answer2{1}),1,'first');
iS = find(x_data > str2num(answer2{2}),1,'first');
new_x_data = x_data(iS:iB);
new_y_data = y_data(iS:iB);
[sim_slopes, reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
end
[longest_section0, Midx]= max(sim_slopes(:,4)-sim_slopes(:,3));
longest_section=1+longest_section0;
long_sec_x_data_start = x_div_size*(sim_slopes(Midx,3)-1)+lv_of_x_range;
long_sec_x_data_end = x_div_size*(sim_slopes(Midx,4)-1)+lv_of_x_range;
long_sec_x_data_start_idx=find(new_x_data >= long_sec_x_data_start,1,'first');
long_sec_x_data_end_idx=find(new_x_data >= long_sec_x_data_end,1,'first');
long_sec_x_data = new_x_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
long_sec_y_data = new_y_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
[b_long_sec, longes_section_reg_data] = robustfit(long_sec_x_data,long_sec_y_data);
plot(long_sec_x_data,b_long_sec(1)+b_long_sec(2)*long_sec_x_data,'LineWidth',3,'LineStyle',':','Color','k');
function [sim_slopes,reg_data] = regr_and_longest_part(x_points,y_points,x_div,lv,K)
reg_data = cell(1,3);
scatter(x_points,y_points,'.');
grid on;
hold on;
uv = lv+x_div;
ii=0;
while lv <= x_points(end)
if uv > x_points(end)
uv = x_points(end);
end
ii=ii+1;
indices = find(x_points>lv & x_points<uv);
temp_x_points = x_points((indices));
temp_y_points = y_points((indices));
if length(temp_x_points) <= 2
break;
end
[b,stats] = robustfit(temp_x_points,temp_y_points);
reg_data{ii,1} = b(1);
reg_data{ii,2} = b(2);
reg_data{ii,3} = length(indices);
plot(temp_x_points,b(1)+b(2)*temp_x_points,'LineWidth',2);
lv = lv+x_div;
uv = lv+x_div;
end
sim_slopes = NaN(length(reg_data),4);
sim_slopes(1,:) = [reg_data{1,1},0,1,1];
idx=1;
for ii=2:length(reg_data)
coff =sim_slopes(idx,1);
if abs(reg_data{ii,1}-coff) <= K*coff
C=zeros(ii-sim_slopes(idx,3)+1,1);
for kk=sim_slopes(idx,3):ii
C(kk)=reg_data{kk,1};
end
sim_slopes(idx,1)=mean(C);
sim_slopes(idx,2)=std(C);
sim_slopes(idx,4)=ii;
else
idx = idx + 1;
sim_slopes(idx,1)=reg_data{ii,1};
sim_slopes(idx,2)=0;
sim_slopes(idx,3)=ii;
sim_slopes(idx,4)=ii;
end
end
end
Apologies for the code not being well optimized, I'm still relatively new to MATLAB. I did not use derivatives because my data is relatively noisy and derivation might have made it worse.
I've managed to get the get the code to find the longest straight part of the plot by splitting the data up into sections called x_div_size then performing a robustfit on each section, the results of which are written into reg_data. The code then runs through reg_data and finds which lines have the most similar slopes, determined by the K factor, by calculating the average of the slopes in a section of the plot and makes a note of it in sim_slopes. It then finds the longest interval with max(sim_slopes(:,4)-sim_slopes(:,3)) and performs a regression on it to give the final answer.
The problem is that it will only consider the first straight portion that it comes across. When the data is plotted, it has a few parts where it seems straightest:
As an example, when I run the script with answer2 = {'0.2','0','0.0038','0.3'} I get the following, where the black line is the straightest part found by the code:
I have the following questions:
It's clear that from about x = 0.04 to x = 0.2 there is a long straight part and I'm not sure why the script is not finding it. Playing around with different values the script always seems to pick the first longest straight part, ignoring subsequent ones.
MATLAB complains that Warning: Iteration limit reached. because there are more than 50 regressions to perform. Is there a way to bypass this limit on robustfit?
When generating sim_slopes there might be section of the plot whose slope is too different from the average of the previous slopes so it gets marked as the end of a long section. But that section sometimes is sandwiched between several other sections on either side which instead have similar slopes. How would it be possible to tell the script to ignore one wayward section and to continue as if it falls within the tolerance allowed by the K value?
Take a look at the Douglas-Peucker algorithm. If you think of your (x,y) values as the vertices of an (open) polygon, this algorithm will simplify it for you, such that the largest distance from the simplified polygon to the original is smaller than some threshold you can choose. The simplified polygon will be the set of straight lines. Find the two vertices that are furthest apart, and you're done.
MATLAB has an implementation in the Mapping Toolbox called reducem. You might also find an implementation on the File Exchange (but be careful, there is also really bad code on there). Or, you can roll your own, it's quite a simple algorithm.
You can also try using the ischange function to detect changes in the intercept and slope of the data, and then extract the longest portion from that.
Using the sample data you provided, here is what I see from a basic attempt:
>> T = readtable('Data.csv');
>> T = rmmissing(T); % Remove rows with NaN
>> T = groupsummary(T,'Var1','mean'); % Average duplicate timestamps
>> [tf,slopes,intercepts] = ischange(T.mean_Var2, 'linear', 'SamplePoints', T.Var1); % find changes
>> plot(T.Var1, T.mean_Var2, T.Var1, slopes.*T.Var1 + intercepts)
which generates the plot
You should be able to extract the longest segment based on the indices given by find(tf).
You can also tune the parameters of ischange to get fewer or more segments. Adding the name-value pair 'MaxNumChanges' with a value of 4 or 5 produces more linear segments with a tighter fit to the curve, for example, which effectively removes the kink in the plot that you see.
let us suppose we have following code
function plot_test(x,y)
x_constucted=[ones(size(x)) x];
b = regress(y,x_constucted);
y_predicted=b(1)+b(2)*x;
scatter(x,y);
hold on
plot(x,y_predicted);
theString = sprintf('y = %.3f*x+%.3f ', b(2), b(1));
text(x(1), y_predicted(1), theString, 'FontSize', 8);
end
output of this equation is the following figure
my question is : how to align equation out of line? for instance on top left size? thanks in advance
If I understand you correctly, you want to move the printed equation out of the dots. Check out the text() function description. The first two values define the x and y position in your plot for the text.
x=1;
y=25;
To move it up, use the new variables in text(x,y,...). Hope that helps.
Some time ago I was looking for a solution for the same exact problem. As you may know, the legend command allows to specify a Location parameter and one of its many options is called best, described in the official Matlab documentation (here) as follows:
Inside axes where least conflict occurs with plot data
My workaround abuses this feature in order to find the best location to place a single text annotation inside the plot. The code below uses a build-in dataset since you didn't specify how your data looks like:
load carsmall;
x = [ones(size(Horsepower)) Horsepower];
y = MPG;
b = regress(y,x);
y_hat = b(1) + b(2) .* Horsepower;
scatter(Horsepower,y);
hold on;
plot(Horsepower,y_hat);
text_at_best(sprintf('y = %.3f*x+%.3f ',b(2),b(1)),'FontSize',12);
function h = text_at_best(txt,varargin)
l = legend(txt,[varargin{:}]);
t = annotation('textbox',varargin{:});
t.String = txt;
t.Position = l.Position;
t.LineStyle = 'None';
delete(l);
if nargout
h = t;
end
end
Here is the final result:
I don't know if this can fit your needs... but developing an algorithm for finding a non overlapping part of the plot in which to place a text looked like an overkill to me. Despite the text being quite far from the prediction line, it's still elegant, clear and comprehensible. The same goes with an even quicker workaround which consists in setting the regression equation as the plot title (blink blink).
I'm trying to estimate the (unknown) original datapoints that went into calculating a (known) moving average. However, I do know some of the original datapoints, and I'm not sure how to use that information.
I am using the method given in the answers here: https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average, but in MATLAB (my code below). This method works quite well for large numbers of data points (>1000), but less well with fewer data points, as you'd expect.
window = 3;
datapoints = 150;
data = 3*rand(1,datapoints)+50;
moving_averages = [];
for i = window:size(data,2)
moving_averages(i) = mean(data(i+1-window:i));
end
length = size(moving_averages,2)+(window-1);
a = (tril(ones(length,length),window-1) - tril(ones(length,length),-1))/window;
a = a(1:length-(window-1),:);
ai = pinv(a);
daily = mtimes(ai,moving_averages');
x = 1:size(data,2);
figure(1)
hold on
plot(x,data,'Color','b');
plot(x(window:end),moving_averages(window:end),'Linewidth',2,'Color','r');
plot(x,daily(window:end),'Color','g');
hold off
axis([0 size(x,2) min(daily(window:end))-1 max(daily(window:end))+1])
legend('original data','moving average','back-calculated')
Now, say I know a smattering of the original data points. I'm having trouble figuring how might I use that information to more accurately calculate the rest. Thank you for any assistance.
You should be able to calculate the original data exactly if you at any time can exactly determine one window's worth of data, i.e. in this case n-1 samples in a window of length n. (In your case) if you know A,B and (A+B+C)/3, you can solve now and know C. Now when you have (B+C+D)/3 (your moving average) you can exactly solve for D. Rinse and repeat. This logic works going backwards too.
Here is an example with the same idea:
% the actual vector of values
a = cumsum(rand(150,1) - 0.5);
% compute moving average
win = 3; % sliding window length
idx = hankel(1:win, win:numel(a));
m = mean(a(idx));
% coefficient matrix: m(i) = sum(a(i:i+win-1))/win
A = repmat([ones(1,win) zeros(1,numel(a)-win)], numel(a)-win+1, 1);
for i=2:size(A,1)
A(i,:) = circshift(A(i-1,:), [0 1]);
end
A = A / win;
% solve linear system
%x = A \ m(:);
x = pinv(A) * m(:);
% plot and compare
subplot(211), plot(1:numel(a),a, 1:numel(m),m)
legend({'original','moving average'})
title(sprintf('length = %d, window = %d',numel(a),win))
subplot(212), plot(1:numel(a),a, 1:numel(a),x)
legend({'original','reconstructed'})
title(sprintf('error = %f',norm(x(:)-a(:))))
You can see the reconstruction error is very small, even using the data sizes in your example (150 samples with a 3-samples moving average).
There are two related things I would like to ask help with.
1) I'm trying to shift a "semi-log" chart (using semilogy) such that the new line passes through a given point on the chart, but still appears to be parallel to the original.
2) Shift the "line" exactly as in 1), but then also invert the slope.
I think that the desired results are best illustrated with an actual chart.
Given the following code:
x = [50 80];
y = [10 20];
all_x = 1:200;
P = polyfit(x, log10(y),1);
log_line = 10.^(polyval(log_line,all_x));
semilogy(all_x,log_line)
I obtain the following chart:
For 1), let's say I want to move the line such that it passes through point (20,10). The desired result would look something like the orange line below (please note that I added a blue dot at the (20,10) point only for reference):
For 2), I want to take the line from 1) and take an inverse of the slope, so that the final result looks like the orange line below:
Please let me know if any clarifications are needed.
EDIT: Based on Will's answer (below), the solution is as follows:
%// to shift to point (40, 10^1.5)
%// solution to 1)
log_line_offset = (10^1.5).^(log10(log_line)/log10(10^1.5) + 1-log10(log_line(40))/log10(10^1.5));
%// solution to 2)
log_line_offset_inverted = (10^1.5).^(1 + log10(log_line(40))/log10(10^1.5) - log10(log_line)/log10(10^1.5));
To do transformations described as linear operations on logarithmic axes, perform those linear transformations on the logarithm of the values and then reapply the exponentiation. So for 1):
log_line_offset = 10.^(log10(log_line) + 1-log10(log_line(20)));
And for 2):
log_line_offset_inverted = 10.^(2*log10(log_line_offset(20)) - log10(log_line_offset));
or:
log_line_offset_inverted = 10.^(1 + log10(log_line(20)) - log10(log_line));
These can then be plot with semilogy in the same way:
semilogy(all_x,log_line,all_x, log_line_offset, all_x,log_line_offset_inverted)
I can't guarantee that this is a sensible solution for the application that you're creating these plots and their underlying data though. It seems an odd way to describe the problem, so you might be better off creating these offsets further up the chain of calculation.
For example, log_line_offset can just as easily be calculated using your original code but for an x value of [20 50], but whether that is a meaningful way to treat the data may depend on what it's supposed to represent.
I have this MATLAB code and I´m trying to implement the method explained in top answer of this question: https://stats.stackexchange.com/questions/12546/software-package-to-solve-l-infinity-norm-linear-regression
Here is the code that I´m using that starts with the data points:
x = [
0
0.101010101010101
0.202020202020202
0.303030303030303
0.404040404040404
0.505050505050505
0.606060606060606
0.707070707070707
0.808080808080808
0.909090909090909
];
y = [
0.052993311292562
14.923120014175920
1.974502763975613
-2.205773310050583
-0.052548781318830
2.935428041987883
0.134606520161892
0.146742215922384
-0.418386565682831
1.702041272689124
];
A1 = [x,ones(length(y),1),-ones(length(y),1)];
A2 = [-x,-ones(length(y),1),-ones(length(y),1)];
A = [A1;A2];
f = [0;0;1];
linprog(f,A,[y;-y])
The point is to find the the parameters (slope and intersection) of the best fit, i.e. a line, by minimizing the L-infinity norm of the residuals between the line and data points. I have made the same problem work for ordinary least squares (minimizing the L-2 norm) as well as for the L-1 fit. The line plotted from those methods fit really nicely between the data points. But can't seem to make this L-infinity fit work no matter what I do so I come to you for help, any tips appreciated.
The sign of t in your inequalities is wrong. Try
A1 = [x,ones(length(y),1),-ones(length(y),1)];
A2 = [-x,-ones(length(y),1),-ones(length(y),1)];