Assume y is a vector with random numbers following the distribution f(x)=sqrt(4-x^2)/(2*pi). At the moment I use the command hist(y,30). How can I plot the distribution function f(x)=sqrt(4-x^2)/(2*pi) into the same histogram?
Instead of normalizing numerically, you could also do it by finding a theoretical scaling factor as follows.
nbins = 30;
nsamples = max(size(y));
binsize = (max(y)-min(y)) / nsamples
hist(y,nbins)
hold on
x1=linspace(min(y),max(y),100);
scalefactor = nsamples * binsize
y1=scalefactor * sqrt(4-x^2)/(2*pi)
plot(x1,y1)
Update: How it works.
For any dataset that is large enough to give a good approximation to the pdf (call it f(x)), the integral of f(x) over this domain will be approximately unity. However we know that the area under any histogram is precisely equal to the total number of samples times the bin-width.
So a very simple scale factor to bring the pdf into line with the histogram is Ns*Wb, the total number of sample point times the width of the bins.
Let's take an example of another distribution function, the standard normal. To do exactly what you say you want, you do this:
nRand = 10000;
y = randn(1,nRand);
[myHist, bins] = hist(y,30);
pdf = normpdf(bins);
figure, bar(bins, myHist,1); hold on; plot(bins,pdf,'rx-'); hold off;
This is probably NOT what you actually want though. Why? You'll notice that your density function looks like a thin line at the bottom of your histogram plot. This is because a histogram is counts of numbers in bins, while a density function is normalized to integrate to one. If you have hundreds of items in a bin, there is no way that the density function will match that in scale, so you have a scaling or normalization problem. Either you have to normalize the histogram, or plot a scaled distribution function. I prefer to scale the distribution function so that my counts are sensical when I look at the histogram:
normalizedpdf = pdf/sum(pdf)*sum(myHist);
figure, bar(bins, myHist,1); hold on; plot(bins,normalizedpdf,'rx-'); hold off;
Your case is the same, except you'll use the function f(x) you specified instead of the normpdf command.
Let me add another example to the mix:
%# some normally distributed random data
data = randn(1e3,1);
%# histogram
numbins = 30;
hist(data, numbins);
h(1) = get(gca,'Children');
set(h(1), 'FaceColor',[.8 .8 1])
%# figure out how to scale the pdf (with area = 1), to the area of the histogram
[bincounts,binpos] = hist(data, numbins);
binwidth = binpos(2) - binpos(1);
histarea = binwidth*sum(bincounts);
%# fit a gaussian
[muhat,sigmahat] = normfit(data);
x = linspace(binpos(1),binpos(end),100);
y = normpdf(x, muhat, sigmahat);
h(2) = line(x, y*histarea, 'Color','b', 'LineWidth',2);
%# kernel estimator
[f,x,u] = ksdensity( data );
h(3) = line(x, f*histarea, 'Color','r', 'LineWidth',2);
legend(h, {'freq hist','fitted Gaussian','kernel estimator'})
Related
I want to create the 3D plot of the probability density functions of my variable. I have a matrix with dimensions 189x10000, where rows correspond to the time and columns are results of the simulation. Can somebody help me to create a density plot over time? I want my plot to look like this:
A = [1:185]'; % substitute for date vector
K = linspace( -20, 20, 100);
f = zeros(185,100);
xi = zeros(185,100);
r = normrnd(0,1,[185,10000]);
for i=1:185
[f(i,:),xi(i,:)] = ksdensity(r(I,:));
end
a = figure;
meshc(A, K', f')
datetick('x', 'yyyy')
view(85, 50)
set(gca, 'YLim', [-15, 10])
set(gca, 'XLim', [A(1), A(end)])
xlabel('Time')
With this code I get this:
Replace random numbers with density distribution.
If you want a finer grid, then use more points. Your actual data has 10-times as much, right? Otherwise this is as good as it gets; "improving" your plot, e.g. smooth over your data, is more data-doctoring than science.
Solution provided by Adriaan.
I have this 3D plot that I'm making but the data are not linear. This implies that on my plot, the distance between the ticks that I want to show is not equal. How can I adapt the scale of the x and y axis so that this is the case, i.e. so that the axis gets divided into equal parts with the current ticks?
I want the same ticks and tick labels, but that they just have an equal distance in between them on the axes, in stead of small between 0.1 and 0.5 and large between 1 and 5.
The current plot looks like this:
RMSEval = xlsread('RMSEvalues.xlsx');
X = RMSEval(:,1);
Y = RMSEval(:,2);
Z = RMSEval(:,3);
figure(1);
xi = linspace(min(X),max(X),30);
yi= linspace(min(Y),max(Y),30);
[XI,YI] = meshgrid(xi,yi);
ZI = griddata(X,Y,Z,XI,YI);
contourf(XI,YI,ZI);
colormap('jet');
xticks([1e-13 5e-13 1e-12 5e-12 1e-11]);
yticks([1e-18 5e-18 1e-17 5e-17 1e-16]);
colorbar;
A plot where the distance between 0.1 and 0.5 is the same as between 1 and 5 would be a log plot, specifically a log-log plot since you want it on both axes. One way to achieve this would be to transform the X and Y values of your data logarithmically and modify the tick mark labels to match the untransformed values rather than the logarithmic ones you're actually plotting.
A rough guess at a solution is below. I say a rough guess because without posting the data you're importing from the xlsx file or a paired down version of that data (as in an MWE) I can't actually test it.
RMSEval = xlsread('RMSEvalues.xlsx');
X = log(RMSEval(:,1));
Y = log(RMSEval(:,2));
Z = RMSEval(:,3);
figure(1);
xi = linspace(min(X),max(X),30);
yi= linspace(min(Y),max(Y),30);
[XI,YI] = meshgrid(xi,yi);
ZI = griddata(X,Y,Z,XI,YI);
contourf(XI,YI,ZI);
colormap('jet');
xticks(log([1e-13 5e-13 1e-12 5e-12 1e-11]));
xticklabels(cellfun(#num2str,num2cell(),'UniformOutput',false));
yticks(log([1e-18 5e-18 1e-17 5e-17 1e-16]));
yticklabels(cellfun(#num2str,num2cell([1e-18 5e-18 1e-17 5e-17 1e-16]),'UniformOutput',false));
colorbar;
I haven't used MATLAB in a while and I am stuck on a small detail. I would really appreciate it if someone could help me out!
So I am trying to plot a transfer function using a specific function called freqs but I can't figure out how I can label specific points on the graph.
b = [0 0 10.0455]; % Numerator coefficients
a = [(1/139344) (1/183.75) 1]; % Denominator coefficients
w = logspace(-3,5); % Frequency vector
freqs(b,a,w)
grid on
I want to mark values at points x=600 Hz and 7500 Hz with a marker or to be more specific, points (600,20) and (7500,-71), both of which should lie on the curve. For some reason, freqs doesn't let me do that.
freqs is very limited when you want to rely on it plotting the frequency response for you. Basically, you have no control on how to modify the graph on top of what MATLAB generates for you.
Instead, generate the output response in a vector yourself, then plot the magnitude and phase of the output yourself so that you have full control. If you specify an output when calling freqs, you will get the response of the system.
With this, you can find the magnitude of the output by abs and the phase by angle. BTW, (600,20) and (7500,-71) make absolutely no sense unless you're talking about magnitude in dB.... which I will assume is the case for the moment.
As such, we can reproduce the plot that freqs gives by the following. The key is to use semilogx to get a semi-logarithmic graph on the x-axis. On top of this, declare those points that you want to mark on the magnitude, so (600,20) and (7500,-71):
%// Your code:
b = [0 0 10.0455]; % Numerator coefficients
a = [(1/139344) (1/183.75) 1]; % Denominator coefficients
w = logspace(-3,5); % Frequency vector
%// New code
h = freqs(b,a,w); %// Output of freqs
mag = 20*log10(abs(h)); %// Magnitude in dB
pha = (180/pi)*angle(h); %// Phase in degrees
%// Declare points
wpt = [600, 7500];
mpt = [20, -71];
%// Plot the magnitude as well as markers
figure;
subplot(2,1,1);
semilogx(w, mag, wpt, mpt, 'r.');
xlabel('Frequency');
ylabel('Magnitude (dB)');
grid;
%// Plot phase
subplot(2,1,2);
semilogx(w, pha);
xlabel('Frequency');
ylabel('Phase (Degrees)');
grid;
We get this:
If you check what freqs generates for you, you'll see that we get the same thing, but the magnitude is in gain (V/V) instead of dB. If you want it in V/V, then just plot the magnitude without the 20*log10() call. Using your data, the markers I plotted are not on the graph (wpt and mpt), so adjust the points to whatever you see fit.
There are a couple issues before we attempt to answer your question. First, there is no data-point at 600Hz or 7500Hz. These frequencies fall between data-points when graphed using the freqs command. See the image below, with datatips added interactively. I copy-pasted your code to generate this data.
Second, it does not appear that either (600,20) or (7500,-71) lie on the curves, at least with the data as you entered above.
One solution is to use plot a marker on the desired position, and use a "text" object to add a string describing the point. I put together a script using your data, to generate this figure:
The code is as follows:
b = [0 0 10.0455];
a = [(1/139344) (1/183.75) 1];
w = logspace(-3,5);
freqs(b,a,w)
grid on
figureHandle = gcf;
figureChildren = get ( figureHandle , 'children' ); % The children this returns may vary.
axes1Handle = figureChildren(1);
axes2Handle = figureChildren(2);
axes1Children = get(axes1Handle,'children'); % This should be a "line" object.
axes2Children = get(axes2Handle,'children'); % This should be a "line" object.
axes1XData = get(axes1Children,'xdata');
axes1YData = get(axes1Children,'ydata');
axes2XData = get(axes2Children,'xdata');
axes2YData = get(axes2Children,'ydata');
hold(axes1Handle,'on');
plot(axes1Handle,axes1XData(40),axes1YData(40),'m*');
pointString1 = ['(',num2str(axes1XData(40)),',',num2str(axes1YData(40)),')'];
handleText1 = text(axes1XData(40),axes1YData(40),pointString1,'parent',axes1Handle);
hold(axes2Handle,'on');
plot(axes2Handle,axes2XData(40),axes2YData(40),'m*');
pointString2 = ['(',num2str(axes2XData(40)),',',num2str(axes2YData(40)),')'];
handleText2 = text(axes2XData(40),axes2YData(40),pointString2,'parent',axes2Handle);
I have a set of data with over 4000 points. I want to exclude grooves from them, ideally from the point from which they start. The data look for example like this:
The problem with this is the noise I get at the top of the plateaus. I have an idea, in which I would take an average value of the most common within some boundaries (again, ideally sth like the red line here:
and then I would construct a temporary matrix, which would fill up one by one with Y if they are less than this average. If the Y(i) would rise above average, the matrix would find its minima and compare it with the global minima. If the temporary matrix's minima wouldn't be sth like 80% of the global minima, it would be discarded as noise.
I've tried using mean(Y), interpolating and fitting it in a polynomial (the green line) - none of those method would cut it to the point I would be satisfied.
I need this to be extremely robust and it doesn't need to be quick. The top and bottom values can vary a lot, as well as the shape of the plateaus. The groove width is more or less the same.
Do you have any ideas? Again, the point is to extract the values that would make the groove.
How about a median filter?
Let's define some noisy data similar to yours, and plot it in blue:
x = .2*sin((0:9999)/1000); %// signal
x(1000:1099) = x(1000:1099) + sin((0:99)/50*pi); %// noise: spike
x(5000:5199) = x(5000:5199) - sin((0:199)/100*pi); %// noise: wider spike
x = x + .05*sin((0:9999)/10); %// noise: high-freq ripple
plot(x)
Now apply the median filter (using medfilt2 from the Image Processing Toolbox) and plot in red. The parameter k controls the filter memory. It should chosen to be large compared to noise variations, and small compared to signal variations:
k = 500; %// filter memory. Choose as needed
y = medfilt2(x,[1 k]);
hold on
plot(y, 'r', 'linewidth', 2)
In case you don't have the image processing toolbox and can't use medfilt2 a method that's more manual. Skip the extreme values, and do a curve fit with sin1 as curve type. Note that this will only work if the signal is in fact a sine wave!
x = linspace(0,3*pi,1000);
y1 = sin(x) + rand()*sin(100*x).*(mod(round(10*x),5)<3);
y2 = 20*(mod(round(5*x),5) == 0).*sin(20*x);
y = y1 + y2; %// A messy sine-wave
yy = y; %// Store the messy sine-wave
[~, idx] = sort(y);
y(idx(1:round(0.15*end))) = y(idx(round(0.15*end))); %// Flatten out the smallest values
y(idx(round(0.85*end):end)) = y(idx(round(0.85*end)));%// Flatten out the largest values
[foo goodness output] = fit(x.',y.', 'sin1'); %// Do a curve fit
plot(foo,x,y) %// Plot it
hold on
plot(x,yy,'black')
Might not be perfect, but it's a step in the right direction.
I have a simple loglog curve as above. Is there some function in Matlab which can fit this curve by segmented lines and show the starting and end points of these line segments ? I have checked the curve fitting toolbox in matlab. They seems to do curve fitting by either one line or some functions. I do not want to curve fitting by one line only.
If there is no direct function, any alternative to achieve the same goal is fine with me. My goal is to fit the curve by segmented lines and get locations of the end points of these segments .
First of all, your problem is not called curve fitting. Curve fitting is when you have data, and you find the best function that describes it, in some sense. You, on the other hand, want to create a piecewise linear approximation of your function.
I suggest the following strategy:
Split manually into sections. The section size should depend on the derivative, large derivative -> small section
Sample the function at the nodes between the sections
Find a linear interpolation that passes through the points mentioned above.
Here is an example of a code that does that. You can see that the red line (interpolation) is very close to the original function, despite the small amount of sections. This happens due to the adaptive section size.
function fitLogLog()
x = 2:1000;
y = log(log(x));
%# Find section sizes, by using an inverse of the approximation of the derivative
numOfSections = 20;
indexes = round(linspace(1,numel(y),numOfSections));
derivativeApprox = diff(y(indexes));
inverseDerivative = 1./derivativeApprox;
weightOfSection = inverseDerivative/sum(inverseDerivative);
totalRange = max(x(:))-min(x(:));
sectionSize = weightOfSection.* totalRange;
%# The relevant nodes
xNodes = x(1) + [ 0 cumsum(sectionSize)];
yNodes = log(log(xNodes));
figure;plot(x,y);
hold on;
plot (xNodes,yNodes,'r');
scatter (xNodes,yNodes,'r');
legend('log(log(x))','adaptive linear interpolation');
end
Andrey's adaptive solution provides a more accurate overall fit. If what you want is segments of a fixed length, however, then here is something that should work, using a method that also returns a complete set of all the fitted values. Could be vectorized if speed is needed.
Nsamp = 1000; %number of data samples on x-axis
x = [1:Nsamp]; %this is your x-axis
Nlines = 5; %number of lines to fit
fx = exp(-10*x/Nsamp); %generate something like your current data, f(x)
gx = NaN(size(fx)); %this will hold your fitted lines, g(x)
joins = round(linspace(1, Nsamp, Nlines+1)); %define equally spaced breaks along the x-axis
dx = diff(x(joins)); %x-change
df = diff(fx(joins)); %f(x)-change
m = df./dx; %gradient for each section
for i = 1:Nlines
x1 = joins(i); %start point
x2 = joins(i+1); %end point
gx(x1:x2) = fx(x1) + m(i)*(0:dx(i)); %compute line segment
end
subplot(2,1,1)
h(1,:) = plot(x, fx, 'b', x, gx, 'k', joins, gx(joins), 'ro');
title('Normal Plot')
subplot(2,1,2)
h(2,:) = loglog(x, fx, 'b', x, gx, 'k', joins, gx(joins), 'ro');
title('Log Log Plot')
for ip = 1:2
subplot(2,1,ip)
set(h(ip,:), 'LineWidth', 2)
legend('Data', 'Piecewise Linear', 'Location', 'NorthEastOutside')
legend boxoff
end
This is not an exact answer to this question, but since I arrived here based on a search, I'd like to answer the related question of how to create (not fit) a piecewise linear function that is intended to represent the mean (or median, or some other other function) of interval data in a scatter plot.
First, a related but more sophisticated alternative using regression, which apparently has some MATLAB code listed on the wikipedia page, is Multivariate adaptive regression splines.
The solution here is to just calculate the mean on overlapping intervals to get points
function [x, y] = intervalAggregate(Xdata, Ydata, aggFun, intStep, intOverlap)
% intOverlap in [0, 1); 0 for no overlap of intervals, etc.
% intStep this is the size of the interval being aggregated.
minX = min(Xdata);
maxX = max(Xdata);
minY = min(Ydata);
maxY = max(Ydata);
intInc = intOverlap*intStep; %How far we advance each iteraction.
if intOverlap <= 0
intInc = intStep;
end
nInt = ceil((maxX-minX)/intInc); %Number of aggregations
parfor i = 1:nInt
xStart = minX + (i-1)*intInc;
xEnd = xStart + intStep;
intervalIndices = find((Xdata >= xStart) & (Xdata <= xEnd));
x(i) = aggFun(Xdata(intervalIndices));
y(i) = aggFun(Ydata(intervalIndices));
end
For instance, to calculate the mean over some paired X and Y data I had handy with intervals of length 0.1 having roughly 1/3 overlap with each other (see scatter image):
[x,y] = intervalAggregate(Xdat, Ydat, #mean, 0.1, 0.333)
x =
Columns 1 through 8
0.0552 0.0868 0.1170 0.1475 0.1844 0.2173 0.2498 0.2834
Columns 9 through 15
0.3182 0.3561 0.3875 0.4178 0.4494 0.4671 0.4822
y =
Columns 1 through 8
0.9992 0.9983 0.9971 0.9955 0.9927 0.9905 0.9876 0.9846
Columns 9 through 15
0.9803 0.9750 0.9707 0.9653 0.9598 0.9560 0.9537
We see that as x increases, y tends to decrease slightly. From there, it is easy enough to draw line segments and/or perform some other kind of smoothing.
(Note that I did not attempt to vectorize this solution; a much faster version could be assumed if Xdata is sorted.)