Findpeaks coordinates not matching X axis coordinates (Matlab) - matlab

I have a piece of code that is not performing as intended (at least in my view) and was hoping if someone could help clarify this issue.
The code plots a histogram of my data and the applies a ksdensity function to smooth the data, finally it runs a findpeaks function to return the max values plotted. However the coordinates for the horizontal axis do not correspond to the graphical representation of the plotted data.
MB(A); %array with the data to be plotted
figure;
histogram(MB(A),25)
[f,xi] = ksdensity(MB(A), 'Bandwidth',10);
figure;
plot(xi,f);
[peaks,loc] = findpeaks(f)
the result from this piece of code are that:
peaks =
0.0232 0.0017
loc =
27 76
however when looking at the graphical representation the coordinates of the peaks (for the horizontal axis) are very different from these values
histogram
smoothed data
I originally thought that might be a problem of over or under fitting but after playing around with the values a little bit the issue remained. Am I just missing some basic concept? Any help would be greatly appreciated. Many thanks

The [loc] location is the index of the point, so you would get the graphical x by:
xi(loc)
see matlab help for info on the returned variables:
[PKS,LOCS]= findpeaks(Y) also returns the indices LOCS at which the
peaks occur.

Related

Find volume of 3d peaks in matlab

right now I have a 3d scatter plot with peaks that I need to find the volumes for. My data is from an image, so the x- and y- values indicate the pixel positions on the xy-plane, and the z value is the pixel value for each pixel.
Here's my scatter plot:
scatter3(x,y,z,20,z,'filled')
I am trying to find the "volume" of the peaks of the data, like drawn below:
I've tried findpeaks() but it gives me many local maxima without the the two prominent peaks that I'm looking for. In addition, I'm really stuck on how to establish the "base" of my peaks, because my data is from a scatter plot. I've also tried the convex hull and a linear surface fit, and get this:
But I'm still stuck on how to use any of these commands to establish an automated peak "base" and volume. Please let me know if you have any ideas or code segments to help me out, because I am stumped and I can't find anything on Stack Overflow. Sorry in advance if this is really unclear! Thank you so much!
Here is a suggestion for dealing with this problem:
Define a threshold for z height, or define in any other way which points from the scatter are relevant (the black plane in the leftmost figure below).
Within the resulted points, find clusters on the X-Y plane, to define the different regions to calculate. You will have to define manually how many clusters you want.
for each cluster, perform a Delaunay triangulation to estimate its volume.
Here is an example code for all that:
[x,y,z] = peaks(30); % some data
subplot 131
scatter3(x(:),y(:),z(:),[],z(:),'filled')
title('The original data')
th = 2.5; % set a threshold for z values
hold on
surf([-3 -3 3 3],[-4 4 -4 4],ones(4)*th,'FaceColor','k',...
'FaceAlpha',0.5)
hold off
ind = z>th; % get an index of all values of interest
X = x(ind);
Y = y(ind);
Z = z(ind);
clustNum = 3; % the number of clusters should be define manually
T = clusterdata([X Y],clustNum);
subplot 132
gscatter(X,Y,T)
title('A look from above')
subplot 133
hold on
c = ['rgb'];
for k = 1:max(T)
valid = T==k;
% claculate a triangulation of the data:
DT = delaunayTriangulation([X(valid) Y(valid) Z(valid)]);
[K,v] = convexHull(DT); % get the convex hull indices
% plot the volume:
ts = trisurf(K,DT.Points(:,1),DT.Points(:,2),DT.Points(:,3),...
'FaceColor',c(k));
text(mean(X(valid)),mean(Y(valid)),max(Z(valid))*1.3,...
num2str(v),'FontSize',12)
end
hold off
view([-45 40])
title('The volumes')
Note: this code uses different functions from several toolboxes. In any case that something does not work, first make sure that you have the relevant toolbox, there are alternatives to most of them.
Having already a mesh, maybe you could use the process described in https://se.mathworks.com/matlabcentral/answers/277512-how-to-find-peaks-in-3d-mesh .
If not, making a linear regression on (x,z) or (y,z) plane could make a base for you to find the peaks.
Out of experience in data with plenty of noise, selecting the peaks manually is often faster if you have small set of data to make the analysis. Just plot every peak with its number from findpeaks() and select the ones that are relevant to you. An interpolation to a smoother data can help to solve the problem in the long term (but creates a problem by itself).
Other option will be searching for peaks in the (x,z) and (y,z) planes, then having the amplitude of each peak in an (x) [or (y)] interval and from there make a integration for every area.

matlab scatter3 plot real and imaginary parts over frequency

I've got to vectors called ttre and ttim which contain real and imaginary data over a frequency (from 1 to 64). The fields are looking like this:
ttim 64x10100 single
ttre 64x10100 single
I can easily make a 2D scatter plot of a certain row by using the command
scatter(ttim(40,:),ttre(40,:))
Now, I would like to display all data in a 3D scatter plot where X=real values, Y=imaginary values and Z=[1...64]
I created an array for Z with the number 1 to 64 and copied it to make it the same size as the other variables, by:
z=(1:64)'
z=repmat(z,1,10100)
result:
z 64x10100 double
When I try to plo a 3D scatter plot now, I get the error "Vectors x,yu,z must be of the same size"...however, as far as I understand, they are of the same size.
>> scatter3(ttim,ttre,z)
Error using scatter3 (line 64)
X, Y and Z must be vectors of the same length.
I hope that someone could point me into the right direction here.
Kind regards
scatter3 needs points to plot, so x,yand z should be 1xN , where N is the amount of points your are plotting. I dont know what your data is, so unfortunately I can not help more. Maybe scatter3(ttim(:),ttre(:),z(:)) works, but I do not recommend it for the huge amount of data you have, it may crash your computer.
However, maybe z=1:64 is not the best option. It means that you will have 64 layers (like floors from a building) of scattered data, not sure if that's what you want.

matlab: cdfplot of relative error

The figure shown above is the plot of cumulative distribution function (cdf) plot for relative error (attached together the code used to generate the plot). The relative error is defined as abs(measured-predicted)/(measured). May I know the possible error/interpretation as the plot is supposed to be a smooth curve.
X = load('measured.txt');
Xhat = load('predicted.txt');
idx = find(X>0);
x = X(idx);
xhat = Xhat(idx);
relativeError = abs(x-xhat)./(x);
cdfplot(relativeError);
The input data file is a 4x4 matrix with zeros on the diagonal and some unmeasured entries (represent with 0). Appreciate for your kind help. Thanks!
The plot should be a discontinuous one because you are using discrete data. You are not plotting an analytic function which has an explicit (or implicit) function that maps, say, x to y. Instead, all you have is at most 16 points that relates x and y.
The CDF only "grows" when new samples are counted; otherwise its value remains steady, just because there isn't any satisfying sample that could increase the "frequency".
You can check the example in Mathworks' `cdfplot1 documentation to understand the concept of "empirical cdf". Again, only when you observe a sample can you increase the cdf.
If you really want to "get" a smooth curve, either 1) add more points so that the discontinuous line looks smoother, or 2) find any statistical model of whatever you are working on, and plot the analytic function instead.

Interpolating irregularly spaced 3D matrix in matlab

I have a time series of temperature profiles that I want to interpolate, I want to ask how to do this if my data is irregularly spaced.
Here are the specifics of the matrix:
The temperature is 30x365
The time is 1x365
Depth is 30x1
Both time and depth are irregularly spaced. I want to ask how I can interpolate them into a regular grid?
I have looked at interp2 and TriScatteredInterp in Matlab, however the problem are the following:
interp2 works only if data is in a regular grid.
TriscatteredInterp works only if the vectors are column vectors. Although time and depth are both column vectors, temperature is not.
Thanks.
Function Interp2 does not require for a regularly spaced measurement grid at all, it only requires a monotonic one. That is, sampling positions stored in vectors depths and times must increase (or decrease) and that's all.
Assuming this is indeed is the situation* and that you want to interpolate at regular positions** stored in vectors rdepths and rtimes, you can do:
[JT, JD] = meshgrid(times, depths); %% The irregular measurement grid
[RT, RD] = meshgrid(rtimes, rdepths); %% The regular interpolation grid
TemperaturesOnRegularGrid = interp2(JT, JD, TemperaturesOnIrregularGrid, RT, RD);
* : If not, you can sort on rows and columns to come back to a monotonic grid.
**: In fact Interp2 has no restriction for output grid (it can be irregular or even non-monotonic).
I would use your data to fit to a spline or polynomial and then re-sample at regular intervals. I would highly recommend the polyfitn function. Actually, anything by this John D'Errico guy is incredible. Aside from that, I have used this function in the past when I had data on a irregularly spaced 3D problem and it worked reasonably well. If your data set has good support, which I suspect it does, this will be a piece of cake. Enjoy! Hope this helps!
Try the GridFit tool on MATLAB central by John D'Errico. To use it, pass in your 2 independent data vectors (time & temperature), the dependent data matrix (depth) along with the regularly spaced X & Y data points to use. By default the tool also does smoothing for overlapping (or nearly) data points. If this is not desired, you can override this (and other options) through a wide range of configuration options. Example code:
%Establish regularly spaced points
num_points = 20;
time_pts = linspace(min(time),max(time),num_points);
depth_pts = linspace(min(depth),max(depth),num_points);
%Run interpolation (with smoothing)
Pest = gridfit(depth, time, temp, time_pts, depth_pts);

Matlab cdfplot: how to control the spacing of the marker spacing

I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.