2D Kernel Density Estimate in Matlab - matlab

I am using this function to estimate kernel density in 2D. I am slightly confused by the parameters of this function however.
Here is an example, viewed from directly above, where density is being calculated at each point (O) in the figure. i.e: over very small areas.
I want to change the KDE function parameters so that density is computed over a larger area (for example, the area circled in red). Which parameters do I need to change? I presume it is one (or both) of these:
"n: size of the n by n grid over which the density is computed (default 2^8)"
OR:
"MIN_XY, MAX_XY: limits of the bounding box over which the density is computed". The default limits are computed as:
MAX = max(data,[],1);
MIN = min(data,[],1);
Range = MAX-MIN;
MAX_XY = MAX+Range/4;
MIN_XY = MIN-Range/4;

I have run some tests with this function and the solution is to use lower values of n. Here is a series of comparison figures, using the same dataset. The value of n is shown in the title (all other parameters are kept constant):

Related

Poisson fit curve over histogram plot

I 'd like to fit my empirical data to a poisson distribution curve.
I have the mean given value, say 2.3, and data (empirical).
def fit_poisson(data=None,network=None,mu=2.3):
sns.set_theme()
fig, ax = plt.subplots(1, 1)
x = np.arange(poisson.ppf(0.01, mu),
poisson.ppf(0.99, mu))
sns.histplot(data, stat='density')
plt.plot(x, poisson.pmf(x, mu))
It plots:
Apparently, there's is a range issue in y, here. Maybe a problem with lambda? How do I properly fit my empirical histogram to a poisson distribution curve of same mean?
Poisson random variables are discrete: their y value is "probability" not "density". But the default behavior of histplot avoids guessing that you have discrete data, and it is choosing bins with binwidth < 1 in this case.
Because density normalization forces the area of all bars to sum to 1, that means the density value for the bar containing observations of a certain value will be greater than the probability mass on that value.
There are two relevant parameters here:
stat="probability" will make the heights of the bars sum to 1, so they will match the PMF (assuming binwidth < 2, so that only one unique value appears in each bar)
discrete=True, which sets binwidth=1 (and aligns the center of each bar with integral values)
sns.histplot(data, stat='probability', discrete=True, shrink=.8)
I've also added shrink=0.8, which draws the bars a bit narrower than the binwidth; this helps emphasize the discrete nature of the data.
(Note that with discrete=True (implying binwidth=1), density and probability normalization will do the same thing so that's actually all you need, but Probability is the right y axis label to use here).

How to compute histogram using three variables in MATLAB?

I have three variables, e.g., latitude, longitude and temperature. For each latitude and longitude, I have corresponding temperature value. I want to plot latitude v/s longitude plot in 5 degree x 5 degree grid , with mean temperature value inserted in that particular grid instead of occurring frequency.
Data= [latGrid,lonGrid] = meshgrid(25:45,125:145);
T = table(latGrid(:),lonGrid(:),randi([0,35],size(latGrid(:))),...
'VariableNames',{'lat','lon','temp'});
At the end, I need it somewhat like the following image:
Sounds to me like you want to scale your grid. The easiest way to do this is to smooth and downsample.
While 2d histograms also bin values into a grid, using a histogram is not the way to find the mean of datapoints in a smooth grid. A histogram counts the occurrence of values in a set of ranges. In a 2d example, a histogram would take the input measurements [1, 3, 3, 5] and count the number of ones, the number of threes, etc. A 2d histogram will count occurrences of pairs of numbers. (You might want to use histogram to help organize a measurements taken at irregular intervals, but that would be a different question)
How to smooth and downsample without the Image Processing Toolbox
Keep your data in the 2d matrix format rather than reshaping it into a table. This makes it easier to find the neighbors of each grid location.
%% Sample Data
[latGrid,lonGrid] = meshgrid(25:45,125:145);
temp = rand(size(latGrid));
There are many tools in Matlab for smoothing matrices. If you want to have the mean of a 5x5 window. You can write a for-loop, use a convolution, or use filter2. My example uses convolution. For more on convolutional filters, I suggest the wikipedia page.
%% Mean filter with conv2
M = ones(5) ./ 25; % 5x5 mean or box blur filter
C_temp = conv2(temp, M, 'valid');
C_temp is a blurry version of the original temperature variable with a slightly smaller size because we can't accurately take the mean of the edges. The border is reduced by a frame of 2 measurements. Now, we just need to take every fifth measurement from C_temp to scale down the grid.
%% Subsample result
C_temp = C_temp(1:5:end, 1:5:end);
% Because we removed a border from C_temp, we also need to remove a border from latGrid and lonGrid
[h, w] = size(latGrid)
latGrid = latGrid(5:5:h-5, 5:5:w-5);
lonGrid = lonGrid(5:5:h-5, 5:5,w-5);
Here's what the steps look like
If you use a slightly more organized, temp variable. It's easier to see that the result is correct.
With Image Processing Toolbox
imresize has a box filter method option that is equivalent to a mean filter. However, you have to do a little calculation to find the scaling factor that is equivalent to using a 5x5 window.
C_temp = imresize(temp, scale, 'box');

Canny edge detector with a single threshold value

I'm trying to port some Matlab code to C++.
I've come across this line:
edges = edge(gray,'canny',0.1);
The output for the sample image is a completely black image. I want to reproduce the same behaviour using cv::Canny. What values should I use for low threshold and high threshold?
Sample:
Output:
In the line above you have not defined a threshold, probably it takes zero then, thus delivering a black picture. Also, you use a sigma of 0.1 which means virtually no Gauss Blur in the first Canny step. Within Matlab you can get an optimized threhold by:
[~, th] = edge(gray,'canny');
and then apply the optimized threshold th multiplied by some factor f (from my experience f should be between 1-3), you have to try out:
edges=edge(gray,'canny',f*th,'both', sigma);
sigma is sqrt(2) by default (you used 0.1 above). Following remarks:
Matlab calculated the optimized threshold as a percentile of the distribution of intensity gradients (you can see the construction of edge() if you enter "edit edge", if I remember correctly)
the above parameter th is a vector consisting of the low and high threshold. Matlab always uses low_threshold = 0.4* high_threshold

How to get FFT units for spatial 1D signal?

Note : Answers with dimensions (cy/mm) instead of (Hz) would be greatly appreciated!
I have looked into unit of fft(DFT) x axis and units on x axis after FFT , but I couldn't find exactly what I'm looking for. So, I decided to write my own question.
I have an object in the real world that has a width of 3mm that scans a bigger object of width 310mm. I can model this as a linear system with a convolution process with a rectangular function of width 3mm as follows:
g(x) = f(x) * rect(x/3mm)
where f(x) is the 310mm object, and g(x) is the output of the scanned object. In MATLAB, I'm trying to simulate this effect, so I imported the 310mm object as a high resolution array with each point in the array corresponds to 1mm.
Now, to model the rect(x/3mm) I'm thinking of two way to do it, and I'm not sure which one is correct.
1- Is to say that rect(x/3mm) is equal to 3 points of the high resolution array, then can be created as a kernel of zeros of the same size as the high resolution object with only 3 point of 1's. Now the actual spacing between the kernel point = 1mm. How can you show the proper FFT scale of this rect(x/3mm) model?
2- I can define in MATLAB
d = 3; % in mm
value = 2;
spacing = 0.01;
x = -value: spacing : value - spacing
y = rect(x/d);
the second way will provide me with a rect(x/3mm) that can be coarsely of finely created depending on spacing or value, but the first way was only 3 points. How can you show the proper FFT scale of this rect(x/3mm) model?
Which of the two methods is the correct way of doing it?

Define the width of peak in Matlab

I'm trying to find some peaks in Matlab, but the function findpeaks.m doesn't have the width option. The peaks I want to be detected are in the balls. All the detected are in the red squares. As you can see they have a low width. Any help?
here's the code I use:
[pk,lo] = findpeaks(ecg);
lo2 = zeros(size(lo));
for m = 1:length(lo) - 1
if (ecg(m) - ecg(m+1)) > 0.025
lo2(m) = lo(m);
end
end
p = find(lo2 == 0);
lo2(p) = [];
figure, plot(ecg);
hold on
plot(lo, ecg(lo), 'rs');
By the looks of it you want to characterise each peak in terms of amplitude and width, so that you can apply thresholds (or simmilar) to these values to select only those meeting your criteria (tall and thin).
One way you could do this is to fit a normal distribution to each peak, pegging the mean and amplitude to the value you have found already, and using an optimisation function to find the standard deviation (width of normal distribution).
So, you would need a function which calculates a representation of your data based on the sum of all the gaussian distributions you have, and an error function (mean squared error perhaps) then you just need to throw this into one of matlabs inbuilt optimisation/minimisation functions.
The optimal set of standard deviation parameters would give you the widths of each peak, or at least a good approximation.
Another method, based on Adiel's comment and which is perhaps more appropriate since it looks like you are working on ecg data, would be to also find the local minima (troughs) as well as the peaks. From this you could construct an approximate measure of 'thinness' by taking the x-axis distance between the troughs on either side of a given peak.
You need to define a peak width first, determine how narrow you want your peaks to be and then select them accordingly.
For instance, you can define the width of a peak as the difference between the x-coordinates at which the y-coordinates equal to half of the peak's value (see here). Another approach, (which seems more appropriate here) is to measure the gradient at fixed distances from the peak itself, and selecting the peaks accordingly. In MATLAB, you'll probably use a gradient filter for that :
g = conv(ecg, [-1 0 1], 'same'); %// Gradient filter
idx = g(lo) > thr); %// Indices of narrow peaks
lo = lo(idx);
where thr is the threshold value that you need to determine for yourself. Lower threshold values mean more tolerance for wider peaks.
You need to define what it means to be a peak of interest, and what you mean by the width of that peak. Once you do those things, you are a step ahead.
Perhaps you might locate each peak using find peaks. Then locate the troughs, one of which should lie between each pair of peaks. A trough is simply a peak of -y. Make sure you worry about the first and last peaks/troughs.
Next, define the half height points as the location midway in height between each peak and trough. This can be done using a reverse linear interpolation on the curve.
Finally, the width at half height might be simply the distance (on the x axis) between those two half height points.
Thinking pragmatically, I suppose you could use something along the lines of this simple brute-force approach:
[peaks , peakLocations] = findpeaks(+X);
[troughs, troughLocations] = findpeaks(-X);
width = zeros(size(peaks));
for ii = 1:numel(peaks)
trough_before = troughLocations( ...
find(troughLocations < peakLocations(ii), 1,'last') );
trough_after = troughLocations( ...
find(troughLocations > peakLocations(ii), 1,'first') );
width(ii) = trough_after - trough_before;
end
This will find the distance between the two troughs surrounding a peak of interest.
Use the 'MinPeakHeight' option in findpeaks() to pre-prune your data. By the looks of it, there is no automatic way to extract the peaks you want (unless you somehow have explicit indices to them). Meaning, you'll have to select them manually.
Now of course, there will be many more details that will have to be dealt with, but given the shape of your data set, I think the underlying idea here can nicely solve your problem.