How to get FFT units for spatial 1D signal? - matlab

Note : Answers with dimensions (cy/mm) instead of (Hz) would be greatly appreciated!
I have looked into unit of fft(DFT) x axis and units on x axis after FFT , but I couldn't find exactly what I'm looking for. So, I decided to write my own question.
I have an object in the real world that has a width of 3mm that scans a bigger object of width 310mm. I can model this as a linear system with a convolution process with a rectangular function of width 3mm as follows:
g(x) = f(x) * rect(x/3mm)
where f(x) is the 310mm object, and g(x) is the output of the scanned object. In MATLAB, I'm trying to simulate this effect, so I imported the 310mm object as a high resolution array with each point in the array corresponds to 1mm.
Now, to model the rect(x/3mm) I'm thinking of two way to do it, and I'm not sure which one is correct.
1- Is to say that rect(x/3mm) is equal to 3 points of the high resolution array, then can be created as a kernel of zeros of the same size as the high resolution object with only 3 point of 1's. Now the actual spacing between the kernel point = 1mm. How can you show the proper FFT scale of this rect(x/3mm) model?
2- I can define in MATLAB
d = 3; % in mm
value = 2;
spacing = 0.01;
x = -value: spacing : value - spacing
y = rect(x/d);
the second way will provide me with a rect(x/3mm) that can be coarsely of finely created depending on spacing or value, but the first way was only 3 points. How can you show the proper FFT scale of this rect(x/3mm) model?
Which of the two methods is the correct way of doing it?

Related

How to remove bias when downsampling a vector in Matlab

I have a set of vectors containing some arbitrary shape like a triangle pulse with a single maxima.
I need to downsample these vectors by an integer factor.
The position of the maxima relative to the length of the vector should stay the same.
Below code shows, that when I do this, there is a bias=-0.0085 introduced by the downsampling step which should be zero on average.
The bias doesn't seem to change much depending on the number of vectors (tried between 200 and 800 vectors)
.
I also tried different resampling functions like downsample and decimate leading to the same results.
datapoints = zeros(1000,800);
for ii = 1:size(datapoints,2)
datapoints(ii:ii+18,ii) = [1:10,9:-1:1];
end
%downsample each column of the data
datapoints_downsampled = datapoints(1:10:end,:);
[~,maxinds_downsampled] = max(datapoints_downsampled);
[~,maxinds] = max(datapoints);
%bias needs to be zero
bias = mean(maxinds/size(datapoints,1)-maxinds_downsampled/size(datapoints_downsampled,1))
This graph shows, that there is a systematic bias that does not depend on the number of vectors
How to remove this bias? Is there a way to determine its magnitude given only one vector?
Where does it come from?
There are two main issues with the code:
Dividing the index by the length of the vector leads to a small bias: if the max is at the first element, then 1/1000 is not the same as 1/100, even though the subsampling preserved the element that contained the maximum. This needs to be corrected for by subtracting 1 before the division, and adding 1/1000 after the division.
Subsampling by a factor of 10 leads to a bias as well: since we're determining the integer location only, in 1/10 cases we preserve the location, in 4/10 cases we move the location in one direction, and in 5/10 cases we move the location in the other direction. The solution is to use an odd subsampling factor, or to determine the location of the maximum with sub-sample precision (this requires proper low-pass filtering before subsampling).
The code below is a modification of the code in the OP, it does a scatter plot of the error vs the location, as well as OP's bias plot. The first plot helps identify issue #2 above. I have made the subsampling factor and the offset for subsampling variables, I recommend that you play with these values to understand what is happening. I have also made the location of the maximum random to avoid a sampling bias. Note I also use N/factor instead of size(datapoints_downsampled,1). The size of the downsampled vector is the wrong value to use if N/factor is not integer.
N = 1000;
datapoints = zeros(N,800);
for ii = 1:size(datapoints,2)
datapoints(randi(N-20)+(1:19),ii) = [1:10,9:-1:1];
end
factor = 11;
offset = round(factor/2);
datapoints_downsampled = datapoints(offset:factor:end,:);
[~,maxinds_downsampled] = max(datapoints_downsampled,[],1);
[~,maxinds] = max(datapoints,[],1);
maxpos_downsampled = (maxinds_downsampled-1)/(N/factor) + offset/N;
maxpos = (maxinds)/N;
subplot(121), scatter(maxpos,maxpos_downsampled-maxpos)
bias = cumsum(maxpos_downsampled-maxpos)./(1:size(datapoints,2));
subplot(122), plot(bias)

Finding length between a lot of elements

I have an image of a cytoskeleton. There are a lot of small objects inside and I want to calculate the length between all of them in every axis and to get a matrix with all this data. I am trying to do this in matlab.
My final aim is to figure out if there is any axis with a constant distance between the object.
I've tried bwdist and to use connected components without any luck.
Do you have any other ideas?
So, the end goal is that you want to globally stretch this image in a certain direction (linearly) so that the distances between nearest pairs end up the closest together, hopefully the same? Or may you do more complex stretching ? (note that with arbitrarily complex one you can always make it work :) )
If linear global one, distance in x' and y' is going to be a simple multiplication of the old distance in x and y, applied to every pair of points. So, the final euclidean distance will end up being sqrt((SX*x)^2 + (SY*y)^2), with SX being stretch in x and SY stretch in y; X and Y are distances in X and Y between pairs of points.
If you are interested in just "the same" part, solution is not so difficult:
Find all objects of interest and put their X and Y coordinates in a N*2 matrix.
Calculate distances between all pairs of objects in X and Y. You will end up with 2 matrices sized N*N (with 0 on the diagonal, symmetric and real, not sure what is the name for that type of matrix).
Find minimum distance (say this is between A an B).
You probably already have this. Now:
Take C. Make N-1 transformations, which all end up in C->nearestToC = A->B. It is a simple system of equations, you have X1^2*SX^2+Y1^2*SY^2 = X2^2*SX^2+Y2*SY^2.
So, first say A->B = C->A, then A->B = C->B, then A->B = C->D etc etc. Make sure transformation is normalized => SX^2 + SY^2 = 1. If it cannot be found, the only valid transformation is SX = SY = 0 which means you don't have solution here. Obviously, SX and SY need to be real.
Note that this solution is unique except in case where X1 = X2 and Y1 = Y2. In this case, grab some other point than C to find this transformation.
For each transformation check the remaining points and find all nearest neighbours of them. If distance is always the same as these 2 (to a given tolerance), great, you found your transformation. If not, this transformation does not work and you should continue with the next one.
If you want a transformation that minimizes variations between distances (but doesn't require them to be nearly equal), I would do some optimization method and search for a minimum - I don't know how to find an exact solution otherwise. I would pick this also in case you don't have linear or global stretch.
If i understand your question correctly, the first step is to obtain all of the objects center of mass points in the image as (x,y) coordinates. Then, you can easily compute all of the distances between all points. I suggest taking a look on a histogram of those distances which may provide some information as to the nature of distance distribution (for example if it is uniformly random, or are there any patterns that appear).
Obtaining the center of mass points is not an easy task, consider transforming the image into a binary one, or some sort of background subtraction with blob detection or/and edge detector.
For building a histogram you can use histogram.

Is this the right way of projecting the training set into the eigespace? MATLAB

I have computed PCA using the following :
function [signals,V] = pca2(data)
[M,N] = size(data);
data = reshape(data, M*N,1);
% subtract off the mean for each dimension
mn = mean(data,2);
data = bsxfun(#minus, data, mean(data,1));
% construct the matrix Y
Y = data'*data / (M*N-1);
[V D] = eigs(Y, 10); % reduce to 10 dimension
% project the original data
signals = data * V;
My question is:
Is "signals" is the projection of the training set into the eigenspace?
I saw in "Amir Hossein" code that "centered image vectors" that is "data" in the above code needs to be projected into the "facespace" by multiplying in the eigenspace basis's. I don't really understand why is the projection done using centered image vectors? Isn't "signals" enough for classification??
By signals, I assume you mean to ask why are we subtracting the mean from raw vector form of image.
If you think about PCA; it is trying to give you best direction where the data varies most. However, as your images contain pixel probably only positive values those pixels will always be on positive which will mislead, especially, your first and most important eigenvector. You can search more about second moment matrix. But I will share a bad paint image that explains it. Sorry about my drawing.
Please ignore the size of stars;
Stars: Your data
Red Line: Eigenvectors;
As you can easily see in 2D, centering the data can give better direction for your principal component. If you skip this step, your first eigenvector will bias on mean and cause poorer results.

How to find a position of a step that changes it's properties as function of position

I want to find a position of a step in my noisy data. After a few attempts I tried using an edge detector or to convolve with a match filter to get the position of the step, The problem that neither is accurate because the step changes its shape as function of position.
For example, given my data vector is 1000 elements long, the step width will be 30 around pixel number 200, and 70 around pixel number 700, similar to the way a wavepacket broaden due to dispersion. There are more properties that change somewhat, so the general question is, how can I find the position of the step? A matched filter is only limited to a specific shape, and will give inaccurate position. An edge detector is sensitive to the slope and also will give an inaccurate position. What other approaches do you know? I'd be happy to learn new ideas.
Here is an example of steps that change vs positions without noise or other features
And here are the same steps + noise and one additional feature (displaced for better visualization).
I would do a noise-insensitive kind of numerical differentiation, and find its peaks (I don't have Matlab at hand, so this may have typos):
n = length(x); % x is the original data vector
m = 30; % smallest step width you expect
y = zeros(n - m, 1); % the "numerical derivative" of x
for i = 1 + m : n
y(i - m) = x(i) - x(i - m);
end
figure; plot(y)
% now find the peaks of y using a sliding window and thresholding
% sliding window width should be the largest step width you expect
This simple approach worked for me in the past.
Another method to calculate numerical derivative is to calculate the slope at the middle of a (parabolic) Savitzky-Golay filter. You should apply the filter in a sliding window. Again, its width should be the smallest step width you expect. The advantages of the SG filter are that (1) calculating the slope of the fitted parabola is easy, and (2) there will be no time offset in the derivative. Calculating slope after applying a usual linear smoothing filter won't work because these filters shift the smoothed signal in time, so the timing of the step will also be shifted.

Define the width of peak in Matlab

I'm trying to find some peaks in Matlab, but the function findpeaks.m doesn't have the width option. The peaks I want to be detected are in the balls. All the detected are in the red squares. As you can see they have a low width. Any help?
here's the code I use:
[pk,lo] = findpeaks(ecg);
lo2 = zeros(size(lo));
for m = 1:length(lo) - 1
if (ecg(m) - ecg(m+1)) > 0.025
lo2(m) = lo(m);
end
end
p = find(lo2 == 0);
lo2(p) = [];
figure, plot(ecg);
hold on
plot(lo, ecg(lo), 'rs');
By the looks of it you want to characterise each peak in terms of amplitude and width, so that you can apply thresholds (or simmilar) to these values to select only those meeting your criteria (tall and thin).
One way you could do this is to fit a normal distribution to each peak, pegging the mean and amplitude to the value you have found already, and using an optimisation function to find the standard deviation (width of normal distribution).
So, you would need a function which calculates a representation of your data based on the sum of all the gaussian distributions you have, and an error function (mean squared error perhaps) then you just need to throw this into one of matlabs inbuilt optimisation/minimisation functions.
The optimal set of standard deviation parameters would give you the widths of each peak, or at least a good approximation.
Another method, based on Adiel's comment and which is perhaps more appropriate since it looks like you are working on ecg data, would be to also find the local minima (troughs) as well as the peaks. From this you could construct an approximate measure of 'thinness' by taking the x-axis distance between the troughs on either side of a given peak.
You need to define a peak width first, determine how narrow you want your peaks to be and then select them accordingly.
For instance, you can define the width of a peak as the difference between the x-coordinates at which the y-coordinates equal to half of the peak's value (see here). Another approach, (which seems more appropriate here) is to measure the gradient at fixed distances from the peak itself, and selecting the peaks accordingly. In MATLAB, you'll probably use a gradient filter for that :
g = conv(ecg, [-1 0 1], 'same'); %// Gradient filter
idx = g(lo) > thr); %// Indices of narrow peaks
lo = lo(idx);
where thr is the threshold value that you need to determine for yourself. Lower threshold values mean more tolerance for wider peaks.
You need to define what it means to be a peak of interest, and what you mean by the width of that peak. Once you do those things, you are a step ahead.
Perhaps you might locate each peak using find peaks. Then locate the troughs, one of which should lie between each pair of peaks. A trough is simply a peak of -y. Make sure you worry about the first and last peaks/troughs.
Next, define the half height points as the location midway in height between each peak and trough. This can be done using a reverse linear interpolation on the curve.
Finally, the width at half height might be simply the distance (on the x axis) between those two half height points.
Thinking pragmatically, I suppose you could use something along the lines of this simple brute-force approach:
[peaks , peakLocations] = findpeaks(+X);
[troughs, troughLocations] = findpeaks(-X);
width = zeros(size(peaks));
for ii = 1:numel(peaks)
trough_before = troughLocations( ...
find(troughLocations < peakLocations(ii), 1,'last') );
trough_after = troughLocations( ...
find(troughLocations > peakLocations(ii), 1,'first') );
width(ii) = trough_after - trough_before;
end
This will find the distance between the two troughs surrounding a peak of interest.
Use the 'MinPeakHeight' option in findpeaks() to pre-prune your data. By the looks of it, there is no automatic way to extract the peaks you want (unless you somehow have explicit indices to them). Meaning, you'll have to select them manually.
Now of course, there will be many more details that will have to be dealt with, but given the shape of your data set, I think the underlying idea here can nicely solve your problem.