Change inter quartile range(IQR) for boxchart - matlab

I had to use only boxchart for my application where I need to change Inter Quartile Range(IQR) from default range i.e. 25% for the lower and 75% is the upper quartiles respectly. I see changing of IQR in boxplot using the Outliers Tag box object possible, but for boxchart there is no much help found in different forums.
As from the help of boxchart, it is using quantile Algorithms where the upper quartile corresponds to the 0.75 quantile and the lower quartile corresponds to the 0.25 quantile default, I have statistics and machine learning toolbox,
my problem is similar to this Matlab Boxplot: Use a specific percentile as the upper whisker OR remove redundant outliers after manual upper whisker edit but with the boxchart.
Can I get any help?

As per the reply of service request to Mathworks, there is no feature available in the boxchart function to allow for adjusting percentile limits(IQR) till current i.e. R2021b release.
Hope this feature may available in the the future release.

Related

How to calculate an exponentially weighted moving average on a conditional basis?

I am using MATLAB R2020a on a MacOS. I have a signal 'cycle_periods' consisting of the cycle periods of an ECG signal on which I would like to perform an exponentially weighted mean, such that older values are less weighted than newer ones. However, I would like this to be done on an element-by-element basis such that a given element is only included in the overall weighted mean if the weighted mean with the current sample does not exceed 1.5 times or go below 0.5 times the weighted mean without the element.
I have used the dsp.MovingAverage function as shown below to calculate the weighted mean, but I am really unsure as to how to manipulate the function to include my conditions.
% Exponentially weighted moving mean for stable cycle periods
movavgExp = dsp.MovingAverage('Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
mean_cycle_period_exp = movavgExp(cycle_period_stable);
I would very much appreciate any help regarding this matter, thanks in advance.

findpeaks - MINPEAKDISTANCE

I'm trying to understand how MINPEAKDISTANCE works. I returned to the documentation, here, but it wasn't very clear how this parameter works.
Can you kindly clarify it a bit?
Thanks.
Minimum peak separation Specify the minimum peak distance, or minimum
separation between peaks as a positive integer. You can use the
'MINPEAKDISTANCE' option to specify that the algorithm ignore small
peaks that occur in the neighborhood of a larger peak. When you
specify a value for 'MINPEAKDISTANCE', the algorithm initially
identifies all the peaks in the input data and sorts those peaks in
descending order. Beginning with the largest peak, the algorithm
ignores all identified peaks not separated by more than the value of
'MINPEAKDISTANCE'. Default: 1
So if you consider your peak heights as values in the "y" direction, then the separation that this is talking about is in the "x" direction. So for example look at this image (from Matlab docs and if you have the image processing toolbox you can get the data too load noisyecg.mat):
lets say you just want to identify thos 4 big distinct peaks, but not the hundreds of little peaks caused by noise, setting MINPEAKDISTANCE is a feasible way accomplish this because the noisy peaks are at a much higher frequency, i.e. they are closer to each other in the "x" direction, or have a smaller distance separating them than the big peaks do. So choosing a large enough MINPEAKDISTANCE, say 100 or 350 for example depending on what peaks you're interested in, would help you to not detect these undesired noise peaks.
Try findpeaks on this data with different MINPEAKDISTANCE values and see what you get!
If you've got noisy data, you may find that instead of one solid peak, you get lots of small ones (see the folowing image).
The important data here is when the signal is high and when it is low - you don't care about small variations in value, you only want to use one of those peaks and not look at all the smaller local ones around it. If you know the frequency of your signal (i.e. how often the peaks should occur), you can tell the function to ensure that the peaks are separated by a certain amount.
In the above example, the peak is every 15 milliseconds and lasts for 5 milliseconds, so you might set your MINPEAKDISTANCE parameter to 15 or so.

PIV Analysis, Interrogation Area of The Cross Correlation

I'm running a PIV analysis on two consecutive images taken during an experiment to get the vector field. But I would like to know, based on what criteria do I have to choose the percentage of overlap between the tow images for the cross-correlation process? 50%, 75%...? The PIVlab_GUI tool designed for MATLAB chooses a 50% overlap by default, but it allows changing it.
I just want to know the criteria based on which I can know how much overlap is best? Do the vectors become less accurate, dependent.etc, as we increase/decrease the overlap?
My book "Fluid Mechanics Measurements" does not explain how to choose the overlap amount in the cross-correlation process, and I could not find any helpful online reference.
Any help is appreciated.
I suggest you read up on spectral estimation - which is basically equivalent to cross correlation when you segment the data and average the correlation estimates calculated from each segment (the cross correlation is the inverse Fourier transform of the cross spectrum). There's a book chapter on this stuff here, but you may want to find a more complete resource if you are unclear on the basics.
A short answer: increasing the overlap will increase the frequency resolution of the spectral estimate, and give you more segments to average over; your estimate will have a lower variance. But there are diminishing statistical returns the more you increase your overlap past 50%, while the computational complexity continues to rise (more segments = more calculations). Hence most people just choose 50% and have done with it.
It's important to note that you don't get any more information by using overlapping frames, you are simply increasing the frequency resolution (or time lag resolution, for correlation) - similar to the effect of zero-padding a signal before taking its Fourier transform - and this has statistical effects due to the way estimation of this type works.

How to pick the T1 and T2 threshold values for Canopy Clustering?

I am trying to implement the Canopy clustering algorithm along with K-Means. I've done some searching online that says to use Canopy clustering to get your initial starting points to feed into K-means, the problem is, in Canopy clustering, you need to specify 2 threshold values for the canopy: T1 and T2, where points in the inner threshold are strongly tied to that canopy and the points in the wider threshold are less tied to that canopy. How are these threshold, or distances from the canopy center, determined?
Problem context:
The problem I'm trying to solve is, I have a set of numbers such as [1,30] or [1,250] with set sizes of about 50. There can be duplicate elements and they can be floating point numbers as well, such as 8, 17.5, 17.5, 23, 66, ... I want to find the optimal clusters, or subsets of the set of numbers.
So, if Canopy clustering with K-means is a good choice, then my questions still stands: how do you find the T1, T2 values?. If this is not a good choice, is there a better, simpler but effective algorithm to use?
Perhaps naively, I see the problem in terms of a sort of spectral-estimation. Suppose I have 10 vectors. I can compute the distances between all pairs. In this case I'd get 45 such distances. Plot them as a histogram in various distance ranges. E.g. 10 distances are between 0.1 and 0.2, 5 between 0.2 and 0.3 etc. and you get an idea of how the distances between vectors are distributed. From this information you can choose T1 and T2 (e.g. choose them so that you cover the distance range that is the most populated).
Of course, this is not practical for a large dataset - but you could just take a random sample or something so that you at least know the ballpark of T1 and T2. Using something like Hadoop you could do some sort of prior spectral estimation on a large number of points. If all incoming data you are trying to cluster is distributed in much the same way then you cjust need to get T1 and T2 once, then fix them as constants for all future runs.
Actually that is the big issue with Canopy Clustering. Choosing the thresholds is pretty much as difficult as the actual algorithm. In particular in high dimensions. For a 2D geographic data set, a domain expert can probably define the distance thresholds easily. But in high-dimensional data, probably the best you can do is to run k-means on a sample of your data first, then choose the distances based on this sample run.

High-pass filtering in MATLAB

Does anyone know how to use filters in MATLAB?
I am not an aficionado, so I'm not concerned with roll-off characteristics etc — I have a 1 dimensional signal vector x sampled at 100 kHz, and I want to perform a high pass filtering on it (say, rejecting anything below 10Hz) to remove the baseline drift.
There are Butterworth, Elliptical, and Chebychev filters described in the help, but no simple explanation as to how to implement.
There are several filters that can be used, and the actual choice of the filter will depend on what you're trying to achieve. Since you mentioned Butterworth, Chebyschev and Elliptical filters, I'm assuming you're looking for IIR filters in general.
Wikipedia is a good place to start reading up on the different filters and what they do. For example, Butterworth is maximally flat in the passband and the response rolls off in the stop band. In Chebyschev, you have a smooth response in either the passband (type 2) or the stop band (type 1) and larger, irregular ripples in the other and lastly, in Elliptical filters, there's ripples in both the bands. The following image is taken from wikipedia.
So in all three cases, you have to trade something for something else. In Butterworth, you get no ripples, but the frequency response roll off is slower. In the above figure, it takes from 0.4 to about 0.55 to get to half power. In Chebyschev, you get steeper roll off, but you have to allow for irregular and larger ripples in one of the bands, and in Elliptical, you get near-instant cut off, but you have ripples in both bands.
The choice of filter will depend entirely on your application. Are you trying to get a clean signal with little to no losses? Then you need something that gives you a smooth response in the passband (Butterworth/Cheby2). Are you trying to kill frequencies in the stopband, and you won't mind a minor loss in the response in the passband? Then you will need something that's smooth in the stop band (Cheby1). Do you need extremely sharp cut-off corners, i.e., anything a little beyond the passband is detrimental to your analysis? If so, you should use Elliptical filters.
The thing to remember about IIR filters is that they've got poles. Unlike FIR filters where you can increase the order of the filter with the only ramification being the filter delay, increasing the order of IIR filters will make the filter unstable. By unstable, I mean you will have poles that lie outside the unit circle. To see why this is so, you can read the wiki articles on IIR filters, especially the part on stability.
To further illustrate my point, consider the following band pass filter.
fpass=[0.05 0.2];%# passband
fstop=[0.045 0.205]; %# frequency where it rolls off to half power
Rpass=1;%# max permissible ripples in stopband (dB)
Astop=40;%# min 40dB attenuation
n=cheb2ord(fpass,fstop,Rpass,Astop);%# calculate minimum filter order to achieve these design requirements
[b,a]=cheby2(n,Astop,fstop);
Now if you look at the zero-pole diagram using zplane(b,a), you'll see that there are several poles (x) lying outside the unit circle, which makes this approach unstable.
and this is evident from the fact that the frequency response is all haywire. Use freqz(b,a) to get the following
To get a more stable filter with your exact design requirements, you'll need to use second order filters using the z-p-k method instead of b-a, in MATLAB. Here's how for the same filter as above:
[z,p,k]=cheby2(n,Astop,fstop);
[s,g]=zp2sos(z,p,k);%# create second order sections
Hd=dfilt.df2sos(s,g);%# create a dfilt object.
Now if you look at the characteristics of this filter, you'll see that all the poles lie inside the unit circle (hence stable) and matches the design requirements
The approach is similar for butter and ellip, with equivalent buttord and ellipord. The MATLAB documentation also has good examples on designing filters. You can build upon these examples and mine to design a filter according to what you want.
To use the filter on your data, you can either do filter(b,a,data) or filter(Hd,data) depending on what filter you eventually use. If you want zero phase distortion, use filtfilt. However, this does not accept dfilt objects. So to zero-phase filter with Hd, use the filtfilthd file available on the Mathworks file exchange site
EDIT
This is in response to #DarenW's comment. Smoothing and filtering are two different operations, and although they're similar in some regards (moving average is a low pass filter), you can't simply substitute one for the other unless it you can be sure that it won't be of concern in the specific application.
For example, implementing Daren's suggestion on a linear chirp signal from 0-25kHz, sampled at 100kHz, this the frequency spectrum after smoothing with a Gaussian filter
Sure, the drift close to 10Hz is almost nil. However, the operation has completely changed the nature of the frequency components in the original signal. This discrepancy comes about because they completely ignored the roll-off of the smoothing operation (see red line), and assumed that it would be flat zero. If that were true, then the subtraction would've worked. But alas, that is not the case, which is why an entire field on designing filters exists.
Create your filter - for example using [B,A] = butter(N,Wn,'high') where N is the order of the filter - if you are unsure what this is, just set it to 10. Wn is the cutoff frequency normalized between 0 and 1, with 1 corresponding to half the sample rate of the signal. If your sample rate is fs, and you want a cutoff frequency of 10 Hz, you need to set Wn = (10/(fs/2)).
You can then apply the filter by using Y = filter(B,A,X) where X is your signal. You can also look into the filtfilt function.
A cheapo way to do this kind of filtering that doesn't involve straining brain cells on design, zeros and poles and ripple and all that, is:
* Make a copy of the signal
* Smooth it. For a 100KHz signal and wanting to eliminate about 10Hz on down, you'll need to smooth over about 10,000 points. Use a Gaussian smoother, or a box smoother maybe 1/2 that width twice, or whatever is handy. (A simple box smoother of total width 10,000 used once may produce unwanted edge effects)
* Subtract the smoothed version from the original. Baseline drift will be gone.
If the original signal is spikey, you may want to use a short median filter before the big smoother.
This generalizes easily to 2D images, 3D volume data, whatever.