Determine intervals from kernel density estimation - cluster-analysis

I have a 1-dimensional data which is (t) where users spend time to complete a task. I applied kernel density estimation from http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator to remove the outliers who spent unreasonable time. I used the following lines:
[bandwidth,density,xmesh]=kde(dur1);
plot(xmesh,density);
After applying KDE, I have a problem of defining the local minima to split the data. The following link shows how the curve looks like:
http://s23.postimg.org/6aa1748jf/kde.jpg
I expect to see three clusters, where the middle one contains the reasonable spent time. However, the curve I have got has only one peak.
I am wondering if the steps I am following are correct?

Related

Peak Detection Matlab

I'm trying to get all large peaks values of this signal :
As you can see there is one large peak followed by one smaller peak, and I want to get each value of the largest peak. I already tried this [pks1,locs1] = findpeaks(y1,'MinPeakHeight',??); but I can't find what I can write instead of the ?? knowing that the signal will not be the same every time (of course there will ever be a large+smaller peak schema but time intervals and amplitudes can change). I tried a lot of things using std(), mean(),max() but none of the combination works properly.
Any ideas on how can I solve the problem ?
You could try using the 'MinPeakDistance' keyword and enter a minimum distance between the two peaks slightly higher than the distance between the large peak and the following small peak. So for example:
[pks1,locs1] = findpeaks(y1,'MinPeakDistance',0.3);
Edit:
If the time between peaks (and the following smaller one) varies a lot you'll probably have to do some post-processing. First find all the peaks including the smaller second ones. Then in your array of peaks remove every peak which is significantly lower than its two neighbours.
You could also try fiddling with 'MinPeakProminence'.
Generally these problems require a lot of calibration for the final few percent of the algorithms accuracy, and there's no universal cure.
I also recommend having a look at all the other options in the documentation.

Kalman Filter and sudden measurements jumps

Ok here is what i need to do:
I want to do some tracking using Kalman filter(possibly adaptive).My measurements(when they are available) are very good with very small error from the real measurements. In some cases though the measurements jump to a value,completely off from the correct position i am looking for, and then after few frames the come back to their correct position.
The problem is that if my filter(not adaptive) has specific values for Measurement Noise Covariance(R) and State Error Covariance(Q) matrices the results are not very accurate,because even for these 1% of cases i have to do a compromise between R and Q.
So i decided to use an adaptive Kalman filter as they do in here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.367.1747&rep=rep1&type=pdf
They estimate the measurement noise covariance matrix based on the innovation sequences.
Basically, they are using a moving window on previous samples and the calculate the covariance of the error between the previous measurements-prior estimations. For eg 5 past measurements and the 5 prior estimations.When a faulty measurement comes under the window, the covariance increases and thus the R increases also.
But in practice the R increases(but not enough) so in the next step the estimation is still good but just a bit towards the the faulty measurement.In the next step(because now the the previous estimation has moved a bit towards the measurement) the R becomes smaller with result the new estimation to go even closer to the measurements, and so on and so forth.
In the end after a few frames the estimations follow the faulty measurements. Here is a plot to understand better what i mean.
https://www.dropbox.com/s/rkv0tjcm4s54kv3/untitled.tif
Maybe what i am trying to do is completely wrong and can't be done with the adaptive Kalman filter.Maybe someone who has worked extensively with Kalman Filter in the past and he has faced this problem before can help.
Any idea is welcome!
Before the answer, I want to be sure I got the problem you have right.
You have measurements, some of them are good (Low measurement noise) yet others are outliers.
The problem you're having is tuning the measurement noise covariance matrix.
Practically, you tune for the good measurements.
Outliers measurements are rejected by using the Error Covariance.
If the innovation falls outside an ellipse you define using the Error Covariance Matrix the measurement is rejected.
Whenever a measurement is rejected you just apply the prediction step again and wait for another measurement.
Yes the problem is exactly this.
However i manage to solve it without the need to define any ellipse.What i was doing was correct except the fact that was not working if i had a lot of(lets say fifty) consecutive outliers.
This is normal if you think the size of your window.If it is for example only 10 samples and you have 20 outliers obviously it won't work.But for 5 consecutive outliers work perfectly.Generally i haven't used any threshold as you propose("if the innovation falls outside an ellipse") reject the measurements.I keep the measurements but in the same time when i start to have outliers the Error measurement covariance becomes very large.So the estimation is based more in previous estimation than in current measurement.
If i used your method which is indeed more logical(reject the current measurement,if it is an outlier based on a threshold) i have the problem that i have to define this threshold a priori,right?Maybe i am missing something..

Noisy signal correlation

I have two (or more) time series that I would like to correlate with one another to look for common changes e.g. both rising or both falling etc.
The problem is that the time series are all fairly noisy with relatively high standard deviations meaning it is difficult to see common features. The signals are sampled at a fairly low frequency (one point every 30s) but cover reasonable time periods 2hours +. It is often the case that the two signs are not the same length, for example 1x1hour & 1x1.5 hours.
Can anyone suggest some good correlation techniques, ideally using built in or bespoke matlab routines? I've tried auto correlation just to compare lags within a single signal but all I got back is a triangular shape with the max at 0 lag (I assume this means there is no obvious correlation except with itself?) . Cross correlation isn't much better.
Any thoughts would be greatly appreciated.
Start with a cross-covariance (xcov) instead of the cross-correlation. xcov removes the DC component (subtracts off the mean) of each data set and then does the cross-correlation. When you cross-correlate two square waves, you get a triangle wave. If you have small signals riding on a large offset, you get a triangle wave with small variations in it.
If you think there is a delay between the two signals, then I would use xcorr to calculate the delay. Since xcorr is doing an FFT of the signal, you should remove the means before calling xcorr, you may also want to consider adding a window (e.g. hanning) to reduce leakage if the data is not self-windowing.
If there is no delay between the signals or you have found and removed the delay, you could just average the two (or more) signals. The random noise should tend to average to zero and the common features will approach the true value.

Best way to extract neuronal spike times from a noisy signal / voltage meaurement

I'm a neuroscientist, and not a very good one. My colleague has kindly provided me with a noisy voltage measurements of the PY neuron of the Stomatogastric Ganglion of the lobster.
The activity of this neuron is characterised by a slow depolarised plateaux with fast spikes on top (a burst).
Both idealised and noisy versions are presented here for you to peruse at your leisure.
It's my job to extract the spike times from the noisy signal but this is so far beyond my experience level I have no idea where to begin. Fortunately, I am a total ninja at Matlab.
Could someone kindly provide me with the name of the procedure, filter or smoothing function which is best suited for this task. Or even the appropriate forum to ask such an asinine question.
Presumably, it needs to increase the signal to noise ratio? The problem here seems to be determining the difference between noise and a bona fide spike as the margin between the two is quite small.
UPDATE: 02/07/2013
I have tried the following filters in Matlab with mixed results. It's still very hard to say what is noise and what is a spike.
Lowpass Butterworth filter,
median filter,
gaussian,
moving weighted window,
moving average filter,
smooth,
sgolay filter.
This may not be an adequate response for stackoverflow - but one way of increasing a signal to noise ratio in your case is to average parts of the signal.
low pass your signal to remove noise (and spikes), and find the minima of the filtered signal (from your image, one minimum every 600 data points). Keep the indexes of each minimum,
on the noisy signal, for each minimum index, select the consecutive 700 data points. If you have 50 minima, you should have a 50 by 700 matrix,
average your matrix. You should have a 1 by 700 vector.
By averaging parts of the signal (minimum-locked potentials), you will take advantage of two properties: noise is zero-mean (well, it should be), and the signal of interest is repetitive. The first will therefore decrease as you pile up potentials, and the second will increase. With this process however, you will lose the spike times for each slow wave figure, but at least have them for blocks of 50 minima.
This technique is known in neuroscience as event-related potential (http://en.wikipedia.org/wiki/Event-related_potential). It may not fit perfectly your signal, or the result may not give nice spikes, but you may extract the spike times for some periods of interest (given the nature of your signal, I would say that you would need 5 or 10 potentials to see an emerging mean activity).
There are some toolboxes that do part of the job (but I would program it myself given the complexity of the task). These are eeglab or fieldtrip. They have a bunch of filter/decomposition options too, as well as some statistical features.

Matlab: Kmeans gives different results each time

I running kmeans in matlab on a 400x1000 matrix and for some reason whenever I run the algorithm I get different results. Below is a code example:
[idx, ~, ~, ~] = kmeans(factor_matrix, 10, 'dist','sqeuclidean','replicates',20);
For some reason, each time I run this code I get different results? any ideas?
I am using it to identify multicollinearity issues.
Thanks for the help!
The k-means implementation in MATLAB has a randomized component: the selection of initial centers. This causes different outcomes. Practically however, MATLAB runs k-means a number of times and returns you the clustering with the lowest distortion. If you're seeing wildly different clusterings each time, it may mean that your data is not amenable to the kind of clusters (spherical) that k-means looks for, and is an indication toward trying other clustering algorithms (e.g. spectral ones).
You can get deterministic behavior by passing it an initial set of centers as one of the function arguments (the start parameter). This will give you the same output clustering each time. There are several heuristics to choose the initial set of centers (e.g. K-means++).
As you can read on the wiki, k-means algorithms are generally heuristic and partially probabilistic, the one in Matlab being no exception.
This means that there is a certain random part to the algorithm (in Matlab's case, repeatedly using random starting points to find the global solution). This makes kmeans output clusters that are of good-quality-on-average. But: given the pseudo-random nature of the algorithm, you will get slightly different clusters each time -- this is normal behavior.
This is called initialization problem, as kmeans starts with random iniinital points to cluster your data. matlab selects k random points and calculates the distance of points in your data to these locations and finds new centroids to further minimize the distance. so you might get different results for centroid locations, but the answer is similar.