I'm trying to get all large peaks values of this signal :
As you can see there is one large peak followed by one smaller peak, and I want to get each value of the largest peak. I already tried this [pks1,locs1] = findpeaks(y1,'MinPeakHeight',??); but I can't find what I can write instead of the ?? knowing that the signal will not be the same every time (of course there will ever be a large+smaller peak schema but time intervals and amplitudes can change). I tried a lot of things using std(), mean(),max() but none of the combination works properly.
Any ideas on how can I solve the problem ?
You could try using the 'MinPeakDistance' keyword and enter a minimum distance between the two peaks slightly higher than the distance between the large peak and the following small peak. So for example:
[pks1,locs1] = findpeaks(y1,'MinPeakDistance',0.3);
Edit:
If the time between peaks (and the following smaller one) varies a lot you'll probably have to do some post-processing. First find all the peaks including the smaller second ones. Then in your array of peaks remove every peak which is significantly lower than its two neighbours.
You could also try fiddling with 'MinPeakProminence'.
Generally these problems require a lot of calibration for the final few percent of the algorithms accuracy, and there's no universal cure.
I also recommend having a look at all the other options in the documentation.
Related
I am generation some data whose plots are as shown below
In all the plots i get some outliers at the beginning and at the end. Currently i am truncating the first and the last 10 values. Is there a better way to handle this?
I am basically trying to automatically identify the two points shown below.
This is a fairly general problem with lots of approaches, usually you will use some a priori knowledge of the underlying system to make it tractable.
So for instance if you expect to see the pattern above - a fast drop, a linear section (up or down) and a fast rise - you could try taking the derivative of the curve and looking for large values and/or sign reversals. Perhaps it would help to bin the data first.
If your pattern is not so easy to define but you are expecting a linear trend you might fit the data to an appropriate class of curve using fit and then detect outliers as those whose error from the fit exceeds a given threshold.
In either case you still have to choose thresholds - mean, variance and higher order moments can help here but you would probably have to analyse existing data (your training set) to determine the values empirically.
And perhaps, after all that, as Shai points out, you may find that lopping off the first and last ten points gives the best results for the time you spent (cf. Pareto principle).
I'm trying to understand how MINPEAKDISTANCE works. I returned to the documentation, here, but it wasn't very clear how this parameter works.
Can you kindly clarify it a bit?
Thanks.
Minimum peak separation Specify the minimum peak distance, or minimum
separation between peaks as a positive integer. You can use the
'MINPEAKDISTANCE' option to specify that the algorithm ignore small
peaks that occur in the neighborhood of a larger peak. When you
specify a value for 'MINPEAKDISTANCE', the algorithm initially
identifies all the peaks in the input data and sorts those peaks in
descending order. Beginning with the largest peak, the algorithm
ignores all identified peaks not separated by more than the value of
'MINPEAKDISTANCE'. Default: 1
So if you consider your peak heights as values in the "y" direction, then the separation that this is talking about is in the "x" direction. So for example look at this image (from Matlab docs and if you have the image processing toolbox you can get the data too load noisyecg.mat):
lets say you just want to identify thos 4 big distinct peaks, but not the hundreds of little peaks caused by noise, setting MINPEAKDISTANCE is a feasible way accomplish this because the noisy peaks are at a much higher frequency, i.e. they are closer to each other in the "x" direction, or have a smaller distance separating them than the big peaks do. So choosing a large enough MINPEAKDISTANCE, say 100 or 350 for example depending on what peaks you're interested in, would help you to not detect these undesired noise peaks.
Try findpeaks on this data with different MINPEAKDISTANCE values and see what you get!
If you've got noisy data, you may find that instead of one solid peak, you get lots of small ones (see the folowing image).
The important data here is when the signal is high and when it is low - you don't care about small variations in value, you only want to use one of those peaks and not look at all the smaller local ones around it. If you know the frequency of your signal (i.e. how often the peaks should occur), you can tell the function to ensure that the peaks are separated by a certain amount.
In the above example, the peak is every 15 milliseconds and lasts for 5 milliseconds, so you might set your MINPEAKDISTANCE parameter to 15 or so.
I have two (or more) time series that I would like to correlate with one another to look for common changes e.g. both rising or both falling etc.
The problem is that the time series are all fairly noisy with relatively high standard deviations meaning it is difficult to see common features. The signals are sampled at a fairly low frequency (one point every 30s) but cover reasonable time periods 2hours +. It is often the case that the two signs are not the same length, for example 1x1hour & 1x1.5 hours.
Can anyone suggest some good correlation techniques, ideally using built in or bespoke matlab routines? I've tried auto correlation just to compare lags within a single signal but all I got back is a triangular shape with the max at 0 lag (I assume this means there is no obvious correlation except with itself?) . Cross correlation isn't much better.
Any thoughts would be greatly appreciated.
Start with a cross-covariance (xcov) instead of the cross-correlation. xcov removes the DC component (subtracts off the mean) of each data set and then does the cross-correlation. When you cross-correlate two square waves, you get a triangle wave. If you have small signals riding on a large offset, you get a triangle wave with small variations in it.
If you think there is a delay between the two signals, then I would use xcorr to calculate the delay. Since xcorr is doing an FFT of the signal, you should remove the means before calling xcorr, you may also want to consider adding a window (e.g. hanning) to reduce leakage if the data is not self-windowing.
If there is no delay between the signals or you have found and removed the delay, you could just average the two (or more) signals. The random noise should tend to average to zero and the common features will approach the true value.
I have discrete empirical data which forms a histogram with gaps. I.e. no observations were made of certain values. However in reality those values may well occur.
This is a fig of the scatter graph.
So my question is, SHOULD I interpolate between xaxis values to make bins for the histogram ? If so what would you suggest to be best practice?
Regards,
Don't do it.
With that many sample points, the probability (p-value) of getting empty bins if the distribution is smooth is quite low. There's some underlying reason they're empty, which you may want to investigate. I can think of two possibilities:
Your data actually is discrete (perhaps someone rounded off to 1 signficant figure during data collection, or quantization error was significantly in an ADC) and then unit conversion caused irregular gaps. Even conversion from .12 and .13 to 12,13 as shown could cause this issue, if .12 is actually represented as .11111111198 inside the computer. But this would tend to double-up in a neighboring bin and the gaps would tend to be regularly spaced, so I doubt this is the cause. (For example, if 128 trials of a Bernoulli coin-flip experiment were done for each data point, and someone recorded the percentage of heads in each series to the nearest 1%, you could multiply by 1.28/% to try to recover the actual number of heads, but there'd be 28 empty bins)
Your distribution has real lobes. Because the frequency is significantly reduced following each empty bin, I favor this explanation.
But these are just starting suggestions for your own investigation.
I am trying to implement buddhabrot fractal. I can't understand one thing: all implementations I inspected pick random points on the image to calculate the path of the particle escaping. Why do they do this? Why not go over all pixels?
What purpose do the random points serve? More points make better pictures so I think going over all pixels makes the best picture - am I wrong here?
From my test data:
Working on 400x400 picture. So 160 000 pixels to iterate if i go all over.
Using random sampling,
Picture only starts to take shape after 1 million points. Good results show up around 1 billion random points which takes hours to compute.
Random sampling is better than grid sampling for two main reasons. First because grid sampling will introduce grid-like artifacts in the resulting image. Second is because grid sampling may not give you enough samples for a converged resulting image. If after completing a grid pass, you wanted more samples, you would need to make another pass with a slightly offset grid (so as not to resample the same points) or switch to a finer grid which may end up doing more work than is needed. Random sampling gives very smooth results and you can stop the process as soon as the image has converged or you are satisfied with the results.
I'm the inventor of the technique so you can trust me on this. :-)
The same holds for flame fractals: Buddha brot are about finding the "attractors",
so even if you start with a random point, it is assumed to quite quickly converge to these attracting curves. You typically avoid painting the first 10 pixels in the iteration or so anyways, so the starting point is not really relevant, BUT, to avoid doing the same computation twice, random sampling is much better. As mentioned, it eliminates the risk of artefacts.
But the most important feature of random sampling is that it has all levels of precision (in theory, at least). This is VERY important for fractals: they have details on all levels of precision, and hence require input from all levels as well.
While I am not 100% aware of what the exact reason would be, I would assume it has more to do with efficiency. If you are going to iterate through every single point multiple times, it's going to waste a lot of processing cycles to get a picture which may not look a whole lot better. By doing random sampling you can reduce the amount work needed to be done - and given a large enough sample size still get a result that is difficult to "differentiate" from iterating over all the pixels (from a visual point of view).
This is possibly some kind of Monte-Carlo method so yes, going over all pixels would produce the perfect result but would be horribly time consuming.
Why don't you just try it out and see what happens?
Random sampling is used to get as close as possible to the exact solution, which in cases like this cannot be calculated exactly due to the statistical nature of the problem.
You can 'go over all pixels', but since every pixel is in fact some square region with dimensions dx * dy, you would only use num_x_pixels * num_y_pixels points for your calculation and get very grainy results.
Another way would be to use a very large resolution and scale down the render after the calculation. This would give some kind of 'systematic' render where every pixel of the final render is divided in equal amounts of sub pixels.
I realize this is an old post, but wanted to add my thoughts based on a current project.
The problem with tying your samples to pixels, like others said:
Couples your sample grid to your view plane, making it difficult to do projections, zooms, etc
Not enough fidelity. Random sampling is more efficient as everyone else said, so you need even more samples if you want to sample using a uniform grid
You're much more likely to see grid artifacts at equivalent sample counts, whereas random sampling tends to just look grainy at low counts
However, I'm working on a GPU-accelerated version of buddhabrot, and ran into a couple issues specific to GPU code with random sampling:
I've had issues with overhead/quality of PRNGs in GPU code where I need to generate thousands of samples in parallel
Random sampling produces highly scattered traces, and the GPU really, really wants threads in a given block/warp to move together as much as possible. The performance difference for clustered vs scattered traces was extreme in my testing
Hardware support in modern GPUs for efficient atomicAdd means writes to global memory don't bottleneck GPU buddhabrot nearly as much now
Grid sampling makes it very easy to do a two-pass render, skipping blocks of sample points based on a low-res pass to find points that don't escape or don't ever touch a pixel in the view plane
Long story short, while the GPU technically has to do more work this way for equivalent quality, it's actually faster in practice AFAICT, and GPUs are so fast that re-rendering is often a matter of seconds/minutes instead of hours (or even milliseconds at lower resolution / quality levels, realtime buddhabrot is very cool)