Averaging periodic signal - matlab

I have a signal that repeats periodically as the one attached in the figure (the same pattern repeats 4 times). I would like to create a template of this signal as the averaging of the 4 repetitions. Which is the best approach for my problem? I know the answer might be obvious to experts in signal processing, I have tried searching for signal folding techniques but couldn't find anything useful. I am prototyping it in Matlab.

Assuming your signal length is dividable by 4 and each of the repetitions is 1/4th of this, simply use:
mean(reshape(signal,[],4),2)
reshape puts each repetition into one column, then the mean over all columns is calculated.

Related

Automatically truncating a curve to discard outliers in matlab

I am generation some data whose plots are as shown below
In all the plots i get some outliers at the beginning and at the end. Currently i am truncating the first and the last 10 values. Is there a better way to handle this?
I am basically trying to automatically identify the two points shown below.
This is a fairly general problem with lots of approaches, usually you will use some a priori knowledge of the underlying system to make it tractable.
So for instance if you expect to see the pattern above - a fast drop, a linear section (up or down) and a fast rise - you could try taking the derivative of the curve and looking for large values and/or sign reversals. Perhaps it would help to bin the data first.
If your pattern is not so easy to define but you are expecting a linear trend you might fit the data to an appropriate class of curve using fit and then detect outliers as those whose error from the fit exceeds a given threshold.
In either case you still have to choose thresholds - mean, variance and higher order moments can help here but you would probably have to analyse existing data (your training set) to determine the values empirically.
And perhaps, after all that, as Shai points out, you may find that lopping off the first and last ten points gives the best results for the time you spent (cf. Pareto principle).

how to speed up Matlab nested for loops when I cannot vectorize the calculations?

I have three big 3D arrays of the same size [41*141*12403], named in the Matlab code below alpha, beta and ni. From them I need to calculate another 3D array with the same size, which is obtained elementwise from the original matrices through a calculation that combines an infinite sum and definite integral calculations, using the value of each element. It therefore seems inevitible to have to use several nested loops to make this calculation. The code is already running now for several hours(!) and it is still in the first iteration of the outer loop (which needs to be performed 41 times!! According to my calculation, in this way the program will have to run more than two years!!!). I don't know how to optimize the code. Please help me !!
the code I use:
z_len=size(KELDYSH_PARAM_r_z_t,1); % 41 rows
r_len=size(KELDYSH_PARAM_r_z_t,2); % 141 columns
t_len=size(KELDYSH_PARAM_r_z_t,3); % 12403 slices
sumRes=zeros(z_len,r_len,t_len);
for z_ind=1:z_len
z_ind % in order to track the advancement of the calculation
for r_ind=1:r_len
for t_ind=1:t_len
sumCurrent=0;
sumPrevious=inf;
s=0;
while abs(sumPrevious-sumCurrent)>1e-6
kapa=kapa_0+s; %some scalar
x_of_w=(beta(z_ind,r_ind,t_ind).*(kapa-ni...
(z_ind,r_ind,t_ind))).^0.5;
sumPrevious=sumCurrent;
sumCurrent=sumCurrent+exp(-alpha(z_ind,r_ind,t_ind).* ...
(kapa-ni(z_ind,r_ind,t_ind))).*(x_of_w.^(2*abs(m)+1)/2).* ...
w_m_integral(x_of_w,m);
s=s+1;
end
sumRes(z_ind,r_ind,t_ind)=sumCurrent;
end
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function res=w_m_integral(x_of_w,m)
res=quad(#integrandFun,0,1,1e-6);
function y=integrandFun(t)
y=exp(-x_of_w^2*t).*t.^(abs(m))./((1-t).^0.5);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Option 1 - more vectorising
It's a pretty complex model you're working with and not all the terms are explained, but some parts can still be further vectorised. Your alpha, beta and ni matrices are presumably static and precomputed? Your s value is a scalar and kapa could be either, so you can probably precompute the x_of_w matrix all in one go too. This would give you a very slight speedup all on its own, though you'd be spending memory to get it - 71 million points is doable these days but will call for an awful lot of hardware. Doing it once for each of your 41 rows would reduce the burden neatly.
That leaves the integral itself. The quad function doesn't accept vector inputs - it would be a nightmare wouldn't it? - and neither does integral, which Mathworks are recommending you use instead. But if your integration limits are the same in each case then why not do the integral the old-fashioned way? Compute a matrix for the value of the integrand at 1, compute another matrix for the value of the integrand at 0 and then take the difference.
Then you can write a single loop that computes the integral for the whole input space then tests the convergence for all the matrix elements. Make a mask that notes the ones that have not converged and recalculate those with the increased s. Repeat until all have converged (or you hit a threshold for iterations).
Option 2 - parallelise it
It used to be the case that matlab was much faster with vectorised operations than loops. I can't find a source for it now but I think I've read that it's become a lot faster recently with for loops too, so depending on the resources you have available you might get better results by parallelising the code you currently have. That's going to need a bit of refactoring too - the big problems are overheads while copying in data to the workers (which you can fix by chopping the inputs up into chunks and just feeding the relevant one in) and the parfor loop not allowing you to use certain variables, usually ones which cover the whole space. Again chopping them up helps.
But if you have a 2 year runtime you will need a factor of at least 100 I'm guessing, so that means a cluster! If you're at a university or somewhere where you might be able to get a few days on a 500-core cluster then go for that...
If you can write the integral in a closed form then it might be amenable to GPU computation. Those things can do certain classes of computation very fast but you have to be able to parallelise the job and reduce the actual computation to something basic comprised mainly of addition and multiplication. The CUDA libraries have done a lot of the legwork and matlab has an interface to them so have a read about those.
Option 3 - reduce the scope
Finally, if neither of the above two results in sufficient speedups, then you may have to reduce the scope of your calculation. Trim the input space as much as you can and perhaps accept a lower convergence threshold. If you know how many iterations you tend to need inside the innermost while loop (the one with the s counter in it) then it might turn out that reducing the convergence criterion reduces the number of iterations you need, which could speed it up. The profiler can help see where you're spending your time.
The bottom line though is that 71 million points are going to take some time to compute. You can optimise the computation only so far, the odds are that for a problem of this size you will have to throw hardware at it.

Noisy signal correlation

I have two (or more) time series that I would like to correlate with one another to look for common changes e.g. both rising or both falling etc.
The problem is that the time series are all fairly noisy with relatively high standard deviations meaning it is difficult to see common features. The signals are sampled at a fairly low frequency (one point every 30s) but cover reasonable time periods 2hours +. It is often the case that the two signs are not the same length, for example 1x1hour & 1x1.5 hours.
Can anyone suggest some good correlation techniques, ideally using built in or bespoke matlab routines? I've tried auto correlation just to compare lags within a single signal but all I got back is a triangular shape with the max at 0 lag (I assume this means there is no obvious correlation except with itself?) . Cross correlation isn't much better.
Any thoughts would be greatly appreciated.
Start with a cross-covariance (xcov) instead of the cross-correlation. xcov removes the DC component (subtracts off the mean) of each data set and then does the cross-correlation. When you cross-correlate two square waves, you get a triangle wave. If you have small signals riding on a large offset, you get a triangle wave with small variations in it.
If you think there is a delay between the two signals, then I would use xcorr to calculate the delay. Since xcorr is doing an FFT of the signal, you should remove the means before calling xcorr, you may also want to consider adding a window (e.g. hanning) to reduce leakage if the data is not self-windowing.
If there is no delay between the signals or you have found and removed the delay, you could just average the two (or more) signals. The random noise should tend to average to zero and the common features will approach the true value.

audio pattern matching in matlab

Can someone please give me an idea about this problem in matlab ,
I have 4 .wav files that contain the chirping of the birds . Each .wav file represents a different bird. Given an input .wav file , I need to decide which bird it is . I know I have to make frequency spectrum comparison to get to the solution . but don't quite know how i should use spectrogram to help me get there .
P.S. I know how what spectrogram does and have plotted quite a few .wav files with it though
There are several methods for patter recognition problem like the one that you are talking.
You can use a frequency analysis like FFT with the matlab function
S = SPECTROGRAM(X,WINDOW,NOVERLAP)
In SPECTROGRAM you need to define the time window of signal to be analysed in the variable WINDOW. You can use a rectangular window (example WINDOW = [1 1 1 1 1 1 1 ... 1]) with the number of values equal to the desired length. There are a lot of windows to use: hanning, hamming, blackman. You should used the one which is better to your problem. The NOVERLAP is the number of points that your windows moves in one step.
Besides this approach, wavelet transform is also a good technique to solve your problem. Matlab also have a good toolbox to apply discrete and continuous wavelets.
You might try to solve the problem with Deep Belief Networks
Here are some articles that might be helpful:
Audio Feature Extraction with Restricted Boltzmann Machines
Unsupervised feature learning for audio classification
To summarize the idea, instead of manually tune the features, employ RBMs or Autoencoder to extract features (bases) which represent the observed audio samples and then run learning algorithm.
You will need more than 4 audio samples in order to train DBN but it is worth trying as the approach has shown promising results in the past.
This tuturial might be also helpful.
this may prove to be a complicated problem. As a starting point I advise you to divide each record into some fixed length of frames like 20ms with 10ms overlap ,then extract fft of these frames and get some max energy freq. values for each frame. As a last step compare frame frequencies with each other and determine the result by selecting the maximum correlation

Distance to nearest palindrome

I'd like an algorithm to provide some kind of measure of how symmetrical a string is.In looking through previous questions, I found one on finding the number of letters that need to be added to a string to turn it into a palindrome. This is close to what I'm looking for but too restrictive in the set of allowable editing operations.
My motivation for this is that I'd like to make an improved version of a video that I put on Youtube called "Numbers are Colorful" The video shows Golden Ratio bases and a couple other related systems using irrational bases. Surprisingly, one system is to begin with completely symmetrical. but the others exhibit partial symmetry which I would like to highlight.
Are you looking for repetition or symmetry? So far I have seen no example that points to symmetry only repetition. 1001010.0010101 is not symmetrical. They are related by a circular shift, i.e. take the first set of digits [1001010], shift it to the left by 1 [0010101] and now you have the right side.
Unless you make it clear what you are trying to identify, this question is too poorly defined to give a sensible answer. If you really mean symmetrical, show me an example of symmetry. You might as well mean "I can see some interesting pattern here" which is so poorly defined it's difficult to quantify.
That said, digital signal processing is the sort of area you might look into for identifying interesting patterns. For example, if you are looking for repetition then I suggest you attempt to use an algorithm designed for detecting repeating patterns.
Consider the digits in your number to be an input signal. Perform frequency analysis on this signal to detect repeating sections of numbers. If you have a strong repeating component in your series of digits this should relate to a strong frequency component in your analysis. You can measure the strength of this pattern from identifying the fundamental frequency by performing the Fourier transform, and summing all of the harmonics for the most significant frequency bin. Divide this by the total energy of the signal and this will give you a measure between 0 and 1 for how "repetitive" the signal is, and will also identify the periodicity of the signal. You may be better off using time-domain algorithms like Autocorrelation, AMDF, or the YIN estimator. (Particularly AMDF)
A similar approach can be adopted if you were to consider actual symmetry (i.e. the numbers are still very similar when you reverse them).Take your input number, create a new signal by reversing it, and then measure their "sameness" at each discrete phase. If you have a digit of length N you could consider padding it with 0's to the length 2N before performing the comparison of the signal with it's inverted self, to consider the possibility of digits lying outside the length of the number.
The time-domain techniques are more likely to work because they are not affected so much by discontinuities. They do literally compare "sameness" of a signal by either computing the difference of all the points at each phase or multiplying the numbers together at each phase. In the subtraction case you hope to get to 0 when they are similar. In the multiplication case you hope to get a peak in the function when the numbers are back in phase. They are however more prone to noise (which in this context means the numbers which aren't quite right).