How to identify timestamps (indices) of multiple threshold crossings in continuous data - matlab

From an audio stream vector in Matlab I am trying to identify the time of onset and finish of audible events that occur multiple times within the time series data.
I am very much a novice with Matlab, but I have written code which identifies the peak and location of the event, however, I need to get the start of the event relative to a user defined threshold which occurs several tens of milliseconds before the peak.
Here is the code I am using at the moment:
function [emg] = calcPeaks(EMG, thresh)
%Rectify and downsample data
emg = resample(abs(hilbert(EMG)),1000,10000);
%Low Pass Filter
[b,a]=butter(8,0.01,'low');
emg=filtfilt(b,a,emg);
%Plot the processed vector
plot (emg); hold on;
%Find maximum for each Peak and Location
[pks,locs] = findpeaks(emg(1:end-2000),'minpeakheight',thresh);
plot(locs, emg(locs), 'ko'); hold on;
%Find Crossings above threshold
[FindCross] = find(emg(1:end-2000) > thresh);
[Gaps] = find(diff(FindCross)> thresh);
plot(FindCross, emg(FindCross), 'ro');
plot(Gaps, emg(Gaps), 'bo');
I tried to post an image of the datat but I don't have enough reputation :(

This should be getting you close to what you want (although same thresh for both is probably not what you intend):
[FindCross] = find(emg(1:end-2000) > thresh); %thresh is your minimum volume
[Gaps] = find(diff(FindCross)> thresh2); % thresh2 is related to the timing of events
However, note that this only finds gaps between areas which are above your noise level threshold, so won't locate the first event (presuming at start of data you are below the threshold).
A simple way to do this sort of thing is to threshold and then use diff to look for rising and falling edges in the thresholded data.
emg2 = emg > thresh; %emg2 = 1 and 0 for event / non event
demg = diff(emg2); % contains 0, -1, 1
rise = find(demg>0)+1; %+1 because of how diff works
fall = find(demg<0);
rise should then contain the positions where emg goes from below threshold to above threshold. If the data is sufficiently noisy, this could contain false positives, so you may want to then filter those results with additional criteria - e.g. check that after the rise the data stays above threshold for some minimum period.
The problem with doing it by the method you're using to find gaps is the following. Presume your data looks like this, where 0 is below threshold and 1 above threshold: 000111000111000. That is, our first event starts at index 4 and finishes at index 6, and the second starts at index 10 and ends at index 12.
emgT = find(emg > thresh);
This finds all the places where our data = 1, so emgT = [4,5,6,10,11,12]
emgD = diff(emgT);
This takes the difference between emgT(n+1), and emgT(n) - since there's no n+1 for the final datapoint, the output is one smaller than emgT. Our output is [1 1 4 1 1] - that is, it will find the gap between the two events, but not the gap between the start of the file and the first event, or the gap between the last event and the end of the file.

Related

Animated plot of infectious disease spread with for loop (Matlab)

I'm a beginner in Matlab and I'm trying to model the spread of an infectious disease using Matlab. However, I encounter some problems.
At first, I define the matrices that need to be filled and their initial status:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.0001; % Rate of spread
Now, I want to make a plot where the spread of the disease is shown, using a for loop. But i'm stuck here...
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix(t) = round((Rate).*(out_ZT));
diseasematrix(t) = diseasematrix(t-1) + infectionmatrix(t-1);
healthymatrix(t) = healthymatrix(t-1) - infectionmatrix(t-1);
imagesc(diseasematrix(t));
title(sprintf('Day %i',t));
drawnow;
end
This basically says that the infectionmatrix is calculated based upon the formula in the loop, the diseasematrix is calculated by adding up the sick people of the previous timestep with the infected people of the previous time. The healthy people that remain are calculated by substracting the healthy people of the previous time step with the infected people. The variable out_ZT is a function I made:
function [ZT] = calc_ZT(Zneighboursum, diseasematrix)
Zneighboursum = Zneighboursum + circshift(diseasematrix,[1 0]);
Zneighboursum = Zneighboursum + circshift(diseasematrix,[0 1]);
ZT=Zneighboursum;
end
This is to quantify the number of sick people around a central cell.
However, the result is not what I want. The plot does not evolve dynamically and the values don't seem to be right. Can anyone help me?
Thanks in advance!
There are several problems with the code:
(Rate).*(out_ZT) is wrong. Because first one is a scalar and
second is a matrix, while .* requires both to be matrices of the
same size. so a single * would work.
The infectionmatrix,
diseasematrix, healthymatrix are all 2 dimensional matrices and
in order to keep them in memory you need to have a 3 dimensional
matrix. But since you don't use the things you store later you can
just rewrite on the old one.
You store integers in the
infectionmatrix, because you calculate it with round(). That
sets the result always to zero.
The value for Rate was too low to see any result. So I increased it to 0.01 instead
(just a cautionary point) you haven't used healthymatrix in your code anywhere.
The code for the function is fine, so after debugging according to what I perceived, here's the code:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01;
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix = (Rate*out_ZT);
diseasematrix = diseasematrix + infectionmatrix;
healthymatrix = healthymatrix - infectionmatrix;
imagesc(diseasematrix);
title(sprintf('Day %i',t));
drawnow;
end
There is several problems:
1) If you want to save a 3D matrix you will need a 3D vector:
so you have to replace myvariable(t) by myvariable(:,:,t);
2) Why did you use round ? if you round a value < 0.5 the result will be 0. So nothing will change in your loop.
3) You need to define the boundary condition (t=1) and then start your loop with t = 2.
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix =zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01; % Rate of spread
for t=2:365
Zneighboursum=zeros(size(diseasematrix,1),size(diseasematrix,2));
out_ZT = calc_ZT(Zneighboursum, diseasematrix(:,:,t-1));
infectionmatrix(:,:,t) = (Rate).*(out_ZT);
diseasematrix(:,:,t) = diseasematrix(:,:,t-1) + infectionmatrix(:,:,t-1);
healthymatrix(:,:,t) = healthymatrix(:,:,t-1) - infectionmatrix(:,:,t-1);
imagesc(diseasematrix(:,:,t));
title(sprintf('Day %i',t));
drawnow;
end
IMPORTANT: circshift clone your matrix in order to deal with the boundary effect.

Remove noise from a rectangular wave matlab

I have some recordings (from 16:00PM to 16:00PM) where ones indicate some kind of noise and zeros indicate quite moments. The following code tries to replicate these recordings.
dt = datenum('00:02:00','HH:MM:ss') - datenum('00:01:00','HH:MM:ss');
time_begin = datenum('00:00:00','HH:MM:ss');
time_end = datenum('24:00:00','HH:MM:ss');
time = repmat(cellstr(datestr(time_begin:dt:time_end,'HH:MM:ss')),2,1);
loudness = ones(1,numel(time));
quiet_start = [1489 1737];
quiet_end = [1603 1906];
for i = 1: numel(quiet_start)
loudness(quiet_start(i):quiet_end(i))=0;
end
time = time(961:2400);
loudness = loudness(961:2400);
figure
plot(loudness)
ylim([0 3])
I know that in the interval 16:00PM - 16:00PM there should be only 1 "bout" of zeros. Here (if you plot loudness) you can see that there are 2 bouts of zeros.
I have 2 possibilities:
remove one of the two bouts of zeros
remove the bout of ones in the middle
Is there any measure that I can use to take this decision? I.e. Do I make a bigger error converting ones to zeros or viceversa?
There are 2 (or more) bouts of zeros because of some errors in the recordings...but for sure there should be only one. I would like to remove the bouts in order to "modify" the system as less as possible. For instance: in this case I would remove the first bout of zeros since it is the smallest, but what to do if there are more than 2 bouts? Is there any algorithm that deals with this kind of problems?

How to process multiple mics input Audio stream in matlab in real time

I need to get 3 input audio stream to matlab simultaneously using 3 USB mics and find out the highest amplitude in real time.
this is the mics initialization
mic1= dsp.AudioRecorder('DeviceName','ÇáãíßÑæÝæä (10- USB PnP Sound Device)', 'SampleRate', 48000, 'NumChannels', 1);
mic2= dsp.AudioRecorder('DeviceName','ÇáãíßÑæÝæä (9- USB PnP Sound Device)', 'SampleRate', 48000, 'NumChannels', 1);
mic3= dsp.AudioRecorder('DeviceName','ÇáãíßÑæÝæä (8- USB PnP Sound Device)', 'SampleRate', 48000, 'NumChannels', 1);
frame1=step(mic1);
frame2=step(mic2);
frame3=step(mic3);
what are the next steps?
By examining your code, you have three audio signals with one channel each. If I understand what you want correctly, you want to find the highest sound made by any one signal overall over time. However, if I understand you correctly, you can't do this in real time as step for the AudioWriter can only capture one frame at a time. Because you're specifically trying to capture from all three audio devices, and the only way for you to capture audio is with step, you'll have to serially call step for each signal that you have.
Therefore, you'll need to capture all three audio signals separately, and then do your analysis. You'll also want to clip the sound after a certain point, so perhaps something like 5 seconds. Therefore, you'd do something like:
time_end = 5;
%// Capture audio signal 1
tic;
frame1 = [];
while toc < time_end
audio_in = step(mic1);
frame1 = [frame1; audio_in(:)];
end
%// Capture audio signal 2
tic;
frame2 = [];
while toc < time_end
audio_in = step(mic2);
frame2 = [frame2; audio_in(:)];
end
%// Capture audio signal 3
tic;
frame3 = [];
while toc < time_end
audio_in = step(mic3);
frame3 = [frame3; audio_in(:)];
end
After this point, because the sounds will probably all be uneven length, you'll want to zero pad all of them so they all match the same length. After this, it's a matter of first finding the maximum amplitude for each sample for all three signals, and then finding the maximum out of all of this.
I'm not quite sure how the signals are shaped... if they are row or column vectors, so let's just make sure they're all column vectors. Then, use max and operate along the columns and find the maximum for each point in time, then find the maximum out of all of these.
Therefore:
%// Find lengths for all three signals
l1 = numel(frame1);
l2 = numel(frame2);
l3 = numel(frame3);
max_length = max([l1, l2, l3]);
%// Zero pad signals to make same length
frame1_pad = zeros(max_length,1);
frame2_pad = zeros(max_length,1);
frame2_pad = zeros(max_length,1);
frame1_pad(1:l1) = frame1;
frame2_pad(1:l2) = frame2;
frame3_pad(1:l3) = frame3;
%// Find maximum among each sample
max_signal = max([frame1_pad, frame2_pad, frame3_pad], [], 1);
%// Find the maximum amplitude overall and location
[max_amplitude, loc] = max(max_signal);
max_amplitude will contain the highest point overall at a particular time point for each of the three signals, and loc will tell you the location in the array of where it was found. If you want to find the actual time it occured, simply take loc and multiply by your sampling time (1/48000). Bear in mind that loc will be 1-indexed instead of 0-indexed, and so you need to subtract by 1 before multiplying by the sampling rate.
Therefore:
time_it_happened = (loc-1)*(1/48000);
time_it_happened will contain that time which the highest amplitude happened.
Good luck!

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

Finding the highest peak above threshold only

if (pbcg(k+M) > pbcg(k-1+M) && pbcg(k+M) > pbcg(k+1+M) && pbcg(k+M) > threshold)
peaks_y(Counter) = pbcg(k+M);
peaks_x(Counter) = k + M;
py = peaks_y(Counter);
px = peaks_x(Counter);
plot(px,py,'ro');
Counter = (Counter + 1)-1;
fid = fopen('y1.txt','a');
fprintf(fid, '%d\t%f\n', px, py);
fclose(fid);
end
end
this code previously doesn't have any issue on finding the peak..
the main factor for it to find the only peak is this
if (pbcg(k+M) > pbcg(k-1+M) && pbcg(k+M) > pbcg(k+1+M) && pbcg(k+M) > threshold)
but right now it keep show me all the peak that is above the threshold instead of the particular highest peak..
UPDATE: what if the highest peaks have 4nodes that got the same value?
EDIT:
If multiple peaks with the same value surface, I will take the value at the middle and plot.
What I mean by that is for example [1,1,1,4,4,4,2,2,2]
I will take the '4' at the 5th position, so the plot will be at the center of the graph u see
It will be much faster and much more readable to use the built-in max function, and then test if the max value is larger than the threshold.
[C,I] = max(pbcg);
if C > threshold
...
%// I is the index of the maximal value, and C is the maximal value.
end
As alternative solution, you may evaluate the idea of using the built-in function findpeaks, which encompasses several methods to ascertain the existance of peaks within a given signal. Within thos methods you may call
findPeaks = findpeaks(data,'threshold',threshold_resolution);
The only limit I see is that findpeaks is only available with the Signal Processing Toolbox.
EDIT
In case of multiple peaks over the defined threshold, I would just call max to figure the highest peak, as follows
max(peaks);
Assuming you have a vector with peaks pbcg
Here is how you can get the middle one:
highestPeakValue = max(pbcg)
f = find(pbcg == highestPeakValue);
middleHighestPeakLocation = f(ceil(length(f)/2))
Note that you can still make it more robust for cases where you have no peaks, and can adjust it to give different behavior when there are two middle peaks (now it will take the second one)