How do I create a sliding window using a for loop? - matlab

I have neural data collected across 16 different channels. This data was recorded over a 30 second period.
Over a 10s period (from 20 - 30s), I want to record the number of neural data points that are greater than or equal to a specified threshold. I would like to do this according to bins of 0.001s.
I am using MATLAB 2019b.
My code so far looks like this:
t1 = 20;
t2 = 30;
ind1 = find(tim_trl>=t1, 1);
ind2 = find(tim_trl>=t2, 1);
time1 = tim_trl(ind1:ind2); %10s window
sampRate = 24414; %sampling freq (Hz), samples per sec
muaWindow = 0.001; %1ms window
binWidth = round(muaWindow*sampRate); %samples per 1ms window
threshold = 0.018;
for jj = 1:16 %ch
data = AbData(ind1:ind2, jj); %10 sec of data
for kk = 1:10000
abDataBin = data(1:binWidth,jj); %data in 1 bin
dataThreshold = find(abDataBin >= threshold); %find data points >= threshold
mua(kk,jj) = sum(dataThreshold); %number of data pts over threshold per ch
end
end
So far, I'm just having a bit of trouble at this point:
abDataBin = data(1:binWidth,jj); %data in 1 bin
When I run the loop, the data in bin 1 gets overwritten, rather than shift to bin 2, 3...10000. I'd appreciate any feedback on fixing this.
Many thanks.

You forgot to use the running variable as index to access your data. Try
% create data with 16 channels
AbData = rand(10000,16);
binWidth = 24;
threshold = 0.001;
for channel=1:16
data = AbData(2001:3000,channel);
counter = 1; % needed for mua indexing
% looping over the bin starting indeces
for window=1:binWidth:length(data)-(binWidth)
% access data in one bin
bindata = data(window:window+binWidth);
% calculate ms above threshold
mua(counter, channel) = sum(bindata >= threshold);
counter = counter+1;
end
end
EDIT:
your data variable is of dimension nx1, therefore doenst need the column indexing with jj

Related

How to find 3 hourly averages?

In the below code, each y is of 8 min duration. Once I generate the Powerr matrix I have what I Need but How do i do 3 hour averages? Should i do within the for loop or once I generate the Powerr matrix can i still be able to find out the 3 hourly averages of my data? thanks in advance.
tic
profile on
warning('off','MATLAB:audiovideo:wavread:functionToBeRemoved');
neglect_the_first_few= 5;
lastfile = 10;
Powerr = zeros(2049,10);
count =[];
parfor count = neglect_the_first_few + 1 :lastfile
% time_spectrum(timeindex) = timecount;
warning('off','MATLAB:audiovideo:wavread:functionToBeRemoved');
% infiles = find(timestamp >= timecount & timestamp <= (timecount + timestep));
% if ~isempty(infiles)
% if isnan(time_spectrum)
% time_spectrum(timeindex) = NaN
% end
y= [];
% for inputfile = infiles'
fn = [directory '/001/' file_names(count,:)];
[y,fs,nbits] = wavread(fn,'native');
disp(['sampling freq is ',num2str(fs),' hz. length of y is ',num2str(size(y)),'filenumber is ',num2str(count)])
% end
y = y- mean(y);
Y = double(y).*(1.49365435131571e-07);%CONVERSION - volts per bit for 24bit channel
% clear y;
nooverlap =[];%number of overlapped segments set to zero on Nov20th 2017
[Poww,fr] = pwelch(Y,hanning_window,nooverlap,nfft,fs);%no overlap taken into account%
Powerr(:,count) = Poww;
end
fr is independent from the parfor iteration count. So,this variable did not get transferred to the host from workers. Write fr(count)=...
You've done it correctly for Powerr.

calculate histogram of 2 variables

I have a vector containining the speed of 200 walks:
a = 50;
b = 100;
speed = (b-a).*rand(200,1) + a;
and another vector contaning the number of steps for each walk:
a = 8;
b = 100;
steps = (b-a).*rand(200,1) + a;
I would like to create a histogram plot with on the x axis the speeds and on the y axes the sum of the steps of each speed.
What I do is the following but I guess there is a more elegant way to do this:
unique_speed = unique(speed);
y_unique_speed = zeros(size(unique_speed));
for i = 1 : numel(unique_speed)
speed_idx = unique_speed(i);
idx = speed==speed_idx ;
y_unique_speed (i) = sum(steps (idx));
end
First you need to discretize your speed variable. Unlike in the other answer, the following allows you to choose arbitrary step size, e.g. I've chosen 1.5. Note that the last bin edge should be strictly more than the maximum data point, hence I used max(speed)+binstep. You can then use histc to determine which bin each pairs falls into, and accumarray to determine total number of steps in each bin. Finally, use bar to plot.
binstep = 1.5;
binranges = (min(speed):binstep:max(speed)+binstep)';
[~, ind] = histc(speed, binranges);
bincounts = accumarray(ind, steps, size(binranges));
hFig = figure(); axh = axes('Parent', hFig); hold(axh, 'all'); grid(axh, 'on');
bar(axh, binranges, bincounts); axis(axh, 'tight');
First, bin the speed data to discrete values:
sspeed = ceil(speed);
and then accumulate the various step sizes to each bin:
numsteps = accumarray(sspeed, steps);
plot(numsteps)

plotting volume-time graph of .wav file

I'm trying to get volume-time graph of .wav file. First, I recorded sound (patient exhalations) via android as .wav file, but when I read this .wav file in MATLAB it has negative values. What is the meaning of negative values? Second, MATLAB experts could you please check if the code below does the same as written in my comments? Also another question. Y = fft(WindowArray);
p = abs(Y).^2;
I took the power of values returned from fft...is that correct and what is the goal of this step??
[data, fs] = wavread('newF2');
% read exhalation audio wav file (1 channel, mono)
% frequency is 44100 HZ
% windows of 0.1 s and overlap of 0.05 seconds
WINDOW_SIZE = fs*0.1; %4410 = fs*0.1
array_size = length(data); % array size of data
numOfPeaks = (array_size/(WINDOW_SIZE/2)) - 1;
step = floor(WINDOW_SIZE/2); %step size used in loop
transformed = data;
start =1;
k = 1;
t = 1;
g = 1;
o = 1;
% performing fft on each window and finding the peak of windows
while(((start+WINDOW_SIZE)-1)<=array_size)
j=1;
i =start;
while(j<=WINDOW_SIZE)
WindowArray(j) = transformed(i);
j = j+1;
i = i +1;
end
Y = fft(WindowArray);
p = abs(Y).^2; %power
[a, b] = max(abs(Y)); % find max a and its indices b
[m, i] = max(p); %the maximum of the power m and its indices i
maximum(g) = m;
index(t) = i;
power(o) = a;
indexP(g) = b;
start = start + step;
k = k+1;
t = t+1;
g = g+1;
o=o+1;
end
% low pass filter
% filtering noise: ignor frequencies that are less than 5% of maximum frequency
for u=1:length(maximum)
M = max(maximum); %highest value in the array
Accept = 0.05* M;
if(maximum(u) > Accept)
maximum = maximum(u:length(maximum));
break;
end
end
% preparing the time of the graph,
% Location of the Peak flow rates are estimated
TotalTime = (numOfPeaks * 0.1);
time1 = [0:0.1:TotalTime];
if(length(maximum) > ceil(numOfPeaks));
maximum = maximum(1:ceil(numOfPeaks));
end
time = time1(1:length(maximum));
% plotting frequency-time graph
figure(1);
plot(time, maximum);
ylabel('Frequency');
xlabel('Time (in seconds)');
% plotting volume-time graph
figure(2);
plot(time, cumsum(maximum)); % integration over time to get volume
ylabel('Volume');
xlabel('Time (in seconds)');
(I only answer the part of the question which I understood)
Per default Matlab normalizes the audio wave to - 1...1 range. Use the native option if you want the integer data.
First, in your code it should be p = abs(Y)**2, this is the proper way to square the values returned from the FFT. The reason why you take the absolute value of the FFT return values is because those number are complex numbers with a Real and Imaginary part, therefore the absolute value (or modulus) of an imaginary number is the magnitude of that number. The goal of taking the power could be for potentially obtaining an RMS value (root mean squared) of your overall amplitude values, but you could also have something else in mind. When you say volume-time I assume you want decibels, so try something like this:
def plot_signal(file_name):
sampFreq, snd = wavfile.read(file_name)
snd = snd / (2.**15) #Convert sound array to floating point values
#Floating point values range from -1 to 1
s1 = snd[:,0] #left channel
s2 = snd[:,1] #right channel
timeArray = arange(0, len(snd), 1)
timeArray = timeArray / sampFreq
timeArray = timeArray * 1000 #scale to milliseconds
timeArray2 = arange(0, len(snd), 1)
timeArray2 = timeArray2 / sampFreq
timeArray2 = timeArray2 * 1000 #scale to milliseconds
n = len(s1)
p = fft(s1) # take the fourier transform
m = len(s2)
p2 = fft(s2)
nUniquePts = ceil((n+1)/2.0)
p = p[0:nUniquePts]
p = abs(p)
mUniquePts = ceil((m+1)/2.0)
p2 = p2[0:mUniquePts]
p2 = abs(p2)
'''
Left Channel
'''
p = p / float(n) # scale by the number of points so that
# the magnitude does not depend on the length
# of the signal or on its sampling frequency
p = p**2 # square it to get the power
# multiply by two (see technical document for details)
# odd nfft excludes Nyquist point
if n % 2 > 0: # we've got odd number of points fft
p[1:len(p)] = p[1:len(p)] * 2
else:
p[1:len(p) -1] = p[1:len(p) - 1] * 2 # we've got even number of points fft
plt.plot(timeArray, 10*log10(p), color='k')
plt.xlabel('Time (ms)')
plt.ylabel('LeftChannel_Power (dB)')
plt.show()
'''
Right Channel
'''
p2 = p2 / float(m) # scale by the number of points so that
# the magnitude does not depend on the length
# of the signal or on its sampling frequency
p2 = p2**2 # square it to get the power
# multiply by two (see technical document for details)
# odd nfft excludes Nyquist point
if m % 2 > 0: # we've got odd number of points fft
p2[1:len(p2)] = p2[1:len(p2)] * 2
else:
p2[1:len(p2) -1] = p2[1:len(p2) - 1] * 2 # we've got even number of points fft
plt.plot(timeArray2, 10*log10(p2), color='k')
plt.xlabel('Time (ms)')
plt.ylabel('RightChannel_Power (dB)')
plt.show()
I hope this helps.

MeanShift Clustering on dataset

I have a numeric dataset and I want to cluster data with a non-parametric algorithm. Basically, I would like to cluster without specifying the number of clusters for the input. I am using this code that I accessed through the MathWorks File Exchange network which implements the Mean Shift algorithm. However, I don't Know how to adapt my data to this code as my dataset has dimensions 516 x 19.
function [clustCent,data2cluster,cluster2dataCell] =MeanShiftCluster(dataPts,bandWidth,plotFlag)
%UNTITLED2 Summary of this function goes here
% Detailed explanation goes here
%perform MeanShift Clustering of data using a flat kernel
%
% ---INPUT---
% dataPts - input data, (numDim x numPts)
% bandWidth - is bandwidth parameter (scalar)
% plotFlag - display output if 2 or 3 D (logical)
% ---OUTPUT---
% clustCent - is locations of cluster centers (numDim x numClust)
% data2cluster - for every data point which cluster it belongs to (numPts)
% cluster2dataCell - for every cluster which points are in it (numClust)
%
% Bryan Feldman 02/24/06
% MeanShift first appears in
% K. Funkunaga and L.D. Hosteler, "The Estimation of the Gradient of a
% Density Function, with Applications in Pattern Recognition"
%*** Check input ****
if nargin < 2
error('no bandwidth specified')
end
if nargin < 3
plotFlag = true;
plotFlag = false;
end
%**** Initialize stuff ***
%[numPts,numDim] = size(dataPts);
[numDim,numPts] = size(dataPts);
numClust = 0;
bandSq = bandWidth^2;
initPtInds = 1:numPts
maxPos = max(dataPts,[],2); %biggest size in each dimension
minPos = min(dataPts,[],2); %smallest size in each dimension
boundBox = maxPos-minPos; %bounding box size
sizeSpace = norm(boundBox); %indicator of size of data space
stopThresh = 1e-3*bandWidth; %when mean has converged
clustCent = []; %center of clust
beenVisitedFlag = zeros(1,numPts,'uint8'); %track if a points been seen already
numInitPts = numPts %number of points to posibaly use as initilization points
clusterVotes = zeros(1,numPts,'uint16'); %used to resolve conflicts on cluster membership
while numInitPts
tempInd = ceil( (numInitPts-1e-6)*rand) %pick a random seed point
stInd = initPtInds(tempInd) %use this point as start of mean
myMean = dataPts(:,stInd); % intilize mean to this points location
myMembers = []; % points that will get added to this cluster
thisClusterVotes = zeros(1,numPts,'uint16'); %used to resolve conflicts on cluster membership
while 1 %loop untill convergence
sqDistToAll = sum((repmat(myMean,1,numPts) - dataPts).^2); %dist squared from mean to all points still active
inInds = find(sqDistToAll < bandSq); %points within bandWidth
thisClusterVotes(inInds) = thisClusterVotes(inInds)+1; %add a vote for all the in points belonging to this cluster
myOldMean = myMean; %save the old mean
myMean = mean(dataPts(:,inInds),2); %compute the new mean
myMembers = [myMembers inInds]; %add any point within bandWidth to the cluster
beenVisitedFlag(myMembers) = 1; %mark that these points have been visited
%*** plot stuff ****
if plotFlag
figure(12345),clf,hold on
if numDim == 2
plot(dataPts(1,:),dataPts(2,:),'.')
plot(dataPts(1,myMembers),dataPts(2,myMembers),'ys')
plot(myMean(1),myMean(2),'go')
plot(myOldMean(1),myOldMean(2),'rd')
pause
end
end
%**** if mean doesnt move much stop this cluster ***
if norm(myMean-myOldMean) < stopThresh
%check for merge posibilities
mergeWith = 0;
for cN = 1:numClust
distToOther = norm(myMean-clustCent(:,cN)); %distance from posible new clust max to old clust max
if distToOther < bandWidth/2 %if its within bandwidth/2 merge new and old
mergeWith = cN;
break;
end
end
if mergeWith > 0 % something to merge
clustCent(:,mergeWith) = 0.5*(myMean+clustCent(:,mergeWith)); %record the max as the mean of the two merged (I know biased twoards new ones)
%clustMembsCell{mergeWith} = unique([clustMembsCell{mergeWith} myMembers]); %record which points inside
clusterVotes(mergeWith,:) = clusterVotes(mergeWith,:) + thisClusterVotes; %add these votes to the merged cluster
else %its a new cluster
numClust = numClust+1 %increment clusters
clustCent(:,numClust) = myMean; %record the mean
%clustMembsCell{numClust} = myMembers; %store my members
clusterVotes(numClust,:) = thisClusterVotes;
end
break;
end
end
initPtInds = find(beenVisitedFlag == 0); %we can initialize with any of the points not yet visited
numInitPts = length(initPtInds); %number of active points in set
end
[val,data2cluster] = max(clusterVotes,[],1); %a point belongs to the cluster with the most votes
%*** If they want the cluster2data cell find it for them
if nargout > 2
cluster2dataCell = cell(numClust,1);
for cN = 1:numClust
myMembers = find(data2cluster == cN);
cluster2dataCell{cN} = myMembers;
end
end
This is the test code I am using to try and get the Mean Shift program to work:
clear
profile on
nPtsPerClust = 250;
nClust = 3;
totalNumPts = nPtsPerClust*nClust;
m(:,1) = [1 1];
m(:,2) = [-1 -1];
m(:,3) = [1 -1];
var = .6;
bandwidth = .75;
clustMed = [];
%clustCent;
x = var*randn(2,nPtsPerClust*nClust);
%*** build the point set
for i = 1:nClust
x(:,1+(i-1)*nPtsPerClust:(i)*nPtsPerClust) = x(:,1+(i-1)*nPtsPerClust:(i)*nPtsPerClust) + repmat(m(:,i),1,nPtsPerClust);
end
tic
[clustCent,point2cluster,clustMembsCell] = MeanShiftCluster(x,bandwidth);
toc
numClust = length(clustMembsCell)
figure(10),clf,hold on
cVec = 'bgrcmykbgrcmykbgrcmykbgrcmyk';%, cVec = [cVec cVec];
for k = 1:min(numClust,length(cVec))
myMembers = clustMembsCell{k};
myClustCen = clustCent(:,k);
plot(x(1,myMembers),x(2,myMembers),[cVec(k) '.'])
plot(myClustCen(1),myClustCen(2),'o','MarkerEdgeColor','k','MarkerFaceColor',cVec(k), 'MarkerSize',10)
end
title(['no shifting, numClust:' int2str(numClust)])
The test script generates random data X. In my case. I want to use the matrix D of size 516 x 19 but I am not sure how to adapt my data to this function. The function is returning results that are not agreeing with my understanding of the algorithm.
Does anyone know how to do this?

How do I index codistributed arrays in a spmd block

I am doing a very large calculation (atmospheric absorption) that has a lot of individual narrow peaks that all get added up at the end. For each peak, I have pre-calculated the range over which the value of the peak shape function is above my chosen threshold, and I am then going line by line and adding the peaks to my spectrum. A minimum example is given below:
X = 1:1e7;
K = numel(a); % count the number of peaks I have.
spectrum = zeros(size(X));
for k = 1:K
grid = X >= rng(1,k) & X <= rng(2,k);
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(k),b(k),c(k)]);
end
Here, each peak has some parameters that define the position and shape (a,b,c), and a range over which to do the calculation (rng). This works great, and on my machine it benchmarks at around 220 seconds to do a complete data set. However, I have a 4 core machine and I would eventually like to run this on a cluster, so I'd like to parallelize it and make it scaleable.
Because each loop relies on the results of the previous iteration, I cannot use parfor, so I am taking my first step into learning how to use spmd blocks. My first try looked like this:
X = 1:1e7;
cores = matlabpool('size');
K = numel(a);
spectrum = zeros(size(X),cores);
spmd
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid,labindex) = spectrum(grid,labindex) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]);
end
end
finalSpectrum = sum(spectrum,2);
This almost works. The program crashes at the last line because spectrum is of type Composite, and the documentation for 2013a is spotty on how to turn Composite data into a matrix (cell2mat does not work). This also does not scale well because the more cores I have, the larger the matrix is, and that large matrix has to get copied to each worker, which then ignores most of the data. Question 1: how do I turn a Composite data type into a useable array?
The second thing I tried was to use a codistributed array.
spmd
spectrum = codistributed.zeros(K,cores);
disp(size(getLocalPart(spectrum)))
end
This tells me that each worker has a single vector of size [K 1], which I believe is what I want, but when I try to then meld the above methods
spmd
spectrum = codistributed.zeros(K,cores);
n = labindex:cores:K
N = numel(n);
for k = 1:N
grid = X >= rng(1,n(k)) & X <= rng(2,n(k));
spectrum(grid) = spectrum(grid) + peakfn(X(grid),a(n(k)),b(n(k)),c(n(k))]); end
finalSpectrum = gather(spectrum);
end
finalSpectrum = sum(finalSpectrum,2);
I get Matrix dimensions must agree errors. Since it's in a parallel block, I can't use my normal debugging crutch of stepping through the loop and seeing what the size of each block is at each point to see what's going on. Question 2: what is the proper way to index into and out of a codistributed array in an spmd block?
Regarding question#1, the Composite variable in the client basically refers to a non-distributed variant array stored on the workers. You can access the array from each worker by {}-indexing using its corresponding labindex (e.g: spectrum{1}, spectrum{2}, ..).
For your code that would be: finalSpectrum = sum(cat(2,spectrum{:}), 2);
Now I tried this problem myself using random data. Below are three implementations to compare (see here to understand the difference between distributed and nondistributed arrays). First we start with the common data:
len = 100; % spectrum length
K = 10; % number of peaks
X = 1:len;
% random position and shape parameters
a = rand(1,K); b = rand(1,K); c = rand(1,K);
% random peak ranges (lower/upper thresholds)
ranges = sort(randi([1 len], [2 K]));
% dummy peakfn() function
fcn = #(x,a,b,c) x+a+b+c;
% prepare a pool of MATLAB workers
matlabpool open
1) Serial for-loop:
spectrum = zeros(size(X));
for i=1:size(ranges,2)
r = ranges(:,i);
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + fcn(X(idx), a(i), b(i), c(i));
end
s1 = spectrum;
clear spectrum i r idx
2) SPMD with Composite array
spmd
spectrum = zeros(1,len);
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(idx) = spectrum(idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s2 = sum(vertcat(spectrum{:}));
clear spectrum i r idx ind
3) SPMD with co-distributed array
spmd
spectrum = zeros(numlabs, len, codistributor('1d',1));
ind = labindex:numlabs:K;
for i=1:numel(ind)
r = ranges(:,ind(i));
idx = (r(1) <= X & X <= r(2));
spectrum(labindex,idx) = spectrum(labindex,idx) + ...
feval(fcn, X(idx), a(ind(i)), b(ind(i)), c(ind(i)));
end
end
s3 = sum(gather(spectrum));
clear spectrum i r idx ind
All three results should be equal (to within an acceptably small margin of error)
>> max([max(s1-s2), max(s1-s3), max(s2-s3)])
ans =
2.8422e-14