I am trying to detect high amplitude events and remove them along with rows above and below. I have the following code which does this in part but not fully and I'm not sure where the error is. I have commented out the the audioread function and added randi to allow reproducible results. Code:
%[data, fs] = audioread("noise.wav");
%t1 = linspace(0, (numel(data)-1)/fs, numel(data));
rng(1)
data = randi(10,1000,1);
threshold = 5;
clear_range = 10; %rows/samples
data = clearRange(data, threshold, clear_range);
%t1 = linspace(0, (numel(data)-1)/fs, numel(data));
%plot(t1, data);
plot(data)
function [data] = clearRange(data, threshold, clear_range, compare_column)
% data: matrix of values to clean
% threshold: value to compare values against
% clear_range: number of rows to delete
% compare_column: column to check for value to compare against threshold
if nargin < 4
compare_column = 1;
end
for i = 1:length(data)
if i > length(data)
break
end
if data(i,compare_column) > threshold
data(max(1, i-clear_range):min(length(data), i+clear_range),:) = [];
end
end
end
I think the main problem with your code is that you modify data while looping over it. This means, you delete peaks (or high amplitude events in your words) in rows with an index greater than i, so that they cannot be taken into account in following iterations.
E.g. consider peaks in rows with indices 4 and 6, which should cause that rows up to index 16 are removed (with a value of clear_range equal to 10). However, when i is equal to 4, you remove rows up to index 14. Consequently, you also remove the peak at position 6, so that it is not taken into account in further iterations.
In general, it is easier to rely on MATLAB's matrix/array operations instead of using loops.
Please find below a possible solution with explanations inline.
clc;
% I adjusted inputs to get a minimal example
data = randi(10,30,1);
threshold = 9;
rangeToClear = 1;
columnToCompare = 1;
dataOut = clearRange(data, threshold, rangeToClear, columnToCompare );
disp('In:')
disp( data' );
disp('Out:')
disp( dataOut' ); % Plot for cross-check
function data = clearRange(data, threshold, rangeToClear, columnToCompare)
% rowsWithPeak is 1-D logical array showing where columnToCompare is greater than the threshold
rowsWithPeak = data( :, columnToCompare ) > threshold;
% kernel is a column vector of ones of size Nx1, where N is the number of rows
% that should be removed around a peak
kernel = ones( 2*rangeToClear+1, 1 );
% rowsToRemove is a column vector being greater than one at row indices
% that should be removed from the data. To obtain rowsToRemove,
% we convolute rowsWithPeak with the kernel. The argument 'same' to
% the conv2 function, specifies that rowsToRemove will have the same
% size as rowsWithPeak.
rowsToRemove = conv2( rowsWithPeak, kernel, 'same' );
% rowsToRemoveLogical is a logical array being one, where rowsToRemove is greater than 0.
% Note that, rowsToRemoveLogical = rowsToRemove > 0 would also work here
rowsToRemoveLogical = logical( rowsToRemove);
% Finally, we use rowsToRemoveLogical to mask positions of the rows that should be removed.
data( rowsToRemoveLogical, : ) = [];
end
I would like to find certain indices of a variable called "heading". I want to find the indices referring to the value in "heading" between 0 and 20, 20 and 40, 40 and 60, etc. I then want to extract other variables depending on these indices, such as speed. I have written this, but I guess is inefficient, so would like to but it into some kind of loop or indexing algorithm.
heading_index1 = find(heading>0 & heading<=20);
heading_index2 = find(heading>20 & heading<=40);
heading_index3 = find(heading>40 & heading<=60);
etc
speed1 = speed(heading_index1);
speed2 = speed(heading_index2);
speed3 = speed(heading_index3);
etc
it is more efficient if you use logical-indexing directly (so no find command in between). If you want to automatize the whole thing, you can use a struct or cells:
% random vectors to create a minimum working example
heading = rand(100,1)*100;
speed = rand(size(heading));
% limit vector
lim = [0 20 40 60];
% declare storage structures
S = struct();
C = cell(1,length(lim)-1);
% loop through limits
for i = 1:length(lim)-1
% logical vector
lg = heading > lim(i) & heading <= lim(i+1);
% index (its faster to use logical vectors for indexing than integers)
chunk = speed(lg);
% assign to struct
fld = num2str(i,'F%d');
S.(fld) = chunk;
% assign to cell
C{i} = chunk;
end
You can choose if you like the structure-way or the cell-way. It shouldn't make a difference in terms of memory space.
Personally, I prefer the struct as I can define names, which I can interpret but it is also a bit tedious to create the field name rather than just index a cell
I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another
Elements of a column matrix of non-sequential numbers (sourceData) should have their values incremented if their index positions lie between certain values as defined in a second column matrix (triggerIndices) which lists the indices sequentially.
This can be easily done with a for-loop but can it be done in a vectorized way?
%// Generation of example data follows
sourceData = randi(1e3,100,1);
%// sourceData = 1:1:1000; %// Would show more clearly what is happening
triggerIndices = randperm(length(sourceData),15);
triggerIndices = sort(triggerIndices);
%// End of example data generation
%// Code to be vectorized follows
increment = 75;
addOn = 100;
for index = 1:1:length(triggerIndices)-1
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) = ...
sourceData(triggerIndices(index):1:triggerIndices(index+1)-1) + addOn;
addOn = addOn + increment;
end
sourceData(triggerIndices(end):1:end) = ....
sourceData(triggerIndices(end):1:end) + addOn;
%// End of code to be vectorized
How about replacing everything with:
vals = sparse(triggerIndices, 1, increment, numel(sourceData), 1);
vals(triggerIndices(1)) = addOn;
sourceData(:) = sourceData(:) + cumsum(vals);
This is basically a variant of run-length decoding shown here.
Please help me to improve the following Matlab code to improve execution time.
Actually I want to make a random matrix (size [8,12,10]), and on every row, only have integer values between 1 and 12. I want the random matrix to have the sum of elements which has value (1,2,3,4) per column to equal 2.
The following code will make things more clear, but it is very slow.
Can anyone give me a suggestion??
clc
clear all
jum_kel=8
jum_bag=12
uk_pop=10
for ii=1:uk_pop;
for a=1:jum_kel
krom(a,:,ii)=randperm(jum_bag); %batasan tidak boleh satu kelompok melakukan lebih dari satu aktivitas dalam satu waktu
end
end
for ii=1:uk_pop;
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
gab2(:,:,ii) = sum(krom(:,:,ii)==2)
gab3(:,:,ii) = sum(krom(:,:,ii)==3)
gab4(:,:,ii) = sum(krom(:,:,ii)==4)
end
for jj=1:uk_pop;
gabh1(:,:,jj)=numel(find(gab1(:,:,jj)~=2& gab1(:,:,jj)~=0))
gabh2(:,:,jj)=numel(find(gab2(:,:,jj)~=2& gab2(:,:,jj)~=0))
gabh3(:,:,jj)=numel(find(gab3(:,:,jj)~=2& gab3(:,:,jj)~=0))
gabh4(:,:,jj)=numel(find(gab4(:,:,jj)~=2& gab4(:,:,jj)~=0))
end
for ii=1:uk_pop;
tot(:,:,ii)=gabh1(:,:,ii)+gabh2(:,:,ii)+gabh3(:,:,ii)+gabh4(:,:,ii)
end
for ii=1:uk_pop;
while tot(:,:,ii)~=0;
for a=1:jum_kel
krom(a,:,ii)=randperm(jum_bag); %batasan tidak boleh satu kelompok melakukan lebih dari satu aktivitas dalam satu waktu
end
gabb1 = sum(krom(:,:,ii)==1)
gabb2 = sum(krom(:,:,ii)==2)
gabb3 = sum(krom(:,:,ii)==3)
gabb4 = sum(krom(:,:,ii)==4)
gabbh1=numel(find(gabb1~=2& gabb1~=0));
gabbh2=numel(find(gabb2~=2& gabb2~=0));
gabbh3=numel(find(gabb3~=2& gabb3~=0));
gabbh4=numel(find(gabb4~=2& gabb4~=0));
tot(:,:,ii)=gabbh1+gabbh2+gabbh3+gabbh4;
end
end
Some general suggestions:
Name variables in English. Give a short explanation if it is not immediately clear,
what they are indented for. What is jum_bag for example? For me uk_pop is music style.
Write comments in English, even if you develop source code only for yourself.
If you ever have to share your code with a foreigner, you will spend a lot of time
explaining or re-translating. I would like to know for example, what
%batasan tidak boleh means. Probably, you describe here that this is only a quick
hack but that someone should really check this again, before going into production.
Specific to your code:
Its really easy to confuse gab1 with gabh1 or gabb1.
For me, krom is too similar to the built-in function kron. In fact, I first
thought that you are computing lots of tensor products.
gab1 .. gab4 are probably best combined into an array or into a cell, e.g. you
could use
gab = cell(1, 4);
for ii = ...
gab{1}(:,:,ii) = sum(krom(:,:,ii)==1);
gab{2}(:,:,ii) = sum(krom(:,:,ii)==2);
gab{3}(:,:,ii) = sum(krom(:,:,ii)==3);
gab{4}(:,:,ii) = sum(krom(:,:,ii)==4);
end
The advantage is that you can re-write the comparsisons with another loop.
It also helps when computing gabh1, gabb1 and tot later on.
If you further introduce a variable like highestNumberToCompare, you only have to
make one change, when you certainly find out that its important to check, if the
elements are equal to 5 and 6, too.
Add a semicolon at the end of every command. Having too much output is annoying and
also slow.
The numel(find(gabb1 ~= 2 & gabb1 ~= 0)) is better expressed as
sum(gabb1(:) ~= 2 & gabb1(:) ~= 0). A find is not needed because you do not care
about the indices but only about the number of indices, which is equal to the number
of true's.
And of course: This code
for ii=1:uk_pop
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
end
is really, really slow. In every iteration, you increase the size of the gab1
array, which means that you have to i) allocate more memory, ii) copy the old matrix
and iii) write the new row. This is much faster, if you set the size of the
gab1 array in front of the loop:
gab1 = zeros(... final size ...);
for ii=1:uk_pop
gab1(:,:,ii) = sum(krom(:,:,ii)==1)
end
Probably, you should also re-think the size and shape of gab1. I don't think, you
need a 3D array here, because sum() already reduces one dimension (if krom is
3D the output of sum() is at most 2D).
Probably, you can skip the loop at all and use a simple sum(krom==1, 3) instead.
However, in every case you should be really aware of the size and shape of your
results.
Edit inspired by Rody Oldenhuis:
As Rody pointed out, the 'problem' with your code is that its highly unlikely (though
not impossible) that you create a matrix which fulfills your constraints by assigning
the numbers randomly. The code below creates a matrix temp with the following characteristics:
The numbers 1 .. maxNumber appear either twice per column or not at all.
All rows are a random permutation of the numbers 1 .. B, where B is equal to
the length of a row (i.e. the number of columns).
Finally, the temp matrix is used to fill a 3D array called result. I hope, you can adapt it to your needs.
clear all;
A = 8; B = 12; C = 10;
% The numbers [1 .. maxNumber] have to appear exactly twice in a
% column or not at all.
maxNumber = 4;
result = zeros(A, B, C);
for ii = 1 : C
temp = zeros(A, B);
for number = 1 : maxNumber
forbiddenRows = zeros(1, A);
forbiddenColumns = zeros(1, A/2);
for count = 1 : A/2
illegalIndices = true;
while illegalIndices
illegalIndices = false;
% Draw a column which has not been used for this number.
randomColumn = randi(B);
while any(ismember(forbiddenColumns, randomColumn))
randomColumn = randi(B);
end
% Draw two rows which have not been used for this number.
randomRows = randi(A, 1, 2);
while randomRows(1) == randomRows(2) ...
|| any(ismember(forbiddenRows, randomRows))
randomRows = randi(A, 1, 2);
end
% Make sure not to overwrite previous non-zeros.
if any(temp(randomRows, randomColumn))
illegalIndices = true;
continue;
end
end
% Mark the rows and column as forbidden for this number.
forbiddenColumns(count) = randomColumn;
forbiddenRows((count - 1) * 2 + (1:2)) = randomRows;
temp(randomRows, randomColumn) = number;
end
end
% Now every row contains the numbers [1 .. maxNumber] by
% construction. Fill the zeros with a permutation of the
% interval [maxNumber + 1 .. B].
for count = 1 : A
mask = temp(count, :) == 0;
temp(count, mask) = maxNumber + randperm(B - maxNumber);
end
% Store this page.
result(:,:,ii) = temp;
end
OK, the code below will improve the timing significantly. It's not perfect yet, it can all be optimized a lot further.
But, before I do so: I think what you want is fundamentally impossible.
So you want
all rows contain the numbers 1 through 12, in a random permutation
any value between 1 and 4 must be present either twice or not at all in any column
I have a hunch this is impossible (that's why your code never completes), but let me think about this a bit more.
Anyway, my 5-minute-and-obvious-improvements-only-version:
clc
clear all
jum_kel = 8;
jum_bag = 12;
uk_pop = 10;
A = jum_kel; % renamed to make language independent
B = jum_bag; % and a lot shorter for readability
C = uk_pop;
krom = zeros(A, B, C);
for ii = 1:C;
for a = 1:A
krom(a,:,ii) = randperm(B);
end
end
gab1 = sum(krom == 1);
gab2 = sum(krom == 2);
gab3 = sum(krom == 3);
gab4 = sum(krom == 4);
gabh1 = sum( gab1 ~= 2 & gab1 ~= 0 );
gabh2 = sum( gab2 ~= 2 & gab2 ~= 0 );
gabh3 = sum( gab3 ~= 2 & gab3 ~= 0 );
gabh4 = sum( gab4 ~= 2 & gab4 ~= 0 );
tot = gabh1+gabh2+gabh3+gabh4;
for ii = 1:C
ii
while tot(:,:,ii) ~= 0
for a = 1:A
krom(a,:,ii) = randperm(B);
end
gabb1 = sum(krom(:,:,ii) == 1);
gabb2 = sum(krom(:,:,ii) == 2);
gabb3 = sum(krom(:,:,ii) == 3);
gabb4 = sum(krom(:,:,ii) == 4);
gabbh1 = sum(gabb1 ~= 2 & gabb1 ~= 0)
gabbh2 = sum(gabb2 ~= 2 & gabb2 ~= 0);
gabbh3 = sum(gabb3 ~= 2 & gabb3 ~= 0);
gabbh4 = sum(gabb4 ~= 2 & gabb4 ~= 0);
tot(:,:,ii) = gabbh1+gabbh2+gabbh3+gabbh4;
end
end