Fast way to detect one of the value pairs in matlab array - matlab

My script generate some arrays. I have a list of value pairs which should not be in array. Pairs are symmetric, if i don't like pair [1 2], then pair [2 1] is also bad. To detect "bad" arrays I use the following approach:
%% SAMPLE DATA
Pair2Find=[1,2;4,6;7,10]; % value pairs to detect
Seq=randi(10,1,10000); % array where detect pairs
%% DETECTION
for iPair=1:size(Pair2Find)
idx=find(or(Seq(1:end-1)==Pair2Find(iPair,1)&Seq(2:end)==Pair2Find(iPair,2),...
Seq(1:end-1)==Pair2Find(iPair,2)&Seq(2:end)==Pair2Find(iPair,1)));
if (~isempty(idx))
display('Bad array')
break
end
end
Everything works fine, but it is the bottleneck of my program.
Could you help me to improve the quality and speed of this code

pairs = [1 2; 4 6; 7 10];
seq = randi(10,1,10000);
for i = 1:size(pairs,1)
pair = pairs(i,:);
res = strfind(seq,pair);
if (~isempty(res))
disp('Bad array!');
break;
end
pair = fliplr(pair);
res = strfind(seq,pair);
if (~isempty(res))
disp('Bad array!');
break;
end
end
If your pairs matrix is very big, you could also increase your loop time (a little bit) as follows:
pairs = [1 2; 4 6; 7 10];
pairs_flip = fliplr(pairs);
seq = randi(10,1,10000);
for i = 1:size(pairs,1)
res = strfind(seq,pairs(i,:));
if (~isempty(res))
disp('Bad array!');
break;
end
res = strfind(seq,pairs_flip(i,:));
if (~isempty(res))
disp('Bad array!');
break;
end
end

If you're able to reshape your vector into an n x 2 array you can do this pretty simply with set operations like intersect. You can create a helper function to detect whether or not the bad pairs are present and return a boolean that you can use for further logic.
For example:
Pair2Find = [1,2;4,6;7,10]; % value pairs to detect
Seq = randi(10,1,1000000); % array where detect pairs
Seq = reshape(Seq, 2, []).'; % Reshape to a 2 column array
test = isbad(Seq, Pair2Find);
function [outbool] = isbad(Seq, Pair2Find)
sortSeq = sort(Seq, 2); % Sort the pairs
uniquepairs = unique(sortSeq, 'rows'); % Find unique pairs
test = intersect(Pair2Find, uniquepairs, 'rows'); % Find pair intersection
outbool = ~isempty(test);
end

Related

How can I vectorize the loops of this function in Octave?

I want to be able to vectorize the for-loops of this function to then be able to parallelize it in octave. Can these for-loops be vectorized? Thank you very much in advance!
I attach the code of the function commenting on the start and end of each for-loop and if-else.
function [par]=pem_v(tsm,pr)
% tsm and pr are arrays of N by n. % par is an array of N by 8
tss=[27:0.5:32];
tc=[20:0.01:29];
N=size(tsm,1);
% main-loop
for ii=1:N
% I extract the rows in each loop because each one represents a sample
sst=tsm(ii,:); sst=sst'; %then I convert each sample to column vectors
pre=pr(ii,:); pre=pre';
% main-condition
if isnan(nanmean(sst))==1;
par(ii,1:8)=NaN;
else
% first sub-loop
for k=1:length(tss);
idxx=find(sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
out(k)=prctile(pre(idxx),90);
end
% end first sub-loop
tp90=tss(find(max(out)==out));
% second sub-loop
for j=1:length(tc)
cond1=find(sst>=tc(j) & sst<=tp90);
cond2=find(sst>=tp90);
pem=zeros(length(sst),1);
A=[sst(cond1),ones(length(cond1),1)];
B=regress(pre(cond1),A);
pt90=B(1)*(tp90-tc(j));
AA=[(sst(cond2)-tp90)];
BB=regress(pre(cond2)-pt90,AA);
pem(cond1)=max(0,B(1)*(sst(cond1)-tc(j)));
pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90);
clear A B AA BB;
E(j)=sqrt(nansum((pem-pre).^2)/length(pre));
clear pem;
end
% end second sub-loop
tcc=tc(find(E==min(E)));
% sub-condition
if(isempty(tcc)==1);
par(ii,1:9)=NaN;
else
cond1=find(sst>=tcc & sst<=tp90);
cond2=find(sst>=tp90);
pem=zeros(length(sst),1);
A=[sst(cond1),ones(length(cond1),1)];
B=regress(pre(cond1),A);
pt90=B(1)*(tp90-tcc);
AA=[sst(cond2)-tp90];
BB=regress(pre(cond2)-pt90,AA);
pem(cond1)=max(0,B(1)*(sst(cond1)-tcc));
pem(cond2)=max(0,(BB(1)*(sst(cond2)-tp90))+pt90);
RMSE=sqrt(nansum((pem-pre).^2)/length(pre));
% outputs
par(ii,1)=tcc;
par(ii,2)=tp90;
par(ii,3)=B(1);
par(ii,4)=BB(1);
par(ii,5)=RMSE;
par(ii,6)=nanmean(sst);
par(ii,7)=nanmean(pre);
par(ii,8)=nanmean(pem);
end
% end sub-condition
clear pem pre sst RMSE BB B tp90 tcc
end
% end main-condition
end
% end main-loop
You haven't given any example inputs, so I've created some like so:
N = 5; n = 800;
tsm = rand(N,n)*5+27; pr = rand(N,n);
Then, before you even consider vectorising your code, you should keep 4 things in mind...
Avoid calulating the same thing (like the size of a vector) every loop, instead do it before looping
Pre-allocate arrays where possible (declare them as zeros/NaNs etc)
Don't use find to convert logical indices into linear indices, there is no need and it will slow down your code
Don't repeatedly use clear, especially many times within loops. It is slow! Instead, use pre-allocation to ensure the variables are as you expect each loop.
Using the above random inputs, and taking account of these 4 things, the below code is ~65% quicker than your code. Note: this is without even doing any vectorising!
function [par]=pem_v(tsm,pr)
% tsm and pr are arrays of N by n.
% par is an array of N by 8
tss=[27:0.5:32];
tc=[20:0.01:29];
N=size(tsm,1);
% Transpose once here instead of every loop
tsm = tsm';
pr = pr';
% Pre-allocate memory for output 'par'
par = NaN(N, 8);
% Don't compute these every loop, do it before the loop.
% numel simpler than length for vectors, and size is clearer still
ntss = numel(tss);
nsst = size(tsm,1);
ntc = numel(tc);
npr = size(pr, 1);
for ii=1:N
% Extract the columns in each loop because each one represents a sample
sst=tsm(:,ii);
pre=pr(:,ii);
% main-condition. Previously isnan(nanmean(sst))==1, but that's only true if all(isnan(sst))
% We don't need to assign par(ii,1:8)=NaN since we initialised par to a matrix of NaNs
if ~all(isnan(sst));
% first sub-loop, initialise 'out' first
out = zeros(1, ntss);
for k=1:ntss;
% Don't use FIND on an indexing vector. Use the logical index raw, it's quicker
idxx = (sst>=tss(k)-0.25 & sst<=tss(k)+0.25);
% We need a check that some values of idxx are true, otherwise prctile will error.
if nnz(idxx) > 0
out(k) = prctile(pre(idxx), 90);
end
end
% Again, no need for FIND, just reduces speed. This is a theme...
tp90=tss(max(out)==out);
for jj=1:ntc
cond1 = (sst>=tc(jj) & sst<=tp90);
cond2 = (sst>=tp90);
% Use nnz (numer of non-zero) instead of length, since cond1 is now a logical vector of all elements
A = [sst(cond1),ones(nnz(cond1),1)];
B = regress(pre(cond1), A);
pt90 = B(1)*(tp90-tc(jj));
AA = [(sst(cond2)-tp90)];
BB = regress(pre(cond2)-pt90,AA);
pem=zeros(nsst,1);
pem(cond1) = max(0, B(1)*(sst(cond1)-tc(jj)));
pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90);
E(jj) = sqrt(nansum((pem-pre).^2)/npr);
end
tcc = tc(E==min(E));
if ~isempty(tcc);
cond1 = (sst>=tcc & sst<=tp90);
cond2 = (sst>=tp90);
A = [sst(cond1),ones(nnz(cond1),1)];
B = regress(pre(cond1),A);
pt90 = B(1)*(tp90-tcc);
AA = [sst(cond2)-tp90];
BB = regress(pre(cond2)-pt90,AA);
pem = zeros(length(sst),1);
pem(cond1) = max(0, B(1)*(sst(cond1)-tcc));
pem(cond2) = max(0, (BB(1)*(sst(cond2)-tp90))+pt90);
RMSE = sqrt(nansum((pem-pre).^2)/npr);
% Outputs, which we might as well assign all at once!
par(ii,:)=[tcc, tp90, B(1), BB(1), RMSE, ...
nanmean(sst), nanmean(pre), nanmean(pem)];
end
end
end

average bins along a dimension of a nd array in matlab

To compute the mean of every bins along a dimension of a nd array in matlab, for example, average every 10 elements along dim 4 of a 4d array
x = reshape(1:30*30*20*300,30,30,20,300);
n = 10;
m = size(x,4)/10;
y = nan(30,30,20,m);
for ii = 1 : m
y(:,:,:,ii) = mean(x(:,:,:,(1:n)+(ii-1)*n),4);
end
It looks a bit silly. I think there must be better ways to average the bins?
Besides, is it possible to make the script applicable to general cases, namely, arbitray ndims of array and along an arbitray dim to average?
For the second part of your question you can use this:
x = reshape(1:30*30*20*300,30,30,20,300);
dim = 4;
n = 10;
m = size(x,dim)/10;
y = nan(30,30,20,m);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
for ii = 1 : m
idx1{dim} = ii;
idx2{dim} = (1:n)+(ii-1)*n;
y(idx1{:}) = mean(x(idx2{:}),dim);
end
For the first part of the question here is an alternative using cumsum and diff, but it may not be better then the loop solution:
function y = slicedmean(x,slice_size,dim)
s = cumsum(x,dim);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
idx1{dim} = slice_size;
idx2{dim} = slice_size:slice_size:size(x,dim);
y = cat(dim,s(idx1{:}),diff(s(idx2{:}),[],dim))/slice_size;
end
Here is a generic solution, using the accumarray function. I haven't tested how fast it is. There might be some room for improvement though.
Basically, accumarray groups the value in x following a matrix of customized index for your question
x = reshape(1:30*30*20*300,30,30,20,300);
s = size(x);
% parameters for averaging
dimAv = 4;
n = 10;
% get linear index
ix = (1:numel(x))';
% transform them to a matrix of index per dimension
% this is a customized version of ind2sub
pcum = [1 cumprod(s(1:end-1))];
sub = zeros(numel(ix),numel(s));
for i = numel(s):-1:1,
ixtmp = rem(ix-1, pcum(i)) + 1;
sub(:,i) = (ix - ixtmp)/pcum(i) + 1;
ix = ixtmp;
end
% correct index for the given dimension
sub(:,dimAv) = floor((sub(:,dimAv)-1)/n)+1;
% run the accumarray to compute the average
sout = s;
sout(dimAv) = ceil(sout(dimAv)/n);
y = accumarray(sub,x(:), sout, #mean);
If you need a faster and memory efficient operation, you'll have to write your own mex function. It shouldn't be so difficult, I think !

How can we use nchoosek() to get all the combinations of the rows of a matrix?

If we have a vector v of 1- 5 numbers we can use nchoosek(v,2) to get all the combinations having two elements. But this function does now allow us to get all the combinations of a matrix. I want to use it to get all the combinations of rows of a matrix.
Here's one way to do it:
function p = q47204269(inMat)
% Input handling:
if nargin == 0 || isempty(inMat)
inMat = magic(5);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rowsCell = num2cell(inMat,2);
nRows = size(inMat,1);
p = cell(nRows,1);
for indR = 1:nRows
r = nchoosek(1:nRows,indR);
p{indR} = cell2mat(reshape(rowsCell(r.',:).',indR,1,[]));
end
See also:
The perms function, as it might come in handy in what you're doing.
This question.
with square matrix A
v = 1:size(A,1);
a = nchoosek(v,2);
B = zeros(2,size(A,1),length(a));
for i = 1:length(a)
B(:,:,i) = A(a(i,:)',:);
end
Each layer of array B is a 2 row matrix with the row combos from A
Not the most readable answer, but just for the sake of a one-liner :-)
A = randn(5,3); % example matrix
N = 2; % number of rows to pick each time
result = permute(reshape(A(nchoosek(1:size(A,1), N).', :), N, [], size(A,2)), [1 3 2]);
The result is a 3D array, such that each third-dim slice gives one of the a submatrices of A.

MATLAB Morse Code diff and find functions

We are trying to write a function that takes arr and counts how many 0's and 1's appear in sequence. The output should be a 2D array where column 1 is how many appear in sequence and column 2 is which token it is (0 or 1).  Our function below
function [token] = tokenizeSignal(arr)
matA = diff(find(diff([log_vector;-1])));
addA = zeros(size(matA, 1),1);
matA = [matA, addA];
matB = diff(find(diff([log_vector;0])));
addB = ones(size(matB, 1), 1);
matB = [matB, addB];
[nRowsA, nCols] = size(matA);
nRowsB = size(matB, 1);
AB = zeros(nRowsA + nRowsB, nCols);
AB(1:2:end, :) = matA;
AB(2:2:end, :) = matB;
token = AB;
works with
arr = [0; 0; 0; 1; 1; 1; 0];
but nothing else because it adds random integers into the matrix. Why does it do this and how can I fix it?
Here is code that takes any array arr and produces what you want:
% input checking/processing
% ... convert the input into a column vector
arr = arr(:);
% ... check that the input is nonempty and numeric
if ~isnumeric(arr), error('Bad input'); end
if isempty(arr), error('Bad input'); end
% determine the starting indices of each sequence in arr
I = [1 ; find(diff(arr)) + 1];
% determine the values of each of these sequences
values = arr(I);
% determine the length of each of these sequences
value_counts = [diff(I) ; length(arr) - max(I) + 1];
% produce the output
token = [value_counts, values];

Smallest N elements of an array with their location

I have an array called Acc_Std of size 1 Row and 222 Columns.
I need to have the smallest 100 values in each that array but with their original location.
I have written this code but, actually, doesn't work:
for Col = 1:222
[Std_Cont, Std_Loc] = min(Acc_Std(:));
Sort_Std_Cont(Col,1) = Std_Cont;
Sort_Std_Loc(Col,1) = Std_Loc;
Acc_Std(Std_Loc) = []; % Here is the problem in my code
end
Use both outputs of sort:
% Example data
Acc_Std = randi(10, 1,10);
% Extract the smallest N elements
N = 3;
% Sort, while saving the original indices
[B, indices] = sort(Acc_Std);
% Now extract the N smallest elements
smallest_N = B(1:N);
% Check that they are indeed located at the
% indices returned by sort()
isequal(smallest_N, Acc_Std(indices(1:N)))
Result of executing this little script:
ans =
1