Matlab: Convert elements larger (smaller) than 1 (-1) into a sequence of 1 (-1) - matlab

UPDATE: I've done some testing, and the solution of Jonas is the fastest for a range of different size input vectors. In particular, as angainor points out, the solution scales up to large sizes incredibly well - an important test as it is usually the large size problems that prompt us to pose these kind of questions on SO. Thanks to both Jonas and tmpearce for your solutions - based on the efficiency of the solution for large size problems I'm giving the answer tick to Jonas.
My Question: I have this column vector:
Vec = [0; 1; 2; -1; -3; 0; 0; 2; 1; -1];
I would like to convert every element greater than one into a sequence of ones that has length equal to the value of the element. Similarly, I want to convert every element less than minus one into a sequence of minus ones. Thus my output vector should look like this:
VecLong = [0; 1; 1; 1; -1; -1; -1; -1; 0; 0; 1; 1; 1; -1];
Note that each 2 has been changed into two 1's, while the -3 has been changed into three -1's. Currently, I solve the problem like this:
VecTemp = Vec;
VecTemp(VecTemp == 0) = 1;
VecLong = NaN(sum(abs(VecTemp)), 1);
c = 1;
for n = 1:length(Vec)
if abs(Vec(n)) <= 1
VecLong(c) = Vec(n);
c = c + 1;
else
VecLong(c:c + abs(Vec(n))) = sign(Vec(n));
c = c + abs(Vec(n));
end
end
This doesn't feel very elegant. Can anyone suggest a better method? Note: You can assume that Vec will contain only integer values. Thanks in advance for all suggestions.

You can use the good old cumsum-approach to repeating the entries properly. Note that I'm assigning a few temporary variables that you can get rid of, if you want to put everything into one line.
%# create a list of values to repeat
signVec = sign(Vec);
%# create a list of corresponding indices that repeat
%# as often as the value in signVec has to be repeated
tmp = max(abs(Vec),1); %# max: zeros have to be repeated once
index = zeros(sum(tmp),1);
index([1;cumsum(tmp(1:end-1))+1])=1; %# assign ones a pivots for cumsum
index = cumsum(index); %# create repeating indices
%# repeat
out = signVec(index);
out'
out =
0 1 1 1 -1 -1 -1 -1 0 0 1 1 1 -1

Edit: I thought of another (slightly obscure) but shorter way to do this, and it is faster than the loop you've got.
for rep=1:100000
#% original loop-based solution
end
toc
Elapsed time is 2.768822 seconds.
#% bsxfun-based indexing alternative
tic;
for rep=1:100000
TempVec=abs(Vec);TempVec(Vec==0)=1;
LongVec = sign(Vec(sum(bsxfun(#gt,1:sum(TempVec),cumsum(TempVec)))+1))
end
toc
Elapsed time is 1.798339 seconds.
This answer scales pretty well too, compared to the original - at least, to a point. There's a performance sweet spot.
Vec = repmat(OrigVec,10,1);
#% test with 100,000 loops
#% loop-based solution:
Elapsed time is 19.005226 seconds.
#% bsxfun-based solution:
Elapsed time is 4.411316 seconds.
Vec = repmat(OrigVer,1000,1);
#% test with 1,000 loops - 100,000 would be horribly slow
#% loop-based solution:
Elapsed time is 18.105728 seconds.
#% bsxfun-based solution:
Elapsed time is 98.699396 seconds.
bsxfun is expanding the vector into a matrix, then collapsing it with sum. With very large vectors this is needlessly memory heavy compared to the loop, so it ends up losing. Before then though, it does quite well.
Original, slow answer:
Here's a one-liner:
out=cell2mat(arrayfun(#(x) repmat(((x>0)*2)-1+(x==0),max(1,abs(x)),1),Vec,'uni',0));
out' =
0 1 1 1 -1 -1 -1 -1 0 0 1 1 1 -1
What's going on:
((x>0)*2)-1 + (x==0) #% if an integer is >0, make it a 1, <0 becomes -1, 0 stays 0
max(1,abs(x)) #% figure out how many times to replicate the value
arrayfun(#(x) (the above stuff), Vec, 'uni', 0) #% apply the function
#% to each element in the array, generating a cell array output
cell2mat( (the above stuff) ) #% convert back to a matrix

Related

How to get indexes of logical matrix without using find in matlab?

Let's assume my matrix A is the output of comparison function i.e. logical matrix having values 0 and 1's only. For a small matrix of size 3*4, we might have something like:
A =
1 1 0 0
0 0 1 0
0 0 1 1
Now, I am generating another matrix B which is of the same size as A, but its rows are filled with indexes of A and any leftover values in each row are set to zero.
B =
1 2 0 0
3 0 0 0
3 4 0 0
Currently, I am using find function on each row of A to get matrix B. Complete code can be written as:
A=[1,1,0,0;0,0,1,0;0,0,1,1];
[rows,columns]=size(A);
B=zeros(rows,columns);
for i=1:rows
currRow=find(A(i,:));
B(i,1:length(currRow))=currRow;
end
For large martixes, "find" function is taking time in the calculation as per Matlab Profiler. Is there any way to generate matrix B faster?
Note:
Matrix A is having more than 1000 columns in each row but non-zero elements are never more than 50. Here, I am taking Matrix B as the same size as A but Matrix B can be of much smaller size column-wise.
I would suggest using parfor, but the overhead is too much here, and there are more issues with it, so it is not a good solution.
rows = 5e5;
cols = 1000;
A = rand(rows, cols) < 0.050;
I = uint16(1:cols);
B = zeros(size(A), 'uint16');
% [r,c] = find(A);
tic
for i=1:rows
% currRow = find(A(i,:));
currRow = I(A(i,:));
B(i,1:length(currRow)) = currRow;
end
toc
#Cris suggests replacing find with an indexing operation. It increases the performance by about 10%.
Apparently, there is not a better optimization unless B is required to be in that specific form you tell. I suggest using [r,c] = find(A); if the indexes are not required in a matrix form.

Clean MATLAB time series data

I have a Matlab time series data set, which consist of a signal that can only be 1 or 0. How can I get rid of all the values except for the changing ones?
For example:
1
1
1
0
1
0
0
0
should ideally result in
1
0
1
0
while keeping the correct time values as well of course.
Thing is, that I need to find the frequency of the signal. The time should be measured from 0->1 to the next time 0->1 occurs. The smallest time / highest frequency is what I need in the end.
Thanks!
You can use the getsamples method to get a time series which contains a subset of the original samples. Remains to identify the indices where the time series has changed, for this purpose you can use diff and logical indexing:
ts = timeseries([1 1 1 0 1 0 0 0],1:8)
ts.getsamples([true;squeeze(diff(ts.Data)) ~= 0])
A simple and clever call to to diff should be sufficient:
>> A = [1; 1; 1; 0; 1; 0; 0; 0];
>> B = A(diff([-Inf; A]) ~= 0)
B =
1
0
1
0
The code is quite simple. diff finds pairs of differences in an array. Concretely, given an array A, the output is of the following structure:
B = [A(2) - A(1), A(3) - A(2), ..., A(N) - A(N-1)];
N is the total length of the signal. This results in a N-1 length signal. As such, a trick that you can use is to append the array A with -Inf (or some high non-zero value) so that when you find the difference between the first element of this appended array and the actual first element of the true array, you will get some non-zero change. That is registered with diff([-Inf; A]). The next thing you'll want is to check is to see where the differences are non-zero. Whenever there is a non-zero difference, that is a position that you want to keep because there has been a change that occurred. This produces a logical array and so the last step is to use this to index into your array A and thus get the result.
This only extracts out the signal you need however. If you'd like to extract the time in between unique elements, supposing you had some time vector t that was as long as your signal stored in A. You would first record the logical vector in a separate variable, then index into both your time array and the signal array to extract out what you need (original idea from user dfri):
ind = diff([-Inf; A]) ~= 0;
times = t(ind);
B = A(ind);
You can make use of diff and logical to save the results as a logical array, used as a subsequent index filter in your data (say t for time and y for boolean values ))
%// example
t = 0:0.01:0.07;
y = [1,1,1,0,1,0,0,0];
%// find indices to keep
keep = [true logical(diff(y))];
%// truncated data
tTrunc = t(keep)
yTrunc = y(keep)
with the results for the example as follows
tTrunc =
0 0.0300 0.0400 0.0500
yTrunc =
1 0 1 0

How to generate this matrix in matlab

H matrix is n-by-n, n=10000. I can use loop to generate this matrix in matlab. I just wonder if there are any methods that can do this without looping in matlab.
You can see that the upper right portion of the matrix consists of 1 / sqrt(n*(n-1)), the diagonal elements consist of -(n-1)/sqrt(n*(n-1)), the first column consists of 1/sqrt(n) and the rest of the elements are zero.
We can generate the full matrix that consists of the first column having all 1 / sqrt(n), then having the rest of the columns with 1 / sqrt(n*(n-1)) then we'll need to modify the matrix to include the rest of what you want.
As such, let's concentrate on the elements that start from row 2, column 2 as these follow a pattern. Once we're done, we can construct the other things that build up the final matrix.
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Next, we will tackle the diagonal elements:
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Now, let's zero the rest of the elements:
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
Now that we're done, let's create a new matrix that pieces all of this together:
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Therefore, the full code is:
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Here's an example with n = 6:
>> H
H =
Columns 1 through 3
0.408248290463863 0.707106781186547 0.408248290463863
0.408248290463863 -0.707106781186547 0.408248290463863
0.408248290463863 0 -0.816496580927726
0.408248290463863 0 0
0.408248290463863 0 0
0.408248290463863 0 0
Columns 4 through 6
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
-0.866025403784439 0.223606797749979 0.182574185835055
0 -0.894427190999916 0.182574185835055
0 0 -0.912870929175277
Since you are working with a pretty large n value of 10000, you might want to squeeze out as much performance as possible.
Going with that, you can use an efficient approach based on cumsum -
%// Values to be set in each column for the upper triangular region
upper_tri = 1./sqrt([1:n].*(0:n-1));
%// Diagonal indices
diag_idx = [1:n+1:n*n];
%// Setup output array
out = zeros(n,n);
%// Set the first row of output array with upper triangular values
out(1,:) = upper_tri;
%// Set the diagonal elements with the negative triangular values.
%// The intention here is to perform CUMSUM across each column later on,
%// thus therewould be zeros beyond the diagonal positions for each column
out(diag_idx) = -upper_tri;
%// Set the first element of output array with n^(-1/2)
out(1) = -1/sqrt(n);
%// Finally, perform CUMSUM as suggested earlier
out = cumsum(out,1);
%// Set the diagonal elements with the actually expected values
out(diag_idx(2:end)) = upper_tri(2:end).*[-1:-1:-(n-1)];
Runtime Tests
(I) With n = 10000, the runtime at my end were - Elapsed time is 0.457543 seconds.
(II) Now, as the final performance-squeezing practice, you can edit the pre-allocation step for out with a faster pre-allocation scheme as listed in this MATLAB Undodumented Blog. Thus, the pre-allocation step would look like this -
out(n,n) = 0;
The runtime with this edited code was - Elapsed time is 0.400399 seconds.
(III) The runtime for n = 10000 with the other answer by #rayryeng yielded - Elapsed time is 1.306339 seconds.

MATLAB: efficient generation of a large integer matrix of multi-indices

Let d and p be two integers. I need to generate a large matrix A of integers, having d columns and N=nchoosek(d+p,p) rows. Note that nchoosek(d+p,p) increases quickly with d and p, so it's very important that I can generate A quickly. The rows of A are all the multi-indices with components from 0 to p, such that the sum of the components is less than or equal to p. This means that, if d=3 and p=3, then A is an [N=nchoosek(3+3,3)=20x3] matrix with the following structure:
A=[0 0 0;
1 0 0;
0 1 0;
0 0 1;
2 0 0;
1 1 0;
1 0 1;
0 2 0;
0 1 1;
0 0 2;
3 0 0;
2 1 0;
2 0 1;
1 2 0;
1 1 1;
1 0 2;
0 3 0;
0 2 1;
0 1 2;
0 0 3]
It is not indispensable to follow exactly the row ordering I used, although it would make my life easier (for those interested, it's called graded lexicographical ordering and it's described here:
http://en.wikipedia.org/wiki/Monomial_order).
In case you are curious about the origin of this weird matrix, let me know!
Solution using nchoosek and diff
The following solution is based on this clever answer by Mark Dickinson.
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
k = numVars;
for n = 0:maxDegree
dividers = flipud(nchoosek(1:(n+k-1), k-1));
degrees{n+1} = [dividers(:,1), diff(dividers,1,2), (n+k)-dividers(:,end)]-1;
end
degrees = cell2mat(degrees);
You can get your matrix by calling monomialDegrees(d,p).
Solution using nchoosek and accumarray/histc
This approach is based on the following idea: There is a bijection between all k-multicombinations and the matrix we are looking for. The multicombinations give the positions, where the entries should be added. For example the multicombination [1,1,1,1,3] will be mapped to [4,0,1], as there are four 1s, and one 3. This can be either converted using accumarray or histc. Here is the accumarray-approach:
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
degrees{1} = zeros(1,numVars);
for n = 1:maxDegree
pos = nmultichoosek(1:numVars, n);
degrees{n+1} = accumarray([reshape((1:size(pos,1)).'*ones(1,n),[],1),pos(:)],1);
end
degrees = cell2mat(degrees);
And here the alternative using histc:
function degrees = monomialDegrees(numVars, maxDegree)
if numVars==1
degrees = (0:maxDegree).';
return;
end
degrees = cell(maxDegree+1,1);
degrees(1:2) = {zeros(1,numVars); eye(numVars);};
for n = 2:maxDegree
pos = nmultichoosek(1:numVars, n);
degrees{n+1} = histc(pos.',1:numVars).';
end
degrees = cell2mat(degrees(1:maxDegree+1));
Both use the following function to generate multicombinations:
function combs = nmultichoosek(values, k)
if numel(values)==1
n = values;
combs = nchoosek(n+k-1,k);
else
n = numel(values);
combs = bsxfun(#minus, nchoosek(1:n+k-1,k), 0:k-1);
combs = reshape(values(combs),[],k);
end
Benchmarking:
Benchmarking the above codes yields that the diff-solution is faster if your numVars is low and maxDegree high. If numVars is higher than maxDegree, then the histc solution will be faster.
Old approach:
This is an alternative to Dennis' approach of dec2base, which has a limit on the maximum base. It is still a lot slower than the above solutions.
function degrees = monomialDegrees(numVars, maxDegree)
Cs = cell(1,numVars);
[Cs{:}] = ndgrid(0:maxDegree);
degrees = reshape(cat(maxDegree+1, Cs{:}),(maxDegree+1)^numVars,[]);
degrees = degrees(sum(degrees,2)<=maxDegree,:);
I would solve it this way:
ncols=d;
colsum=p;
base=(0:colsum)';
v=#(dm)permute(base,[dm:-1:1]);
M=bsxfun(#plus,base,v(2));
for idx=3:ncols
M=bsxfun(#plus,M,v(idx));
end
L=M<=colsum;
A=cell(1,ncols);
[A{:}]=ind2sub(size(L),find(L));
a=cell2mat(A);
%subtract 1 because 1 based indexing but base starts at 0
a=a-1+min(base);
It builds up a p-dimensional matrix which contains the sum. The efficiency of this code depends on sum(L(:))/numel(L), this quotient tells you how much of the created matrix is actually used for solutions. If this gets low for your intput, there probably exits a better solution.
Here is a very easy way to do it:
L = dec2base(0:4^3-1,4);
idx=sum(num2str(L)-'0',2)<=3;
L(idx,:)
I think the first line can be very time efficient for creating a list of candidates, but unfortunately I don't know how to reduce the list in an efficient way after that.
So the second line works, but could use improvement performance wise.

Find the average value between each element of the array and its immediate neighbor

Suppose I have a matrix 1a1 which is 1 x n, and I want to find the average value between each element of a and its neighbors.
What's a smart way to do this?
EX:
If
a=[0 1 2 1 0 1];
Then the "average value matrix" is:
b=[0.5 1 1.33 1 0.5];
Where the first entry of b is:
b(1) = (0+1)/2 = 0.5
b(2) = (0+1+2)/3 = 1
etc.
I would suggest doing the middle as vector ops and handling the edge conditions as scalars.
b=zeros(size(a));
b(2:end-1)=(a(1:end-2)+a(2:end-1)+a(3:end))/3;
b(1)=(a(1)+a(2))/2;
b(end)=(a(end-1)+a(end))/2;
If you get into bigger averages...
% scale and sum elements with a sliding window 3 long.
b=conv(a,[1,1,1]/3)
%
% remove the tails
b=b(2:end-1)
%
% and rescale the edge cases.
b(1)=b(1)*3/2
b(end)=b(end)*3/2
I compared the first method above(vector), the convolution method, and the hankel method suggested by RDizzl3. (Sorry Luis, I don't have the Statistics package, though I expect the nanmean method to be slower due to the amount of condition checking.) The comparison was with a 10000 length random a vector, to make the timing significant. b was initialized to a zeros matrix of the correct size before these timings were done.The hankel matrix(h) of correct size was precomputed before the these timings as well.
% hankle method
tic; b(1)=mean(a([1,2])); b(2:(n-1))=mean(a(h),2); b(2)=mean(a([n-1,n])); toc
Elapsed time is 0.001698 seconds.
% convolution method
tic; c=conv(a,[1,1,1]/3) ; b=c(2:(2+n-1)); b(1)=b(1)*3/2; b(n)=b(n)*3/2; toc;
Elapsed time is 0.000339 seconds.
% vector method
tic; b(1)=mean(a([1,2])) ; b(2:(n-1))=(a(1:(n-2))+a(2:(n-1))+a(3:n))/3;b(2)=mean(a([n-1,n])); toc
Elapsed time is 0.000914 seconds.
I repeated the above 3 more times and sorted the results,
hankel convolution vector
9.2500e-04 3.3900e-04 7.2600e-04
1.3820e-03 5.2600e-04 8.7100e-04
1.6980e-03 5.5200e-04 9.1400e-04
2.1570e-03 5.5300e-04 2.6390e-03
I am a little surprised, I didn't expect the efficiency of the convolution approach to come out till larger window sizes. But it consistently did the best here.
Note that if you are using smaller data sets these timings probably aren't appropriate. I wouldn't at all be surprised if the hankel approach works better if the interest is in large numbers of shorter length vectors.
You can use this:
a=[0 1 2 1 0 1];
n = numel(a);
h = hankel(1:(n-2),(n-2):n);
b(1) = mean(a([1 2]))
b(2:(n-1)) = mean(a(h),2);
b(n) = mean(a([n-1 n]))
This will return the vector:
b = [0.5000 1.0000 1.3333 1.0000 0.6667 0.5000]
This takes the elements from the vector a and finds the average for its neighbors, so:
b(1) = (0+1)/2 = 0.5
b(2) = (0+1+2)/3 = 1
b(3) = (1+2+1)/3 = 1.3333
b(4) = (2+1+0)/3 = 1
b(5) = (1+0+1)/3 = 0.6667
b(6) = (0+1)/2 = 0.5 % last element
a = [0 1 2 1 0 1]; %// data
n = 1; %// how many neighbours to consider on each side
a2 = [NaN(1,n) a NaN(1,n)]; %// pad with NaN's (which will be ignored by nanmean)
b = arrayfun(#(k) nanmean(a2(k-n:k+n)), n+1:n+numel(a)); %// apply a
%// sliding-window mean ignoring NaN's
Easiest way to use smooth filter
output=smooth(A,3,'moving');
where 3 is the window size (should be odd value)
check documentation for smooth function
https://www.mathworks.com/help/curvefit/smooth.html