Addressing multiple ranges via indices in a vector - matlab

To begin, this problem is easily solvable with a for-loop. However, I'm trying to force/teach myself to think vector-wise to take advantage of what Matlab does best.
Simplified, here is the problem explanation:
I have a vector with data in it.
I have a 2xN array of start/stop indices that represent ranges of interesting data in the vector.
I want to perform calculations on each of those ranges, resulting in a number (N results, corresponding to each start/stop range.)
In code, here's a pseudoexample of what I'd like to have at the end:
A = 1:10000;
startIndicies = [5 100 1000];
stopIndicies = [10 200 5000];
...
calculatedResults = [func(A(5:10)) func(A(100:200)) func(A(1000:5000))]
The length of A, and of the start/stop index array is variable.
Like I said, I can easily solve this with a for loop. However since could be used with a large data set, I'd like to know if there's a good solution without a for loop.

Here is one possible solution, although, I won't call it a fully vectorized solution, rather a one liner one.
out = cellfun(#(i,j) fun(A(i:j)), num2cell(startIndicies), num2cell(stopIndicies) );
or, if you plan to have homogeneous outputs,
out = arrayfun(#(i,j) fun(A(i:j)), startIndicies, stopIndicies);

Related

Matlab vectorization of multiple embedded for loops

Suppose you have 5 vectors: v_1, v_2, v_3, v_4 and v_5. These vectors each contain a range of values from a minimum to a maximum. So for example:
v_1 = minimum_value:step:maximum_value;
Each of these vectors uses the same step size but has a different minimum and maximum value. Thus they are each of a different length.
A function F(v_1, v_2, v_3, v_4, v_5) is dependant on these vectors and can use any combination of the elements within them. (Apologies for the poor explanation). I am trying to find the maximum value of F and record the values which resulted in it. My current approach has been to use multiple embedded for loops as shown to work out the function for every combination of the vectors elements:
% Set the temp value to a small value
temp = 0;
% For every combination of the five vectors use the equation. If the result
% is greater than the one calculated previously, store it along with the values
% (postitions) of elements within the vectors
for a=1:length(v_1)
for b=1:length(v_2)
for c=1:length(v_3)
for d=1:length(v_4)
for e=1:length(v_5)
% The function is a combination of trigonometrics, summations,
% multiplications etc..
Result = F(v_1(a), v_2(b), v_3(c), v_4(d), v_5(e))
% If the value of Result is greater that the previous value,
% store it and record the values of 'a','b','c','d' and 'e'
if Result > temp;
temp = Result;
f = a;
g = b;
h = c;
i = d;
j = e;
end
end
end
end
end
end
This gets incredibly slow, for small step sizes. If there are around 100 elements in each vector the number of combinations is around 100*100*100*100*100. This is a problem as I need small step values to get a suitably converged answer.
I was wondering if it was possible to speed this up using Vectorization, or any other method. I was also looking at generating the combinations prior to the calculation but this seemed even slower than my current method. I haven't used Matlab for a long time but just looking at the number of embedded for loops makes me think that this can definitely be sped up. Thank you for the suggestions.
No matter how you generate your parameter combination, you will end up calling your function F 100^5 times. The easiest solution would be to use parfor instead in order to exploit multi-core calculation. If you do that, you should store the calculation results and find the maximum after the loop, because your current approach would not be thread-safe.
Having said that and not knowing anything about your actual problem, I would advise you to implement a more structured approach, like first finding a coarse solution with a bigger step size and narrowing it down successivley by reducing the min/max values of your parameter intervals. What you have currently is the absolute brute-force method which will never be very effective.

Permuting a vector efficiently in matlab

I want to make 1000 random permutations of a vector in matlab. I do it like this
% vector is A
num_A = length(A);
for i=1:1000
n = randperm(num_A);
A = A(n); % This is one permutation
end
This takes like 73 seconds. Is there any way to do it more efficiently?
Problem 1 - Overwriting the original vector inside loop
Each time A = A(n); will overwrite A, the input vector, with a new permutation. This might be reasonable since anyway you don't need the order but all the elements in A. However, it's extremely inefficient because you have to re-write a million-element array in every iteration.
Solution: Store the permutation into a new variable -
B(ii, :) = A(n);
Problem 2 - Using i as iterator
We at Stackoverflow are always telling serious Matlab users that using i and j as interators in loops is absolutely a bad idea. Check this answer to see why it makes your code slow, and check other answers in that page for why it's bad.
Solution - use ii instead of i.
Problem 3 - Using unneccessary for loop
Actually you can avoid this for loop at all since the iterations are not related to each other, and it will be faster if you allow Matlab do parallel computing.
Solution - use arrayfun to generate 1000 results at once.
Final solution
Use arrayfun to generate 1000 x num_A indices. I think (didn't confirm) it's faster than directly accessing A.
n = cell2mat(arrayfun(#(x) randperm(num_A), 1:1000', 'UniformOutput', false)');
Then store all 1000 permutations at once, into a new variable.
B = A(n);
I found this code pretty attractive. You can replace randperm with Shuffle. Example code -
B = Shuffle(repmat(A, 1000, 1), 2);
A = perms(num_A)
A = A(1:1000)
Perms returns all the different permutations, just take the first 1000 permutations.

Vectorize matlab code to map nearest values in two arrays

I have two lists of timestamps and I'm trying to create a map between them that uses the imu_ts as the true time and tries to find the nearest vicon_ts value to it. The output is a 3xd matrix where the first row is the imu_ts index, the third row is the unix time at that index, and the second row is the index of the closest vicon_ts value above the timestamp in the same column.
Here's my code so far and it works, but it's really slow. I'm not sure how to vectorize it.
function tmap = sync_times(imu_ts, vicon_ts)
tstart = max(vicon_ts(1), imu_ts(1));
tstop = min(vicon_ts(end), imu_ts(end));
%trim imu data to
tmap(1,:) = find(imu_ts >= tstart & imu_ts <= tstop);
tmap(3,:) = imu_ts(tmap(1,:));%Use imu_ts as ground truth
%Find nearest indecies in vicon data and map
vic_t = 1;
for i = 1:size(tmap,2)
%
while(vicon_ts(vic_t) < tmap(3,i))
vic_t = vic_t + 1;
end
tmap(2,i) = vic_t;
end
The timestamps are already sorted in ascending order, so this is essentially an O(n) operation but because it's looped it runs slowly. Any vectorized ways to do the same thing?
Edit
It appears to be running faster than I expected or first measured, so this is no longer a critical issue. But I would be interested to see if there are any good solutions to this problem.
Have a look at knnsearch in MATLAB. Use cityblock distance and also put an additional constraint that the data point in vicon_ts should be less than its neighbour in imu_ts. If it is not then take the next index. This is required because cityblock takes absolute distance. Another option (and preferred) is to write your custom distance function.
I believe that your current method is sound, and I would not try and vectorize any further. Vectorization can actually be harmful when you are trying to optimize some inner loops, especially when you know more about the context of your data (e.g. it is sorted) than the Mathworks engineers can know.
Things that I typically look for when I need to optimize some piece of code liek this are:
All arrays are pre-allocated (this is the biggest driver of performance)
Fast inner loops use simple code (Matlab does pretty effective JIT on basic commands, but must interpret others.)
Take advantage of any special data features that you have, e.g. use sort appropriate algorithms and early exit conditions from some loops.
You're already doing all this. I recommend no change.
A good start might be to get rid of the while, try something like:
for i = 1:size(tmap,2)
C = max(0,tmap(3,:)-vicon_ts(i));
tmap(2,i) = find(C==min(C));
end

Create a variable of a specific length and populate it with 0's and 1's

I am trying to use MATLAB in order to simulate a communications encoding and decoding mechanism. Hence all of the data will be 0's or 1's.
Initially I created a vector of a specific length and populated with 0's and 1's using
source_data = rand(1,8192)<.7;
For encoding I need to perform XOR operations multiple times which I was able to do without any issue.
For the decoding operation I need to implement the Gaussian Elimination method to solve the set of equations where I realized this vector representation is not very helpful. I tried to use strcat to append multiple 0's and 1's to a variable a using a for loop:
for i=1:8192
if(mod(i,2)==0)
a = strcat(a,'0');
else
a = strcat(a,'1');
end
i = i+1;
disp(i);
end
when I tried length(a) after this I found that the length was 16384, which is twice 8192. I am not sure where I am going wrong or how best to tackle this.
Did you reinitialize a before the example code? Sounds like you ran it twice without clearing a in between, or started with a already 8192 long.
Growing an array in a loop like this in Matlab is inefficient. You can usually find a vectorized way to do stuff like this. In your case, to get an 8192-long array of alternating ones and zeros, you can just do this.
len = 8192;
a = double(mod(1:len,2) == 0);
And logicals might be more suited to your code, so you could skip the double() call.
There are probably a few answer/questions here. Firstly, how can one go from an arbitrary vector containing {0,1} elements to a string? One way would be to use cellfun with the converter num2str:
dataDbl = rand(1,8192)<.7; %see the original question
dataStr = cellfun(#num2str, num2cell(dataDbl));
Note that cellfun concatenates uniform outputs.

What's the best way to iterate through columns of a matrix?

I want to apply a function to all columns in a matrix with MATLAB. For example, I'd like to be able to call smooth on every column of a matrix, instead of having smooth treat the matrix as a vector (which is the default behaviour if you call smooth(matrix)).
I'm sure there must be a more idiomatic way to do this, but I can't find it, so I've defined a map_column function:
function result = map_column(m, func)
result = m;
for col = 1:size(m,2)
result(:,col) = func(m(:,col));
end
end
which I can call with:
smoothed = map_column(input, #(c) (smooth(c, 9)));
Is there anything wrong with this code? How could I improve it?
The MATLAB "for" statement actually loops over the columns of whatever's supplied - normally, this just results in a sequence of scalars since the vector passed into for (as in your example above) is a row vector. This means that you can rewrite the above code like this:
function result = map_column(m, func)
result = [];
for m_col = m
result = horzcat(result, func(m_col));
end
If func does not return a column vector, then you can add something like
f = func(m_col);
result = horzcat(result, f(:));
to force it into a column.
Your solution is fine.
Note that horizcat exacts a substantial performance penalty for large matrices. It makes the code be O(N^2) instead of O(N). For a 100x10,000 matrix, your implementation takes 2.6s on my machine, the horizcat one takes 64.5s. For a 100x5000 matrix, the horizcat implementation takes 15.7s.
If you wanted, you could generalize your function a little and make it be able to iterate over the final dimension or even over arbitrary dimensions (not just columns).
Maybe you could always transform the matrix with the ' operator and then transform the result back.
smoothed = smooth(input', 9)';
That at least works with the fft function.
A way to cause an implicit loop across the columns of a matrix is to use cellfun. That is, you must first convert the matrix to a cell array, each cell will hold one column. Then call cellfun. For example:
A = randn(10,5);
See that here I've computed the standard deviation for each column.
cellfun(#std,mat2cell(A,size(A,1),ones(1,size(A,2))))
ans =
0.78681 1.1473 0.89789 0.66635 1.3482
Of course, many functions in MATLAB are already set up to work on rows or columns of an array as the user indicates. This is true of std of course, but this is a convenient way to test that cellfun worked successfully.
std(A,[],1)
ans =
0.78681 1.1473 0.89789 0.66635 1.3482
Don't forget to preallocate the result matrix if you are dealing with large matrices. Otherwise your CPU will spend lots of cycles repeatedly re-allocating the matrix every time it adds a new row/column.
If this is a common use-case for your function, it would perhaps be a good idea to make the function iterate through the columns automatically if the input is not a vector.
This doesn't exactly solve your problem but it would simplify the functions' usage. In that case, the output should be a matrix, too.
You can also transform the matrix to one long column by using m(:,:) = m(:). However, it depends on your function if this would make sense.