Efficient representation of low-complexity integer vectors - matlab

Let V be a vector of integers, and let L be the length of V.
Now, suppose that the number N of distinct values in V is much smaller than L.
One may also assume that V is sorted, so that it can be construed as the concatenation of N consecutive constant "blocks".
Lastly, one may assume that, once initialized, V is henceforth read-only (aka immutable).
(In the case I'm working with at the moment, L is between 106 and 107, and N is about 20.)
It is wasteful to store such low-complexity data in a standard MATLAB L-long vector. Does MATLAB have any built-in1 data structure that
has the same interface as a regular vector (e.g. one can read its k-th element with the expression V(k), its last element with V(end), ranges of locations with V(p:q), etc.).
uses much less storage space than L × the size of an integer.
?
BTW, the problem here is reminiscent to that of sparse-array representation, but not quite the same (at least AFAICT).
OK, here's my solution, based on gariepy's answer:
block_sizes = [5, 4, 3, 2];
block_values = 1000 + (1:numel(block_sizes));
interpolation_table = [0 cumsum(block_sizes)];
V = #(i) interp1(interpolation_table, [NaN block_values], i, 'next');
V(0)
ans =
NaN
V(1:5)
ans =
1001 1001 1001 1001 1001
V(6:9)
ans =
1002 1002 1002 1002
V(10:12)
ans =
1003 1003 1003
V(13:14)
ans =
1004 1004
V(15)
ans =
NaN
It has a tiny wart, though:
V(end)
ans =
1001
(It would have been better if it raised an exception when given end as arguments, rather than give a completely crazy answer.)
1 Of course, I know that I can always can try to roll my own implementation of such a thing, but I prefer not to re-invent wheels if I can avoid it.

One possible method to represent this data is an interpolation table.
Assume vec is your length L vector. First, count the number of occurrences:
[num_occurrences, y_values] = hist(vec, unique(vec));
Then, build the interpolation representation:
interp_table = zeros(1,length(y_values) + 1);
interp_table(1) = 1;
y_values(end+1) = y_values(end) + 1; % Need to create a "fake" value at the end of the interpolation table
for i = 2:length(y_values)
interp_table(i) = interp_table(i-1) + num_occurrences(i-1);
end
Finally, define a function handle to give you the "array-like" access you want.
my_fun = #(x) interp1(interp_table, y_values, x, 'previous');
Example:
>> vec = [1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 ];
>> my_fun(1)
ans =
1
>> my_fun(2)
ans =
1
>> my_fun(3)
ans =
2
>> my_fun(6)
ans =
2
>> my_fun(7)
ans =
2
>> my_fun(8)
ans =
3
>> my_fun(17)
ans =
3
>> my_fun(18) %% This is vec(L+1), so it should never be needed
ans =
4
>> my_fun(19) %% Above index L+1, values are not defined
ans =
NaN
The examples demonstrate a minor caveat: values above my_fun(L) should not be used, where L is the length of the original vector that is being represented by the interpolation table. So this gives you array-like access, though you cannot directly calculate the "length" of this interpolation table.
EDIT: note that you CAN do ranges with this interpolation function:
>> my_fun(1:17)
ans =
Columns 1 through 15
1 1 2 2 2 2 2 3 3 3 3 3 3 3 3
Columns 16 through 17
3 3

Related

Create Non Zero elements of Matrix in vector form in Matlab

I have a Matrix of size M by N in which each row has some zero entries. I want to create M row vectors such that each of the vector contains the non zero elements of each row. For example if I have the following Matrix
A=[0 0 0 5;0 0 4 6;0 1 2 3;9 10 2 3]
I want four different row vectors of the following form
[5]
[4 6]
[1 2 3]
[9 10 2 3]
This can be done with accumarray using an anonymous function as fourth input argument. To make sure that the results are in the same order as in A, the grouping values used as first input should be sorted. This requires using (a linearized version of) A transposed as second input.
ind = repmat((1:size(A,2)).',1,size(A,2)).';
B = A.';
result = accumarray(ind(:), B(:), [], #(x){nonzeros(x).'});
With A = [0 0 0 5; 0 0 4 6; 0 1 2 3; 9 10 2 3]; this gives
result{1} =
5
result{2} =
4 6
result{3} =
1 2 3
result{4} =
9 10 2 3
Since Matlab doesn't support non-rectangular double arrays, you'll want to settle on a cell array. One quick way to get the desired output is to combine arrayfun with logical indexing:
nonZeroVectors = arrayfun(#(k) A(k,A(k,:)~=0),1:size(A,1),'UniformOutput',false);
I used the ('UniformOutput',false) name-value pair for the reasons indicated in the documentation (I'll note that the pair ('uni',0) also works, but I prefer verbosity). This input produces a cell array with the entries
>> nonZerosVectors{:}
ans =
5
ans =
4 6
ans =
1 2 3
ans =
9 10 2 3

creating diagonal matrix with selected elements

I have a 4x5 matrix called A from which I want to select randomly 3 rows, then 4 random columns and then select those elements which coincide in those selected rows and columns so that I have 12 selected elements.Then I want to create a diagonal matrix called B which will have entries either 1 or 0 so that multiplication of that B matrix with reshaped A matrix (20x1) will give me those selected 12 elements of A.
How can I create that B matrix? Here is my code:
A=1:20;
A=reshape(A,4,5);
Mr=4;
Ma=3;
Na=4;
Nr=5;
M=Ma*Mr;
[S1,S2]=size(A);
N=S1*S2;
y2=zeros(size(A));
k1=randperm(S1);
k1=k1(1:Ma);
k2=randperm(S2);
k2=k2(1:Mr);
y2(k1,k2)=A(k1,k2);
It's a little hard to understand what you want and your code isn't much help but I think I've a solution for you.
I create a matrix (vector) of zeros of the same size as A and then use bsxfun to determine the indexes in this vector (which will be the diagonal of B) that should be 1.
>> A = reshape(1:20, 4, 5);
>> R = [1 2 3]; % Random rows
>> C = [2 3 4 5]; % Random columns
>> B = zeros(size(A));
>> B(bsxfun(#plus, C, size(A, 1)*(R-1).')) = 1;
>> B = diag(B(:));
>> V = B*A(:);
>> V = V(V ~= 0)
V =
2
3
4
5
6
7
8
9
10
11
12
13
Note: There is no need for B = diag(B(:)); we could have simply used element by element multiplication in Matlab.
>> V = B(:).*A(:);
>> V = V(V ~= 0)
Note: This may be overly complex or very poorly put together and there is probably a better way of doing it. It's my first real attempt at using bsxfun on my own.
Here is a hack but since you are creating y2 you might as well just use it instead of creating the useless B matrix. The bsxfun answer is much better.
A=1:20;
A=reshape(A,4,5);
Mr=4;
Ma=3;
Na=4;
Nr=5;
M=Ma*Mr;
[S1,S2]=size(A);
N=S1*S2;
y2=zeros(size(A));
k1=randperm(S1);
k1=k1(1:Ma);
k2=randperm(S2);
k2=k2(1:Mr);
y2(k1,k2)=A(k1,k2);
idx = reshape(y2 ~= 0, numel(y2), []);
B = diag(idx);
% "diagonal" matrix 12x20
B = B(all(B==0,2),:) = [];
output = B * A(:)
output =
1
3
4
9
11
12
13
15
16
17
19
20
y2 from example.
y2 =
1 0 9 13 17
0 0 0 0 0
3 0 11 15 19
4 0 12 16 20

maximum value column with the condition of another column [duplicate]

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.

Matlab: avoid for loops to find the maximum among values with same labels

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.

Removing third dimension of matrix

Lets say I have matrix such that A(:,:,1)=[1,2,3;2,3,4], A(:,:,2)=[3,4,5;4,5,6].
How is the easiest way of accessing and plotting the vectors (1,2,3),(2,3,4),(3,4,5),(4,5,6). I tried creating B=[A(:,:,1);A(:,:,2)], but i need a procedure to arbitrary number of A's.
Hope this isn't trivial and I've formulated myself satisfactory.
You should think 'vertically'. This will allow you to use colon indexing:
>> A(:,:,1) = [1,2,3;2,3,4].'; %'// NOTE: transpose of your original
>> A(:,:,2) = [3,4,5;4,5,6].'; %'// NOTE: transpose of your original
>> A(:,:)
ans =
1 2 3 4
2 3 4 5
3 4 5 6
The colon indexing with two colons works for any dimension A:
>> A(:,:,:,:,1,1) = [1 2 3; 2 3 4].'; %'
>> A(:,:,:,:,2,1) = [3 4 5; 4 5 6].'; %'
>> A(:,:,:,:,1,2) = [5 6 7; 6 7 8].'; %'
>> A(:,:,:,:,2,2) = [7 8 9; 8 9 0].'; %'
>> A(:,:)
ans =
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 0
Colon indexing in MATLAB is quite interesting and really powerful once you master it. For example, if you use fewer colons than there are dimensions in the array (like above), MATLAB will automatically concatenate the remainder of the data along the dimension equal to the colon count.
So, if A has 48 dimensions, but you index with just 2 colons: you'll get a 2D array, that is the concatenation of the remaining 46 dimensions along the 2nd dimension.
In general: if A has N dimensions, but you index with just M ≤ N colons: you'll get an M-D array, that is the concatenation of the remaining N-M dimensions along the Mth dimension.
So as long as you are free to define your A to contain vectors on the columns rather than the rows (you should advise everyone to do this, as virtually everything in MATLAB is a bit faster that way), I think this is the fastest and most elegant way to do what you want.
If not, well, then just reshape like Dan :)
Assuming the order does not matter, here is how you can do it for vectors of length 3:
B = reshape(shiftdim(A,2), [], 3)
plot(B')
For vectors of arbitrary dimensions, replace 3 by size(A,2)