Pad cell array with whitespace and rearrange - matlab

I have a 2D cell-array (A = 2x3) containing numerical vectors of unequal length, in this form:
1x3 1x4 1x2
1x7 1x8 1x3
*Size of A (in both dimensions) can be variable
I want to pad each vector with whitespace {' '} to equalise their lengths to lens = max(max(cellfun('length',A)));- in this case, all vectors will become 1x8 in size - and then subsequently rearrange the cell array into this form so that it can be converted to a columnar table using cell2table (using sample data):
4 1 2 1 3 4
8 5 8 4 7 9
10 12 11 5 [] 11
[] 13 21 7 [] []
[] 15 [] 11 [] []
[] 18 [] 23 [] []
[] 21 [] 29 [] []
[] [] [] 32 [] []
[ ] = Whitespace
i.e. columns are in the order A{1,1}, A{2,1}, A{1,2}, A{2,2}, A{1,3} and A{2,3}.
If A = 4x3, the first five columns after the rearrangement would be A{1,1}, A{2,1}, A{3,1}, A{4,1} and A{1,2}.

My version of Matlab (R2013a) does not have cell2table so like Stewie Griffin I'm not sure which exact format you need for the conversion.
I am also not sure if padding vectors of double with whitespace is such a good idea. strings and double are not convenient to be mixed. Specially if in your case you just want cell array columns of homogeneous type (as opposed to column where each element would be a cell). It means you have to:
convert your numbers to string first (e.g. char array).
since the column will be a char array, they need to be homogeneous in dimension, so you have to find the longest string and make them all the same length.
Finally, you can then pad you char array column with the necessary number of whitespace
One way to do that require multiple cellfun calls to probe for all these information we need before we can actually do the padding/reshaping:
%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( #numel , A , 'uni',0)))) ;
%// get the maximum order of magnitude
n = max(max(cell2mat(cellfun( #(x) max(ceil(log10(x))) , A , 'uni',0))))
%// prepare string format based on "n"
fmt = sprintf('%%0%dd',n) ;
%// pad columns with necessary number of whitespace
b = cellfun( #(c) [num2str(c(:),fmt) ; repmat(' ', Lmax-numel(c),n)], A ,'uni',0 ) ;
%// reshape to get final desired result
b = b(:).'
b =
[8x2 char] [8x2 char] [8x2 char] [8x2 char] [8x2 char] [8x2 char]
Note that a call to str2num on that would yield your original cell array (almost, less a reshape operation), as str2num will ignore (return empty) the whitespace entries.
>> bf = cellfun( #str2num , b,'un',0 )
bf =
[3x1 double] [7x1 double] [4x1 double] [8x1 double] [2x1 double] [3x1 double]
If I was dealing with numbers, I would definitely prefer padding with a numeric type (also makes the operation slightly easier). Here's an example padding with 'NaN's:
%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( #numel , A , 'un',0)))) ;
%// pad columns with necessary number of NaN
b = cellfun( #(c) [c(:) ; NaN(Lmax-numel(c),1)], A ,'un',0 ) ;
%// reshape to get final desired result
b = b(:).'
b =
[8x1 double] [8x1 double] [8x1 double] [8x1 double] [8x1 double] [8x1 double]
If you do not like operating with NaNs, you could choose a numeric value which is not among the possible values of your dataset. For example if all your values are supposed to be positive integers, -1 is a good indicator of a special value.
%// choose your NULL value indicator
nullNumber = -1 ;
b = cellfun( #(c) [c.' ; zeros(Lmax-numel(c),1)+nullNumber], A ,'un',0 ) ;
b = b(:).'
cell2mat(b)
ans =
4 1 2 1 3 4
8 5 8 4 7 9
10 12 11 5 -1 11
-1 13 21 7 -1 -1
-1 15 -1 11 -1 -1
-1 18 -1 23 -1 -1
-1 21 -1 29 -1 -1
-1 -1 -1 32 -1 -1
Note:
If -1 is a possible value for your set, and you still don't want to use NaN, a value widely used in my industry (which is totally allergic to NaN) as a null indicator for all real numbers is -999.25. Unless you have a very specific application, the probability of getting exactly this value during normal operation is so infinitesimal that it is ok for most software algorithms to recognize a null value when they come across -999.25. (sometimes they use only -999 if they deal with integers only.)
Also note the use of c(:) in the cellfun calls. This makes sure that the vector (in each cell) will be arranged as a column (regardless of it's original shape (because your initial vectors are actually in line as you have them in your example).

Unfortunately, I don't have time to test this, but I believe this should work if you want to do this fast and simple, without having to write explicit loops.
b = cellfun(#(c) [c, repmat(' ', 1, 197-numel(c))], a,'UniformOutput',0)
Edit:
I don't have MATLAB here, and I have never used table before, so I don't know exactly how it works. But, I assume the easiest way to do this is to use the line above, but instead of trying to pad with spaces, pad it with NaNs. After that, when you have made your table with NaNs, you can do something like:
So:
B = A(:); % Straighten it out
C = cellfun(#(c) [c, repmat(NaN, 1, 8-numel(c))], B,'UniformOutput',0) % 1x8 vectors
%% Create table %%
tab(tab == NaN) = ' ';
Sorry if this didn't help. It's all I can do at the moment.

Padding a vector with a white space:
YourString = 'text here';
YourString = [YourString ' '];
in case only 1 whitespace is required. If more are needed you can loop this code to get the wanted number of spaces attached.
table itself already has the functionality to print cells.
Thanks to #StewieGriffin:
[YourString, repmat(' ',1,197-numel(YourString)]

Related

size of inner elements of cells

I'm reading some data with their attribute (say A in which first row is ids and second row is their attribute value) . I'd like to place such data in a cell where the first column are the unique ids and second row their attribute. whenever there's duplicate values for the attribute, I'll put on the vacancy available on front of its row. for example I'd like to construct C
A =
1 2 3 2
2 4 5 9
C{1}=
1 2 0
2 4 9
3 5 0
when I'm going to test the size of inner homes in cell, e.g.
size(C{1},2)
ans = 3
size(C{1},1)
ans = 3
size(C{1}(1,:),2)
ans = 3
All return 3 since it occupies empty homes with 0. So how should I understand where to put my new data (e.g. (1,5))? Should I traverse or find the place of 0 and insert there?
Thanks for any help.
Why not use a cell-Array for these kind of problem? How did you generate your C matrix?
Even though you have used cell-Arrays for C matrix, each element of C is a matrix in your case, so that the dimensions should be constant.
I have used a cell array inside a matrix. (i.e) each elements takes its own size based on the duplicate sizes. for eg, you could see that C{2,2} has two values while C{1,2} and C{3,2} has only one values. you could easily check the size of them without checking for zeros. Note that, even if any values were zero, this code will still work.
The first column of the matrix represents identifiers while the second column represents the values which takes its own size based on the number of duplicates.
Here is my Implementation using accumarray and unique to generate C as a cell-array.
Code:
C = [num2cell(unique(A(1,:).')), accumarray(A(1,:).',A(2,:).',[],#(x) {x.'})]
Your Sample Input:
A = [1 2 3 2;
2 4 5 9];
Output:
>> C
C =
[1] [ 2]
[2] [1x2 double]
[3] [ 5]
>> size(C{2,2},2)
ans =
2
>> size(C{1,2},2)
ans =
1
From the DOC
Note: If the subscripts in subs are not sorted with respect to their linear indices, then accumarray might not always preserve the order of the data in val when it passes them to fun. In the unusual case that fun requires that its input values be in the same order as they appear in val, sort the indices in subs with respect to the linear indices of the output.
Another Example:
Input:
A = [1 2 1 2 3 1;
2 4 5 9 4 8];
Output:
C =
[1] [1x3 double]
[2] [1x2 double]
[3] [ 4]
Hope this helps!!

Find intersection of two cell arrays in mATLAB

I have two cell arrays C & D , they contains numerical data (but some cells are empty) . the data that is inside each cell may be a 2D array, I want to find the intersection of each cell in C with each cell in D
How can I do such thing?
for example : if the size of C & D is 10-by10
C= [ {1 2 } ,{ 3 4},.... etc]
D = [ { 1 34 7} , {2 5},... etc]
Out = c intersect D
out= [ { 1} , {},.... etc]
>> C = {1 [2 3 4; 5 6 7] [] [] 5};
>> D = {1:2 3:5 6 7:9 []};
>> R = cellfun(#(c, d) intersect(c(:), d(:)), C, D, 'uniformoutput', 0);
>> R{:}
ans =
1
ans =
3
4
5
ans =
Empty matrix: 1-by-0
ans =
Empty matrix: 0-by-1
ans =
Empty matrix: 1-by-0
If your data is only numeric (each cell contains a numeric value and empties) I suggest you to change that to a numeric array and use the intersect function. It is easy to represent missing values as NaN.
To convert to double:
tmp = {1, 2, 3, 4, 5, []};
% // Getting rid of the empties
index_empties = cellfun(#isempty, tmp);
tmp(index_empties) = {NaN};
% // converting to double
tmp_double = cellfun(#double, tmp);
Cell values are there to easily create a vector with non-homogeneous data types (Strings and numbers, for example). It is quite common see people using cells to store numbers. While this might be valid in some cases, using cells to store homogeneous data will waste memory and will complicate some operations. For example, you cannot easily sum two cell-vectors with numeric data while summing two double-vectors is trivial.

How do I remove the elements of one vector from another?

I have a double value, A, which is
[1,4,7,6]
I also have B, which is an array that contains many more values. I have a new variable, C, which is essentially a double value of all these numbers (all of them in one cell, vs. five).
[1,4,7,6]
[2,6,9,12]
[3,1,17,13]
[5,7,13,19]
[1,5,9,15]
How do I remove the elements (not the actual values) from C? I want to end up with this.
[2,6,9,12,3,1,17,13,5,7,13,19,1,5,9,15]
How do I get this? I've used these commands:
C(A) = [];
and
C = C(setdiff(1:length(C),A));
The problem is that when I run that command, I get this instead of what I want.
[4,7,2,12,3,1,17,13,5,7,13,19,1,5,9,15]
Clearly that isn't the same as what I have. It's throwing off the rest of my results and I need to fix this specific issue.
Thanks in advance :)
EDIT:
So I figured out that it's spewing the CORRECT numbers out, just in the wrong order. I have to sort it in order for it to work correctly. This is a problem because it causes the next command to be non-functional because the ismember command has issues with the removal (I don't know why, I'm still working on it).
Double array case
If B is a double array, you can use setdiff with 'rows' and 'stable' options, like so -
C = reshape(setdiff(B,A,'rows','stable').',1,[])
With ismember, you can perform the same operation, like so -
C = reshape(B(~ismember(B,A,'rows'),:).',1,[])
You can also use a bsxfun approach as suggested by #Amro -
C = reshape(B(~all(bsxfun(#eq, B, A),2),:).',1,[])
Cell array case
If B is a cell array with number of elements in each cell equal to the number of elements in A, then you can firstly convert it to a double array - B = vertcat(B{:}) and then use either of the above mentioned tools.
Or you can use a cellfun based approach that avoids conversion to a double array, like so -
excl_rows = B(~cellfun(#(x1,x2) isequal(x1,x2), B, repmat({A},size(B,1),1)),:)
C = horzcat(excl_rows{:})
Or another cellfun based approach that avoids repmat -
exclB = B(~cellfun(#(x1) isequal(x1,A), B),:)
C = horzcat(exclB{:})
Example with explanation -
%// Inputs
A = [1,4,7,6]
B = {[1,4,7,6]
[2,6,9,12]
[3,1,17,13]
[5,7,13,19]
[1,5,9,15]}
%// Compare each cell of B with A for equality.
%// The output must be a binary array where one would be for cells that have
%// elements same as A and zero otherwise.
ind = cellfun(#(x1) isequal(x1,A), B)
%// Thus, ~ind would be a binary array where one would reperesent unequal
%// cells that are to be selected in B for the final output.
exclB = B(~ind)
%// exclB is still a cell array with the cells that are different from A.
%// So, concatenate the elements from exclB into a vector as requested.
C = horzcat(exclB{:})
Output -
A =
1 4 7 6
B =
[1x4 double]
[1x4 double]
[1x4 double]
[1x4 double]
[1x4 double]
ind =
1
0
0
0
0
exclB =
[1x4 double]
[1x4 double]
[1x4 double]
[1x4 double]
C =
2 6 9 12 3 1 17 13 5 7 13 19 1 5 9 15

Cell array or matrix for different element's sizes per iteration

No code just visually:
Iteration i result j1 result j2
1 10 15 20 15 25 2
2 5 8
. . . . .
. . . . . . . .
i j1 with length(x), x=0:100 j2 with length == 1
edit for better representation:
[10 15 20 15 25] [1] (i=1)
[5] [2] (i=2)
Matrix(i) = [ [. . . . . . . ] [3] ]
[..] [.]
[j1 = size (x)] [j2 size 1 * 1] (i=100)
so Matrix dimension is: i (rows) * 2 (columns)
(p.e for i = 1, j1 with size(x) column 1 on row 1, j1 size (1) column 2 on row 1)
I want to save each iterations results to a matrix in order to use them for comparison.
Can this be done with a matrix or its better with cell array and please write an example for reference.
Thanks in advance.
I would go with a cell array for a cleaner, more intuitive implementation, with less contraints.
nIterations = 500;
J = cell(nIterations, 2);
for i=1:nIterations
length_x = randi(100); % random size of J1
J{i,1} = randi(100, length_x, 1); % J1
J{i,2} = randi(i); % J2
end
In addition you get some extra benefits such as:
Access an element along and within the cell array
J{10, 1}; J{10, 2};
Append/modify within each element without changing the overall structure
J{10, 1} = [J{10, 1}; 0]
Append to the array (adding iterations), like in a normal array
J{end+1, 1} = 1; J{end, 2} = 1
Apply functions in each entry (vector) using cellfun
length_J = cellfun(#length, J); % get numel/length of J1
mean_J = cellfun(#mean, J); % get mean of J1
EDIT: scroll to profiling for a comparison. (which renders cell-implementation wining.)
You can do with a matrix which has i rows and 101 colums (values of j1 in the first 100, filled up with NaNs (*) when necessary then value of j2),
so then you can do easy comparisons given that it is an unambiguous representation. That is, using 101 columns you make sure j1's do not end with a NaN.
(*){NaN's or 0's depending on which one is more convenient}
You could also do 102 columns where the first column gives the length of j1, then comes the value of j1 followed by the NaN's, then the value of j2.
Say j1=[3 1 10 5], j2=2, then the corresponding row is [4 3 1 10 5 NaN ... NaN 2].
The benefit of this matrix-approach is
it should be faster than cells (on not too large number of rows) since Matlab is very good at handling fixed-size matrices.
Also, basic operations (like comparisons) are slightly easier to program. (you only have to compare two vectors, you can do multiple comparisons on the same line.)
The backwards things with the matrix approach are
you cannot easily append to j1 (well, a bit easier when you do the 102 column-approach),
there is a limit on the size of j1. (In this case, 100.)
All in all, cells are slower in general and possibly a bit more lengthy to program with, but more flexible.
I hope this points you to the right direction.
EDIT:
Third approach with 2 matrices:
j1results = zeros( n_iterations, maxlen_j1 );
j2results = zeros( n_iterations, 1);
Then the computation goes like so:
[j1results(k,:), j2results(k)] = compute(k);
where the compute is a function that returns two different values.
PROFILING:
function [J1,J2] = compute(k)
J1 = zeros(1,100); %this is necessary
% some dummy assignments
len = randi(100,1);
J1(1:len) = k*ones(1,len);
J2 = k;
end
function res = compute_cell(k) % for the cell-solution
res = cell(1,2);
len = randi(100,1);
res{1} = k*ones(1,len);
res{2} = k;
end
n=100000;
tic;
J12 = cell(n,2);
for i=1:n
J12{i}=temp_cell(i);
end
toc
tic;
J1 = zeros(n,100);
J2 = zeros(n,1);
for i=1:n
[J1(i,:), J2(i)] = temp(i);
end
toc
Result:
Elapsed time is 2.437634 seconds.
Elapsed time is 2.741491 seconds.
(Also profiled with len distribution of UNI[50,100], where the disadvantage of the matrix implementation of allocating unnecessary memory space would be less dominant, the picture stays still the same.)
Bottomline: Surprisingly, profiling says cell-implementation beats matrix implementation in every aspect.
From your graphic representation one can see that you need a 2xN_Rows structure which has not a regular pattern (indeed there is kind of stochasticity in it) therefore you must store your data in a way supporting such an irregularity. Thus, cells are the natural solution, as others say.
As far as I can see, the elements you need to insert in your matrix, although stochastic, are independent one on the other, therefore you can still fruitfully vectorize.
You have 2 stochastic contributions independent one another:
the amount of element in the first column of your structure is random.
the elements in your structure are random;
Let us consider separately the contributions:
you have N_Rows, with a variable number of elements. Let's say that there are N_El in the worst case (i.e. N_El is an upper bound for the amount of entry per row). Let's generate the number of elements per row doing
elem_N = randi(`N_El` , [N_Rows 1]);
you have to generate exactly sum(elem_N) random numbers (for point 2.) that will be distributed among the rows after being partitioned according to elem_N.
Here is the final code I suggest
N_ROW = 20;
N_EL = 10;
MAX_int = 20; %maximum random integer in each row
elem_N = randi(N_EL,[N_ROW , 1]); % elements per line stochasticity
elements = randi(MAX_int, [1 sum(elem)]); % elements value stochasticity
%cutoff points of the vector "elements" among the rows of the structure
cutoffs = mat2cell(...
[[1 ; cumsum(elem_N(1:end-1))] cumsum(elem_)]...
,ones(N_ROW,1),[2]);
%result:
res = [cellfun(#(idx) elements( idx(1):idx(2) ) , cutoffs , 'UniformOutput', false ) ,...
num2cell( randi(MAX_int,[1 N_ROW])')];
Result
res =
[1x9 double] [20]
[1x3 double] [12]
[1x5 double] [ 7]
[1x8 double] [20]
[1x11 double] [18]
[1x7 double] [ 4]
[1x11 double] [ 1]
[1x4 double] [15]
[1x5 double] [18]
where
res{1,1}
ans =
15 13 2 3 20 10 1 2 3
res{2,1}
ans =
3 18 10
and so on...

Difference between [] and [1x0] in MATLAB

I have a loop in MATLAB that fills a cell array in my workspace (2011b, Windows 7, 64 bit) with the following entries:
my_array =
[1x219 uint16]
[ 138]
[1x0 uint16] <---- row #3
[1x2 uint16]
[1x0 uint16]
[] <---- row #6
[ 210]
[1x7 uint16]
[1x0 uint16]
[1x4 uint16]
[1x0 uint16]
[ 280]
[]
[]
[ 293]
[ 295]
[1x2 uint16]
[ 298]
[1x0 uint16]
[1x8 uint16]
[1x5 uint16]
Note that some entries hold [], as in row #6, while others hold [1x0] items, as in row #3.
Is there any difference between them? (other than the fact that MATLAB displays them differently). Any differences in how MATLAB represents them in memory?
If the difference is only about how MATLAB internally represents them, why should the programmer be aware of this difference ? (i.e. why display them differently?). Is it a (harmless) bug? or is there any benefit in knowing that such arrays are represented differently?
In most cases (see below for an exception) there is no real difference. Both are considered "empty", since at least one dimension has a size of 0. However, I wouldn't call this a bug, since as a programmer you may want to see this information in some cases.
Say, for example, you have a 2-D matrix and you want to index some rows and some columns to extract into a smaller matrix:
>> M = magic(4) %# Create a 4-by-4 matrix
M =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> rowIndex = [1 3]; %# A set of row indices
>> columnIndex = []; %# A set of column indices, which happen to be empty
>> subM = M(rowIndex,columnIndex)
subM =
Empty matrix: 2-by-0
Note that the empty result still tells you some information, specifically that you tried to index 2 rows from the original matrix. If the result just showed [], you wouldn't know if it was empty because your row indices were empty, or your column indices were empty, or both.
The Caveat...
There are some cases when an empty matrix defined as [] (i.e. all of its dimensions are 0) may give you different results than an empty matrix that still has some non-zero dimensions. For example, matrix multiplication can give you different (and somewhat non-intuitive) results when dealing with different kinds of empty matrices. Let's consider these 3 empty matrices:
>> a = zeros(1,0); %# A 1-by-0 empty matrix
>> b = zeros(0,1); %# A 0-by-1 empty matrix
>> c = []; %# A 0-by-0 empty matrix
Now, let's try multiplying these together in different ways:
>> b*a
ans =
[] %# We get a 0-by-0 empty matrix. OK, makes sense.
>> a*b
ans =
0 %# We get a 1-by-1 matrix of zeroes! Wah?!
>> a*c
ans =
Empty matrix: 1-by-0 %# We get back the same empty matrix as a.
>> c*b
ans =
Empty matrix: 0-by-1 %# We get back the same empty matrix as b.
>> b*c
??? Error using ==> mtimes
Inner matrix dimensions must agree. %# The second dimension of the first
%# argument has to match the first
%# dimension of the second argument
%# when multiplying matrices.
Getting a non-empty matrix by multiplying two empty matrices is probably enough to make your head hurt, but it kinda makes sense since the result still doesn't really contain anything (i.e. it has a value of 0).
When concatenating matrices, the common dimension has to match.
It's not currently an error if it doesn't match when one of the operands is empty, but you do get a nasty warning that future versions might be more strict.
Examples:
>> [ones(1,2);zeros(0,9)]
Warning: Concatenation involves an empty array with an incorrect number of columns.
This may not be allowed in a future release.
ans =
1 1
>> [ones(2,1),zeros(9,0)]
Warning: Concatenation involves an empty array with an incorrect number of rows.
This may not be allowed in a future release.
ans =
1
1
Another difference is in the internal representation of both versions of empty. Especially when it comes to bundle together objects of the same class in an array.
Say you have a dummy class:
classdef A < handle
%A Summary of this class goes here
% Detailed explanation goes here
properties
end
methods
end
end
If you try to start an array from empty and grow it into an array of A objects:
clear all
clc
% Try to use the default [] for an array of A objects.
my_array = [];
my_array(1) = A;
Then you get:
??? The following error occurred converting from A to double:
Error using ==> double
Conversion to double from A is not possible.
Error in ==> main2 at 6
my_array(1) = A;
But if you do:
% Now try to use the class dependent empty for an array of A objects.
my_array = A.empty;
my_array(1) = A;
Then all is fine.
I hope this adds to the explanations given before.
If concatenation and multiplication is not enough to worry about, there is still looping. Here are two ways to observe the difference:
1. Loop over the variable size
for t = 1:size(zeros(0,0),1); % Or simply []
'no'
end
for t = 1:size(zeros(1,0),1); % Or zeros(0,1)
'yes'
end
Will print 'yes', if you replace size by length it will not print anything at all.
If this is not a surprise, perhaps the next one will be.
2. Iterating an empty matrix using a for loop
for t = [] %// Iterate an empty 0x0 matrix
1
end
for t = ones(1, 0) %// Iterate an empty 1x0 matrix
2
end
for t = ones(0, 1) %// Iterate an empty 0x1 matrix
3
end
Will print:
ans =
3
To conclude with a concise answer to both of your questions:
Yes there is definitely a difference between them
Indeed I believe the programmer will benefit from being aware of this difference as the difference may produce unexpected results