labelling chunks of data consecutively - matlab

I have data contained in 4195X1 double called z1. I would like to extract data in 120 chunks and label them z1_1_120, z1_120_240, z1_240_360, etc using matlab. I would like to both extract them and also label them in this manner in a loop. Here's what I've done so far, but am unsure as to how to proceed:
load(z1)
for i = 1:4195
q=z1(i);
q1(i,:)=q;
q2=q1(1:120:end);
end
z1=q2(1:end);

As Daniel commented, don't do it!
In case you want the kind of labeling you described, you may either use a Table or a Struct data types.
Read the following post:
The solution is easy: Don't do this. It is a shot in your knee
The following solution creates a Table and a Struct.
The Table needs some padding, because all columns must be the same length (and 4195 is not a multiple of 120).
The code is a bit complicated, I tried to solve it without using for loops (for making the solution more efficient [and more interesting]).
I hope you mange to follow the code, and learn from it...
Here is a code sample (explanations are in the comments):
% Fill z1 with sequential numbers (just for testing)
z1 = (1:4195)';
z1_len = length(z1);
% Compute length remainder from 120
len_mod120 = mod(z1_len, 120);
pad_len = mod(120 - len_mod120, 120);
% If length of z1 is not a multiple of 120, add zeros padding at the end.
pad_z1 = padarray(z1, pad_len, 0, 'post'); % After padding, length of z1 is 4120
% Reshape pad_z1 into a matrix where each row is 120 elements
% Z1(:, 1) gets z1(1:120), Z1(:, 2) gets z1(121:240)...
Z1 = reshape(pad_z1, [], length(pad_z1)/120);
% Build naming indices
name_idx = zeros(1, 2*length(pad_z1)/120);
name_idx(1:2:end) = 1:120:length(pad_z1); %First naming index: 1 121 241 361 ...
name_idx(2:2:end) = name_idx(1:2:end) + 120-1; %Second naming index: 120 240 360 480
% String of elements names separated by space
str_names = sprintf('z1_%d_%d ', name_idx); % 'z1_1_120 z1_121_240 z1_241_360 z1_361_480 ...
% Build cell array of names
var_names = split(str_names(1:end-1)); %{'z1_1_120'}, {'z1_121_240'}, {'z1_121_240'}
% Build table, where each column is 120 elements, and column names are 'z1_1_120' 'z1_121_240' 'z1_121_240'
% A table is useful, if you don't care about the zero padding we added at the beginning
% https://www.mathworks.com/matlabcentral/answers/376985-how-to-convert-string-to-variable-name
T = array2table(Z1, 'VariableNames', var_names);
% Convert table to struct:
% S.z1_1_120 holds first 120 elements, S.z1_121_240 holds next 120 elements...
S = table2struct(T, 'ToScalar', true);
% Fix the last field in the stract - should contain only 115 elements (not 120)
S = rmfield(S, var_names{end}); % Remove the last field from the struct
% last_field_name = 'z1_4081_4195'
last_field_name = sprintf('z1_%d_%d', name_idx(end-1), z1_len);
% Add the last field (only 195 elemtns)
S.(last_field_name) = z1(end-len_mod120+1:end);
Table T:
T =
120×35 table
z1_1_120 z1_121_240 z1_241_360 z1_361_480 ...
________ __________ __________ __________
1 121 241 361
2 122 242 362
3 123 243 363
4 124 244 364
5 125 245 365
6 126 246 366
... ... ... ...
Example for accessing the first element: T.z1_1_120(1)
Struct S:
S =
struct with fields:
z1_1_120: [120×1 double]
z1_121_240: [120×1 double]
z1_241_360: [120×1 double]
z1_361_480: [120×1 double]
z1_481_600: [120×1 double]
z1_601_720: [120×1 double]
z1_721_840: [120×1 double]
...
z1_4081_4195: [115×1 double]
Example for accessing the first element: S.z1_1_120(1)

Related

Conditional indexing of MatLab structure array by function output

I am currently organizing heterogeneous data in structure arrays in MatLab, e.g.,
patient.name = 'John Doe';
patient.billing = 127;
patient.test = [79 75 73 180 178 177.5; 220 210 205 79 75 73; 180 178 177.5 20 210 205;];
patient(2).name = 'Ann Lane';
patient(2).billing = 28.50;
patient(2).test = [68 70 68; 118 118 119; 172 170 169; 220 210 205];
Let's assume I want to do some more advanced indexing, I want to look at the size of the test field of each patient. These fields all have heterogeneous sizes, which is why I want to use a different struct for each patient.
I want to do something along the lines of this:
%This does not work
disp(patient([size(patient.test,1)]>3))
For example, check whether the array of patient.test has more than 3 rows and use the resulting boolean array to index the entire structure array. I assume my syntax is simply wrong, but I have not found examples on how to do it properly. Help would be appreciated!
patient.test will give a comma-separated list of the field's contents. You can collect that list into a cell array and use cellfun to apply the size function to the contents of each cell:
>> sz = cellfun(#size, {patient.test}, 'UniformOutput', false);
>> celldisp(sz)
sz{1} =
3 6
sz{2} =
4 3
If you only want to display the sizes, you can use cellfun to apply an anonymous function that does that:
>> cellfun(#(c) disp(size(c)), {patient.test})
3 6
4 3
To obtain an index based on the size of the field:
>> ind = cellfun(#(c) size(c,1)>3, {patient.test})
ind =
1×2 logical array
0 1
and then
patient_selected = patient(ind);
or, if you prefer it in a single line,
patient_selected = patient(cellfun(#(c) size(c,1)>3, {patient.test}));

Pivot Table with Additional Columns

Let's have a 2D double array such as:
% Data: ID, Index, Weight, Category
A0=[1 1121 204 1;...
2 2212 112 1;...
3 2212 483 3;...
4 4334 233 1;...
5 4334 359 2;...
6 4334 122 3 ];
I am needing to pivot / group by the rows with the highest Weights, for each given Index, which can be achieved with any Pivot Table | Group By functionality (such as pivottable, SQL GROUP BY or MS Excel PivotTable)
% Current Result
A1=pivottable(A0,[2],[],[3],{#max}); % Pivot Table
A1=cell2mat(A1); % Convert to array
>>A1=[1121 204;...
2212 483;...
4334 359 ]
How should I proceed if I need to recover also the ID and the Category columns?
% Required Result
>>A1=[1 1121 204 1;...
3 2212 483 3;...
5 4334 359 2 ];
The syntax is Matlab, but a solution involving other languages (Java, SQL) can be acceptable, due they can be transcribed into Matlab.
You can use splitapply with an anonymous function as follows.
grouping_col = 2; % Grouping column
maximize_col = 3; % Column to maximize
[~, ~, group_label] = unique(A0(:,grouping_col));
result = splitapply(#(x) {x(x(:,maximize_col)==max(x(:,maximize_col)),:)}, A0, group_label);
result = cell2mat(result); % convert to matrix
How it works: the anonymous function #(x) {x(x(:,maximize_col)==max(···),:)} is called by splitapply once for each group. The function is provided as input a submatrix containing all rows with the same value of the column with index grouping_col. What this function then does is keep all rows that maximize the column with index maximize_col, and pack that into a cell. The result is then converted to matrix form by cell2mat.
With the above solution, if there are several maximizing rows for each group all of them are produced. To keep only the first one, replace the last line by
result = cell2mat(cellfun(#(c) c(1,:), result, 'uniformoutput', false));
How it works: this uses cellfun to apply the anonymous function #(c) c(1,:) to the content of each cell. The function simply keeps the first row. Alternatively, to keep the last row use #(c) c(end,:). The result is then converted to matrix form using cell2mat again.

Finding the max number of a column in MATLAB, which allocates to a specific string

Consider that I have a table of such type in MATLAB:
Location String Number
1 a 26
1 b 361
2 c 28
2 a 45
3 a 78
4 b 82
I would like to create a script which returns only 3 rows, which would include the largest Number for each string. So in this case the table returned would be the following:
Location String Number
3 a 78
1 b 361
2 c 28
The actual table that I want to tackle is much greater, though I wrote this like that for simplicity. Any ideas on how this task can be tackled? Thank you in advance for your time!
You could use splitapply, with an id for each row.
Please see the comments for details...
% Assign unique ID to each row
tbl.id = (1:size(tbl,1))';
% Get groups of the different strings
g = findgroups(tbl.String);
% create function which gets id of max within each group
% f must take arguments corresponding to each splitapply table column
f = #(num,id) id(find(num == max(num), 1));
% Use splitapply to apply the function f to all different groups
idx = splitapply( f, tbl(:,{'Number','id'}), g );
% Collect rows
outTbl = tbl(idx, {'Location', 'String', 'Number'});
>> outTbl =
Location String Number
3 'a' 78
1 'b' 361
2 'c' 28
Or just a simple loop. This loop is only over the unique values of String so should be pretty quick.
u = unique(tbl.String);
c = cell(numel(u), size(tbl,2));
for ii = 1:numel(u)
temp = tbl(strcmp(tbl.String, u{ii}),:);
[~, idx] = max(temp.Number);
c(ii,:) = table2cell(temp(idx,:));
end
outTbl = cell2table(c, 'VariableNames', tbl.Properties.VariableNames);
Finding max values of each string my idea is.
Create a vector of all your strings and include them only one time. Something like:
strs=['a','b','c'];
Then create a vector that will store maximum value of each string:
n=length(strs);
max_values=zeros(1,n);
Now create a loop with the size of the whole data to compare current max_value with the current value and substitute if current_value>max_value:
for i=1:your_table_size
m=find(strs==current_table_string); % This finds the index of max_values
if max_values(m)<current_table_Number % This the the i_th row table_number
max_values(m)=current_table_Number;
end
end

Pad cell array with whitespace and rearrange

I have a 2D cell-array (A = 2x3) containing numerical vectors of unequal length, in this form:
1x3 1x4 1x2
1x7 1x8 1x3
*Size of A (in both dimensions) can be variable
I want to pad each vector with whitespace {' '} to equalise their lengths to lens = max(max(cellfun('length',A)));- in this case, all vectors will become 1x8 in size - and then subsequently rearrange the cell array into this form so that it can be converted to a columnar table using cell2table (using sample data):
4 1 2 1 3 4
8 5 8 4 7 9
10 12 11 5 [] 11
[] 13 21 7 [] []
[] 15 [] 11 [] []
[] 18 [] 23 [] []
[] 21 [] 29 [] []
[] [] [] 32 [] []
[ ] = Whitespace
i.e. columns are in the order A{1,1}, A{2,1}, A{1,2}, A{2,2}, A{1,3} and A{2,3}.
If A = 4x3, the first five columns after the rearrangement would be A{1,1}, A{2,1}, A{3,1}, A{4,1} and A{1,2}.
My version of Matlab (R2013a) does not have cell2table so like Stewie Griffin I'm not sure which exact format you need for the conversion.
I am also not sure if padding vectors of double with whitespace is such a good idea. strings and double are not convenient to be mixed. Specially if in your case you just want cell array columns of homogeneous type (as opposed to column where each element would be a cell). It means you have to:
convert your numbers to string first (e.g. char array).
since the column will be a char array, they need to be homogeneous in dimension, so you have to find the longest string and make them all the same length.
Finally, you can then pad you char array column with the necessary number of whitespace
One way to do that require multiple cellfun calls to probe for all these information we need before we can actually do the padding/reshaping:
%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( #numel , A , 'uni',0)))) ;
%// get the maximum order of magnitude
n = max(max(cell2mat(cellfun( #(x) max(ceil(log10(x))) , A , 'uni',0))))
%// prepare string format based on "n"
fmt = sprintf('%%0%dd',n) ;
%// pad columns with necessary number of whitespace
b = cellfun( #(c) [num2str(c(:),fmt) ; repmat(' ', Lmax-numel(c),n)], A ,'uni',0 ) ;
%// reshape to get final desired result
b = b(:).'
b =
[8x2 char] [8x2 char] [8x2 char] [8x2 char] [8x2 char] [8x2 char]
Note that a call to str2num on that would yield your original cell array (almost, less a reshape operation), as str2num will ignore (return empty) the whitespace entries.
>> bf = cellfun( #str2num , b,'un',0 )
bf =
[3x1 double] [7x1 double] [4x1 double] [8x1 double] [2x1 double] [3x1 double]
If I was dealing with numbers, I would definitely prefer padding with a numeric type (also makes the operation slightly easier). Here's an example padding with 'NaN's:
%// get the length of the longest vector
Lmax = max(max(cell2mat(cellfun( #numel , A , 'un',0)))) ;
%// pad columns with necessary number of NaN
b = cellfun( #(c) [c(:) ; NaN(Lmax-numel(c),1)], A ,'un',0 ) ;
%// reshape to get final desired result
b = b(:).'
b =
[8x1 double] [8x1 double] [8x1 double] [8x1 double] [8x1 double] [8x1 double]
If you do not like operating with NaNs, you could choose a numeric value which is not among the possible values of your dataset. For example if all your values are supposed to be positive integers, -1 is a good indicator of a special value.
%// choose your NULL value indicator
nullNumber = -1 ;
b = cellfun( #(c) [c.' ; zeros(Lmax-numel(c),1)+nullNumber], A ,'un',0 ) ;
b = b(:).'
cell2mat(b)
ans =
4 1 2 1 3 4
8 5 8 4 7 9
10 12 11 5 -1 11
-1 13 21 7 -1 -1
-1 15 -1 11 -1 -1
-1 18 -1 23 -1 -1
-1 21 -1 29 -1 -1
-1 -1 -1 32 -1 -1
Note:
If -1 is a possible value for your set, and you still don't want to use NaN, a value widely used in my industry (which is totally allergic to NaN) as a null indicator for all real numbers is -999.25. Unless you have a very specific application, the probability of getting exactly this value during normal operation is so infinitesimal that it is ok for most software algorithms to recognize a null value when they come across -999.25. (sometimes they use only -999 if they deal with integers only.)
Also note the use of c(:) in the cellfun calls. This makes sure that the vector (in each cell) will be arranged as a column (regardless of it's original shape (because your initial vectors are actually in line as you have them in your example).
Unfortunately, I don't have time to test this, but I believe this should work if you want to do this fast and simple, without having to write explicit loops.
b = cellfun(#(c) [c, repmat(' ', 1, 197-numel(c))], a,'UniformOutput',0)
Edit:
I don't have MATLAB here, and I have never used table before, so I don't know exactly how it works. But, I assume the easiest way to do this is to use the line above, but instead of trying to pad with spaces, pad it with NaNs. After that, when you have made your table with NaNs, you can do something like:
So:
B = A(:); % Straighten it out
C = cellfun(#(c) [c, repmat(NaN, 1, 8-numel(c))], B,'UniformOutput',0) % 1x8 vectors
%% Create table %%
tab(tab == NaN) = ' ';
Sorry if this didn't help. It's all I can do at the moment.
Padding a vector with a white space:
YourString = 'text here';
YourString = [YourString ' '];
in case only 1 whitespace is required. If more are needed you can loop this code to get the wanted number of spaces attached.
table itself already has the functionality to print cells.
Thanks to #StewieGriffin:
[YourString, repmat(' ',1,197-numel(YourString)]

Dimension mismatch of matrix

I am not able to save the value of BB in Bv.
MATLAB returns this error:
Subscripted assignment dimension mismatch.
Please help me to do it.
X=[1 6 9 5; 6 36 54 30; 9 54 81 40; 5 30 40 25]
[N1,dim1]=size(X) ;
for i=1:N1
bb=X(i:end,1)*X(i,i:end);
BB=bb(triu(true(size(bb))))
Bv(i,:)=BB(:);
end
As #Rashid suggests, use cell arrays instead of numeric arrays. The beauty of cell arrays is that it can store matrices of different type and size in 1 storage unit. It is much like a structure, but with indices to easily call entries.
X=[1 6 9 5; 6 36 54 30; 9 54 81 40; 5 30 40 25];
for ii=1:size(x,1)
bb=X(ii:end,1)*X(ii,ii:end);
BB=bb(triu(true(size(bb))))
Bv{ii,:}=BB(:);
end
Note that I also changed your loop index to use ii as opposed to i, see here. i is the imaginary unit and to prevent errors it's better to not overwrite build-in functions.
Just an example of how a cell array stores different data types and sizes:
A = magic(2); % 2x2 double
B = uint8(magic(3)); % 3x3 uint8
C = 'hello world'; % string
YourCell{1} = A;
YourCell{2} = B;
YourCell{3} = C;
YourCell =
[2x2 double] [3x3 uint8] 'hello world'
The same but now as a structure:
YourStruct.magic2double = A;
YourStruct.magic3uint8 = B;
YourStruct.MyString = C;
YourStruct =
magic2double: [2x2 double]
magic3uint8: [3x3 uint8]
MyString: 'hello world'
The cell and structure contain the same information, but for information in the cell you call YourCell{ii}, whilst in the structure you must call YourStruct.variablename. The cell can be accessed by indexing, the structure cannot. For the structure however you can use easy names to remember what you stored in each element, whilst that's impossible for the cell.