Regarding storage issue for mixed-type value matrix - matlab

There has a loop in my program, and during each iteration an ID will be generated. I want to store these IDs into a two dimensional array, i.e., A. The first column of A stores the iteration number, i.e., A(1,1) = 1 and A(2,1) = 2. The second column of A stores the ID generated during each iteration, i.e., A(1,2) stores the ID generated during the first iteration. The tricky part is that these IDs can be either a numerical value or a string. For instance, A(1,2) = 12345; A(2,2) = abcde
Which kind of data structure should I use to store this mixed-value matrix?

You have two good options, a cell array or an array of structures.
To use a cell array you need to use braces:
A{1,1} = 1;
A{2,1} = 2;
A{1,2} = 12345;
A{2,2} = 'abcd';
You cannot use most vectorized code with cell arrays, although you can convert numeric subsets to numeric arrays, for example:
col1 = cell2mat(A(:,1));
To use an array of structures, you need to define fields. This has the advantage that you can name your columns of data.
A(1).iteration = 1;
A(2).iteration = 2;
A(1).result = 12345;
A(2).result = 'abcd';
To access a single row of data, use A(1), like this
>> A(1)
ans =
iteration: 1
result: 12345
To access a column of data, use brackets or braces
>> [A.iteration] %This results a numeric array, or an error if not possible
ans =
1 2
>> {A.result} %This returns a cell array, as discussed above.
ans =
[12345] 'abcd'
Which option you use depends on the nature of your task and what method is more suitable to your style. I usually start with a cell array, and eventually convert to an array of structs to take advantage of the named fields.

Related

Extracting field from cell array of structure in matlab

I have a cell array (let's say size 10) where each cell is a structure with the same fields. Let's say they all have a field name x.
Is there a way to retreive in a vector the value of the field x for all the structure in the cell array? I would expect the function to return a vector of size 10 with in position 1, the value of the field x of the structure in cell 1 etc etc...
EDIT 1:
The structure in the cell array have 1 field which is the same for all but some others which are different.
First convert your cell array of structures, c, (with identical field names in the same order) to a structure array:
c = cell2mat(c)
Then, depending on the data types and sizes of the elements of the field, you may be able to use
[c.x]
to extract your vector of field x values in the "standard" way.
It is also possible that you can skip the conversion step and use cellfun(#(e)e.x, c) to do the extraction in one go.
The below code creates a cell array of structures, and extracts field 'x' of each structure to a vector v.
%create a cell array of structures
s1.a = 'hello';
s1.x = 1;
s2.a = 'world';
s2.x = 2;
c{1} = s1;
c{2} = s2;
v = zeros(1,2);
%extract to vector
for idx=1:size(c,2)
v(1,idx) = c{idx}.x;
end
Let's say you have
c = {s1, s2, s3, ...., sn};
where common field is 'field_1', then you have two options
Use cell2mat.
cc = cell2mat(c); % which converts your cell array of structs into an array of structs
value = [cc.field_1]; % if values are number
or
value = {cc.field_1}; % if values are characters, for example
Another option is to use cellfun.
If the field values are characters, you should set "UniformOutput" to "false"
value = cellfun(#(x) x.field_1, c, 'UniformOutput', false)
The first option is better. Also, try to avoid using cell/cellfun/arrayfun whenever you can, vectors are way faster and even a plain for loop is more effecient

Do I always need to use a cell array to assign multiple values to a struct array?

I've got a nested struct array like
A(1).B(1).var1 = 1;
A(1).B(2).var1 = 2;
Now I want to change the values of var1 to using the elements of the vector x = [3; 4] for each of the respective values.
The result should be
A(1).B(1).var1 = 3;
A(1).B(2).var1 = 4;
I have tried
% Error : Scalar structure required for this assignment.
A(1).B.var1 = x;
% Error : Insufficient number of outputs from right hand side of equal sign to satisfy assignment.
[A(1).B.var1] = x(:);
Curiously, if x is a cell array, the second syntax works
x = {3, 4};
[A(1).B.var1] = x{:};
Luckily, it's not too complicated to convert my numeric vector to a cell array using mat2cell, but is that the only way to do this assignment without a for loop?
What's the correct syntax for multiple assignment to a nested struct array? Can I use numeric vectors or do I have to use cell arrays?
The statement
[A(1).B.var1] = x{:};
is shorthand for
[A(1).B.var1] = deal(x{:});
(see the documentation for deal).
Thus you can also write
[A(1).B.var1] = deal(3,4);
I'm not aware of any other way to assign different values to a field in a struct array in a single command.
If your values are in a numeric array, you can easily convert it to a cell array using num2cell (which is simpler than the mat2cell you found).
data = [3,4];
tmp = num2cell(data);
[A(1).B.var1] = tmp{:};
In general, struct arrays are rather awkward to use for cases like this. If you can, I would recommend that you store your data in normal numeric arrays, which make it easier to manipulate many elements at the same time. If you insist on using a struct array (which is convenient for certain situations), simply use a for loop:
data = [3,4];
for ii = 1:length(A(1).B)
A(1).B(ii).var1 = data(ii);
end
The other alternative is to use table.

cellfun with two arrays of indices

I have one big cell with N by 1 dimension. Each row is either a string or a double. A string is a variable name and the sequential doubles are its values until the next string (another variable name). For example:
data = {
var_name1;
val1;
val2;
val3;
val4;
val5;
var_name2;
val1;
val2;
var_name3;
val1;
val2;
val3;
val4;
val5;
val6;
val7}
and so on. I want to separate the data cell into three cells; {var_name and it's 5 values}, {var_name and it's 2 values}, {var_name and it's 7 values}. I try not to loop as much as possible and have found that vectorization along with cellfun works really well. Is it possible? The data cell has close to million rows.
I believe the following should do what you're after. The main pieces are to use cumsum to work out which name each row corresponds to, and then accumarray to build up lists per name.
% Make some data
data = {'a'; 1; 2; 3;
'b'; 4; 5;
'c'; 6; 7; 8; 9;
'd';
'e'; 10; 11; 12};
% Which elements are the names?
isName = cellfun(#ischar, data);
% Use CUMSUM to work out for each row, which name it corresponds to
whichName = cumsum(isName);
% Pick out only the values from 'data', and filter 'whichName'
% for just the values
justVals = data(~isName);
whichName = whichName(~isName);
% Use ACCUMARRAY to build up lists per name. Note that the function
% used by ACCUMARRAY must return something scalar from a column of
% values, so we return a scalar cell containing a row-vector
% of those values
listPerName = accumarray(whichName, cell2mat(justVals), [], #(x) {x.'});
% All that remains is to prepend the name to each cell. This ends
% up with each row of output being a cell like {'a', [1 2 3]}.
% It's simple to make the output be {'a', 1, 2, 3} by adding
% a call to NUM2CELL on 'v' in the anonymous function.
nameAndVals = cellfun(#(n, v) [{n}, v], data(isName), listPerName, ...
'UniformOutput', false);
cellfun is for applying a function to each element of a cell.
When you pass multiple arguments to cellfun like that, it takes the ith argument of data, indx_first, and indx_last, and uses each of them in the anonymous function. Substituting those variables in, your function evaluates to x(y : z), for each element x in data. In other words, you're doing data{i}(y : z), i.e., indexing the actual elements of the cell array, rather than indexing the cell array itself. I don't think that's what you want. Really you want data{y : z}, for each (y, z) pair given by corresponding elements in indx_first and indx_last, right?
If that's indeed the case, I don't see a vectorized way to solve your problem, because each of the "variables" has different size. But you do know how many variables you have, which is the size of indx_first. So I'd pre-allocate and then loop, like so:
>> vars = cell(length(indx_first), 2);
>> for i = 1:length(vars)
vars{i, 1} = data{indx_first(i) - 1}; % store variable name in first column
vars{i, 2} = [data{indx_first(i) : indx_last(i)}]; % store data in last column
end
At the end of this, you'll have a cell array with 2 columns. The first column in each row is the name of the variable. The second is the actual data. I.e.
{'var_name1', [val1 val2 val3 val4 val5];
'var_name2', [val1 val2];
.
.
.

How to convert char to number in Matlab

I am having trouble converting a character variable to a number in Matlab.
Each cell in the char variable contains one of two possible words. I need to convert word_one (for example) to represent '1', and word_two to represent '2'.
Is there a command that will let me do this?
So far I've tried:
%First I converted 'Word' from cell to char
Word = char(Word);
Word(Word == 'Word_one') = '1';
Word(Word == 'Word_two') = '2';
However, I get the:
Error using ==
Matrix dimensions must agree.
When I try to include the first letter only (ie. 'W'), it only changes the first letter in the full word (ie. 1ord_one).
Is there an easy way to do this?
Thanks for your help - any advice is much appreciated!
Use ismember:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, result] = ismember(words, possibleWords);
In this example,
result =
2 1 2
If you need more flexibility, you can specify the value corresponding to each word:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
correspondingValues = [1.1, 2.2]; %// template: value corresponding to each word
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, ind] = ismember(words, possibleWords);
result = correspondingValues(ind);
which gives
result =
2.2000 1.1000 2.2000
Looks like there are a couple of potential issues here.
Use strcmp() (string compare) in place of your current equivalence statement. Comparing strings using == compares element by element and returns a logical vector (where here you want a single logical value). String comparison, strcmp(), will compare the entire strings instead and return a single value.
It's also probably not necessary for you to convert your cell array. You can maintain the cell array structure and address each cell individually.
Try something along the lines of the following snippet.
for i = 1:length(Word)
if strcmp(Word{i},'Word_one')
Word{i} = '1';
elseif strcmp(Word{i},'Word_two')
Word{i} = '2';
end
end
There are a number of ways to solve this problem. Here's my approach.
% define your words
words = {'word_one','word_two','word_two','word_one','word_one'};
% define a function to get the indexes of the words of interest
getindex = #(c, y) cellfun(#(x) strcmp(x,y), c);
% replace 'word_one' with '1'
words(getindex(words, 'word_one'))={'1'};
% replace 'word_two' with '2'
words(getindex(words, 'word_two'))={'2'};
words =
'1' '2' '2' '1' '1'
You can use short n simple unique -
input_cellarr = {'Word_two','Word_one','Word_two','Word_two','Word_one','Word_one'}
[~,~,out] = unique(input_cellarr)
Sample run -
input_cellarr =
'Word_two' 'Word_one' 'Word_two' 'Word_two' 'Word_one' 'Word_one'
out =
2
1
2
2
1
1
Explanation: unique works here because it will produce an ascending order sorted array with numeric arrays. Now, when used on cell arrays, that ascending order translates to alphabetical order sorting. Thus, unique(input_cellarr) would always have {'Word_one' , 'Word_two'} because one is alphabetically higher up than two. Therefore the out indices would always have the first unique ID as 1 for 'Word_one' and the second ID as 2 for 'Word_two'.

Huffman encoding using cell array

I have some problems in final part of Huffman encoding.
Currently I have my coding table in cell array
code =
{
[1,1] = 000
[1,2] = 001
[1,3] = 010
[1,4] = 011
[1,5] = 100
...
}
Where second index represent ascii character in my other cell array
huffman_tree =
{
[1,1] = A
[1,2] = B
[1,3] = C
[1,4] = D
[1,5] = E
...
}
I'm using following code for encoding input to output:
output= [];
for i=1:length(input)
x = findInArray(huffman_tree, input(i));
output= [output code(x)];
end
function [index] = findInArray(array, searched)
index = -1;
for i=1:length(array)
if array{i} == searched
index = i;
end
end
end
At this point my code is O(n^2) or even worse. I'm having problem with large input where
length(input) = 1000000
There must be some faster way to transform input with my coding table to output.
Because you're using cell arrays, that's going to be inherently slow so you have no choice but to iterate over each cell. However, I can provide some suggestions to help speed things up. What you can do is use strcmp to compare strings. I'm assuming that each character in your cell array is represented as a one character string. strcmp has the ability to take an individual string and compare itself to a cell array of strings. The output will be an array that is the same size as the cell array of strings and give you a logical true if the input string matches a position in the cell array and false otherwise.
Because your Huffman dictionary will contain a unique set of characters, you will only get one possible match per character. Therefore, we can use this logical array output to index the codebook directly to retrieve the corresponding code you want. Logical indexing works by supplying a logical vector that is the same length as the vector of interest and it retrieves those values whose corresponding positions are true. Therefore, if we only had one true value in the logical vector with the rest of the elements being false, this means that we would get just the corresponding element we desire and nothing else.
Therefore, we can change your code to do this. Note that I've changed the loop counter i to idx because it has actually been shown that using i as a loop counter slows down your code by a slight amount. See this post by Shai Bagon for more details: Using i and j as variables in Matlab. Also, I've changed the length call to numel... mainly because I don't like using length.... just a personal choice though.
output = [];
for idx = 1:numel(input)
output = [output code(strcmp(input(idx), huffman_tree))];
end
Give the above a whirl and see if it performs any faster. For one thing, this will escape using an additional for loop for searching for a match as strcmp is very efficiently implemented, so the above code won't be O(n^2), but could be slightly better than quadratic.