I'm new to MATLAB and I'm struggling to comprehend the subtleties between array-wise and element wise operations. I'm working with a large dataset and I've found the simplest methods aren't always the fastest. I have a very large Cell Array of strings, like in this simplified example:
% A vertical array of same-length strings
CellArrayOfStrings = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'};
I'm trying to extract an array of substrings, for example:
'a1'
'b1'
'c1'
'd1'
I'm happy enough with an element-wise reference like this:
% Simple element-wise substring operation
MySubString = CellArrayOfStrings{2}(3:4); % Expected result is 'b1'
But I can't work out the notation to reference them all in one go, like this:
% Desired result is 'a1','b1','c1','d1'
MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!
I know that Matlab is capable of performing very fast array-wise operations, such as strcat, so I was hoping for a technique that works at a similar speed:
% An array-wise operation which works quickly
tic
speedTest = strcat(CellArrayOfStrings,'hello');
toc % About 2 seconds on my machine with >500K array elements
All the for loops and functions which use behind-the-scenes iteration I have tried run too slowly with my dataset. Is there some array-wise notation that would do this? Would somebody be able to correct my understanding of element-wise and array-wise operations?! Many thanks!
I can't work out the notation to reference them all in one go, like this:
MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!
This is because curly braces ({}) return a comma-separated list, which is equivalent to writing the contents of these cells in the following way:
c{1}, c{2}, and so on....
When the subscript index refers to only one element, MATLAB's syntax allows to use parentheses (()) after the curly braces and further extract a sub-array (a substring in your case). However, this syntax is prohibited when the comma separated lists contains multiple items.
So what are the alternatives?
Use a for loop:
MyArrayOfSubStrings = char(zeros(numel(CellArrayOfStrings), 2));
for k = 1:size(MyArrayOfSubStrings, 1)
MyArrayOfSubStrings(k, :) = CellArrayOfStrings{k}(3:4);
end
Use cellfun (a slight variant of Dang Khoa's answer):
MyArrayOfSubStrings = cellfun(#(x){x(3:4)}, CellArrayOfStrings);
MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
If your original cell array contains strings of a fixed length, you can follow Dan's suggestion and convert the cell array into an array of strings (a matrix of characters), reshape it and extract the desired columns:
MyArrayOfSubStrings =vertcat(CellArrayOfStrings{:});
MyArrayOfSubStrings = MyArrayOfSubStrings(:, 3:4);
Employ more complicated methods, such as regular expressions:
MyArrayOfSubStrings = regexprep(CellArrayOfStrings, '^..(..).*', '$1');
MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
There are plenty solutions to pick from, just pick the one that fits you most :) I think that with MATLAB's JIT acceleration, a simple loop would be sufficient in most cases.
Also note that in all my suggestions the obtained cell array of substrings cell is converted into an array of strings (a matrix). This is just for the sake of the example; obviously you can keep the substrings stored in a cell array, should you decide so.
cellfun operates on every element of a cell array, so you could do something like this:
>> CellArrayOfStrings = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'};
>> MyArrayofSubstrings = cellfun(#(str) str(3:4), CellArrayOfStrings, 'UniformOutput', false)
MyArrayofSubstrings =
'a1'
'b1'
'c1'
'd1'
If you wanted a matrix of strings instead of a cell array whose elements are the strings, use char on MyArrayOfSubstrings. Note that this is only allowed when each string is the same length.
You can do this:
C = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'}
t = reshape([C{:}], 6, [])'
t(:, 3:4)
But only if your strings are all of equal length I'm afraid.
You can use char to convert them to a character array, do the indexing and convert it back to cell array
A = char(CellArrayOfStrings);
B = cellstr(A(:,3:4));
Note that if strings are of different lengths, char pads them with spaces at the end to create the array. Therefore if you index for a column that is beyond the length of one of the short strings you may receive some space characters.
Related
Is it possible to use cellfun with a conditional. For example, I have a 144x53 cell array, where the first four columns are of type string, the rest are floats. However, among the numbers, there are empty cells. I wonder if it is possible to use cellfun(#(x)sqrt(x), cellarray) with my array. As it is know, its not possible due to strings and empty cells. Otherwise, this is the solution that I use,
for n = 1:length(results)
for k = 1:length(results(1,:))
if ~isstr(results{n,k})
results{n, k} = sqrt(results{n,k});
end
end
end
Otherwise, is it possible to do a vectorization here?
You can create a logical array by checking if each element is numeric. And then use this to perform your cellfun operation on the subset of the cell array that contains numeric data.
C = {1, 2, 'string', 4};
% Logical array that is TRUE when the element is numeric
is_number = cellfun(#isnumeric, C);
% Perform this operation and replace only the numberic values
C(is_number) = cellfun(#sqrt, C(is_number), 'UniformOutput', 0);
% 1 1.4142 'string' 2
As pointed out by #excaza, you may also consider leaving it as a loop as it is more performant on newer versions of MATLAB (R2015b and newer).
I'd like to use fprintf to show code execution progress in the command window.
I've got a N x 1 array of structures, let's call it myStructure. Each element has the fields name and data. I'd like to print the name side by side with the number of data points, like such:
name1 number1
name2 number2
name3 number3
name4 number4
...
I can use repmat N times along with fprintf. The problem with that is that all the numbers have to come in between the names in a cell array C.
fprintf(repmat('%s\t%d',N,1),C{:})
I can use cellfun to get the names and number of datapoints.
names = {myStucture.name};
numpoints = cellfun(#numel,{myStructure.data});
However I'm not sure how to get this into a cell array with alternating elements for C to make the fprintf work.
Is there a way to do this? Is there a better way to get fprintf to behave as I desire?
You're very close. What I would do is change your cellfun call so that the output is a cell array instead of a numeric array. Use the 'UniformOutput' flag and set this to 0 or false.
When you're done, make a new cell array where both the name cell array and the size cell array are stacked on top of each other. You can then call fprintf once.
% Save the names in a cell array
A = {myStructure.name};
% Save the sizes in another cell array
B = cellfun(#numel, {myStructure.data}, 'UniformOutput', 0);
% Create a master cell array where the first row are the names
% and the second row are the sizes
out = [A; B];
% Print out the elements side-by-side
fprintf('%s\t%d\n', out{:});
The trick with the third line of code is that when you unroll the cell array using {:}, this creates a comma-separated list unrolled in column-major format, and so doing out{:} actually gives you:
A{1}, B{1}, A{2}, B{2}, ..., A{n}, B{n}
... which provides the interleaving you need. Therefore, providing this order into fprintf coincides with the format specifiers that are specified and thus gives you what you need. That's why it's important to stack the cell arrays so that each column gives the information you need.
Minor Note
Of course one should never forget that one of the easiest ways to tackle your problem is to just use a simple for loop. Even though for loops are considered bad practice, their performance has come a long way throughout MATLAB's evolution.
Simply put, just do this:
for ii = 1 : numel(myStructure)
fprintf('%s\t%d\n', myStructure(ii).name, numel(myStructure(ii).data));
end
The above code is arguably more readable in comparison to what we did above with cell arrays. You're accessing the structure directly rather than having to create intermediate variables for the purpose of calling fprintf once.
Example Run
Here's an example of this running. Using the data shown below:
clear myStructure;
myStructure(1).name = 'hello';
myStructure(1).data = rand(5,1);
myStructure(2).name = 'hi';
myStructure(2).data = zeros(3,3);
myStructure(3).name = 'huh';
myStructure(3).data = ones(6,4);
I get the following output after running the printing code:
hello 5
hi 9
huh 24
We can see that the sizes are correct as the first element in the structure is simply a random 5 element vector, the second element is a 3 x 3 = 9 zeroes matrix while the last element is a 6 x 4 = 24 ones matrix.
I constructed my vector as such:
v = ['asdf'; 'qwer'; 'zxcv'];
I just wanted to take the first 2 characters, and I wrote a simple cellfun like so:
A = cellfun(#(x) x(1:2), v, 'UniformOutput', false);
However, it says:
error: cellfun: C must be a cell array
How should I extract the first 2 characters of each string?
That's because v is not a cell array. Turn it into one:
v = {'asdf'; 'qwer'; 'zxcv'};
If you can't use cell arrays, do what Divakar suggested and turn v into one by using cellstr:
v = ['asdf', 'qwer', 'zxcv'];
v_cell = cellstr(v);
If you want to escape the temporary variable, supply the call with v directly into cellfun:
A = cellfun(#(x) x(1:2), cellstr(v), 'UniformOutput', false);
If you want to un-cell the cell array, use cell2mat:
Aout = cell2mat(A);
I question the efficiency of the above though. If you just want to extract the first two characters of your cell array then turn it back into a character array, why don't you simply index the first two columns of all of the rows in the original character array? The use of cellfun adds unnecessary overhead when simple indexing would do the trick. Indexing is much more readable in this instance than using cellfun, which adds a layer of obfuscation.
Aout = v(:,1:2);
I have a set of strings vals, for example:
vals = {'AD', 'BC'}
I also have a struct info, inside of which are structs nested in fields corresponding to the elements in the array vals (that would be 'AD' and 'BC' in this example), each in turn storing a number in a field named lastcontract.
I can use a for loop to extract lastcontract for each of the vals like this:
for index = 1:length(vals)
info.(vals{index}).lastcontract
end
I'd like to find a way of doing this without a loop if at all possible, but I'm not having luck. I tried:
info.(vals{1:2}).lastcontract
without success. I think arrayfun may be the appropriate way, but I can't figure out the right syntax.
It is actually possible here to manage without an explicit loop (nor arrayfun/cellfun):
C = struct2cell(info); %// Convert to cell array
idx = ismember(fieldnames(info), vals); %// Find fields
C = [C{idx}]; %// Flatten to structure array
result = [C.lastcontract]; %// Extract values
P.S
cellfun would be more appropriate here than arrayfun, because you iterate vals (a cell array). For the sake of practice, here's a solution with cellfun:
result = cellfun(#(x)info.(x).lastcontract, vals);
If I have an array (of unknown length until runtime), is there a way to call a function with each element of the array as a separate parameter?
Like so:
foo = #(varargin) sum(cell2mat(varargin));
bar = [3,4,5];
foo(*bar) == foo(3,4,5)
Context: I have a list of indices to an n-d array, Q. What I want is something like Q(a,b,:), but I only have [a,b]. Since I don't know n, I can't just hard-code the indexing.
There is no operator in MATLAB that will do that. However, if your indices (i.e. bar in your example) were stored in a cell array, then you could do this:
bar = {3,4,5}; %# Cell array instead of standard array
foo(bar{:}); %# Pass the contents of each cell as a separate argument
The {:} creates a comma-separated list from a cell array. That's probably the closest thing you can get to the "operator" form you have in your example, aside from overriding one of the existing operators (illustrated here and here) so that it generates a comma-separated list from a standard array, or creating your own class to store your indices and defining how the existing operators operate for it (neither option for the faint of heart!).
For your specific example of indexing an arbitrary N-D array, you could also compute a linear index from your subscripted indices using the sub2ind function (as detailed here and here), but you might end up doing more work than you would for my comma-separated list solution above. Another alternative is to compute the linear index yourself, which would sidestep converting to a cell array and use only matrix/vector operations. Here's an example:
% Precompute these somewhere:
scale = cumprod(size(Q)).'; %'
scale = [1; scale(1:end-1)];
shift = [0 ones(1, ndims(Q)-1)];
% Then compute a linear index like this:
indices = [3 4 5];
linearIndex = (indices-shift)*scale;
Q(linearIndex) % Equivalent to Q(3,4,5)