String comparison on cell array of strings with Matlab Coder

String comparison on cell array of strings with Matlab Coder - matlab

I am trying to use the MATLAB Coder toolbox to convert the following code into C:
function [idx] = list_iterator(compare, list)
idx = nan(length(list));
for j = 1:length(list)
idx(j) = strcmp(compare, list{j});
end
list is an N x 1 cell array of strings and compare is a string. The code basically compares each element of list to compare and returns 1 if the two are the same and 0 otherwise. (I'm doing this to speed up execution because N can be quite large - around 10 to 20 million elements.)
When I run codegen list_iterator in the Command Window, I get the following error:
Type of input argument 'compare' for function 'list_iterator'
not specified. Use -args or preconditioning statements to specify
input types.
More information
Error in ==> list_iterator Line: 1
Column: 18
Code generation failed: View Error Report
Error using codegen
I know I'm supposed to specify the types of the inputs when using codegen, but I'm not sure how to do this for a cell array of strings, the elements of which can be of different length. The string compare can also have different lengths depending on the function call.

You can use the function coder.typeof to specify variable-size inputs to codegen. From what I've understood of your example, something like:
>> compare = coder.typeof('a',[1,Inf])
compare =
coder.PrimitiveType
1×:inf char
>> list = coder.typeof({compare}, [Inf,1])
list =
coder.CellType
:inf×1 homogeneous cell
base: 1×:inf char
>> codegen list_iterator.m -args {compare, list}
seems appropriate.
If you check out the MATLAB Coder App, that provides a graphical means of specifying these complicated inputs. From there you can export this to a build script to see the corresponding command line APIs:
https://www.mathworks.com/help/coder/ug/generate-a-matlab-script-to-build-a-project.html?searchHighlight=build%20script&s_tid=doc_srchtitle
Note that when I tried this example with codegen, the resulting MEX was not faster than MATLAB. One reason this can happen is because the body of the function is fairly simple but a large amount of data is transferred from MATLAB to the generated code and back. As a result, this data transfer overhead can dominate the execution time. Moving more of your code to generated MEX may improve this.
Thinking about the performance unrelated to codegen, should you use idx = false(length(list),1); rather than idx = nan(length(list));? The former is a Nx1 logical vector while the latter is an NxN double matrix where we only write the fist column in list_iterator.
With your original code and the inputs compare = 'abcd'; list = repmat({'abcd';'a';'b'},1000,1); this gives the time:
>> timeit(#()list_iterator(compareIn, listIn))
ans =
0.0257
Modifying your code to return a vector scales that down:
function [idx] = list_iterator(compare, list)
idx = false(length(list),1);
for j = 1:length(list)
idx(j) = strcmp(compare, list{j});
end
>> timeit(#()list_iterator(compareIn, listIn))
ans =
0.0014
You can also call strcmp with a cell and char array which makes the code faster still:
function [idx] = list_iterator(compare, list)
idx = strcmp(compare, list);
>> timeit(#()list_iterator(compareIn, listIn))
ans =
2.1695e-05

Related

Write a function file that returns the sum of the positive components and the sum of the negative components of the input vector

Question: Write a function file that returns the sum of the positive components and the sum of the negative components of the input vector.
This problem has to be done in MATLAB but I am completely new in MATLAB? Can anyone give idea how to do this?
Attempts:
V= input(Enter a vector)
function [Ps, Ns] = mysmallfunction(V)
Ps== sum(V(V>0));
Ns= sum(V(V<0));
end
I don't know whether it will work or not.

You pretty much had it. Below is a script that'll guide you through passing the arguments and calling the function. A small issue was the double == for the assignment of Ps within the function (simply use one = unless it's for a conditional statement). To call/test the function simply use the line [Ps, Ns] = mysmallfunction(V); above your function definition (alternatively can put function definitions in separate scripts).
V = input("Enter a vector: ");
%Function call%
[Ps, Ns] = mysmallfunction(V);
%Printing the results to the command window%
fprintf("The positive numbers sum up to %f\n",Ps);
fprintf("The negative numbers sum up to %f\n",Ns);
%*******************************************************************************************************%
%FUNCTION DEFINITION (this can alternatively go in a separate script named: mysmallfunction.m)%
%*******************************************************************************************************%
function [Ps, Ns] = mysmallfunction(V)
Ps = sum(V(V>0));
Ns = sum(V(V<0));
end
Command Window (sample input)
Important to include the square brackets, [] in this case when inputting the vector.
Enter a vector: [1 2 3 -2 -5]

Make vector of elements less than each element of another vector

I have a vector, v, of N positive integers whose values I do not know ahead of time. I would like to construct another vector, a, where the values in this new vector are determined by the values in v according to the following rules:
- The elements in a are all integers up to and including the value of each element in v
- 0 entries are included only once, but positive integers appear twice in a row
For example, if v is [1,0,2] then a should be: [0,1,1,0,0,1,1,2,2].
Is there a way to do this without just doing a for-loop with lots of if statements?
I've written the code in loop format but would like a vectorized function to handle it.

The classical version of your problem is to create a vector a with the concatenation of 1:n(i) where n(i) is the ith entry in a vector b, e.g.
b = [1,4,2];
gives a vector a
a = [1,1,2,3,4,1,2];
This problem is solved using cumsum on a vector ones(1,sum(b)) but resetting the sum at the points 1+cumsum(b(1:end-1)) corresponding to where the next sequence starts.
To solve your specific problem, we can do something similar. As you need two entries per step, we use a vector 0.5 * ones(1,sum(b*2+1)) together with floor. As you in addition only want the entry 0 to occur once, we will just have to start each sequence at 0.5 instead of at 0 (which would yield floor([0,0.5,...]) = [0,0,...]).
So in total we have something like
% construct the list of 0.5s
a = 0.5*ones(1,sum(b*2+1))
% Reset the sum where a new sequence should start
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1)
% Cumulate it and find the floor
a = floor(cumsum(a))
Note that all operations here are vectorised!
Benchmark:
You can do a benchmark using the following code
function SO()
b =randi([0,100],[1,1000]);
t1 = timeit(#() Nicky(b));
t2 = timeit(#() Recursive(b));
t3 = timeit(#() oneliner(b));
if all(Nicky(b) == Recursive(b)) && all(Recursive(b) == oneliner(b))
disp("All methods give the same result")
else
disp("Something wrong!")
end
disp("Vectorised time: "+t1+"s")
disp("Recursive time: "+t2+"s")
disp("One-Liner time: "+t3+"s")
end
function [a] = Nicky(b)
a = 0.5*ones(1,sum(b*2+1));
a(cumsum(b(1:end-1)*2+1)+1) =a(cumsum(b(1:end-1)*2+1)+1)*2 -(b(1:end-1)+1);
a = floor(cumsum(a));
end
function out=Recursive(arr)
out=myfun(arr);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
function b = oneliner(a)
b = cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),a,'UniformOutput',false));
end
Which gives me
All methods give the same result
Vectorised time: 0.00083574s
Recursive time: 0.0074404s
One-Liner time: 0.0099933s
So the vectorised one is indeed the fastest, by a factor approximately 10.

This can be done with a one-liner using eval:
a = eval(['[' sprintf('sort([0 1:%i 1:%i]) ',[v(:) v(:)]') ']']);
Here is another solution that does not use eval. Not sure what is intended by "vectorized function" but the following code is compact and can be easily made into a function:
a = [];
for i = 1:numel(v)
a = [a sort([0 1:v(i) 1:v(i)])];
end

Is there a way to do this without just doing a for loop with lots of if statements?
Sure. How about recursion? Of course, there is no guarantee that Matlab has tail call optimization.
For example, in a file named filename.m
function out=filename(arr)
out=myfun(in);
function local_out=myfun(arr)
if isscalar(arr)
if arr
local_out=sort([0,1:arr,1:arr]); % this is faster
else
local_out=0;
end
else
local_out=[myfun(arr(1:end-1)),myfun(arr(end))];
end
end
end
in cmd, type
input=[1,0,2];
filename(input);
You can take off the parent function. I added it just hoping Matlab can spot the recursion within filename.m and optimize for it.
would like a vectorized function to handle it.
Sure. Although I don't see the point of vectorizing in such a unique puzzle that is not generalizable to other applications. I also don't foresee a performance boost.
For example, assuming input is 1-by-N. In cmd, type
input=[1,0,2];
cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),input,'UniformOutput',false)
Benchmark
In R2018a
>> clear all
>> in=randi([0,100],[1,100]); N=10000;
>> T=zeros(N,1);tic; for i=1:N; filename(in) ;T(i)=toc;end; mean(T),
ans =
1.5647
>> T=zeros(N,1);tic; for i=1:N; cell2mat(arrayfun(#(x)sort([0,1:x,1:x]),in,'UniformOutput',false)); T(i)=toc;end; mean(T),
ans =
3.8699
Ofc, I tested with a few more different inputs. The 'vectorized' method is always about twice as long.
Conclusion: Recursion is faster.

Why does MATLAB throw a "too many output arguments" error when I overload subsref (subscripted reference)?

As a toy example, I have a class that simply wraps a vector or matrix in an object and includes a timestamp of when it was created. I'm trying to overload subsref so that
() referencing works exactly as it does with the standard vector and matrix types
{} referencing works in exactly the same way as () referencing (nothing to do with cells in other words)
. referencing allows me to access the private properties of the object and other fields that aren't technically properties.
Code:
classdef TimeStampValue
properties (Access = private)
time;
values;
end
methods
%% Constructor
function x = TimeStampValue(values)
x.time = now();
x.values = values;
end
%% Subscripted reference
function x = subsref(B, S)
switch S.type
case '()'
v = builtin('subsref', B.values, S);
x = TimeStampValue(v);
case '{}'
S.type = '()';
v = builtin('subsref', B.values, S);
x = TimeStampValue(v);
case '.'
switch S.subs
case 'time'
x = B.time;
case 'values'
x = B.values;
case 'datestr'
x = datestr(B.time);
end
end
end
function disp(x)
fprintf('\t%d\n', x.time)
disp(x.values)
end
end
end
However brace {} referencing doesn't work. I run this code
clear all
x = TimeStampValue(magic(3));
x{1:2}
and I get this error:
Error using TimeStampValue/subsref
Too many output arguments.
Error in main (line 3)
x{1:2}
MException.last gives me this info:
identifier: 'MATLAB:maxlhs'
message: 'Too many output arguments.'
cause: {0x1 cell}
stack: [1x1 struct]
which isn't helpful because the only thing in the exception stack is the file containing three lines of code that I ran above.
I placed a breakpoint on the first line of the switch statement in subsref but MATLAB never reaches it.
Whats the deal here? Both () and . referencing work as you would expect, so why doesn't {} referencing work?

When overloading the curly braces {} to return a different number of output arguments than usual, it is also necessary to overload numel to return the intended number (1, in this case). UPDATE: As of R2015b, the new function numArgumentsFromSubscript was created to be overloaded instead of numel. The issue remains the same, but this function should be overloaded instead of numel as I describe in the original answer below. See also the page "Modify nargout and nargin for Indexing Methods". Excerpt:
When a class overloads numArgumentsFromSubscript, MATLAB calls this method instead of numel to compute the number of arguments expected for subsref nargout and subsasgn nargin.
If classes do not overload numArgumentsFromSubscript, MATLAB calls numel to compute the values of nargout or nargin.
More explanation of the underlying issue (need to specify number of output arguments) follows.
Original answer (use numArgumentsFromSubscript instead of numel for R2015b+)
To handle the possibility of a comma separated list of output arguments when indexing with curly braces, MATLAB calls numel to determine the number of output arguments from the size of the input indexes (according to this MathWorks answer). If the number of output arguments in the definition of overloaded subsref is inconsistent with (i.e. less than) the number provided by numel, you get the "Too many output arguments" error. As stated by MathWorks:
Therefore, to allow curly brace indexing into your object while returning a number of arguments INCONSISTENT with the size of the input, you will need to overload the NUMEL function inside your class directory.
Since x{1:2} normally provides two outputs (X{1},X{2}), the definition function x = subsref(B, S) is incompatible for this input. The solution is to include in the class a simple numel method to overload the builtin function, as follows:
function n = numel(varargin)
n = 1;
end
Now the {} indexing works as intended, mimicking ():
>> clear all % needed to reset the class definition
>> x = TimeStampValue(magic(3));
>> x(1:2)
ans =
7.355996e+05
8 3
>> x{1:2}
ans =
7.355996e+05
8 3
However, overloading curly braces in this manner is apparently a "specific type of code that we [MathWorks] did not expect customers to be writing". MathWorks recommends:
If you are designing your class to output only one argument, it is not recommended that you use curly brace indexing that requires you to overload NUMEL. Instead, it is recommended you use smooth brace () indexing.
UPDATE: Interestingly, the R2015b release notes state:
Before MATLAB release R2015b, MATLAB incorrectly computed the number of arguments expected for outputs from subsref and inputs to subsasgn for some indexing expressions that return or assign to a comma-separated list.
With release R2015b, MATLAB correctly computes the values of nargout and nargin according to the number of arguments required by the indexing expression.
So perhaps this is now fixed?
An alternative solution that comes to mind is to change function x = subsref(B, S) to function varargout = subsref(B, S) and adding varargout=cell(1,numel(B)); varargout{1} = x;. As Amro noted in comments, pre-allocating the cell is necessary to avoid an error about an unassigned argument.

I just ran into the same problem. What's even worse, is that the number of output arguments is enforced to be equal to what numel() returns not only for the curly braces {}, but also for the dot . operation.
This means that if numel() is overridden to return the usual prod(size(obj)), it becomes impossible to access any properties of the underlying object (such as x.time in the above example), as subsref() is then expected to return multiple outputs.
But if numel() just returns 1 instead, it does not match prod(size(obj)), which is what most code working with numeric values or based on reshape() expects. In fact, the MATLAB editor's balloon help immediately suggests that 'NUMEL(x) is usually faster than PROD(SIZE(x))', which suggest that they are equivalent, but apparently are not.
A possible solution would be to make numel() return prod(size(obj)) and write explicit getter/setter functions for all these properties, e.g.,
x.get_time()
in the example above. This seems to work, because method calls apparently get resolved before subsref() gets called. But then if one of the properties is a matrix it cannot be directly indexed any more because Matlab doesn't understand chained indexing, i.e., instead of writing
x.matrix(1,:)
one would have to write
m = x.get_matrix();
m(1,:)
which is ugly to say the least.
This is starting to get a bit frustrating. I still hope I've just overlooked something obvious, I can't believe that this is how it's supposed to work.

This solution seems to work in 2014b (but not entirely certain why)
classdef TestClass < handle
methods
function n = numel(~,varargin)
n = 1;
end
function varargout = subsref(input,S)
varargout = builtin('subsref',input,S);
end
function out = twoOutputs(~)
out = {}; out{1} = 2; out{2} = 3;
end
end
end
Then via the command window
>> testClass = TestClass();
>> [a,b] = testClass.twoOutouts()
a =
2
b =
3

I am working on a class to handle polynomials and polynomial matrices. I was having the same dificulty because I want different behaviors for the '.' indexing in the cases of scalar polynomials and polynomial matrices.
In my case I want P.coef to return a vector of coefficients if P is a scalar polynomial. If P is a polynomial matrix, P.coef must return a cell array of the same size of P, in which the cell {i,j} contains the coefficient vector of the polynomial P(i,j).
The problem appeared when P.coef was used with a matrix. My desired behavior returns only one object as an answer, but Matlab is expecting the function to return numel(P) objects.
I found a very simple solution. When declaring subsref, I used one mandatory output and a varargout:
function [R,varargout] = subsref(P,S)
The body of the function defines R as needed, according to my design. And at the very end of the function I added:
varargout(1:nargout-1) = cell(1,nargout-1);
To just return empty matrices as the extra outputs that Matlab wants.
This should create no problem if the function is always called with a single output argument, e.g., as in R = P.coef. If the function is called without assigning, the user will see numel(P)-1 empty matrices, which is really not a big deal. Anyway, the user is warned about this in the function help.

Bitwise commands such as Bitor with many inputs?

Matlab takes only two inputs with bitwise commands such as bitor. bitor(1,2) returns 3 and bitor(1,2,4) does not return 7 but:
ASSUMEDTYPE must be an integer type name.
Currently, I create for-loops to basically create a bitwise command to take as many inputs as needed. I feel the for-loops for this kind of thing is reinvention of the wheel.
Is there some easy way of creating the bitwise operations with many inputs?
Currently an example with some random numbers, must return 127
indices=[1,23,45,7,89,100];
indicesOR=0;
for term=1:length(indices)
indicesOR=bitor(indicesOR,indices(term));
end

If you don't mind getting strings involved (may be slow):
indicesOR = bin2dec(char(double(any(dec2bin(indices)-'0'))+'0'));
This uses dec2bin to convert to strings of '0' and '1'; converts to numbers 0 and 1 by subtracting '0'; applys an "or" operation column-wise using any; and then converts back.
For AND (instead of OR): replace any by all:
indicesAND = bin2dec(char(double(all(dec2bin(indices)-'0'))+'0'));
For XOR: use rem(sum(...),2):
indicesXOR = bin2dec(char(double(rem(sum(dec2bin(indices)-'0'),2))+'0'))
EDIT: I just found out about functions de2bi and bi2de (Communications Toolbox), which avoid using strings. However, they seem to be slower!
indicesOR = bi2de(double(any(de2bi(indices))));
indicesAND = bi2de(double(all(de2bi(indices))));
indicesXOR = bi2de(double(rem(sum((de2bi(indices))));
Another approach is to define a recursive function, exploiting the fact that AND, OR, XOR operations are (bit-wise) associative, that is, x OR y OR z equals (x OR y) OR z. The operation to be applied is passed as a function handle.
function r = bafun(v, f)
%BAFUN Binary Associative Function
% r = BAFUN(v,f)
% v is a vector with at least two elements
% f is a handle to a function that operates on two numbers
if numel(v) > 2
r = bafun([f(v(1), v(2)) v(3:end)], f);
else
r = f(v(1), v(2));
end
Example:
>> bafun([1,23,45,7,89,100], #bitor)
ans =
127

Merging function handles in MATLAB

I'm currently coding a simulation in MATLAB and need some help in regards to an issue that I've been having.
I'm working on a problem where I have n separate anonymous function handles fi stored in cell array, each of which accepts a 1×1 numeric array xi and returns a 1×1 numeric array yi.
I'm trying to combine each of these anonymous function handles into a single anonymous function handle that accepts a single n×1 numeric array X and returns a single n×1 numeric array Y, where the i-th elements of X and Y are xi and yi = fi (xi), respectively.
As an example, let n = 2 and f_1, f_2 be two function handles that input and output 1×1 arrays and are stored in a cell array named functions:
f_1 = #(x_1) x_1^2
f_2 = #(x_2) x_2^3
functions = {f_1, f_2}
In this example, I basically need to be able to use f_1 and f_2 to construct a function handle F that inputs and outputs a 2×1 numeric array, like so:
F = #(x) [f_1(x(1,1)); f_2(x(2,1))]
The question is how to generalize this for an arbitrary number n of such functions.

It is difficult to define such a function using the inline #()-anonymous
syntax (because of the requirement for the function’s body to be
an expression).
However, it is possible to define a regular function that runs over
the items of a given vector and applies the functions from a given
cell array to those items:
function y = apply_funcs(f, x)
assert(length(f) == length(x));
y = x;
for i = 1 : length(f)
y(i) = feval(f{i}, x(i));
end
end
If it is necessary to pass this function as an argument to another
one, just reference it by taking its #-handle:
F = #apply_funcs

This can be solved using a solution I provided to a similar previous question, although there will be some differences regarding how you format the input arguments. You can achieve what you want using the functions CELLFUN and FEVAL to evaluate your anonymous functions in one line, and the function NUM2CELL to convert your input vector to a cell array to be used by CELLFUN:
f_1 = #(x_1) x_1^2; %# First anonymous function
f_2 = #(x_2) x_2^3; %# Second anonymous function
fcnArray = {f_1; f_2}; %# Cell array of function handles
F = #(x) cellfun(#feval,fcnArray(:),num2cell(x(:)));
Note that I used the name fcnArray for the cell array of function handles, since the name functions is already used for the built-in function FUNCTIONS. The colon operator (:) is used to turn fcnArray and the input argument x into column vectors if they aren't already. This ensures that the output is a column vector.
And here are a few test cases:
>> F([2;2])
ans =
4
8
>> F([1;3])
ans =
1
27

#you can try
f=#(x)[x(1)^2;x(2)^3]
>>f([1,2])
ans =
1
8
>>f([2,3])
ans =
4
27

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

String comparison on cell array of strings with Matlab Coder - matlab

Related

Write a function file that returns the sum of the positive components and the sum of the negative components of the input vector

Make vector of elements less than each element of another vector

Why does MATLAB throw a "too many output arguments" error when I overload subsref (subscripted reference)?

Bitwise commands such as Bitor with many inputs?

Merging function handles in MATLAB

Categories

Resources