What is 'cat' param used for in TreeBagger method - matlab

I am following the tutorial and am trying to implement TreeBagger Method. I have a question since I cannot understand part of the code.
b = TreeBagger(nTrees,X,Y,'oobpred','on','cat',6,'minleaf',leaf(ii));
Can anyone tell me what 'cat' is and the number 6 please?

The constructor for TreeBagger:
% In addition to the optional arguments above, this method accepts all
% optional CLASSREGTREE arguments with the exception of 'minparent'.
% Refer to the documentation for CLASSREGTREE for more detail.
'cat' is not one of the valid input pairs for TreeBagger, so it must be an input for CLASSREGTREE. Looking at the input pairs for classregtree, the only input pair close to 'cat' is 'categorical,' which says:
% 'categorical' Vector of indices of the columns of X that are to be
% treated as unordered categorical variables
If you look at statgetargs.m, specifically this line:
i = strmatch(lower(pname),pnames);
It will allow any arguments as long as the first portion is spelled correctly. pnames will contain a cell array of valid strings (one of them will be 'categorical') while pname will contain a string to compare pnames with (eventually, this will contain 'cat'). If you enter only the first portion of the input string, it will still work. I.e. for me this works:
EDU>> a = TreeBagger(nTrees,X,Y,'oobpr','on','cat',6,'minle',leaf(ii));
EDU>> b = TreeBagger(nTrees,X,Y,'oobpred','on','cat',6,'minleaf',leaf(ii));
EDU>> isequal(a,b)
ans =
1
It doesn't work if 'cat' is changed because it stores 'cat' explicitly as it's spelled under TreeArgs. Regardless, 'cat' is being treated as 'categorical' for classregtree.

cat is being treated as an abbreviation of the categorical input parameter of classregtree, and it specifies that the sixth variable in X should be treated as categorical.

Related

Splice list of structs into arguments for function call - Matlab

I want to splice a list of arguments to pass to a function. For a vector I know that I can use num2cell and call the cell with curly braces (see this question), but in my case the list I want to splice originally has structs and I need to access one of their attributes. For example:
austen = struct('ids', ids, 'matrix', matrix);
% ... more structs defined here
authors = [austen, dickens, melville, twain];
% the function call I want to do is something like
tmp = num2cell(authors);
% myFunction defined using varargin
[a,b] = myFunction(tmp{:}.ids);
The example above does not work because Matlab expected ONE output from the curly braces and it's receiving 4, one for each author. I also tried defining my list of arguments as a cell array in the first place
indexes = {austen.ids, dickens.ids, melville.ids, twain.ids};
[a,b] = myFunction(indexes{:});
but the problem with this is that myFunction is taking the union and intersection of the vectors ids and I get the following error:
Error using vertcat
The following error occurred converting from double to struct:
Conversion to struct from double is not possible.
Error in union>unionR2012a (line 192)
c = unique([a;b],order);
Error in union (line 89)
[varargout{1:nlhs}] = unionR2012a(varargin{:});
What is the correct way for doing this? The problem is that I will have tens of authors and I don't want to pass al of them to myFunction by hand.
As #kedarps rightly pointed out I need to use struct2cell instead of num2cell. The following code does the trick
tmp = struct2cell(authors);
[a, b] = myFunction(tmp{1,:,:}); %ids is the first entry of the structs
I had never heard about struct2cell before! It doesn't even show up in the See also of help num2cell! It would be amazing to have an apropos function like Julia's....

Creating a function with variable number of inputs?

I am trying to define the following function in MATLAB:
file = #(var1,var2,var3,var4) ['var1=' num2str(var1) 'var2=' num2str(var2) 'var3=' num2str(var3) 'var4=' num2str(var4)'];
However, I want the function to expand as I add more parameters; if I wanted to add the variable vark, I want the function to be:
file = #(var1,var2,var3,var4,vark) ['var1=' num2str(var1) 'var2=' num2str(var2) 'var3=' num2str(var3) 'var4=' num2str(var4) 'vark=' num2str(vark)'];
Is there a systematic way to do this?
Use fprintf with varargin for this:
f = #(varargin) fprintf('var%i= %i\n', [(1:numel(varargin));[varargin{:}]])
f(5,6,7,88)
var1= 5
var2= 6
var3= 7
var4= 88
The format I've used is: 'var%i= %i\n'. This means it will first write var then %i says it should input an integer. Thereafter it should write = followed by a new number: %i and a newline \n.
It will choose the integer in odd positions for var%i and integers in the even positions for the actual number. Since the linear index in MATLAB goes column for column we place the vector [1 2 3 4 5 ...] on top, and the content of the variable in the second row.
By the way: If you actually want it on the format you specified in the question, skip the \n:
f = #(varargin) fprintf('var%i= %i', [(1:numel(varargin));[varargin{:}]])
f(6,12,3,15,5553)
var1= 6var2= 12var3= 3var4= 15var5= 5553
Also, you can change the second %i to floats (%f), doubles (%d) etc.
If you want to use actual variable names var1, var2, var3, ... in your input then I can only say one thing: Don't! It's a horrible idea. Use cells, structs, or anything else than numbered variable names.
Just to be crytsal clear: Don't use the output from this in MATLAB in combination with eval! eval is evil. The Mathworks actually warns you about this in the official documentation!
How about calling the function as many times as the number of parameters? I wrote this considering the specific form of the character string returned by your function where k is assumed to be the index of the 'kth' variable to be entered. Array var can be the list of your numeric parameters.
file=#(var,i)[strcat('var',num2str(i),'=') num2str(var) ];
var=[2,3,4,5];
str='';
for i=1:length(var);
str=strcat(str,file(var(i),i));
end
If you want a function to accept a flexible number of input arguments, you need varargin.
In case you want the final string to be composed of the names of your variables as in your workspace, I found no way, since you need varargin and then it looks impossible. But if you are fine with having var1, var2 in your string, you can define this function and then use it:
function str = strgen(varargin)
str = '';
for ii = 1:numel(varargin);
str = sprintf('%s var%d = %s', str, ii, num2str(varargin{ii}));
end
str = str(2:end); % to remove the initial blank space
It is also compatible with strings. Testing it:
% A = pi;
% B = 'Hello!';
strgen(A, B)
ans =
var1 = 3.1416 var2 = Hello!

Correct use of tilde operator for input arguments

Function:
My MATLAB function has one output and several input arguments, most of which are optional, i.e.:
output=MyFunction(arg1,arg2,opt1,opt2,...,optN)
What I want to do:
I'd like to give only arg1, arg2 and the last optional input argument optN to the function. I used the tilde operator as follows:
output=MyFunction(str1,str2,~,~,...,true)
Undesired result:
That gives the following error message:
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
The error points to the comma after the first tilde, but I don't know what to make of it to be honest.
Problem identification:
I use MATLAB 2013b, which supports the tilde operator.
According to MATLAB's documentation the above function call should work:
You can ignore any number of function inputs, in any position in the argument list. Separate consecutive tildes with a comma...
I guess there are a few workarounds, such as using '' or [] as inputs, but I'd really like to understand how to correctly use '~' because actually leaving inputs out allows me to use exist() when checking the input arguments of a function.
If you need any further info from me, please let me know.
Thank you very much!
The tilde is only for function declaration. Matlab's mlint recommends to replace unused arguments by ~. The result is a function declared like this function output = MyFunction(a, b, ~, c). This is a very bad practice.
Since you have a function where the parameters are optional, you must call the function with empty arguments output=MyFunction(str1,str2,[],[],...,true).
A better way to do it is to declare the function with the varargin argument and prepare your function for the different inputs:
function output = MyFunction(varargin)
if nargin == 1
% Do something for 1 input
elseif nargin == 2
% Do something for 3 inputs
elseif nargin == 3
% Do something for 3 inputs
else
error('incorrect number of input arguments')
end
It is even possible to declare your function as follows:
function output = MyFunction(arg1, arg2, varargin)
The declaration above will tell Matlab that you are expecting at least two parameters.
See the documentation of nargin here.
... and the documentation of varargin here
To have variable number of inputs, use varargin. Use it together with nargin.
Example:
function varlist2(X,Y,varargin)
fprintf('Total number of inputs = %d\n',nargin);
nVarargs = length(varargin);
fprintf('Inputs in varargin(%d):\n',nVarargs)
for k = 1:nVarargs
fprintf(' %d\n', varargin{k})
end

How to get a comma-separated list directly from table.Properties.VariableNames?

Example:
>> A = table({}, {}, {}, {}, {}, ...
'VariableNames', {'Foo', 'Bar', 'Baz', 'Frobozz', 'Quux'});
>> vn = A.Properties.VariableNames;
>> isequal(vn, A.Properties.VariableNames)
ans =
1
So far so good, but even though vn and A.Properties.VariableNames appear to be the same, they behave very differently when one attempts to get a "comma-separated list" from them (using {:}):
>> {'Frobnitz', vn{:}}
ans =
'Frobnitz' 'Foo' 'Bar' 'Baz' 'Frobozz' 'Quux'
>> {'Frobnitz', A.Properties.VariableNames{:}}
ans =
'Frobnitz' 'Foo'
Is there a way to get a "comma-separated list" from A.Properties.VariableNames directly (that is, without having to create an intermediate variable like vn)?
(Also, is there a more reliable function than isequal to test for equality of cell arrays? In the example above vn and A.Properties.VariableNames are clearly not equal enough!)
For those who don't have a version of MATLAB that supports the (rather new) table objects, it's the same story if one uses dataset objects (from the Statistics toolbox) instead. The example above would then translate to:
clear('A', 'vn');
A = dataset({}, {}, {}, {}, {}, ...
'VarNames', {'Foo', 'Bar', 'Baz', 'Frobozz', 'Quux'});
vn = A.Properties.VarNames;
isequal(vn, A.Properties.VarNames)
{'Frobnitz', vn{:}}
{'Frobnitz', A.Properties.VarNames{:}}
(Note the change from VariableNames to VarNames; output omitted: it's identical to the output shown above):
There's no problem with isequal. vn and A.Properties.VariableNames are in fact equal. The problem is something else...
If you type help dataset.subsref, you will get an explanation of why this is happening which should be the same explanation as for the table class:
LIMITATIONS:
Subscripting expressions such as A.CellVar{1:2}, A.StructVar(1:2).field,
or A.Properties.ObsNames{1:2} are valid, but result in subsref
returning multiple outputs in the form of a comma-separated list. If
you explicitly assign to output arguments on the LHS of an assignment,
for example, [cellval1,cellval2] = A.CellVar{1:2}, those variables will
receive the corresponding values. However, if there are no output
arguments, only the first output in the comma-separated list is
returned.
In short, when you invoke the line A.Properties.VarNames{:}, you are making a call to the dataset.subsref method and the curly-brace subscript {:} is being passed to it right along with the other . subscripts, as opposed to being applied separately after the call to the dataset.subsref method.
Because of this, it doesn't look like you can get a comma-separated list directly from A without using an intermediate variable. However, if your goal (as in your example) is to concatenate the strings together with another string into a new cell array, you could do this:
>> [{'Frobnitz'} A.Properties.VarNames]
ans =
'Frobnitz' 'Foo' 'Bar' 'Baz' 'Frobozz' 'Quux'
No, I don't think there is anything you can do except create the temporary variable vn. It's long been a troubling shortcoming of user-defined classes that they cannot do comma separated list expansion. I do find it strange, though, that TMW chose to implement the table class in the user-defined class framework.
As for isequal, there is no issue there. The behavior you see has nothing to do with vn and A.Properties.VariableNames not being equal.

MATLAB "bug" (or really weird behavior) with structs and empty cell arrays

I have no idea what's going on here. I'm using R2006b. Any chance someone out there with a newer version could test to see if they get the same behavior, before I file a bug report?
code: (bug1.m)
function bug1
S = struct('nothing',{},'something',{});
add_something(S, 'boing'); % does what I expect
add_something(S.something,'test'); % weird behavior
end
function add_something(X,str)
disp('X=');
disp(X);
disp('str=');
disp(str);
end
output:
>> bug1
X=
str=
boing
X=
test
str=
??? Input argument "str" is undefined.
Error in ==> bug1>add_something at 11
disp(str);
Error in ==> bug1 at 4
add_something(S.something,'test');
It looks like the emptiness/nothingness of S.something allows it to shift the arguments for a function call. This seems like Very Bad Behavior. In the short term I want to find away around it (I'm trying to make a function that adds items to an initially empty cell array that's a member of a structure).
Edit:
Corollary question: so there's no way to construct a struct literal containing any empty cell arrays?
As you already discovered yourself, this isn't a bug but a "feature". In other words, it is the normal behavior of the STRUCT function. If you pass empty cell arrays as field values to STRUCT, it assumes you want an empty structure array with the given field names.
>> s=struct('a',{},'b',{})
s =
0x0 struct array with fields:
a
b
To pass an empty cell array as an actual field value, you would do the following:
>> s = struct('a',{{}},'b',{{}})
s =
a: {}
b: {}
Incidentally, any time you want to set a field value to a cell array using STRUCT requires that you encompass it in another cell array. For example, this creates a single structure element with fields that contain a cell array and a vector:
>> s = struct('strings',{{'hello','yes'}},'lengths',[5 3])
s =
strings: {'hello' 'yes'}
lengths: [5 3]
But this creates an array of two structure elements, distributing the cell array but replicating the vector:
>> s = struct('strings',{'hello','yes'},'lengths',[5 3])
s =
1x2 struct array with fields:
strings
lengths
>> s(1)
ans =
strings: 'hello'
lengths: [5 3]
>> s(2)
ans =
strings: 'yes'
lengths: [5 3]
ARGH... I think I found the answer. struct() has multiple behaviors, including:
Note If any of the values fields is
an empty cell array {}, the MATLAB
software creates an empty structure
array in which all fields are also
empty.
and apparently if you pass a member of a 0x0 structure as an argument, it's like some kind of empty phantom that doesn't really show up in the argument list. (that's still probably a bug)
bug2.m:
function bug2(arg1, arg2)
disp(sprintf('number of arguments = %d\narg1 = ', nargin));
disp(arg1);
test case:
>> nothing = struct('something',{})
nothing =
0x0 struct array with fields:
something
>> bug2(nothing,'there')
number of arguments = 2
arg1 =
>> bug2(nothing.something,'there')
number of arguments = 1
arg1 =
there
This behaviour persists in 2008b, and is in fact not really a bug (although i wouldn't say the designers intended for it):
When you step into add_something(S,'boing') and watch the first argument (say by selecting it and pressing F9), you'd get some output relating to the empty structure S.
Step into add_something(S.something,'test') and watch the first argument, and you'd see it's in fact interpreted as 'test' !
The syntax struct.fieldname is designed to return an object of type 'comma separated list'. Functions in matlab are designed to receive an object of this exact type: the argument names are given to the values in the list, in the order they are passed. In your case, since the first argument is an empty list, the comma-separated-list the function receives starts really at the second value you pass - namely, 'test'.
Output is identical in R2008b:
>> bug1
X=
str=
boing
X=
test
str=
??? Input argument "str" is undefined.
Error in ==> bug1>add_something at 11
disp(str);
Error in ==> bug1 at 4
add_something(S.something,'test'); % weird behavior