Backwards compatible contains() - matlab

I'm writing a set of functions that will be used by colleagues who use older versions of MATLAB (2015a/2015b). In one of my functions I use contains() which was only introduced in 2016b and is thus not backward compatible. I'd like to provide a workaround but I'm not quite sure how to go about this. The particular issue that I'm dealing with is as follows:
files = {'/some/path/sub001file','/some/path/sub002file','/some/path/sub003file'};
subjects = {'sub001','sub003'};
files = files(contains(files,subjects))
I'm looking for a way to replace the third line with one that will run on MATLAB2015a and later, and provides identical output. As an aside, since this is a rather small operation the readability of the code is more important than computational efficiency.

It's a bit convoluted, but the following will work,
idx = cellfun(#(c)~all(cellfun(#(d)isempty(strfind(c,d)),subjects)),files);
files = files(idx);

Related

What are the differences between mlint and checkcode in MATLAB?

MATLAB provides two functions to check code for errors mlint and checkcode.
What are the main differences between them, and why does the MATLAB help say that mlint is not recommended and checkcode should be used instead?
checkcode is just a new name for mlint.
About six or seven years ago, MathWorks decided that for reasons of brand and product integrity they would prefer it if people thought of MATLAB (including the language, the IDE, the graphics, the libraries etc) as a single entity called MATLAB, rather than separable things.
They realised that they had been contributing to the issue by referring (in code, comments, and some marketing material) to the underlying language as "M", which might give the impression that MATLAB was just a wrapper around the "M" language.
They went through the product and purged pretty much every reference to "M", and the mlint command was one of those cases.
However, they have many customers who rely on the existence of the command mlint, and wouldn't want to update their code. So mlint still exists for backward compatibility, but it's deliberately unadvertised, and its help/doc just says that it's no longer recommended, and that you should use checkcode instead.
In modern versions of MATLAB, if you type edit mlint, you'll see that it literally just calls checkcode under the hood.
The functionality is the same as it always has been, it's just a name change. Nevertheless, if you're starting a new project, you should use checkcode, as eventually all those legacy customers will have finally upgraded things, and at that point MathWorks may well decide to finally remove mlint entirely.

What's the point of using "map()" for two elements in perl?

I've seen code where there are just two rather static elements to be mapped such as time intervals with start and end dates, yet map() is being used rather than explicit code for mapping, e.g.
{ map { ... } qw(start end) } # vs.
{ start => ..., end => ... }
Which way is preferrable, and why?
The map form may be less concise but looks more functional (as in functional programming), so I guess that's why it may be preferred over explicit code and is perhaps more DRY.
However, it looks less legible to me because there is more logic going on behind, and mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
EDIT
There is a conflicting goal in programming: KISS (keep it { pick 2 from: small, simple, stupid }). Using map slightly complicates code.
Assuming you're not just setting both items to the same constant or something similarly trivial, I would expect the map version to be more concise.
IMO, the main point in favor of the map version is that you know the same process will be used to produce both values. Not only for the sake of DRY, but also because it eliminates any concern that one might have a subtle change which the other doesn't.
As for the performance concern... If your use case is sufficiently performance-sensitive for any potential difference to matter, then you shouldn't be using Perl in the first place. Switching to well-written C (not C#, not C++, not Objective C - just plain C) will have a far greater performance impact than micro-optimizing whether you assign two values individually vs. using a loop to set them. But the odds of your use case being that sensitive are approximately zero anyhow.
There is a principle of coding known as DRY. Don't Repeat Yourself.
It asserts that:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
And that can be interpreted as condensing duplicate typing with (things like) map/for.
I use idioms like the one you've quoted when I'm trying to expand some text - for example:
my #defs = map { "DEF:$_=$source_file:$_:MAX" } qw ( read write );
This generates me some DEF lines for rrdtool.
I'm doing it this way, because for some cases, I've got considerably longer lists of 'things I want to define' and want to be consistent. (Sometimes I have say, 10 similar lines that differ only by a single word).
But also because:
my #defs = ( "DEF:read=$source_file:read:MAX",
"DEF:write=$source_file:write:MAX" );
There's not much in it for two elements, and I'd suggest it's as much a matter of style as anything. However, if you've got more than that, it quickly becomes very beneficial because you can change the single line - say you've got a different file location? Want to swap MAX for AVERAGE?
It's also quite shockingly easy to go 'punctuation blind' when looking at a long sequence of similar statements, where someone's typo-ed and added a , where it should be . or similar.
And ... you probably don't lose a great deal in terms of readability. But will acknowledge that's something of a style point, because whilst map is pretty amazing, it can make for some rather hard to read code if you're not careful.
Also to specifically address:
mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
A wise man once said:
premature optimization is the root of all evil
Don't think about the efficiency of a statement - look at the legibility/readability. Compilers are pretty clever. Most "obvious" optimisations, they already make for you. Processors are also pretty fast. Your limiting factor in most code isn't the amount of CPU cycles you need, it's IO throughput and memory footprint. So don't worry about it - write clear code.
And if there's a performance critical demand on your code, you should be using a code profiler to look at where you gain the most efficiency for your effort at refactoring. You may end up with less clear code in doing so (sometimes) but that's a more clear tradeoff.

How to test for recent-enough version of MATLAB?

A function I want to implement needs to know whether the current version of MATLAB is at least as recent as R2014a.
Is there a robust, supported way to perform this check?
(With "robust, supported" I mean to say that I'm not interested in fragile hacks such as parsing the string returned by version, etc.)
BTW, in this case, the reason I want this check is to know that I can use the function matlab.lang.makeUniqueStrings. If there were a robust, supported way to check for the availability of this function, I'd use it instead of testing that the current MATLAB is recent enough. Unfortunately, there doesn't seem to be such a check: exist returns false to every variant I can come up for the name of this function. Again, I can think of fragile hacks to mimic a proper test (e.g. which('matlab.lang.makeUniqueStrings')), but they're hardly better than the version-testing hacks I alluded to above.
The best solution I have found is to run the command using matlab.lang.makeUniqueStrings within a try-catch block. This is still a fragile hack, because MATLAB does not offer a robust, built-in way to catch specific exceptions!
IOW, it's all about choosing the least awful hack. Testing that the current version is recent enough (even if this test is a fragile hack) at least has the virtue of being general enough to stick in some function, and at least contain the proliferation of fragile, hacky code.
I would use the verLessThan function:
verLessThan('matlab', '8.3')
This will return true (1) if the current version you are using is older than 8.3 (R2014a) and false (0) otherwise. No string parsing required.
You could then use it like so:
if ~verLessThan('matlab', '8.3')
% Run code using matlab.lang.makeUniqueStrings
end
If you only need to care about fairly recent versions, use the verLessThan command. However, verLessThan was introduced in about 2006a or so; if you need to support versions older than that, you will need to use the output of the version command.
Alternatively, you can robustly test for the existence of matlab.lang.makeUniqueStrings. Firstly, use m = meta.package.fromName('matlab.lang') to retrieve a meta.package object referring to the package. If m is empty, the package does not exist. Assuming m is not empty, check the FunctionList property of m to see whether makeUniqueStrings is present. (There's also a ClassList property as well).
Finally, MATLAB does offer a way to catch specific exceptions. Instead of a simple catch, use catch myError. The variable myError will be an object of type MException, available within the catch block. You can test the identifier and message properties of the exception, and handle different exceptions appropriately, including rethrowing unhandled ones.
You may use MATLAB command version for your test -
['Release R' version('-release')]
Sample run -
>> ['Release R' version('-release')]
ans =
Release R2012a
Check if your MATLAB version is the recent one (R2014a) -
strcmp (version('-release'),'R2014a')
The above command would return 1 if it's a recent version, otherwise returns 0.
The best way is to use the version command, and parse the string appropriately.
[v d] = version
Take a look at the output from R2014a, and set your values appropriately.
An example of what Sam meant:
try
%// call to matlab.lang.makeUniqueStrings
catch ME
%// (use regexp here to include support for Octave)
if strcmpi(ME.identifier, 'MATLAB:undefinedVarOrClass')
error('yourFcn:someID',...
'matlab.lang.makeUniqueStrings is not supported on your version of MATLAB.');
else
throw(ME);
end
end
Robust until The MathWorks changes the ID string.
As a final remark: checking for features is not sufficient: what if The MathWorks decides to change the function signature? Or the output argument list? Or ..?
There is no really robust method in a language that is itself not robust. Be as robust as the language allows you, but no more.
Testing for version number is barely a good idea. You should always check for features, and never for versions, if you really want robustness (and at the same time portability).
What will happen if one of the features you need is removed from a future version of Matlab? Or the way it works changes? (this is far more common than one would expect). Or if someone wants to use your code in a Matlab compatible system that does have the features your code requires?
There are some autoconf macros related to Matlab around (although I have never used one). Or you can write your own simply checks in Matlab language.

What was the original reason for MATLAB's one function = one file and why is it still so?

What was the original reason for MATLAB's one (primary) function = one file, and why is it still so, after so many years of development?
What are the advantages of this approach, compared to its disadvantages (people put too many things in functions and scripts, when they should obviously be separated ... resulting in loss of code clarity)?
Matlab's schema of loading one class/function per file seems to match Java's choice in this matter. I am betting that there were other technical reasons for speeding up the parser in when it was introduced the 1980's. This schema was chosen by Java to discourage extremely large files with everything stuffed inside, which has been the primary argument for any language I've seen using one-file class symantics.
However, forcing one class per file semantics doesn't stop mega files -- KPIB is a perfect example of a complicated, horrifically long function/class file (though a quite useful maga file). So the one class file system is a way of trying to make the user aware about code abstraction more than a functionally useful mechanism.
A positive result of the one function/class file system of Matlab is that it's very easy to know what functions are available at a quick glance of a project directory. Additionally many of the names had to be made descriptive enough to differentiate them from other files, so naming as a minor form of documentation is present as a side effect.
In the end I don't think there are strong arguments for or against one file classes as it's usually just a minor semantically change to go from onw to the other (unless your code is in a horribly unorganized state... in which case you should be shamed into fixing it).
EDIT!
I fixed the bad reference to Matlab adopting Java's one class file system -- after more research it appears that both developers adopted this style independently (or rather didn't specify that the other language influenced their decision). This is especially true since Matlab didn't bundle Java until 2000.
I don't think there any advantage. But you can put as many functions as you need in a single file.
For example:
classdef UTILS
methods (Static)
function help
% prints help for all functions
disp(char(methods(mfilename, '-full')));
end
function func_01()
end
function func_02()
end
% ...more functions
end
end
I find it very neat.
>> UTILS.help
obj UTILS
Static func_01
Static func_02
Static help
>> UTILS.func_01()

How can I represent sets in Perl?

I would like to represent a set in Perl. What I usually do is using a hash with some dummy value, e.g.:
my %hash=();
$hash{"element1"}=1;
$hash{"element5"}=1;
Then use if (defined $hash{$element_name}) to decide whether an element is in the set.
Is this a common practice? Any suggestions on improving this?
Also, should I use defined or exists?
Thank you
Yes, building hash sets that way is a common idiom. Note that:
my #keys = qw/a b c d/;
my %hash;
#hash{#keys} = ();
is preferable to using 1 as the value because undef takes up significantly less space. This also forces you to uses exists (which is the right choice anyway).
Use one of the many Set modules on CPAN. Judging from your example, Set::Light or Set::Scalar seem appropriate.
I can defend this advice with the usual arguments pro CPAN (disregarding possible synergy effects).
How can we know that look-up is all that is needed, both now and in the future? Experience teaches that even the simplest programs expand and sprawl. Using a module would anticipate that.
An API is much nicer for maintenance, or people who need to read and understand the code in general, than an ad-hoc implementation as it allows to think about partial problems at different levels of abstraction.
Related to that, if it turns out that the overhead is undesirable, it is easy to go from a module to a simple by removing indirections or paring data structures and source code. But on the other hand, if one would need more features, it is moderately more difficult to achieve the other way around.
CPAN modules are already tested and to some extent thoroughly debugged, perhaps also the API underwent improvement steps over the time, whereas with ad-hoc, programmers usually implement the first design that comes to mind.
Rarely it turns out that picking a module at the beginning is the wrong choice.
That's how I've always done it. I would tend to use exists rather than defined but they should both work in this context.