What was the original reason for MATLAB's one function = one file and why is it still so? - matlab

What was the original reason for MATLAB's one (primary) function = one file, and why is it still so, after so many years of development?
What are the advantages of this approach, compared to its disadvantages (people put too many things in functions and scripts, when they should obviously be separated ... resulting in loss of code clarity)?

Matlab's schema of loading one class/function per file seems to match Java's choice in this matter. I am betting that there were other technical reasons for speeding up the parser in when it was introduced the 1980's. This schema was chosen by Java to discourage extremely large files with everything stuffed inside, which has been the primary argument for any language I've seen using one-file class symantics.
However, forcing one class per file semantics doesn't stop mega files -- KPIB is a perfect example of a complicated, horrifically long function/class file (though a quite useful maga file). So the one class file system is a way of trying to make the user aware about code abstraction more than a functionally useful mechanism.
A positive result of the one function/class file system of Matlab is that it's very easy to know what functions are available at a quick glance of a project directory. Additionally many of the names had to be made descriptive enough to differentiate them from other files, so naming as a minor form of documentation is present as a side effect.
In the end I don't think there are strong arguments for or against one file classes as it's usually just a minor semantically change to go from onw to the other (unless your code is in a horribly unorganized state... in which case you should be shamed into fixing it).
EDIT!
I fixed the bad reference to Matlab adopting Java's one class file system -- after more research it appears that both developers adopted this style independently (or rather didn't specify that the other language influenced their decision). This is especially true since Matlab didn't bundle Java until 2000.

I don't think there any advantage. But you can put as many functions as you need in a single file.
For example:
classdef UTILS
methods (Static)
function help
% prints help for all functions
disp(char(methods(mfilename, '-full')));
end
function func_01()
end
function func_02()
end
% ...more functions
end
end
I find it very neat.
>> UTILS.help
obj UTILS
Static func_01
Static func_02
Static help
>> UTILS.func_01()

Related

Speed advantage of defining function bodies in classdef file?

In C++ (at least as of a decade ago), there was speed advantage in defining the body of a class method in the header file, where the class is defined. No function call overhead was suffered because, in the compilation process, the invocation of such functions was replaced by the code in the body of the function. Subsquently, all source level optimizations (and all optimizations beyond source level) could be brought to bear.
Is there an analogous advantage to putting the body of class methods in the classdef file itself rather than in a separate m-file? I'm speaking specifically about the case where one defines a #myclass/myclass.m, with method m-files in the directory #myclass. The two options I'm considering is to have the code for the body of a method mymethod put into the classdef in #myclass/myclass.m versus being in a separate file #myclass/mymethod.m.
However, an very related auxiliary question would be how those two options compare with having everything defined in a myclass.m file, with no folder #myclass.
Please note that I have previously posted this to usenet
Summarizing the comments as the answer to this question: Using an #classFolder folder containing separate method m-files is faster than having a single m file containing the entireties of the function definitions in the classdef. This is the case even though OOP in general has sped up in 2015b.
I find this a happy answer because I see great value in separating the code implementation of a class's methods from the class definition itself. That's the whole idea of separating interface from implementation. I can look at the classdef and see only a map of the class rather than have those key information elements completely dispersed by the deluge of code that accompanies implementation.
It's just too bad that this doesn't work so well for weakly typed languages. What's listed in the classdef is just member names (properties or methods) with no specification of what class they are. So not as much information as in a strongly typed language. In fact, very little info about what the class, its properties, and its methods really are. Furthermore, there is nothing to ensure that the actual method implementation even complies with the argument list in the classdef. These kind of details helped prevent development errors in a strongly typed language, especially when one's body of classes get large.

Restrict MATLAB to call functions from same folder as running file

I am writing a script (lets call it main.m) which calls a function that I wrote (lets call it myfunc.m). It seems I have a few of these myfunc.m functions at different locations on my MATLAB path.
I would like to somehow restrict matlab to only look within the same folder where my main.m class is, when looking for custom functions.
So for example if I have in
C:\example\main.m
C:\example\myfunc.m
and
C:\asd\main.m
C:\asd\myfunc.m
and I open main.m in folder example, when it comes to the call of myfunc.m, it can ONLY call a function within folder C:\example\. Same goes for if I run main.m in folder C:\asd\.
I hope this makes sense, thanks.
In the short term, a fairly quick solution would be for you to make your myfunc.m files into private functions that are ahead in terms of precedence compared to regular functions, and can only be called by functions in the same parent folder.
Simply place your myfunc.m files in a folder called private:
C:\example\main.m
C:\example\private\myfunc.m
and
C:\asd\main.m
C:\asd\private\myfunc.m
Now example\private\myfunc.m is only callable by things in the folder example, and \asd\private\myfunc.m is only callable by things in the folder asd. In addition, they are higher in precedence than other functions, so you can ensure that the right one always gets called.
Longer term, you might benefit from taking a look at some of the other more extensive ways that MATLAB provides for managing namespace conflicts, such as subfunctions, object-oriented programming, and packages.
Subfunctions are extremely simple to get the hang of. Packages are not at all complicated but require a bit of thought about how to organize your code (which is usually well worth it). OO programming is a much larger change in typical programming style, but for larger applications is pretty essential.

Design - When to create new functions?

This is a general design question not relating to any language. I'm a bit torn between going for minimum code or optimum organization.
I'll use my current project as an example. I have a bunch of tabs on a form that perform different functions. Lets say Tab 1 reads in a file with a specific layout, tab 2 exports a file to a specific location, etc. The problem I'm running into now is that I need these tabs to do something slightly different based on the contents of a variable. If it contains a 1 I may need to use Layout A and perform some extra concatenation, if it contains a 2 I may need to use Layout B and do no concatenation but add two integer fields, etc. There could be 10+ codes that I will be looking at.
Is it more preferable to create an individual path for each code early on, or attempt to create a single path that branches out only when absolutely required.
Creating an individual path for each code would allow my code to be extremely easy to follow at a glance, which in turn will help me out later on down the road when debugging or making changes. The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same.
Creating a single path that would branch out only when required will be a bit messier and more difficult to follow at a glance, but I would create less code by placing conditionals only at steps that are unique.
I realize that this may be a case-by-case decision, but in general, if you were handed a previously built program to work on, which would you prefer?
Edit: I've drawn some simple images to help express it. Codes 1/2/3 are the variables and the lines under them represent the paths they would take. All of these steps need to be performed in a linear chronological fashion, so there would be a function to essentially just call other functions in the proper order.
Different Paths
Single Path
Creating a single path that would
branch out only when required will be
a bit messier and more difficult to
follow at a glance, but I would create
less code by placing conditionals only
at steps that are unique.
Im not buying this statement. There is a level of finesse when deciding when to write new functions. Functions should be as simple and reusable as possible (but no simpler). The correct answer is almost never 'one big file that does a lot of branching'.
Less LOC (lines of code) should not be the goal. Readability and maintainability should be the goal. When you create functions, the names should be self documenting. If you have a large block of code, it is good to do something like
function doSomethingComplicated() {
stepOne();
stepTwo();
// and so on
}
where the function names are self documenting. Not only will the code be more readable, you will make it easier to unit test each segment of the code in isolation.
For the case where you will have a lot of methods that call the same exact methods, you can use good OO design and design patterns to minimize the number of functions that do the same thing. This is in reference to your statement "The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same."
The biggest danger in starting with one big block of code is that it will never actually get refactored into smaller units. Just start down the right path to begin with....
EDIT --
for your picture, I would create a base-class with all of the common methods that are used. The base class would be abstract, with an abstract method. Subclasses would implement the abstract method and use the common functions they need. Of course, replace 'abstract' with whatever your language of choice provides.
You should always err on the side of generalization, with the only exception being early prototyping (where throughput of generating working stuff is majorly impacted by designing correct abstractions/generalizations). having said that, you should NEVER leave that mess of non-generalized cloned branches past the early prototype stage, as it leads to messy hard to maintain code (if you are doing almost the same thing 3 different times, and need to change that thing, you're almost sure to forget to change 1 out of 3).
Again it's hard to specifically answer such an open ended question, but I believe you don't have to sacrifice one for the other.
OOP techniques solves this issue by allowing you to encapsulate the reusable portions of your code and generate child classes to handle object specific behaviors.
Personally I think you might (if possible by your API) create inherited forms, create them on fly on master form (with tabs), pass agruments and embed in tab container.
When to inherit form and when to decide to use arguments (code) to show/hide/add/remove functionality is up to you, yet master form should contain only decisions and argument passing and embeddable forms just plain functionality - this way you can separate organisation from implementation.

Is it naughty to have a large utility file?

In my C project I have quite a large utils.c file. It is really full of many utilities of different sorts. I feel a bit naughty just stuffing different miscellaneous functions in there. For example it has some utilities related to low level stuff such as a lowercase() function, and it also has some quite sophisticated utilities such as converting to/from different colour formats.
My question is, is it very naughty to have such a large utils.c with many different types of utilities in it? Should I break it up into many different kinds of utility files? Such as graphics_utils.c and so on What do you think?
Breaking them up into separate files based on categories (ie graphics, strings, etc.) will lead to better organization, making it easier to locate certain pieces of code, having smaller files to go through, instead of just one large file.
You want to break it up, not just for organizational reasons, but because you will have many other files that depend on this one. Because everything will depend on this file, it makes this one file difficult to change because it might cause widespread breakage.
http://ifacethoughts.net/2006/04/15/stable-dependencies-principle/
If it's just you that will EVER maintain the stuff, it's a matter of when the complexity gets to the point where you find yourself searching for things. That would be the time to refactor and reorganize (there's a cost to reorganize, just as there's a cost to not reorganize).
If it's POSSIBLE that anyone else will maintain a project that includes your utils, you have to consider THEIR pain point when deciding when to reorganize. Theirs is MUCH lower than yours.
I tend to break them up into various sub-utils as you say (graphics_utils) when it becomes appropriate.
Break it up. Stuff will be easier to find, easier to reuse, easier to refactor, easier to unit test. I recently needed to get a set of ISO-8601 date handling methods out of a ginormous Java utility class of static methods, and it was really hard to find the 5% of the code I needed.
It is definitely not kosher, because the next guy coming through your code won't know where to look for anything. Break it up by function, and your coworkers will thank you!
Another advantage that comes from breaking up the file into separates is that when you place it under source control, you can have finer grained control. This really is useful if you have bits that are tweaked/extended/specialised frequently, and other bits that are relatively stable.
Another point: You should organize your code, i. e. break it up in smaller modules and categorize it, because at some point in time you will end up writing a second and third function for the same thing, simply for the reason that you wont find that function that you knew it was there, but you don't remember it's name.
I've got a (rather large) project with such a module and there is programming logic for which there are up to 5-6 implementations (for the same thing).
Like everyone else I would break them up. But I tend to use Extension Methods now, so I would have one class (and one file) per class being extended (e.g. StringExtensions, SqlDataReaderExtensions, etc). I find this tends to break up the utility methods nicely.

Does it matter if there are unused functions I put into a big CoolFunctions.h / CoolFunctions.m file that's included everywhere in my project?

I want to create a big file for all cool functions I find somehow reusable and useful, and put them all into that single file. Well, for the beginning I don't have many, so it's not worth thinking much about making several files, I guess. I would use pragma marks to separate them visually.
But the question: Would those unused methods bother in any way? Would my application explode or have less performance? Or is the compiler / linker clever enough to know that function A and B are not needed, and thus does not copy their "code" into my resulting app?
This sounds like an absolute architectural and maintenance nightmare. As a matter of practice, you should never make a huge blob file with a random set of methods you find useful. Add the methods to the appropriate classes or categories. See here for information on the blob anti-pattern, which is what you are doing here.
To directly answer your question: no, methods that are never called will not affect the performance of your app.
No, they won't directly affect your app. Keep in mind though, all that unused code is going to make your functions file harder to read and maintain. Plus, writing functions you're not actually using at the moment makes it easy to introduce bugs that aren't going to become apparent until much later on when you start using those functions, which can be very confusing because you've forgotten how they're written and will probably assume they're correct because you haven't touched them in so long.
Also, in an object oriented language like Objective-C global functions should really only be used for exceptional, very reusable cases. In most instances, you should be writing methods in classes instead. I might have one or two global functions in my apps, usually related to debugging, but typically nothing else.
So no, it's not going to hurt anything, but I'd still avoid it and focus on writing the code you need now, at this very moment.
The code would still be compiled and linked into the project, it just wouldn't be used by your code, meaning your resultant executable will be larger.
I'd probably split the functions into seperate files, depending on the common areas they are to address, so I'd have a library of image functions separate from a library of string manipulation functions, then include whichever are pertinent to the project in hand.
I don't think having unused functions in the .h file will hurt you in any way. If you compile all the corresponding .m files containing the unused functions in your build target, then you will end up making a bigger executable than is required. Same goes for if you include the code via static libraries.
If you do use a function but you didn't include the right .m file or library, then you'll get a link error.