i and j are very popular variable names (see e.g., this question and this one).
For example, in loops:
for i=1:10,
% Do something...
end
As indices into a matrix:
mat(i, j) = 4;
Why shouldn't they be used as variable names in MATLAB?
Because i and j are both functions denoting the imaginary unit:
http://www.mathworks.co.uk/help/matlab/ref/i.html
http://www.mathworks.co.uk/help/matlab/ref/j.html
So a variable called i or j will override them, potentially silently breaking code that does complex maths.
Possible solutions include using ii and jj as loop variables instead, or using 1i whenever i is required to represent the imaginary unit.
It is good practice to avoid i and j variables to prevent confusion about them being variables or the imaginary unit.
Personally, however, I use i and j as variables quite often as the index of short loops. To avoid problems in my own code, I follow another good practice regarding i and j: don't use them to denote imaginary numbers. In fact, MATLAB's own documentation states:
For speed and improved robustness, you can replace complex i and j by 1i.
So rather than avoiding two very commonly used variable names because of a potential conflict, I'm explicit about imaginary numbers. It also makes my code more clear. Anytime I see 1i, I know that it represents sqrt(-1) because it could not possibly be a variable.
In old versions of MATLAB, there used to be a good reason to avoid the use of i and j as variable names - early versions of the MATLAB JIT were not clever enough to tell whether you were using them as variables or as imaginary units, and would therefore turn off many otherwise possible optimizations.
Your code would therefore get slower just by the very presence of i and j as variables, and would speed up if you changed them to something else. That's why, if you read through much MathWorks code, you'll see ii and jj used fairly widely as loop indices. For a while, MathWorks might even have unofficially advised people to do that themselves (although they always officially advise people to program for elegance/maintainability rather than to whatever the current JIT does, as it's a moving target each version).
But that's rather a long time ago, and nowadays it's a bit of a "zombie" issue that is really much less important than many people still think, but refuses to die.
In any recent version, it's really a personal preference whether to use i and j as variable names or not. If you do a lot of work with complex numbers, you may want to avoid i and j as variables, to avoid any small potential risk of confusion (although you may also/instead want to only use 1i or 1j for even less confusion, and a little better performance).
On the other hand, in my typical work I never deal with complex numbers, and I find my code more readable if I feel free to use i and j as loop indices.
I see a lot of answers here that say It is not recommended... without saying who's doing that recommending. Here's the extent of MathWorks' actual recommendations, from the current release documentation for i:
Since i is a function, it can be overridden and used as a variable. However, it is best to avoid using i and j for variable names if you intend to use them in complex arithmetic. [...] For speed and improved robustness, you can replace complex i and j by 1i.
As described in other answers, the use of i in general code is not recommended for two reasons:
If you want to use the imaginary number, it can be confused with or overwritten by an index
If you use it as an index it can overwrite or be confused with the imaginary number
As suggested: 1i and ii are recommended. However, though these are both fine deviations from i, it is not very nice to use both of these alternatives together.
Here is an example why (personally) I don't like it:
val2 = val + i % 1
val2 = val + ii % 2
val2 = val + 1i % 3
One will not easily be misread for two or three, but two and three resemble each other.
Therefore my personal recommendation would be: In case you sometimes work with complex code, always use 1i combined with a different loop variable.
Examples of single letter indices that for if you don't use many loop variables and letters suffice: t,u,k and p
Example of longer indices: i_loop,step,walk, and t_now
Of course this is a matter of personal taste as well, but it should not be hard to find indices to use that have a clear meaning without growing too long.
It was pointed out that 1i is an acceptable and unambiguous way to write sqrt(-1), and that as such there is no need to avoid using i. Then again, as Dennis pointed out, it can be hard to see the difference between 1i and ii. My suggestion: use 1j as the imaginary constant where possible. It's the same trick that electrical engineers employ - they use j for sqrt(-1) because i is already taken for current.
Personally I never use i and j; I use ii and jj as shorthand indexing variables, (and kk, ll, mm, ...) and 1j when I need to use complex numbers.
Confusion with the imaginary unit has been well covered here, but there are some other more prosaic reasons why these and other single-letter variable names are sometimes discouraged.
MATLAB specifically: if you're using coder to generate C++ source from your MATLAB code (don't, it's horrible) then you are explicitly warned not to reuse variables because of potential typing clashes.
Generally, and depending on your IDE, a single-letter variable name can cause havoc with highlighters and search/replace. MATLAB doesn't suffer from this and I believe Visual Studio hasn't had a problem for some time, but the C/C++ coding standards like MISRA, etc. tend to advise against them.
For my part I avoid all single-letter variables, despite the obvious advantages for directly implementing mathematical sources. It takes a little extra effort the first few hundred times you do it, but after that you stop noticing, and the advantages when you or some other poor soul come to read your code are legion.
By default i and j stand for the imaginary unit. So from MATLAB's point of view, using i as a variable is somehow like using 1 as a variable.
Any non-trivial code contains multiple for loops, and the best practices recommend you use a descriptive name indicative of its purpose and scope. For times immemorial (and unless its 5-10 lines script that I am not going to save), I have always been using variable names like idxTask, idxAnotherTask and idxSubTask etc.
Or at the very least doubling the first letter of the array it is indexing e.g. ss to index subjectList, tt to index taskList, but not ii or jj which doesn't help me effortlessly identify which array they are indexing out of my multiple for loops.
Unless you are a very confused user I think there is very little risk in using variable names i and j and I use them regularly. I haven't seen any official indication that this practice should be avoided.
While it's true that shadowing the imaginary unit could cause some confusion in some context as mentioned in other posts, overall I simply don't see it as a major issue. There are far more confusing things you can do in MATLAB, take for instance defining false=true
In my opinion the only time you should probably avoid them is if your code specifically deals with imaginary numbers.
Related
Recently, in R2016b a feature was added to MATLAB, which is causing a lot of headaches in the school where I teach.
Nowadays formulae, which traditionally would be considered illegal or at least shady maths are executed successfully:
[1, 2] + [3, 4]' -> [4, 5; 5, 6]
[1, 2]' + [3, 4, 5] -> [4, 5, 6; 5, 6, 7]
So adding a row vector to a column vector is treated as an addition of two matrices one can get from repeating the vectors up to the "suitable" dimensions. In older versions this would have produced an error message informing that the addition of matrices with different dimensions is not possible.
I think asking why is a bit broad, although if you do know why, I'd love to know. Instead I will ask, is there a way to disable this functionality? To novice programmers this is a world of hurt when the conventional mathematics doesn't seem to be in line and the resulting matrix often goes unnoticed causing errors only later on.
I can not see this being a useful part of MATLAB syntax and behavior, as it requires too much interpretation, reading into the intention of the programmer. repmat is there for a reason, and a dedicated function could be introduced to accommodate for the need of this thing.
As mentioned by #PhelypeOleinik, this is (since R2016b) a core part of the language, and for good reasons, as detailed in the blog post referred to.
However, if you REALLY want to disable it...
Make a folder somewhere on your path, called #double.
In this folder, make a file plus.m.
In the file, put something like the following:
function out = plus(in1, in2)
% do some things here
out = builtin('plus', in1, in2)
Where I have a comment above, you can put whatever code you like: which could include code that checks the inputs for the "size-compatibility" rules you want, and errors if it doesn't meet them.
Do something similar for the functions minus, times, ldivide, rdivide, power, and other functions you want to modify.
PS please don't actually do this, the developers worked very hard to implement implicit expansion, and they'll cry if they see you disabling it like this...
This feature was introduced in Matlab R2016b. In older versions, this expansion had to be done either with repmat or with bsxfun. Newer versions feature this implicit expansion of dimensions to vectorize the calculation.
In this blog post Steve Eddins, from MathWorks says that:
Other people thought that the new operator behavior was not sufficiently based on linear algebra notation. However, instead of thinking of MATLAB as a purely linear algebra notation, it is more accurate to think of MATLAB as being a matrix and array computation notation.
and it really does make sense in a computational context. I can say that for my uses, this implicit expansion does make things easier very often.
Of course, seeing this from the point of view of algebra, it doesn't make sense. But if you think about it, most computer language notation wouldn't make sense.
And since this is now part of the language, it shouldn't be possible to disable the feature (until Yair Altman tries to do so :P).
Matlab offers multiple algorithms for solving Linear Programs.
For example Matlab R2012b offers: 'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', or 'sqp'.
But other versions of Matlab support different algorithms.
I would like to run a loop over all algorithms that are supported by the users Matlab-Version. And I would like them to be ordered like the recommendation order of Matlab.
I would like to implement something like this:
i=1;
x=[];
while (isempty(x))
options=optimset(options,'Algorithm',Here_I_need_a_list_of_Algorithms(i))
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
end
In 99% this code should be equivalent to
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
but sometimes the algorithm gives back an empty array because of numerical problems (exitflag -4). If there is a chance that one of the other algorithms can find a solution I would like to try them too.
So my question is:
Is there a possibility to automatically get a list of all linprog-algorithms that are supported by the installed Matlab-version ordered like Matlab recommends them.
I think looping through all algorithms can make sense in other scenarios too. For example when you need very precise data and have a lot of time, you could run them all and than evaluate which gives the best results.
Or one would like to loop through all algorithms, if one wants to find which algorithms is the best for LPs with a certain structure.
There's no automatic way to do this as far as I know. If you really want to do it, the easiest thing to do would be to go to the online documentation, and check through previous versions (online documentation is available for old versions, not just the most recent release), and construct some variables like this:
r2012balgos = {'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', 'sqp'};
...
r2017aalgos = {...};
v = ver('matlab');
switch v.Release
case '(R2012b)'
algos = r2012balgos;
....
case '(R2017a)'
algos = r2017aalgos;
end
% loop through each of the algorithms
Seems boring, but it should only take you about 30 minutes.
There's a reason MathWorks aren't making this as easy as you might hope, though, because what you're asking for isn't a great idea.
It is possible to construct artificial problems where one algorithm finds a solution and the others don't. But in practice, typically if the recommended algorithm doesn't find a solution this doesn't indicate that you should switch algorithms, it indicates that your problem wasn't well-formulated, and you should consider modifying it, perhaps by modifying some constraints, or reformulating the objective function.
And after all, why stop with just looping through the alternative algorithms? Why not also loop through lots of values for other options such as constraint tolerances, optimality tolerances, maximum number of function evaluations, etc.? These may have just as much likelihood of affecting things as a choice of algorithm. And soon you're running an optimisation algorithm to search through the space of meta-parameters for your original optimisation.
That's not a great plan - probably better to just choose one of the recommended algorithms, stick to that, and if things don't work out then focus on improving your formulation of the problems rather than over-tweaking the optimisation itself.
Recently, in R2016b a feature was added to MATLAB, which is causing a lot of headaches in the school where I teach.
Nowadays formulae, which traditionally would be considered illegal or at least shady maths are executed successfully:
[1, 2] + [3, 4]' -> [4, 5; 5, 6]
[1, 2]' + [3, 4, 5] -> [4, 5, 6; 5, 6, 7]
So adding a row vector to a column vector is treated as an addition of two matrices one can get from repeating the vectors up to the "suitable" dimensions. In older versions this would have produced an error message informing that the addition of matrices with different dimensions is not possible.
I think asking why is a bit broad, although if you do know why, I'd love to know. Instead I will ask, is there a way to disable this functionality? To novice programmers this is a world of hurt when the conventional mathematics doesn't seem to be in line and the resulting matrix often goes unnoticed causing errors only later on.
I can not see this being a useful part of MATLAB syntax and behavior, as it requires too much interpretation, reading into the intention of the programmer. repmat is there for a reason, and a dedicated function could be introduced to accommodate for the need of this thing.
As mentioned by #PhelypeOleinik, this is (since R2016b) a core part of the language, and for good reasons, as detailed in the blog post referred to.
However, if you REALLY want to disable it...
Make a folder somewhere on your path, called #double.
In this folder, make a file plus.m.
In the file, put something like the following:
function out = plus(in1, in2)
% do some things here
out = builtin('plus', in1, in2)
Where I have a comment above, you can put whatever code you like: which could include code that checks the inputs for the "size-compatibility" rules you want, and errors if it doesn't meet them.
Do something similar for the functions minus, times, ldivide, rdivide, power, and other functions you want to modify.
PS please don't actually do this, the developers worked very hard to implement implicit expansion, and they'll cry if they see you disabling it like this...
This feature was introduced in Matlab R2016b. In older versions, this expansion had to be done either with repmat or with bsxfun. Newer versions feature this implicit expansion of dimensions to vectorize the calculation.
In this blog post Steve Eddins, from MathWorks says that:
Other people thought that the new operator behavior was not sufficiently based on linear algebra notation. However, instead of thinking of MATLAB as a purely linear algebra notation, it is more accurate to think of MATLAB as being a matrix and array computation notation.
and it really does make sense in a computational context. I can say that for my uses, this implicit expansion does make things easier very often.
Of course, seeing this from the point of view of algebra, it doesn't make sense. But if you think about it, most computer language notation wouldn't make sense.
And since this is now part of the language, it shouldn't be possible to disable the feature (until Yair Altman tries to do so :P).
I have a, b and A.
a = some expression 1
b = some expression 2
A = a + b
vs
A = some expression 1 + some expression 2
In my code, there are not just a and b but a lot of those. By using the later method without creating variables at first, i.e. by just summing all the expressions in A, I get 1s faster in my program, total is about 11s. This is confirmed after a long of tests. So it reduces from 11s to about 10s. Is this due to just not creating variables at first? Does not creating variables at first lead to faster computation?
I need to run a lot of for loop and run ode solver and for long computation. Variables are calculated and created inside the loop. If i can get a about 10% decrease this is good.
In general (not just MATLAB).
Your first scenario these additional steps are required, which do not apply to the second scenario.
When variable is created, memory needs to be allocated where the value for the variable can be stored.
When a value is assigned to that variable, that value needs to be written to the variable's space in memory.
When the calculation is requested, the value for each variable needs to be retrieved from memory.
Many compilers optimize away these additional overheads by using various techniques, but many interpreted languages do not. (This is not a hard and fast rule though, there are smart interpreted languages and stupid compiled ones).
I do not know exactly how the internals of MATLAB works, but I do think it is
interpreted, which means that the additional steps likely will incur additional overhead.
The problem with your second scenario is that is less readable and maintainable in the long run though. It is easier to read computations and intermediate steps when variable names are used. The trick is balance performance and readability.
I'm not sure how much of a difference it would make in terms of performance, but I don't think it would be a sizeable difference. Maybe a few hundredths of a second.
You can test it for yourself by using the tic toc function.
tic
a = some expression 1
b = some expression 2
A = a + b
toc
VS
tic
A = some expression 1 + some expression 2
toc
As mentioned in the other answer, readability is the main difference. You want to keep your code as simple as possible so that if there is a problem you know exactly where it is and hopefully why there was a problem!
I am looking at refactoring some very complex code which is a subsystem of a project I have at work. Part of my examination of this code is that it is incredibly complex, and contains a lot of inputs, intermediate values and outputs depending on some core business logic.
I want to redesign this code to be easier to maintain as well as executing a hell of a lot faster, so to start off with I have been trying to look at each of the parameters and their dependencies on each other. This has lead to quite a large and tangled graph and I would like a mechanism for simplifying this graph.
A while back I came across a technique in a book about SOA design called "Matrix Design Decomposition" which uses a matrix of outputs and the dependencies they have on the inputs, applies some form of matrix algebra and can generate Business Process diagrams for those dependencies.
I know there is a web tool available at http://www.designdecomposition.com/ however it is limited in the number of input/output dependencies you can have. I have tried looking around for the algorithmic source for this tool (so I could attempt to implement it myself without the size limitation), however I have had no luck.
Does anybody know a similar technique that I could use? Currently I am even considering taking the dependency matrix and applying some Genetic Algorithms to see if evolution can come up with a simpler workflow...
Cheers,
Aidos
EDIT:
I will explain the motivation:
The original code was written for a system which computed all of the values (about 60) every time the user performed an operation (adding, removing or modifying certain properties of a item). This code was written over ten years ago and is definitely showing signs of age - others have added more complex calculations into the system and now we are getting completely unreasonable performance (up to 2 minutes before control is returned to the user). It has been decided to detach the calculations from the user actions and provide a button to "recalculate" the values.
My problem arises because there are so many calculations that are going on and they are based on the assumption that all of the required data will be available for their computation - now when I try to re-implement the calculations I keep encountering problems because I haven't got the result for a different calculation that this calculation relies on.
This is where I want to use the matrix-decomposition approach. The MD approach allows me to specify all of the inputs and outputs and gives me the "simplest" workflow that I can use for generating all of the outputs.
I can then use this "workflow" to know the precedence of the calculations I need to perform to get the same result without generating any exceptions. It also shows me which parts of the calculation system I can parallelise and where the fork and join points will be (I won't worry about that part just yet). At the moment all I have is an insanely large matrix with lots of dependencies showing in it, with no idea where to start.
I will elaborate from my comment a little more:
I don't want to use the solution from the EA process in the actual program. I want to take the dependency matrix and decompose it into modules that I will then code manually - this is purely a design aid - I am just interested in what the inputs/outputs for these modules will be. Basically a representation of the complex interdependencies between these calculations, as well as some idea of precedence.
Say I have A requires B and C. D requires A and E. F requires B, A and E, I want to effectively partition the problem space from a complex set of dependencies into a "workflow" that I can examine to get a better understanding. Once I have this understanding I can come up with a better design / implementation that is still human readable, so for the example I know I need to calculate A, then C, then D, then F.
--
I know this seems kind of strange, if you take a look at the website I linked to before the matrix based decomposition there should give you some understanding of what I am thinking of...
kquinn, If it's the piece of code I think he's referring to (I used to work there), it's already a black box solution that no human can understand as is. He's not looking to make it more complicated, less in fact. What he's trying to achieve is a whole heap of interlinked calculations.
What currently happens, is that whenever anything changes, it's an avalanche of events which cause a whole bunch of calculations to fire off, which in turn causes a whole bunch more events which continues on until finally it reaches a state of equilibrium.
What I assume he wants to do is find the dependencies for those outlying calculations and work in from there so they can be rewritten and find a way for the calculations from happening for the sake of it, rather than because they need to.
I can't offer much advice in regards to simplifying the graph, as unfortunately it's not something I have much experience in. That said, I would start looking for those outlying calculations which have no dependencies, and just traverse the graph from there. Start building up a new framework that includes the core business logic of each calculation in the simplest possible way, and refactor the crap out of it along the way.
If this is, as you say, "core business logic", then you really don't want to be screwing around with fancy decompositions and evolutionary algorithms that produce a "black box" solution that no one in the world understands or is capable of modifying. I would be very surprised if any of these techniques actually yielded any useful result; the human brain is still incomprehensibly more capable than any machine at untangling complicated relationships.
What you want to do is traditional refactoring: clean up the individual procedures, streamlining them and merging them where possible. Your goal is to make the code clear, so your successor doesn't have to go through the same process.
What language are you using?
Your problem should be pretty easy to model using Java Executors and Future<> tasks, but a similar framework is perhaps availabe on your chosen platform as well?
Also, if I understand this correctly, you want to generate a critical path for a large set of interdependent calculations -- is that something done dynamically, or do you "just" need a static analysis?
Regarding an algorithmic solution; pick up the closest copy of your numerical analysis textbook and refresh your memory on singular value decompositions and LU factorization; I'm guessing from the top off my head that this is what lies behind the tool you linked to.
EDIT: Since you're using Java, I'll give a brief outline of a suggestion proposal:
-> Use a threadpool executor to parallellize all calculations easily
-> Solve interdependencies with an object map of Future<> or FutureTask<>:s, i.e. if you variables are A, B and C, where A = B + C, do something like this:
static final Map<String, FutureTask<Integer>> mapping = ...
static final ThreadPoolExecutor threadpool = ...
FutureTask<Integer> a = new FutureTask<Integer>(new Callable<Integer>() {
public Integer call() {
Integer b = mapping.get("B").get();
Integer c = mapping.get("C").get();
return b + c;
}
}
);
FutureTask<Integer> b = new FutureTask<Integer>(...);
FutureTask<Integer> c = new FutureTask<Integer>(...);
map.put("A", a);
map.put("B", a);
map.put("C", a);
for ( FutureTask<Integer> task : map.values() )
threadpool.execute(task);
Now, if I'm not totally off (and I may very well be, it was a while since I worked in Java), you should be able to solve the apparent deadlock problem by tuning the thread pool size, or use a growing thread pool. (You still have to make sure that there are no interdependent tasks though, such as if A = B + C, and B = A + 1...)
If the black-box is linear you can discover all the coefficients by simply concatenating many vectors of input and many vectors of output.
you have input x[i] and output y[i], then you create a matrix Y whose columns are y[0], y[1], ... y[n], and a matrix X whose columns are x[0], x[1], ..., x[n]. There will be a transformation Y = T * X, then you may determine T = Y * inverse(X).
But since you said it is complex I bet it is not linear. Then if you still want a general framework you can use this a factor-graph
https://ieeexplore.ieee.org/document/910572
I would be curious if you can do this.
What I think is easier is to understand the code and rewrite it using the best practices.