When indexing dictionary in depth I've found different results in the same (as I think) constructions:
q)d:`a`b!(1 2 3;4 5 6)
q)d[`a`b;0]
1 4
q)d[`a`b]0
1 2 3
Why is this happening? How q understands and distinguishes two different cases? Before this I was confident that, for example, calling dyadic function f[a;b] and f[a]b are the same. And now I am not sure even about this.
To index at depth you either need semi colons separating your arguments, or use the dot. Your second example,
d[`a`b] 0
Is taking the 2 lists from the dictionary values and then indexing to return the first.
While
d[`a`b;0]
or
d .(`a`b;0)
Is taking the 2 lists, and then indexing at depth, taking the first element of each, due to the semi colon/dot
When you call a dyadic function it is expecting two parameters, passing one inside the square brackets creates a projection, which is basically using an implicit semi colon, so
f[a]b
is the same as
f[a;]b
which is the same as
f[a;b]
The result of
f[a]
is a projection which is expecting another argument, so
f[a] b
evaluates f[a], then passes argument b to this function, with usual function application via juxtaposition
Your dictionary indexing example does not create a projection, and hence the indexing is not expecting any more arguments, so the first indexing
d[`a`b]
is evaluated immediately to give a result, and then the second index is applied to this result.
It would work the same for a monadic function
q){5+til x}[5] 2
7
Like the top level dictionary index, the application is carried out and then the result is indexed, as only one argument was expected, with no projection involved
EDIT - Adam beat me to it!
I don't think you can consider a function invocation f[a;b] or f[a]b as equivalent to indexing. f[a]b for a function is a projection but you can't project indexing in the same way. A function has a fixed valence, aka fixed number of inputs, but indexing can be done at any depth.
If you take your dictionary and fabricate it to have more depth, you can see that you can keeping indexing deeper and deeper:
q)d:{`a`b!2#enlist value x}/[1;d]
q)d[`a`b;1;1]
5 5
q)d:{`a`b!2#enlist value x}/[2;d]
q)d[`a`b;1;1;1;1]
5 5
q)d:{`a`b!2#enlist value x}/[2;d]
q)d[`a`b;1;1;1;1;1;1]
5 5
Yet you can still index just at the top level d[`a`b]. So the interpreter has to decide if its indexing at the top level, aka d # `a`b or indexing at depth d . (`a`b;0).
To avoid confusion it indexes at top level if you supply one level of indexing and indexes at depth if you supply more than one level of indexing. Thus no projections (at least not in the same manner).
And as mentioned above, functions don't have this ambiguity because they have a fixed number of parameters and so they can be projected.
What's happening here is that d[`a`b] has the depth/valence as d. So when you apply d[`a`b]0 the zero is not indexing at depth. You get expected results if you don't index multiple values of your dictionary:
q)d[`a`b;0]~d[`a`b][0]
0b
q)d[`a;0]~d[`a][0]
1b
q)d[`b;0]~d[`b][0]
1b
This is more clear if you instead consider a 2x3 matrix which has identical behavior to your original example
q)M:(1 2 3;4 5 6)
q)M[0 1;0]
1 4
q)M[0 1][0]
1 2 3
Indexing any one row results in a simple vector
q)type M[0]
7h
q)type M[1]
7h
But indexing more than one row results in a matrix:
q)type M[0 1]
0h
In fact, indexing both rows results in the same exact matrix
q)M~M[0 1]
1b
So we should expect
q)M[0]~M[0 1][0]
1b
as we see above.
None of this should have an impact on calling dyadic functions, since supplying one parameter explicitly results in a function projection and therefore the valence is always reduced.
q)type {2+x*y}
100h
q)type {2+x*y}[10]
104h
Related
I've come across some unexpected behaviour in matlab that I can't make sense of when performing vectorised assignment:
>> q=4;
>> q(q==[1,3,4,5,7,8])
The logical indices contain a true value outside of the array bounds.
>> q(q==[1,3,4,5,7,8])=1
q =
4 0 1
Why does the command q(q==[1,3,4,5,7,8]) result in an error, but the command q(q==[1,3,4,5,7,8])=1 work? And how does it arrive at 4 0 1 being the output?
The difference between q(i) and q(i)=a is that the former must produce the value of an array element; if i is out of bounds, MATLAB chooses to give an error rather than invent a value (good choice IMO). And the latter must write a value to an array element; if i is out of bounds, MATLAB chooses to extend the array so that it is large enough to be able to write to that location (this has also proven to be a good choice, it is useful and used extensively in code). Numeric arrays are extended by adding zeros.
In your specific case, q==[1,3,4,5,7,8] is the logical array [0,0,1,0,0,0]. This means that you are trying to index i=3. Since q has a single value, reading at index 3 is out of bounds, but we can write there. q is padded to size 3 by adding zeros, and then the value 1 is written to the third element.
The following error occurs quite frequently:
Subscript indices must either be real positive integers or logicals
I have found many questions about this but not one with a really generic answer. Hence I would like to have the general solution for dealing with this problem.
Subscript indices must either be real positive integers or logicals
In nearly all cases this error is caused by one of two reasons. Fortunately there is an easy check for this.
First of all make sure you are at the line where the error occurs, this can usually be achieved by using dbstop if error before you run your function or script. Now we can check for the first problem:
1. Somewhere an invalid index is used to access a variable
Find every variable, and see how they are being indexed. A variable being indexed is typically in one of these forms:
variableName(index,index)
variableName{index,index}
variableName{indices}(indices)
Now simply look at the stuff between the brackets, and select every index. Then hit f9 to evaluate the result and check whether it is a real positive integer or logical. Visual inspection is usually sufficient (remember that acceptable values are in true,false or 1,2,3,... BUT NOT 0) , but for a large matrix you can use things like isequal(index, round(index)), or more exactly isequal(x, max(1,round(abs(x)))) to check for real positive integers. To check the class you can use class(index) which should return 'logical' if the values are all 'true' or 'false'.
Make sure to check evaluate every index, even those that look unusual as per the example below. If all indices check out, you are probably facing the second problem:
2. A function name has been overshadowed by a user defined variable
MATLAB functions often have very intuitive names. This is convenient, but sometimes results in accidentally overloading (builtin) functions, i.e. creating a variable with the same name as a function for example you could go max = 9 and for the rest of you script/function Matlab will consider max to be a variable instead of the function max so you will get this error if you try something like max([1 8 0 3 7]) because instead of return the maximum value of that vector, Matlab now assumes you are trying to index the variable max and 0 is an invalid index.
In order to check which variables you have you can look at the workspace. However if you are looking for a systematic approach here is one:
For every letter or word that is followed by brackets () and has not been confirmed to have proper indices in step 1. Check whether it is actually a variable. This can easily be done by using which.
Examples
Simple occurrence of invalid index
a = 1;
b = 2;
c = 3;
a(b/c)
Here we will evaluate b/c and find that it is not a nicely rounded number.
Complicated occurrence of invalid index
a = 1;
b = 2;
c = 3;
d = 1:10;
a(b+mean(d(cell2mat({b}):c)))
I recommend working inside out. So first evaluate the most inner variable being indexed: d. It turns out that cell2mat({b}):c, nicely evaluates to integers. Then evaluate b+mean(d(cell2mat({b}):c)) and find that we don't have an integer or logical as index to a.
Here we will evaluate b/c and find that it is not a nicely rounded number.
Overloaded a function
which mean
% some directory\filename.m
You should see something like this to actually confirm that something is a function.
a = 1:4;
b=0:0.1:1;
mean(a) = 2.5;
mean(b);
Here we see that mean has accidentally been assigned to. Now we get:
which mean
% mean is a variable.
In Matlab (and most other programming languages) the multiplication sign must always be written. While in your math class you probably learned that you can write write a(a+a) instead of a*(a+a), this is not the same in matlab. The first is an indexing or function call, while the second is a multiplication.
>> a=0
a =
0
>> a*(a+a)
ans =
0
>> a(a+a)
Subscript indices must either be real
positive integers or logicals.
Answers to this question so far focused on the sources of this error, which is great. But it is important to understand the powerful yet very intuitive feature of matrix indexing in Matlab. Hence how indexing works and what is a valid index would help avoid this error in the first place by using valid indices.
At its core, given an array A of length n, there are two ways of indexing it.
Linear indexing: with subset of integers from 1 : n (duplicates allowed). 0 is not allowed, as Matlab arrays are 1-based, unless you use the method below. For higher-dimensional arrays, multiple subscripts are internally converted into a linear index, although in an efficient and transparent manner.
Logical indexing:wherein you use a n-length array of 0s and 1s, to pick those elements where indexing is true. In this case, unique(index) must have only 0 and 1.
So a valid indexing array into another array with n number of elements ca be:
entirely logical of the same size, or
linear with subsets of integers from 1:n
Keeping this in mind, invalid indexing error occurs when you mix the two types of indexing: one or more zeros occur in your linearly indexing array, or you mix 0s and 1s with anything other than 0s and 1s :)
There is tons of material online to learn this including this one:
http://www.mathworks.com/company/newsletters/articles/matrix-indexing-in-matlab.html
Short version:
The function passed as the fourth argument to accumarray sometimes gets called with arguments that are not consistent with specifications encoded the first argument to accumarray.
As a result, functions used as arguments to accumarray must test for what are, in effect, anomalous conditions.
The question is: how can an a 1-expression anonymous function test for such anomalous conditions? And more generally: how can write anonymous functions that are robust to accumarray's undocumented behavior?
Full version:
The code below is a drastically distilled version of a problem that ate up most of my workday today.
First some definitions:
idxs = [1:3 1:3 1:3]';
vals0 = [1 4 6 3 5 7 6 Inf 2]';
vals1 = [1 Inf 6 3 5 7 6 4 2]';
anon = #(x) max(x(~isinf(x)));
Note vals1 is obtained from vals0 by swapping elements 2 and 8. The "anonymous" function anon computes the maximum among the non-infinite elements of its input.
Given these definitions, the two calls below
accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)
which differ only in their second argument (vals0 vs vals1), should produce identical results, since the difference between vals0 and vals1 affects only the ordering of the values in the argument to one of the calls to anon, and the result of this function is insensitive to the ordering of elements in its argument.
As it turns out the first of these two expressions evaluates normally and produces the right result1:
>> accumarray(idxs, vals0, [], anon)
ans =
6
5
7
The second one, however, fails with:
>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '#(x)max(x(~isinf(x)))' returned a non-scalar value.
To troubleshoot this problem, all I could come up with2 was to write a separate function (in its own file, of course, "the MATLAB way")
function out = kluge(x)
global ncalls;
ncalls = ncalls + 1;
y = ~isinf(x);
if any(y)
out = max(x(y));
else
{ncalls x}
out = NaN;
end
end
...and ran the following:
>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], #kluge)
ans =
6
5
7
>> ncalls = int8(0); accumarray(idxs, vals1, [], #kluge)
ans =
[2] [Inf]
ans =
6
5
7
As one can see from the output of the last call to accumarray above, the argument to the second call to the kluge callback was the array [Int]. This tells me beyond any doubt that accumarray is not behaving as documented3 (since idxs specifies no arrays of length 1 to be passed to accumarray's function argument).
In fact, from this and other tests I determined that, contrary to what I expected, the function passed to accumarray is called more than max(idxs) (= 3) times; in the expressions involving kluge above it's called 5 times.
The problem here is that if one cannot rely on how accumarray's function argument will actually be called, then the only way to make this function argument robust is to include in it a lot of extra code to perform the necessary checks. This almost certainly will require that the function have multiple statements, which rules out anonymous functions. (E.g. the function kluge above is robust more robust than anon, but I don't know how to fit into an anonymous function.) Not being able to use anonymous functions with accumarray greatly reduces its utility.
So my question is:
how to specify anonymous functions that can be robust arguments to accumarray?
1 I have removed blank lines from MATLAB's typical over-padding in all the MATLAB output shown in this post.
2 I welcome comments with any other troubleshooting suggestions you may have; troubleshooting this problem was a lot harder than it should be.
3
In particular, see items number 1 through 5 right after the line "The function processes the input as follows:".
Short answer
The fourth input argument of accumarray, anon in this case, must return a scalar for any input.
Long answer (and discussion about index sorting)
Consider the output when the indexes are sorted:
>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
6
5
7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
6
5
7
Now, all the documentation has to say about this is the following:
If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.
How does this relate the trouble with anon? It is a clue, as this forces anon to be called for the complete set of values for a given idx rather than a subset/subarray, as Luis Mendo suggested.
Consider how accumarray would work for a non-sorted list of indexes and values:
>> [idxs vals0 vals1]
ans =
1 1 1
2 4 Inf
3 6 6
1 3 3
2 5 5
3 7 7
1 6 6
2 Inf 4
3 2 2
For both vals0 and vals1, the Inf belongs to the set where idxs equals 2. Since idxs is not sorted, it does not process all values for idxs=2 in one shot, at first. The actual algorithm (implementation) is opaque, but it seems to start by assuming that idxs is sorted, processing each single-valued block of the first argument. This is verifiable by putting a breakpoint in fun, the function reference by fourth input argument. When it encounters a 1 in idxs for the second time, it seems to start over, but with subsequent calls to fun containing all the values for a given index. Presumably accumarray calls some implementation of unique to fully-segment idxs (incidentally, order is not preserved). As kjo suggests, this is the point where accumarray actually processes the inputs as described in the documentation, following steps 1-5 here ("Find out how many unique indices there are..."). As a result, it crashes for vals1, when anon(Inf) is called, but not for vals0, which instead calls anon(4) on the first try.
However, even if it followed those steps exactly on the first go, it would not necessarily be robust if a complete subarray of values contained just Infs (consider that anon([Inf Inf Inf]) returns an empty matrix too). It is a requirement, although an understated one, that fun must return a scalar. What is not clear from the documentation is that it must return a scalar, for any inputs, not just what is expected based on the high-level description of the algorithm.
Workaround:
anon = #(x) max([x(~isinf(x));-Inf]);
The documentation does not say that anon is called only with the whole set1 of vals corresponding to each value of idx as its input. As seen in your example, it does get called with subsets thereof.
So the way to make anon robust seems to be: make sure it gives a scalar output when its input is any subset of vals (or maybe just any subset of each set with same-idx value). In your case, anon(inf) does not return a scalar.
1 It's actually an array, of course, but I think it's easier to describe this in terms of sets (and subsets).
In MATLAB, we can operate on both rows and columns of a matrix. What does it exactly mean by "row major" or "column major"?
It is important to understand that MATLAB stores data in column-major order, so you know what happens when you apply the colon operator without any commas:
>> M = magic(3)
M =
8 1 6
3 5 7
4 9 2
>> M(:)
ans =
8
3
4
1
5
9
6
7
2
I tend to think "MATLAB goes down, then across". This makes it easy to reshape and permute arrays without scrambling your data. It's also necessary in order to grasp linear indexing (e.g. M(4)).
For example, a common way to obtain a column vector inline from some expression that generates an array is:
reshape(<array expression>,[],1)
As with (:) this stacks all the columns on top of each other into single column vector, for all data in any higher dimensions.
But this nifty syntactic trick lets you avoid an extra line of code.
In MATLAB, arrays are stored in column major order.
It means that when you have a multi-dimensional array, its 1D representation in memory is such that leftmost indices change faster.
It's called column major order because for a 2D array (matrix), the first (leftmost) index is typically the row index, so since it changes faster than the second (next to the right) index, the 1D representation of the matrix is memory correspond to the concatenation of the columns of the matrix.
Suppose I have a vector B=[1 1 2 2] and A=[5 6 7 4] in the form of B says the numbers in the A that are need to be summed up. That is we need to sum 5 and 6 as the first entry of the result array and sum 7 and 4 as the second entry. If B is [1 2 1 2] then first element of the result is 5+7 and second element is 6+4.
How could I do it in Matlab in generic sense?
A fexible and general approach would be to use accumarray().
accumarray(B',A')
The function accumulates the values in A into the positions specified by B.
Since the documentation is not simple to understand I will summarize why it is flexible. You can:
choose your accumulating function (sum by default)
specify the positions as a set of coordinates for accumulation into ND arrays
preset the dimension of the accumulated array (by default it expands to max position)
pad with custom values the non accumulated positions (pads with 0 by default)
set the accumulated array to sparse, thus potential avoiding out of memory
[sum(A(1:2:end));sum(A(2:2:end))]