Output of unique function in Matlab - matlab

I am using the unique function in Matlab and I am confused about the output of such a function.
Consider the following simple code
rng default
T=randn(232,50); %232*50
equalorder=randsample(232,80802,true); %80802*1
T_extended=T(equalorder,:); %80802*50
By construction, I expect the size of T_extended to be 232. In fact,
S=size(unique(T_extended,'rows'),1); %232
Now, consider the specific T and equalorder function that are produced by some codes of mine (T and equalorder are upload here
https://filebin.net/603zn7mt2efzq91c
unfortunately my code is too long to be reproduced and I think that the issue may be numerical). Let's apply the code above to these arrays:
clear
load matrices %T, equalorder
T_extended=T(equalorder,:);
However, if I do
S=size(unique(T_extended,'rows'),1);
I get S=4694 and not S=232. Why?

The code or data necessary to reproduce the problem should be included in the question itself, as external links may stop working in the future. In this case, however, it was easy to identify the pattern that causes the problem (see below), so the question together with this answer should be self-contained.
In your linked example, T contains NaN at entry (216,37):
>> T(216,37)
ans =
NaN
(and this is the only such entry):
>> nnz(isnan(T))
ans =
1
By design, NaN values are not equal to each other. So when computing unique(T_extended, 'rows'), all rows of T_extended that correspond to the original 216-th row of T are counted as being different. This is what causes the count of unique rows to increase. If you don't consider the 37-the column (which is the only one that contains NaN) you get the expected result:
>> S=size(unique(T_extended(:,[1:36 38:end]),'rows'),1)
S =
232
Let's count how many times a NaN entry appears in T_extended:
>> nnz(isnan(T_extended))
ans =
4465
(Of course, this happens because):
>> sum(equalorder==216)
ans =
4465
This means that the count of unique rows is increased by 4465 - 1 when each repetition of the row containing NaN is counted as a different row. And 4465 - 1 + 232 is 4696, which is the result you get.

Related

Why do I get different result in different versions of MATLAB (2016 vs 2021)?

Why do I get different results when using the same code running in different version of MATLAB (2016 vs 2021) for sum(b.*x1) where b is single and x1 is double. How to avoid such error between MATLAB version?
MATLAB v.2021:
sum(b.*x1)
ans =
single
-0.0013286
MATLAB 2016
sum(b.*x1)
ans =
single
-0.0013283
In R2017b, they changed the behavior of sum for single-precision floats, and in R2020b they made the same changes for other data types too.
The change speeds up the computation, and improves accuracy by reducing the rounding errors. Simply put, previously the algorithm would just run through the array in sequence, adding up the values. The new behavior computes the sum over smaller portions of the array, and then adds up those results. This is more precise because the running total can become a very large number, and adding smaller numbers to it causes more rounding in those smaller numbers. The speed improvement comes from loop unrolling: the loop now steps over, say, 8 values at the time, and in the loop body, 8 running totals are computed (they don’t specify the number they use, the 8 here is an example).
Thus, your newer result is a better approximation to the sum of your array than the old one.
For more details (a better explanation of the new algorithm and the reason for the change), see this blog post.
Regarding how to avoid the difference: you could implement your own sum function, and use that instead of the builtin one. I would suggest writing it as a MEX-file for efficiency. However, do make sure you match the newer behavior of the builtin sum, as that is the better approximation.
Here is an example of the problem. Let's create an array with N+1 elements, where the first one has a value of N and the rest have a value of 1.
N = 1e8;
a = ones(N+1,1,'single');
a(1) = N;
The sum over this array is expected to be 2*N. If we set N large enough w.r.t. the data type, I see this in R2017a (before the change):
>> sum(a)
ans =
single
150331648
And I see this in R2018b (after the change for single-precision sum):
>> sum(a)
ans =
single
199998976
Both implementations make rounding errors here, but one is obviously much, much closer to the expected result (2e8, or 200000000).

Empty objects in MATLAB [duplicate]

This question already has answers here:
Difference between [] and [1x0] in MATLAB
(4 answers)
Closed 7 years ago.
In many cases I have seen that MATLAB would return empty objects and if you look at their size, they would be something like 1 x 0 or0 x 1.
An example is the following piece of code :
img = zeros(256); % Create a square zero image of dimension 256 X 256
regions = detectMSERFeatures(img);
size(regions)
If you look at the size of regions, it will by 0 X 1. My questions are the following. Some of these questions can be overlapping.
What is the meaning of such dimensions ?
What can be said about the memory layout of such objects ? The reason I am asking about memory layout is because MATLAB allows you to write the following statement: temp = zeros(1,0);
Why can't MATLAB simply return an empty constant like NULL in such cases instead of returning weirdoes of size 1 x 0 ?
Arrays in MATLAB can have any of their dimensions of size zero - I guess that may seem odd initially, but they're just arrays like any other.
You can create them directly:
>> a = double.empty(2,0,3,0,2)
a =
Empty array: 2-by-0-by-3-by-0-by-2
or using other array creation functions such as zeros, ones, rand and so on.
Note that, as is obvious from the above, empty arrays still have a class - you can create them with double.empty, uint8.empty, logical.empty and so on. The same is also true for user-defined classes.
It's very useful to have such arrays, rather than just a NULL element. Without them, you would need to spend a lot of programming effort to check for edge cases where you had a NULL rather than an array, and you wouldn't be able to distinguish between arrays that were NULL because they had no rows, and arrays that were NULL because they had no columns.
In addition, they're useful for initializing arrays. For example, let's say you have an array that needs to start empty but get filled later, and you know that it's always going to have three rows but a variable number of columns. You can then initialize it as double.empty(3,0), and you know that your initial value will always pass any checks on the number of rows your array has. That wouldn't work if you initialized it to [] (which is zero by zero), or to a NULL element.
Finally, you can also multiply them in the same way as non-empty arrays. It may be surprising to you that:
>> a = double.empty(2,0)
a =
Empty matrix: 2-by-0
>> b = double.empty(0,3)
b =
Empty matrix: 0-by-3
>> a*b
ans =
0 0 0
0 0 0
but if you think it through, it's just a logical and necessary application/extension of the regular rules for matrix multiplication.
As to how they're stored in memory - again, they're stored just like regular MATLAB arrays. I can't recall the exact details (look in the documentation for mxArray), but it's basically a header giving the dimensions (some of which may be zero), followed by a list of the elements in column-major order (which in this case is an empty list).

`accumarray` makes anomalous calls to its function argument

Short version:
The function passed as the fourth argument to accumarray sometimes gets called with arguments that are not consistent with specifications encoded the first argument to accumarray.
As a result, functions used as arguments to accumarray must test for what are, in effect, anomalous conditions.
The question is: how can an a 1-expression anonymous function test for such anomalous conditions? And more generally: how can write anonymous functions that are robust to accumarray's undocumented behavior?
Full version:
The code below is a drastically distilled version of a problem that ate up most of my workday today.
First some definitions:
idxs = [1:3 1:3 1:3]';
vals0 = [1 4 6 3 5 7 6 Inf 2]';
vals1 = [1 Inf 6 3 5 7 6 4 2]';
anon = #(x) max(x(~isinf(x)));
Note vals1 is obtained from vals0 by swapping elements 2 and 8. The "anonymous" function anon computes the maximum among the non-infinite elements of its input.
Given these definitions, the two calls below
accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)
which differ only in their second argument (vals0 vs vals1), should produce identical results, since the difference between vals0 and vals1 affects only the ordering of the values in the argument to one of the calls to anon, and the result of this function is insensitive to the ordering of elements in its argument.
As it turns out the first of these two expressions evaluates normally and produces the right result1:
>> accumarray(idxs, vals0, [], anon)
ans =
6
5
7
The second one, however, fails with:
>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '#(x)max(x(~isinf(x)))' returned a non-scalar value.
To troubleshoot this problem, all I could come up with2 was to write a separate function (in its own file, of course, "the MATLAB way")
function out = kluge(x)
global ncalls;
ncalls = ncalls + 1;
y = ~isinf(x);
if any(y)
out = max(x(y));
else
{ncalls x}
out = NaN;
end
end
...and ran the following:
>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], #kluge)
ans =
6
5
7
>> ncalls = int8(0); accumarray(idxs, vals1, [], #kluge)
ans =
[2] [Inf]
ans =
6
5
7
As one can see from the output of the last call to accumarray above, the argument to the second call to the kluge callback was the array [Int]. This tells me beyond any doubt that accumarray is not behaving as documented3 (since idxs specifies no arrays of length 1 to be passed to accumarray's function argument).
In fact, from this and other tests I determined that, contrary to what I expected, the function passed to accumarray is called more than max(idxs) (= 3) times; in the expressions involving kluge above it's called 5 times.
The problem here is that if one cannot rely on how accumarray's function argument will actually be called, then the only way to make this function argument robust is to include in it a lot of extra code to perform the necessary checks. This almost certainly will require that the function have multiple statements, which rules out anonymous functions. (E.g. the function kluge above is robust more robust than anon, but I don't know how to fit into an anonymous function.) Not being able to use anonymous functions with accumarray greatly reduces its utility.
So my question is:
how to specify anonymous functions that can be robust arguments to accumarray?
1 I have removed blank lines from MATLAB's typical over-padding in all the MATLAB output shown in this post.
2 I welcome comments with any other troubleshooting suggestions you may have; troubleshooting this problem was a lot harder than it should be.
3
In particular, see items number 1 through 5 right after the line "The function processes the input as follows:".
Short answer
The fourth input argument of accumarray, anon in this case, must return a scalar for any input.
Long answer (and discussion about index sorting)
Consider the output when the indexes are sorted:
>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
6
5
7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
6
5
7
Now, all the documentation has to say about this is the following:
If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.
How does this relate the trouble with anon? It is a clue, as this forces anon to be called for the complete set of values for a given idx rather than a subset/subarray, as Luis Mendo suggested.
Consider how accumarray would work for a non-sorted list of indexes and values:
>> [idxs vals0 vals1]
ans =
1 1 1
2 4 Inf
3 6 6
1 3 3
2 5 5
3 7 7
1 6 6
2 Inf 4
3 2 2
For both vals0 and vals1, the Inf belongs to the set where idxs equals 2. Since idxs is not sorted, it does not process all values for idxs=2 in one shot, at first. The actual algorithm (implementation) is opaque, but it seems to start by assuming that idxs is sorted, processing each single-valued block of the first argument. This is verifiable by putting a breakpoint in fun, the function reference by fourth input argument. When it encounters a 1 in idxs for the second time, it seems to start over, but with subsequent calls to fun containing all the values for a given index. Presumably accumarray calls some implementation of unique to fully-segment idxs (incidentally, order is not preserved). As kjo suggests, this is the point where accumarray actually processes the inputs as described in the documentation, following steps 1-5 here ("Find out how many unique indices there are..."). As a result, it crashes for vals1, when anon(Inf) is called, but not for vals0, which instead calls anon(4) on the first try.
However, even if it followed those steps exactly on the first go, it would not necessarily be robust if a complete subarray of values contained just Infs (consider that anon([Inf Inf Inf]) returns an empty matrix too). It is a requirement, although an understated one, that fun must return a scalar. What is not clear from the documentation is that it must return a scalar, for any inputs, not just what is expected based on the high-level description of the algorithm.
Workaround:
anon = #(x) max([x(~isinf(x));-Inf]);
The documentation does not say that anon is called only with the whole set1 of vals corresponding to each value of idx as its input. As seen in your example, it does get called with subsets thereof.
So the way to make anon robust seems to be: make sure it gives a scalar output when its input is any subset of vals (or maybe just any subset of each set with same-idx value). In your case, anon(inf) does not return a scalar.
1 It's actually an array, of course, but I think it's easier to describe this in terms of sets (and subsets).

matlab: understanding matlab behavior

Could somebody explain the following code snippet? I have no background in computer science or programming and just recently became aware of Matlab. I understand the preallocation part from data=ceil(rand(7,5)*10)... to ...N*(N-1)/2).
I need to understand every aspect of how matlab processes the code from kk=0 to the end. Also, the reasons why the code is codified in that manner. There's no need to explain the function of: bsxfun(#minus), just how it operates in the scheme of the code.
data=ceil(rand(7,5)*10);
N = size(data,2);
b=cell(N-1,1);
c=NaN(size(data,1),N*(N-1)/2);
kk=0;
for ii=1:N-1
b{ii} = bsxfun(#minus,data(:,ii),data(:,ii+1:end));
c(:,kk+(1:N-ii)) = bsxfun(#minus,data(:,ii),data(:,ii+1:end));
kk=kk+N-ii;
end
Start at zero
kk=0;
Loop with ii going from 1 up to N-1 incrementing by 1 every iteration. Type 1:10 in the command line of matlab and you'll see that it outputs 1 2 3 4 5 6 7 8 9 10. Thuis colon operator is a very important operator to understand in matlab.
for ii=1:N-1
b{ii} = ... this just stores a matrix in the next element of the cell vector b. Cell arrays can hold anything in each of their elements, this is necessary as in this case each iteration is creating a matrix with one fewer column than the previous iteration.
data(:,ii) --> just get the iith column of the matrix data (: means get all the rows)
data(:, ii + 1:end) means get a subset of the matrix data consisting of all the rows but only of columns that appear after column ii
bsxfun(#minus, data(:,ii), data(:,ii+1:end)) --> for each column in the matrix data(:, ii+1:end), subtract the single column data(:,ii)
b{ii} = bsxfun(#minus,data(:,ii),data(:,ii+1:end));
%This does the same thing as the line above but instead of storing the resulting matrix of the loop in a separate cell of a cell array, this is appending the original array with the new matrix. Note that the new matrix will have the same number of rows each time but one fewer column, so this appends as new columns.
%c(:,kk + (1:N-ii)) = .... --> So 1:(N-ii) produces the numbers 1 up to the number of columns in the result of this iteration. In matlab, you can index an array using another array. So for example try this in the command line of matlab: a = [0 0 0 0 0]; a([1 3 5]) = 1. The result you should see is a = 1 0 1 0 1. but you can also extend a matrix like this so for example now type a(6) = 2. The result: a = 1 0 1 0 1 2. So by using c(:, 1:N-ii) we are indexing all the rows of c and also the right number of columns (in order). Adding the kk is just offsetting it so that we do not overwrite our previous results.
c(:,kk+(1:N-ii)) = bsxfun(#minus,data(:,ii),data(:,ii+1:end));
Now we just increment kk by the number of new columns we added so that in the next iteration, c is appended at the end.
kk=kk+N-ii;
end;
I suggest that you put a breakpoint in this code and step through it line by line and look at how the variables change in matlab. To do this click on the little dashed line next to k=0; in the mfile, you will see a red dot appear there, and then run the code. The code will only execute as far as the dot, you are now in debug mode. If you hover over a variable in debug mode matlab will show its contents in a tool tip. For a really big variable check it out in the workspace. Now step through the code line by line and use my explanations above to make sure you understand how each line is changing each variable. For more complex lines like b{ii} = bsxfun(#minus,data(:,ii),data(:,ii+1:end)); you should highlight code snippets and ruin these in the command line to see what each part is doing so for example run data(:,ii) to see what that does and then try data(:,ii+1:end)) or even just ii+1:end (well in that case it wont work, replace end with size(data, 2)). Debugging is the best way to understand code that confuses you.
bsxfun(#minus,A,B)
is almost the same as
A-B
The difference is that the bsxfun version will handle inputs of different size: In each dimension (“direction,” if you find it easier to think about that way), if one of the inputs is scalar and the other one a vector, the scalar one will simply be repeated sufficiently often.
http://www.mathworks.com/help/techdoc/ref/bsxfun.html

matlab percentage change between cells

I'm a newbie to Matlab and just stumped how to do a simple task that can be easily performed in excel. I'm simply trying to get the percent change between cells in a matrix. I would like to create a for loop for this task. The data is setup in the following format:
DAY1 DAY2 DAY3...DAY 100
SUBJECT RESULTS
I could only perform getting the percent change between two data points. How would I conduct it if across multiple days and multiple subjects? And please provide explanation
Thanks a bunch
FOR EXAMPLE, FOR DAY 1 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5), DAY 2 SUBJECT1(RESULT=2), SUBJECT2(RESULT=8), SUBJECT3(RESULT=10), DAY 3 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5).
I WANT THE PERCENT CHANGE SO OUTPUT WILL BE DAY 2 SUBJECT1(RESULT=100%), SUBJECT2(RESULT=100%), SUBJECT3(RESULT=100%). DAY3 SUBJECT1(RESULT=50%), SUBJECT2(RESULT=50%), SUBJECT3(RESULT=50%)
updated:
Hi thanks for responding guys. sorry for the confusion. zebediah49 is pretty close to what I'm looking for. My data is for example a 10 x 10 double. I merely wanted to get the percentage change from column to column. For example, if I want the percentage change from rows 1 through 10 on all columns (from columns 2:10). I would like the code to function for any matrix dimension (e.g., 1000 x 1000 double) zebediah49 could you explain the code you posted? thanks
updated2:
zebediah49,
(data(1:end,100)- data(1:end,99))./data(1:end,99)
output=[data(:,2:end)-data(:,1:end-1)]./data(:,1:end-1)*100;
Observing the code above, How would I go about modifying it so that column 100 is used as the index against all of the other columns(1-99)? If I change the code to the following:
(data(1:end,100)- data(1:end,:))./data(1:end,:)
matlab is unable because of exceeding matrix dimensions. How would I go about implementing that?
UPDATE 3
zebediah49,
Worked perfectly!!! Originally I created a new variable for the index and repmat the index to match the matrices which was not a good idea. It took forever to replicate when dealing with large numbers.
Thanks for you contribution once again.
Thanks Chris for your contribution too!!! I was looking more on how to address and manipulate arrays within a matrix.
It's matlab; you don't actually want a loop.
output=input(2:end,:)./input(1:end-1,:)*100;
will probably do roughly what you want. Since you didn't give anything about your matlab structure, you may have to change index order, etc. in order to make it work.
If it's not obvious, that line defines output as a matrix consisting of the input matrix, divided by the input matrix shifted right by one element. The ./ operator is important, because it means that you will divide each element by its corresponding one, as opposed to doing matrix division.
EDIT: further explanation was requested:
I assumed you wanted % change of the form 1->1->2->3->1 to be 100%, 200%, 150%, 33%.
The other form can be obtained by subtracting 100%.
input(2:end,:) will grab a sub-matrix, where the first row is cut off. (I put the time along the first dimension... if you want it the other way it would be input(:,2:end).
Matlab is 1-indexed, and lets you use the special value end to refer to the las element.
Thus, end-1 is the second-last.
The point here is that element (i) of this matrix is element (i+1) of the original.
input(1:end-1,:), like the above, will also grab a sub-matrix, except that that it's missing the last column.
I then divide element (i) by element (i+1). Because of how I picked out the sub-matrices, they now line up.
As a semi-graphical demonstration, using my above numbers:
input: [1 1 2 3 1]
input(2,end): [1 2 3 1]
input(1,end-1): [1 1 2 3]
When I do the division, it's first/first, second/second, etc.
input(2:end,:)./input(1:end-1,:):
[1 2 3 1 ]
./ [1 1 2 3 ]
---------------------
== [1.0 2.0 1.5 0.3]
The extra index set to (:) means that it will do that procedure across all of the other dimension.
EDIT2: Revised question: How do I exclude a row, and keep it as an index.
You say you tried something to the effect of (data(1:end,100)- data(1:end,:))./data(1:end,:). Matlab will not like this, because the element-by-element operators need them to be the same size. If you wanted it to only work on the 100th column, setting the second index to be 100 instead of : would do that.
I would, instead, suggest setting the first to be the index, and the rest to be data.
Thus, the data is processed by cutting off the first:
output=[data(2:end,2:end)-data(2:end,1:end-1)]./data(2:end,1:end-1)*100;
OR, (if you neglect the start, matlab assumes 1; neglect the end and it assumes end, making (:) shorthand for (1:end).
output=[data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100;
However, you will probably still want the indices back, in which case you will need to append that subarray back:
output=[data(1,1:end-1) data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100];
This is probably not how you should be doing it though-- keep data in one matrix, and time or whatever else in a separate array. That makes it much easier to do stuff like this to data, without having to worry about excluding time. It's especially nice when graphing.
Oh, and one more thing:
(data(:,2:end)-data(:,1:end-1))./data(:,1:end-1)*100;
is identically equivalent to
data(:,2:end)./data(:,1:end-1)*100-100;
Assuming zebediah49 guessed right in the comment above and you want
1 4 5
2 8 10
1 4 5
to turn into
1 1 1
-.5 -.5 -.5
then try this:
data = [1,4,5; 2,8,10; 1,4,5];
changes_absolute = diff(data);
changes_absolute./data(1:end-1,:)
ans =
1.0000 1.0000 1.0000
-0.5000 -0.5000 -0.5000
You don't need the intermediate variable, you can directly write diff(data)./data(1:end,:). I just thought the above might be easier to read. Getting from that result to percentage numbers is left as an exercise to the reader. :-)
Oh, and if you really want 50%, not -50%, just use abs around the final line.