I have a one-column array that I would like to sort in descending order. The array has values ranging from 0 to approximately 10^14. When I sort the array with
sorted = sort(A,'descend')
and try to look at the ten largest values, I find that
sorted(1:10)
gives me
1.0e+14 *
5.1093
0.0000
0.0000
0.0000
etc... -- Displaying these first few, large entries tells me that there's only one non-zero element, which isn't true.
However, if I skip the first couple of entries, which are far greater than the rest of the array elements, I get
sorted(8:10) =
2.9754
2.4182
2.0799
Why does displaying these first few large array elements cause all others to be displayed as zero?
The first number in your list is a scale factor that multiplies the entire array that follows. Because of the difference in magnitude of the elements, you will need to play with the format settings. Try
format long
or
format long e
to see a better representation
You may find sprintf or num2str more useful for printing numbers at arbitrary precision.
See this related SO question.
Relevant Matlab docs:
http://www.mathworks.com/help/techdoc/ref/format.html
http://www.mathworks.com/help/techdoc/ref/num2str.html
http://www.mathworks.com/help/techdoc/ref/sprintf.html
They're displayed as zero because they are essentially zero when compared to the first element (that is, they're much much smaller than 10^14). But even if they're displayed as zero, they are NOT zero. Try typing in sorted(3): the result shouldn't be zero. (Edit: you already showed this above).
It's basically an issue of precision. Typing format long might also make this clearer.
Related
I am working on a code in Least Square Non Negative solution recovery context on Matlab, and I need (with no more details because it's not that important for this question) to know the number of non zero elements in my matrices and arrays.
The function NNZ on matlab does exactly what I want, but it happens that I need more information about what Matlab thinks of a "zero element", it could be 0 itself, or the numerical zero like 1e-16 or less.
Does anybody has this information about the NNZ function, cause I couldn't get the original script
Thanks.
PS : I am not an expert on Matlab, so accept my apologies if it's a really simple task.
I tried "open nnz", on Matlab but I only get a small script of commented code lines...
Since nnz counts everything that isn't an exact zero (i.e. 1e-100 is non-zero), you just have to apply a relational operator to your data first to find how many values exceed some tolerance around zero. For a matrix A:
n = nnz(abs(A) > 1e-16);
Also, this discussion of floating-point comparison might be of interest to you.
You can add in a tolerance by doing something like:
nnz(abs(myarray)>tol);
This will create a binary array that is 1 when abs(myarray)>tol and 0 otherwise and then count the number of non-zero entries.
So I'm working with sparse matrix and I have to find out different info about a very big one (10^6 size) and I need to find out the mean of the outlinks. Just to be sure I think mean is what you get from 3+4+5/3=4, 4 is the mean.
I thought of something like this:
[row,col] = find(A(:,2),1,'first')
and then I would do 1/numberInThatIndex or something similar, since it's a S-matrix (pretty sure it's called that).
And I would iterate column by column but for some reason it's not giving me the first number in each column, if I do find(A(:,1),1,'first') it does give me the first in the first column, but not in the second if I change it to A(:,2).
I'd also need something to store that index to access the value, I thought of a 2xN vector but I guess it's not the best idea. I mean, find is going to give me index, but I need the value in that index, and then store that or show it. Not sure if I'm explaining myself properly but I'm trying, sorry about that.
Just to be clear both when I input A(:,1) and A(:,2) it gives me index from the first column, and I do not want that, I want first element found from each column, so I can calculate the mean out of the number in that index.
edit: allright it seems like that indeed does work, but when I was checking the results I was putting 3817 instead of 3871 that was the given answer and so I found a 0 when I wanted something that's not a zero. Not sure if I should delete all of this.
To solve your problem, you can do the following:
numberNonZerosPerColumn = sum(S~=0,1);
meanValue = nanmean(1./numberNonZerosPerColumn);
Count the number of nonzero elements in every column n(i)
Compute the values v(i) that are stored there, which are defined by v(i) := 1/n(i)
Take the mean of those values where n(i) is not zero (i.e. summing all those values, where v(i) is not NaN and divide by the number of columns that contain at least one zero)
If you want to treat columns without any nonzero entry as v(i):= 0, but still use them in your mean, you can use:
numberNonZerosPerColumn = sum(S~=0,1);
meanValue = nansum(1./numberNonZerosPerColumn)/size(S,2);
I have made an array of doubles and when I want to use the find command to search for the indices of specific values in the array, this yields an empty matrix which is not what I want. I assume the problem lies in the precision of the values and/or decimal places that are not shown in the readout of the array.
command:
peaks=find(y1==0.8236)
array readout:
y1 =
Columns 1 through 11
0.2000 0.5280 0.8224 0.4820 0.8239 0.4787 0.8235 0.4796 0.8236 0.4794 0.8236
Columns 12 through 20
0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794
output:
peaks =
Empty matrix: 1-by-0
I tried using the command
format short
but I guess this only truncates the displayed values and not the actual values in the array.
How can I used the find command to give an array of indices?
By default, each element of a numerical matrix in Matlab is stored using floating point double precision. As you surmise in the question format short and format long merely alter the displayed format, rather than the actual format of the numbers.
So if y1 is created using something like y1 = rand(100, 1), and you want to find particular elements in y1 using find, you'll need to know the exact value of the element you're looking for to floating point double precision - which depending on your application is probably non-trivial. Certainly, peaks=find(y1==0.8236) will return the empty matrix if y1 only contains values like 0.823622378...
So, how to get around this problem? It depends on your application. One approach is to truncate all the values in y1 to a given precision that you want to work in. Funnily enough, a SO matlab question on exactly this topic attracted two good answers about 12 hours ago, see here for more.
If you do decide to go down this route, I would recommend something like this:
a = 1e-4 %# Define level of precision
y1Round = round((1/a) * y1); %# Round to appropriate precision, and leave y1 in integer form
Index = find(y1Round == SomeValue); %# Perform the find operation
Note that I use the find command with y1Round in integer form. This is because integers are stored exactly when using floating point double, so you won't need to worry about floating point precision.
An alternative approach to this problem would be to use find with some tolerance for error, for example:
Index = find(abs(y1 - SomeValue) < Tolerance);
Which path you choose is up to you. However, before adopting either of these approaches, I would have a good hard look at your application and see if it can be reformulated in some way such that you don't need to search for specific "real" numbers from among a set of "reals". That would be the most ideal outcome.
EDIT: The code advocated in the other two answers to this question is neater than my second approach - so I've altered it accordingly.
Testing for equality with floating-point numbers is almost always a bad idea. What you probably want to do is test to see which numbers are close enough to the target value:
peaks = find( abs( y - .8236 ) < .0001 );
The problem is indeed with the precision. The array that you see displayed is not the actual array, as the actual array has more digits for each of the numbers. Changing the format just changes the way in which the array is displayed, so it doesn't solve the problem.
You have two options, either modify the array or modify what you are looking for. It is probably better to modify what you are looking for, since then you are not changing the actual values.
So instead of looking for equality, you can look for proximity (so the difference between the number you are searching for and the number in the array is at most some small epsilon):
peaks = find( abs(y1-0.8236) < epsilon )
In general, when you are dealing with floats, always try to avoid exact comparisons and use some error thresholds, since the representation of these numbers is limited so they are often stored with small inaccuracies.
I am having an issue with viewing double data in matlab console. Actually, I am importing a matrix from my data file. The value of a particular row and column was 1.543 but in the console when I use disp(x) where x is the matrix imported, it is showing as 1.0e+03 * 0.0002. However, when I try to access that particular element in the matrix using disp(x(25,25)) where 25 and 25 are the row and column numbers it is showing to be 1.543. So I am confused. Any clarifications. It is just that when I print the whole matrix it is showing as 1.0e+03 * 0.0002.
The following command should fix it. It is only a display issue, the precision of the actual values in the matrix are not affected:
format shortG
That happens due to high dynamic range of your data.
Try for example :
x = [10^-10 10^10];
disp(x);
The result is:
1.0e+010 *
0.0000 1.0000
It looks like the first value is zero, but it isn't. It is almost zero compared to the second one. That is not surprising. Try to add to the big value the small one, and subtract, and you get zero. That is due to floating point arithmetic.The following expression is true
isequal( (x(1)+x(2)) - x(2) , 0)
What can be done?
1) A really high dynamic range can cause troubles in any kind of computations. Try to understand where it came from, and solve the problem in a broader context.
2). You can try to set
format long
It can improve the situation visually for some of the cases.
I have two arrays of data that I'm trying to amalgamate. One contains actual latencies from an experiment in the first column (e.g. 0.345, 0.455... never more than 3 decimal places), along with other data from that experiment. The other contains what is effectively a 'look up' list of latencies ranging from 0.001 to 0.500 in 0.001 increments, along with other pieces of data. Both data sets are X-by-Y doubles.
What I'm trying to do is something like...
for i = 1:length(actual_latency)
row = find(predicted_data(:,1) == actual_latency(i))
full_set(i,1:4) = [actual_latency(i) other_info(i) predicted_info(row,2) ...
predicted_info(row,3)];
end
...in order to find the relevant row in predicted_data where the look up latency corresponds to the actual latency. I then use this to created an amalgamated data set, full_set.
I figured this would be really simple, but the find function keeps failing by throwing up an empty matrix when looking for an actual latency that I know is in predicted_data(:,1) (as I've double-checked during debugging).
Moreover, if I replace find with a for loop to do the same job, I get a similar error. It doesn't appear to be systematic - using different participant data sets throws it up in different places.
Furthermore, during debugging mode, if I use find to try and find a hard-coded value of actual_latency, it doesn't always work. Sometimes yes, sometimes no.
I'm really scratching my head over this, so if anyone has any ideas about what might be going on, I'd be really grateful.
You are likely running into a problem with floating point comparisons when you do the following:
predicted_data(:,1) == actual_latency(i)
Even though your numbers appear to only have three decimal places of precision, they may still differ by very small amounts that are not being displayed, thus giving you an empty matrix since FIND can't get an exact match.
One feature of floating point numbers is that certain numbers can't be exactly represented, since they aren't an integer power of 2. This occurs with the numbers 0.1 and 0.001. If you repeatedly add or multiply one of these numbers you can see some unexpected behavior. Amro pointed out one example in his comment: 0.3 is not exactly equal to 3*0.1. This can also be illustrated by creating your look-up list of latencies in two different ways. You can use the normal colon syntax:
vec1 = 0.001:0.001:0.5;
Or you can use LINSPACE:
vec2 = linspace(0.001,0.5,500);
You'd think these two vectors would be equal to one another, but think again!:
>> isequal(vec1,vec2)
ans =
0 %# FALSE!
This is because the two methods create the vectors by performing successive additions or multiplications of 0.001 in different ways, giving ever so slightly different values for some entries in the vector. You can take a look at this technical solution for more details.
When comparing floating point numbers, you should therefore do your comparisons using some tolerance. For example, this finds the indices of entries in the look-up list that are within 0.0001 of your actual latency:
tolerance = 0.0001;
for i = 1:length(actual_latency)
row = find(abs(predicted_data(:,1) - actual_latency(i)) < tolerance);
...
The topic of floating point comparison is also covered in this related question.
You may try to do the following:
row = find(abs(predicted_data(:,1) - actual_latency(i))) < eps)
EPS is accuracy of floating-point operation.
Have you tried using a tolerance rather than == ?