Consider the following example:
Bathymetry = [0,4134066;
3,3817906;
6,3343666;
9,2978725;
12,2742092;
14,2584337;
16,2415355;
18,2228054;
20,2040753;
23,1761373;
26,1514085];
Depth = [0;1;2;3;5;8;10;11.6;15];
newDepth = min(Bathymetry(:,1)):0.1:max(Bathymetry(:,1));
From this I want to find which column of 'newDepth' corresponds to 'Depth'. For example:
dd = find(newDepth==Depth(1))
dd =
1
Showing that Depth == 0, is located in the first column of newDepth. When I apply this to all of the entries of 'Depth'
for i = 1:length(Depth);
dd(i) = find(newDepth == Depth(i));
end
I receive an error:
Improper assignment with rectangular empty matrix.
Initially I couldn't understand why, but by looking at the array for newDepth, especially column 117 where newDepth == 11.6, I noticed that the value isnt equal to 11.6 but equal to 11.600000000000001 thus being different from Depth(8). How can I fix this? and why does MATLAB not just write the value as 11.6? nowhere have I specified to include the .000000000000001.
This is because there is no exact representation of 0.1 in binary. Read the wiki for more background. In binary, representing 0.1 is something like trying to write out all the decimals of one-third:
1/3 == 0.333333333333333333...
it will never be exact, no matter how many 3's you add.
For this (and many other) reasons, I'd suggest you do not use == (which is a very stringent demand), but rather use
for ii = 1:length(Depth);
[~,dd(ii)] = min( abs(newDepth-Depth(ii)) );
end
This problem is to to with floating point arithmetic which is quite complicated, i recommend you google it and read a bit, there is plenty out there explaining it. Here is a good start: http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/
To solve it for your case I would suggest rounding
newDepth = round(newDepth * 10) / 10
The 11.600000000000001 is because the number 11.6 is not exactly representable in binary floating point notation. This is to do with the way the hardware works rather than any limitation of Matlab.
You want to change your compare to something like
dd(i) = find(abs(newDepth - Depth(i))<.0000001);
Related
I use mod() to compare if a number's 0.01 digit is 2 or not.
if mod(5.02*100, 10) == 2
...
end
The result is mod(5.02*100, 10) = 2 returns 0;
However, if I use mod(1.02*100, 10) = 2 or mod(20.02*100, 10) = 2, it returns 1.
The result of mod(5.02*100, 10) - 2 is
ans =
-5.6843e-14
Could it be possible that this is a bug for matlab?
The version I used is R2013a. version 8.1.0
This is not a bug in MATLAB. It is a limitation of floating point arithmetic and conversion between binary and decimal numbers. Even a simple decimal number such as 0.1 has cannot be exactly represented as a binary floating point number with finite precision.
Computer floating point arithmetic is typically not exact. Although we are used to dealing with numbers in decimal format (base10), computers store and process numbers in binary format (base2). The IEEE standard for double precision floating point representation (see http://en.wikipedia.org/wiki/Double-precision_floating-point_format, what MATLAB uses) specifies the use of 64 bits to represent a binary number. 1 bit is used for the sign, 52 bits are used for the mantissa (the actual digits of the number), and 11 bits are used for the exponent and its sign (which specifies where the decimal place goes).
When you enter a number into MATLAB, it is immediately converted to binary representation for all manipulations and arithmetic and then converted back to decimal for display and output.
Here's what happens in your example:
Convert to binary (keeping only up to 52 digits):
5.02 => 1.01000001010001111010111000010100011110101110000101e2
100 => 1.1001e6
10 => 1.01e3
2 => 1.0e1
Perform multiplication:
1.01000001010001111010111000010100011110101110000101 e2
x 1.1001 e6
--------------------------------------------------------------
0.000101000001010001111010111000010100011110101110000101
0.101000001010001111010111000010100011110101110000101
+ 1.01000001010001111010111000010100011110101110000101
-------------------------------------------------------------
1.111101011111111111111111111111111111111111111111111101e8
Cutting off at 52 digits gives 1.111101011111111111111111111111111111111111111111111e8
Note that this is not the same as 1.11110110e8 which would be 502.
Perform modulo operation: (there may actually be additional error here depending on what algorithm is used within the mod() function)
mod( 1.111101011111111111111111111111111111111111111111111e8, 1.01e3) = 1.111111111111111111111111111111111111111111100000000e0
The error is exactly -2-44 which is -5.6843x10-14. The conversion between decimal and binary and the rounding due to finite precision have caused a small error. In some cases, you get lucky and rounding errors cancel out and you might still get the 'right' answer which is why you got what you expect for mod(1.02*100, 10), but In general, you cannot rely on this.
To use mod() correctly to test the particular digit of a number, use round() to round it to the nearest whole number and compensate for floating point error.
mod(round(5.02*100), 10) == 2
What you're encountering is a floating point error or artifact, like the commenters say. This is not a Matlab bug; it's just how floating point values work. You'd get the same results in C or Java. Floating point values are "approximate" types, so exact equality comparisons using == without some rounding or tolerance are prone to error.
>> isequal(1.02*100, 102)
ans =
1
>> isequal(5.02*100, 502)
ans =
0
It's not the case that 5.02 is the only number this happens for; several around 0 are affected. Here's an example that picks out several of them.
x = 1.02:1000.02;
ix = mod(x .* 100, 10) ~= 2;
disp(x(ix))
To understand the details of what's going on here (and in many other situations you'll encounter working with floats), have a read through the Wikipedia entry for "floating point", or my favorite article on it, "What Every Computer Scientist Should Know About Floating-Point Arithmetic". (That title is hyperbole; this article goes deep and I don't understand half of it. But it's a great resource.) This stuff is particularly relevant to Matlab because Matlab does everything in floating point by default.
I am writing a program which goes through FITS files with photometry and looks for stars given in a .dat file.
One of the steps is computing distances between two given stars using ephem.separation()
It works well. However, from time to time separation returns angles like 1389660529:33:00.8
import ephem
import math
star = ['21:45:15.00', '65:49:24.0']
first_coo = ['21:45:15.00', '65:49:24.0']
check = ephem.FixedBody()
check._ra = ephem.hours(star[0])
check._dec = ephem.degrees(star[1])
check.compute()
# star is a list with coordinates, strings in form %s:%s:%s
first = ephem.FixedBody()
first._ra = ephem.hours(first_coo[0])
first._dec = ephem.degrees(first_coo[1])
first.compute()
sep = math.degrees(float(ephem.separation(check,first)))
print sep
It occurs randomly. Have anybody encountered such behaviour?
I search for 18 stars in 212 files, which makes 3816 cycles. Might have something to do with it?
UPDATE: I have released a new PyEphem 3.7.5.2 that fixes this special case of comparing an angle to itself.
Your code sample has two interesting features:
first, it contains a slight bug that I thought at first might be behind the problem;
and, second, I was wrong that your code was the problem because your code
does indeed expose a flaw in the separation() function
when it is asked how far a position is from itself!
The bug in your own code is that calling compute() and asking about .ra and .dec
returns those coordinates in the coordinate system of the very moment that you are
calling compute() for — so your two compute() calls are returning coordinates
in two different coordinate systems that are very slightly different,
and so the resulting positions cannot be meaningfully compared with separation()
because separation() requires two coordinates that are in the same coordinate system.
To fix this problem, chose a single now moment to use as your equinox epoch:
now = ephem.now()
...
check.compute(epoch=now)
...
first.compute(epoch=now)
That will give you coordinates that can be meaningfully compared.
Now, on to the problem in PyEphem itself!
When presented with two copies of the same position are provided to separation() it goes ahead and tries to find a distance between them anyway, and winds up doing a calculation that amounts to:
acos(sin(angle) ** 2.0 + cos(angle) ** 2.0)
which should remind us of the standard Euclidean distance formula but with an acos() around it instead of a sqrt(). The problem is that while in theory the value inside of the parens should always be 1.0, the rounding inside of IEEE floating point math does not always produce squares that sum to exactly 1.0. Instead, it sometimes produces a value that is a bit high or a bit low.
When the result is a bit below 1.0, separation() will return a very slight separation for the coordinates, even though they are actually “the same coordinate.”
When the result exceeds 1.0 by a little bit, separation() will return not-a-number (nan) on my system — and, I will bet, returns that huge return value that you are seeing printed out on yours — because cos() cannot, by definition, return a number greater than 1.0, so there is no answer acos() can return when asked “what angle returns this value that is greater than one?”
I have created a bug report in GitHub and will fix this for the next version of PyEphem:
https://github.com/brandon-rhodes/pyephem/issues/31
Meanwhile, your code will have to avoid calling separation() for two positions that are actually the same position — could you use an if statement with two == comparisons between ra and dec to detect such cases and just use the value 0.0 for the separation instead?
I have made an array of doubles and when I want to use the find command to search for the indices of specific values in the array, this yields an empty matrix which is not what I want. I assume the problem lies in the precision of the values and/or decimal places that are not shown in the readout of the array.
command:
peaks=find(y1==0.8236)
array readout:
y1 =
Columns 1 through 11
0.2000 0.5280 0.8224 0.4820 0.8239 0.4787 0.8235 0.4796 0.8236 0.4794 0.8236
Columns 12 through 20
0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794 0.8236 0.4794
output:
peaks =
Empty matrix: 1-by-0
I tried using the command
format short
but I guess this only truncates the displayed values and not the actual values in the array.
How can I used the find command to give an array of indices?
By default, each element of a numerical matrix in Matlab is stored using floating point double precision. As you surmise in the question format short and format long merely alter the displayed format, rather than the actual format of the numbers.
So if y1 is created using something like y1 = rand(100, 1), and you want to find particular elements in y1 using find, you'll need to know the exact value of the element you're looking for to floating point double precision - which depending on your application is probably non-trivial. Certainly, peaks=find(y1==0.8236) will return the empty matrix if y1 only contains values like 0.823622378...
So, how to get around this problem? It depends on your application. One approach is to truncate all the values in y1 to a given precision that you want to work in. Funnily enough, a SO matlab question on exactly this topic attracted two good answers about 12 hours ago, see here for more.
If you do decide to go down this route, I would recommend something like this:
a = 1e-4 %# Define level of precision
y1Round = round((1/a) * y1); %# Round to appropriate precision, and leave y1 in integer form
Index = find(y1Round == SomeValue); %# Perform the find operation
Note that I use the find command with y1Round in integer form. This is because integers are stored exactly when using floating point double, so you won't need to worry about floating point precision.
An alternative approach to this problem would be to use find with some tolerance for error, for example:
Index = find(abs(y1 - SomeValue) < Tolerance);
Which path you choose is up to you. However, before adopting either of these approaches, I would have a good hard look at your application and see if it can be reformulated in some way such that you don't need to search for specific "real" numbers from among a set of "reals". That would be the most ideal outcome.
EDIT: The code advocated in the other two answers to this question is neater than my second approach - so I've altered it accordingly.
Testing for equality with floating-point numbers is almost always a bad idea. What you probably want to do is test to see which numbers are close enough to the target value:
peaks = find( abs( y - .8236 ) < .0001 );
The problem is indeed with the precision. The array that you see displayed is not the actual array, as the actual array has more digits for each of the numbers. Changing the format just changes the way in which the array is displayed, so it doesn't solve the problem.
You have two options, either modify the array or modify what you are looking for. It is probably better to modify what you are looking for, since then you are not changing the actual values.
So instead of looking for equality, you can look for proximity (so the difference between the number you are searching for and the number in the array is at most some small epsilon):
peaks = find( abs(y1-0.8236) < epsilon )
In general, when you are dealing with floats, always try to avoid exact comparisons and use some error thresholds, since the representation of these numbers is limited so they are often stored with small inaccuracies.
I'm trying to convert very long binary strings, often greater than 52 bits into numbers. I cannot have a fixed lookahead window because I am doing this to calculate a version of Lempel-Ziv complexity for neural data.
When I try to convert any long string, bin2dec throws and error that the binary string must be 52 bits or less.
Is there a way to get around this size limitation?
dec2bin throws that error because a single is not capable of storing that much precision. Your very question asks an impossibility. You have two choices: store the value in something other than a floating point value, or throw away some precision before you convert.
Or describe more completely what you're trying to accomplish.
EDITING:
Based on your additional information, I am even more certain that converting to floating point is not what you want to do. If you want to reduce the storage size to something more efficient, convert to a vector of bytes (uint8), which is as dense as you can get. Just split the binary string into N rows of 8 digits each, using reshape. This seems to be an accepted approach for biological data.
str = char((rand(1, 100)>0.5) + '0'); % test data
data = uint8(bin2dec(reshape(str(1:end-mod(end,8)), [], 8)));
In this code, I toss any bits that don't divide evenly into 8. Or, skip the uint8 step and just perform your processing on the resulting vector, where each double-precision float represents one 8-bit word from your sequence.
You could roll your own implementation:
len = 60;
string = [];
for i = 1:len
string = [string sprintf('%d', randi([0 1]))];
end
% error
% bin2dec(string);
% roll your own...
value = 0;
for i = length(string):-1:1
value = value + str2num(string(i))*2^(length(string)-i);
end
I'm just looping through the string and adding to some value. At the end, value will contain the decimal value of the string. Does this work for you?
Note: This solution is slow. You can speed it up a bit by preallocating the string, which I did on my own machine. Also, it's going to have issues if your number gets up to 1e6 digits. At that point, you need variable precision arithmetic to keep track of it. And adding that to the calculation really slowed things down. If I were you, I'd strongly consider compiling this from a .mex file if you need the functionality in MATLAB.
credits due to #aardvarkk, but here's a sped up version of his algorithm (+- 100x faster):
N=100;
strbin = char(randi(2,1,N)+'0'-1);
pows2 = 2.^(N-1:-1:0);
value=pows2*(strbin-'0')';
double's range goes only up to 1.79769e+308 which is 2^1024 give or take. From there on, value will be Inf or NaN. So you still need to find another way storing the resulting number.
A final pro on this algorithm: you can cache pows2 for a large number and then use a piece of it for any new strbin of length N:
Nmax = 1e8; % already 700MB for pows2, watch out!
pows2 = 2.^(Nmax-1:-1:0);
and then use
value = pows2(Nmax-N+1:end)*(strbin-'0')';
Solution to matlab's numeric upper bound
There's a tool on the File Exchange called vpi: http://www.mathworks.com/matlabcentral/fileexchange/22725
It allows you to use really big integers (2^5000? no prob). It's only slower (a lot) in calculating everything, I don't suggest using my method above with this. But hey, you can't have everything!
Download the package, addpath it and the following might work:
N=3000;
strbin = char(randi(2,1,N)+'0'-1);
binvals=strbin-'0';
val=0;
twopow=vpi(1);
for ii=1:N
val=val+twopow*binvals(N-ii+1);
twopow=twopow*2;
end
I have two arrays of data that I'm trying to amalgamate. One contains actual latencies from an experiment in the first column (e.g. 0.345, 0.455... never more than 3 decimal places), along with other data from that experiment. The other contains what is effectively a 'look up' list of latencies ranging from 0.001 to 0.500 in 0.001 increments, along with other pieces of data. Both data sets are X-by-Y doubles.
What I'm trying to do is something like...
for i = 1:length(actual_latency)
row = find(predicted_data(:,1) == actual_latency(i))
full_set(i,1:4) = [actual_latency(i) other_info(i) predicted_info(row,2) ...
predicted_info(row,3)];
end
...in order to find the relevant row in predicted_data where the look up latency corresponds to the actual latency. I then use this to created an amalgamated data set, full_set.
I figured this would be really simple, but the find function keeps failing by throwing up an empty matrix when looking for an actual latency that I know is in predicted_data(:,1) (as I've double-checked during debugging).
Moreover, if I replace find with a for loop to do the same job, I get a similar error. It doesn't appear to be systematic - using different participant data sets throws it up in different places.
Furthermore, during debugging mode, if I use find to try and find a hard-coded value of actual_latency, it doesn't always work. Sometimes yes, sometimes no.
I'm really scratching my head over this, so if anyone has any ideas about what might be going on, I'd be really grateful.
You are likely running into a problem with floating point comparisons when you do the following:
predicted_data(:,1) == actual_latency(i)
Even though your numbers appear to only have three decimal places of precision, they may still differ by very small amounts that are not being displayed, thus giving you an empty matrix since FIND can't get an exact match.
One feature of floating point numbers is that certain numbers can't be exactly represented, since they aren't an integer power of 2. This occurs with the numbers 0.1 and 0.001. If you repeatedly add or multiply one of these numbers you can see some unexpected behavior. Amro pointed out one example in his comment: 0.3 is not exactly equal to 3*0.1. This can also be illustrated by creating your look-up list of latencies in two different ways. You can use the normal colon syntax:
vec1 = 0.001:0.001:0.5;
Or you can use LINSPACE:
vec2 = linspace(0.001,0.5,500);
You'd think these two vectors would be equal to one another, but think again!:
>> isequal(vec1,vec2)
ans =
0 %# FALSE!
This is because the two methods create the vectors by performing successive additions or multiplications of 0.001 in different ways, giving ever so slightly different values for some entries in the vector. You can take a look at this technical solution for more details.
When comparing floating point numbers, you should therefore do your comparisons using some tolerance. For example, this finds the indices of entries in the look-up list that are within 0.0001 of your actual latency:
tolerance = 0.0001;
for i = 1:length(actual_latency)
row = find(abs(predicted_data(:,1) - actual_latency(i)) < tolerance);
...
The topic of floating point comparison is also covered in this related question.
You may try to do the following:
row = find(abs(predicted_data(:,1) - actual_latency(i))) < eps)
EPS is accuracy of floating-point operation.
Have you tried using a tolerance rather than == ?