I am writing a program which goes through FITS files with photometry and looks for stars given in a .dat file.
One of the steps is computing distances between two given stars using ephem.separation()
It works well. However, from time to time separation returns angles like 1389660529:33:00.8
import ephem
import math
star = ['21:45:15.00', '65:49:24.0']
first_coo = ['21:45:15.00', '65:49:24.0']
check = ephem.FixedBody()
check._ra = ephem.hours(star[0])
check._dec = ephem.degrees(star[1])
check.compute()
# star is a list with coordinates, strings in form %s:%s:%s
first = ephem.FixedBody()
first._ra = ephem.hours(first_coo[0])
first._dec = ephem.degrees(first_coo[1])
first.compute()
sep = math.degrees(float(ephem.separation(check,first)))
print sep
It occurs randomly. Have anybody encountered such behaviour?
I search for 18 stars in 212 files, which makes 3816 cycles. Might have something to do with it?
UPDATE: I have released a new PyEphem 3.7.5.2 that fixes this special case of comparing an angle to itself.
Your code sample has two interesting features:
first, it contains a slight bug that I thought at first might be behind the problem;
and, second, I was wrong that your code was the problem because your code
does indeed expose a flaw in the separation() function
when it is asked how far a position is from itself!
The bug in your own code is that calling compute() and asking about .ra and .dec
returns those coordinates in the coordinate system of the very moment that you are
calling compute() for — so your two compute() calls are returning coordinates
in two different coordinate systems that are very slightly different,
and so the resulting positions cannot be meaningfully compared with separation()
because separation() requires two coordinates that are in the same coordinate system.
To fix this problem, chose a single now moment to use as your equinox epoch:
now = ephem.now()
...
check.compute(epoch=now)
...
first.compute(epoch=now)
That will give you coordinates that can be meaningfully compared.
Now, on to the problem in PyEphem itself!
When presented with two copies of the same position are provided to separation() it goes ahead and tries to find a distance between them anyway, and winds up doing a calculation that amounts to:
acos(sin(angle) ** 2.0 + cos(angle) ** 2.0)
which should remind us of the standard Euclidean distance formula but with an acos() around it instead of a sqrt(). The problem is that while in theory the value inside of the parens should always be 1.0, the rounding inside of IEEE floating point math does not always produce squares that sum to exactly 1.0. Instead, it sometimes produces a value that is a bit high or a bit low.
When the result is a bit below 1.0, separation() will return a very slight separation for the coordinates, even though they are actually “the same coordinate.”
When the result exceeds 1.0 by a little bit, separation() will return not-a-number (nan) on my system — and, I will bet, returns that huge return value that you are seeing printed out on yours — because cos() cannot, by definition, return a number greater than 1.0, so there is no answer acos() can return when asked “what angle returns this value that is greater than one?”
I have created a bug report in GitHub and will fix this for the next version of PyEphem:
https://github.com/brandon-rhodes/pyephem/issues/31
Meanwhile, your code will have to avoid calling separation() for two positions that are actually the same position — could you use an if statement with two == comparisons between ra and dec to detect such cases and just use the value 0.0 for the separation instead?
Related
There are many mathematical programs out there out of which some are able to solve calculus-based problems, GeoGebra, Qalculate! to name a few.
How are those programs able to solve calculus-based problems which humans need to evaluate using a long procedure?
For example, the problem:
It takes a lot of steps for humans to solve this problem as shown here on Quora.
How can those mathematical programs solve them with such a good accuracy?
The Church-Turing thesis implies that anything a human being can calculate can be calculated by any Turing-equivalent system of computation - including programs running on computers. That is to say, if we can solve the problem (or calculate an approximate answer that meets some criteria) then a computer program can be made to do the same thing. Let's consider a simpler example:
f(x) = x
a = Integral(f, 0, 1)
A human being presented with this problem has two options:
try to compute the antiderivative using some procedure, then use procedures to evaluate the definite integral over the supplied range
use some numerical method to calculate an approximate value for the definite integral which meets some criteria for closeness to the true value
In either case, human beings have a set of tools that allow them to do this:
recognize that f(x) is a polynomial in x. There are rules for constructing the antiderivatives of polynomials. Specifically, each term ax^b in the polynomial can be converted to a/(b+1)x^(b+1) and then an arbitrary constant c added to the end. We then say Sf(x)dx = (1/2)x^2 + c. Now that we have the antiderivative, we have a procedure for computing the antiderivative over a range: calculate Sf(x)dx for the high value, then subtract from that the result of calculating Sf(x)dx for the low value. This gives ((1/2)1^2) - ((1/2)0^2) = 1/2 - 0 = 1/2.
decide that for our purposes a Riemann sum with dx=1/10 is sufficient and that we'll take the midpoint value. We get 10 rectangles with base 1/10 and heights 1/20, 3/20, 5/20, 7/20, 9/20, 11/20, 13/20, 15/20, 17/20 and 19/20, respectively. The areas are 1/200, 3/200, 5/200, 7/200, 9/200, 11/200, 13/200, 15/200, 17/200 and 19/200. The sum of these is (1+3+5+7+9+11+13+15+17+19)/200 = 100/200 = 1/2. We happened to get the exact answer since we used the midpoint value and evaluated the definite integral of a linear function; in general, we'd have been close but not exact.
The only difficulty is in adequately specifying the procedure human beings use to solve these problems in various ways. Once specified, computers are perfectly capable of doing them. And make no mistake, human beings have a procedure - conscious or subconscious - for doing these problems reliably.
I am using postgis's ST_LineLocatePoint to find out the closest point on a LineString to the given Point, and using ST_LineInterpolatePoint to extract a Point from the returned float number.
ST_LineLocatePoint Query:
SELECT ST_AsText(ST_LineInterpolatePoint(foo.the_line,
ST_LineLocatePoint(foo.the_line,
ST_GeomFromText('POINT(12.962315 77.584841)')))) AS g
FROM (
SELECT ST_GeomFromText('LINESTRING(12.96145 77.58408,12.96219 77.58447,12.96302 77.58489,
12.96316 77.58496,12.96348 77.58511)') AS the_line
) AS foo;
Output:
g
------------------------------------------
POINT(12.9624389808159 77.5845959902924)
Which exactly lies on the linestring I have passed. Demonstration is displayed here.
But when I check whether this point lies in the linestring using ST_Contains it always return false, even though the point lies within.
ST_Contains Query:
SELECT ST_Contains(ST_GeomFromText('LINESTRING(12.96145 77.58408,12.96219 77.58447,
12.96302 77.58489, 12.96316 77.58496, 12.96348 77.58511)'),
ST_GeomFromText('POINT(12.9624389808159 77.5845959902924)'));
Output
st_contains
-------------
f
I am not getting where I am doing wrong. Can anyone help me in this.
Postgresql : 9.4
postgis : 2.1
reference: ST_LineLocatePoint, ST_Contains
I am not getting where I am doing wrong.
I think you're doing good... I had the same issue some time ago... I used ST_ClosestPoint to locate point on linestring and then cut a linestring with this point, but I can't.
Following the documentation:
ST_ClosestPoint — Returns the 2-dimensional point on g1 that is
closest to g2. This is the first point of the shortest line.
So I get situation where one function says - this point is on a line, and other functions says - ok, but I can't cut cause your point is not on a line... I was confused like you're now...
In my case resolution was to draw another line which will intersect first line 'exactly' in given point and after that first line was cutted...
After some research I found issue was about rounding of coordinates counted and writen. I explain it to myself that, according to the definitions line is infinitely thin and point is infinitely small (they do not have the area), so they can easily miss each other - but it's my reasoning and I'm not sure whether it is good. I advice you to use st_intersects, but with very low st_buffer or ST_DWithin function also with very low distance.
To be sure that your point lies on a line it have to be a part of this line (e.g. LINESTRING(0 0, 5 5) points (0 0) and (5 5). Example with point(3 3) works because it's coordinates are counted without any roundings.
This is actually a really common question (most likely a duplicate, but I'm too lazy to find it.)
The issue is related to numerical precision, where the Point is not exactly on the LineString, but is within a very small distance of it. Sort of like how SELECT sin(pi()) is not exactly zero.
Rather than using DE-9IM spatial predicates (like Contains, or Covers, etc.) which normally expect exact noding, it is more robust to use distance-based techniques like ST_DWithin with a small distance threshold. For example:
SELECT ST_Distance(the_point, the_line),
ST_Covers(the_point, the_line),
ST_DWithin(the_point, the_line, 1e-10)
FROM (
SELECT 'POINT(12.9624389808159 77.5845959902924)'::geometry AS the_point,
'LINESTRING(12.96145 77.58408,12.96219 77.58447,12.96302 77.58489,12.96316 77.58496,12.96348 77.58511)'::geometry AS the_line
) AS foo;
-[ RECORD 1 ]----------------------
st_distance | 1.58882185807825e-014
st_covers | f
st_dwithin | t
Here you can see that ST_DWithin indicates that the point is within a very small distance of the line, so it effectively contains the point.
ST_Contains() only returns true if the geometry to test lies within the supplied geometry. In your case the point has to lie within the linestring and this is always false since a linestring does not have an interior.
You should use the ST_Covers() function instead: true if no point of the geometry to test (your point) lies outside the supplied geometry (your linestring).
Consider the following example:
Bathymetry = [0,4134066;
3,3817906;
6,3343666;
9,2978725;
12,2742092;
14,2584337;
16,2415355;
18,2228054;
20,2040753;
23,1761373;
26,1514085];
Depth = [0;1;2;3;5;8;10;11.6;15];
newDepth = min(Bathymetry(:,1)):0.1:max(Bathymetry(:,1));
From this I want to find which column of 'newDepth' corresponds to 'Depth'. For example:
dd = find(newDepth==Depth(1))
dd =
1
Showing that Depth == 0, is located in the first column of newDepth. When I apply this to all of the entries of 'Depth'
for i = 1:length(Depth);
dd(i) = find(newDepth == Depth(i));
end
I receive an error:
Improper assignment with rectangular empty matrix.
Initially I couldn't understand why, but by looking at the array for newDepth, especially column 117 where newDepth == 11.6, I noticed that the value isnt equal to 11.6 but equal to 11.600000000000001 thus being different from Depth(8). How can I fix this? and why does MATLAB not just write the value as 11.6? nowhere have I specified to include the .000000000000001.
This is because there is no exact representation of 0.1 in binary. Read the wiki for more background. In binary, representing 0.1 is something like trying to write out all the decimals of one-third:
1/3 == 0.333333333333333333...
it will never be exact, no matter how many 3's you add.
For this (and many other) reasons, I'd suggest you do not use == (which is a very stringent demand), but rather use
for ii = 1:length(Depth);
[~,dd(ii)] = min( abs(newDepth-Depth(ii)) );
end
This problem is to to with floating point arithmetic which is quite complicated, i recommend you google it and read a bit, there is plenty out there explaining it. Here is a good start: http://blogs.mathworks.com/loren/2006/08/23/a-glimpse-into-floating-point-accuracy/
To solve it for your case I would suggest rounding
newDepth = round(newDepth * 10) / 10
The 11.600000000000001 is because the number 11.6 is not exactly representable in binary floating point notation. This is to do with the way the hardware works rather than any limitation of Matlab.
You want to change your compare to something like
dd(i) = find(abs(newDepth - Depth(i))<.0000001);
My Question - part 1: What is the best way to test if a floating point number is an "integer" (in Matlab)?
My current solution for part 1: Obviously, isinteger is out, since this tests the type of an element, rather than the value, so currently, I solve the problem like this:
abs(round(X) - X) <= sqrt(eps(X))
But perhaps there is a more native Matlab method?
My Question - part 2: If my current solution really is the best way, then I was wondering if there is a general tolerance that is recommended? As you can see from above, I use sqrt(eps(X)), but I don't really have any good reason for this. Perhaps I should just use eps(X), or maybe 5 * eps(X)? Any suggestions would be most welcome.
An Example: In Matlab, sqrt(2)^2 == 2 returns False. But in practice, we might want that logical condition to return True. One can achieve this using the method described above, since sqrt(2)^2 actually equals 2 + eps(2) (ie well within the tolerance of sqrt(eps(2)). But does this mean I should always use eps(X) as my tolerance, or is there good reason to use a larger tolerance, such as 5 * eps(X), or sqrt(eps(X))?
UPDATE (2012-10-31): #FakeDIY pointed out that my question is partially a duplicate of this SO question (apologies, not sure how I missed it in my initial search). Given this I'd like to emphasize the "tolerance" part of the question (which is not covered in that link), ie is eps(X) a sensible tolerance, or should I use something larger, like 5 * eps(X), and if so, why?
UPDATE (2012-11-01): Thanks everyone for the responses. I've +1'ed all three answers as I feel they all contribute meaningfully to various aspects of the question. I'm giving the answer tick to Eric Postpischil as that answer really nailed the tolerance part of the question well (and it has the most upvotes at this point in time).
No, there is no general tolerance that is recommended, and there cannot be.
The difference between a computed result and a mathematically ideal result is a function of the operations that produced the computed result. Because those operations are specific to each application, there is no general rule for testing any property of a computed result.
To design a proper test, you must determine what errors may have occurred during computation, determine bounds on the resulting error in the computed result, and test whether the computed result differs from the ideal result (perhaps the nearest integer) by less than those bounds. You must also decide whether those bounds are sufficiently small to satisfy your application’s requirements. (Using a relaxed test that accepts as an integer something that is not an integer decreases false negatives [incorrect rejections of a result as an integer where the ideal result would be an integer] but increases false positives [incorrect acceptances of a result as an integer where the ideal result would not be an integer].)
(Note that it can even be the case the testing as if the error bounds were zero can produce false negatives: It is possible a computation produces a result that is exactly an integer when the ideal result is not an integer, so any error tolerance, even zero, will falsely report this result is an integer. If this is unacceptable for your application, then, in such a case, the computations must be redesigned.)
It is not only not possible to state, without specific knowledge of the application, a numerical tolerance that may be used, it is impossible to state whether the tolerance should be absolute, should be relative to the computed value or to a target value, should be measured in ULPs (units of least precision), or should be set in some other manner. This is because errors may be introduced into computations in a variety of ways. For example, if there is a small relative error in a and a and b are close in value, then a-b has a large relative error. Additionally, if c is large, then (a-b)*c has a large absolute error.
Its probably not the most efficient method but I would use mod for this:
a = 15.0000000000;
b = mod(a,1.0)
c = 15.0000000001;
d = mod(c,1.0)
returns b = 0 and d = 1.0000e-010
There are a number of other alternatives suggested here:
How do I test for integers in MATLAB?
I like the idea of comparing (x == floor(x)) too.
1) I have historically used your method with a simple tolerance, eps(X). The mod methods interested me though, so I benchmarked a couple using Steve Eddins timeit function.
f = #() abs(X - round(X)) <= eps(X);
g = #() X == round(X);
h = #() ~mod(X,1);
For single values, like X=1.0, yours appears to fastest:
timeit(f) = 7.3635e-006
timeit(g) = 9.9677e-006
timeit(h) = 9.9214e-006
For vectors though, like X = 1:0.01:100, the other methods are faster (though round still beats mod):
timeit(f) = 0.00076636
timeit(g) = 0.00028182
timeit(h) = 0.00040539
2) The error bound is really problem dependent. Other answers cover this much better than I am able to.
I have two arrays of data that I'm trying to amalgamate. One contains actual latencies from an experiment in the first column (e.g. 0.345, 0.455... never more than 3 decimal places), along with other data from that experiment. The other contains what is effectively a 'look up' list of latencies ranging from 0.001 to 0.500 in 0.001 increments, along with other pieces of data. Both data sets are X-by-Y doubles.
What I'm trying to do is something like...
for i = 1:length(actual_latency)
row = find(predicted_data(:,1) == actual_latency(i))
full_set(i,1:4) = [actual_latency(i) other_info(i) predicted_info(row,2) ...
predicted_info(row,3)];
end
...in order to find the relevant row in predicted_data where the look up latency corresponds to the actual latency. I then use this to created an amalgamated data set, full_set.
I figured this would be really simple, but the find function keeps failing by throwing up an empty matrix when looking for an actual latency that I know is in predicted_data(:,1) (as I've double-checked during debugging).
Moreover, if I replace find with a for loop to do the same job, I get a similar error. It doesn't appear to be systematic - using different participant data sets throws it up in different places.
Furthermore, during debugging mode, if I use find to try and find a hard-coded value of actual_latency, it doesn't always work. Sometimes yes, sometimes no.
I'm really scratching my head over this, so if anyone has any ideas about what might be going on, I'd be really grateful.
You are likely running into a problem with floating point comparisons when you do the following:
predicted_data(:,1) == actual_latency(i)
Even though your numbers appear to only have three decimal places of precision, they may still differ by very small amounts that are not being displayed, thus giving you an empty matrix since FIND can't get an exact match.
One feature of floating point numbers is that certain numbers can't be exactly represented, since they aren't an integer power of 2. This occurs with the numbers 0.1 and 0.001. If you repeatedly add or multiply one of these numbers you can see some unexpected behavior. Amro pointed out one example in his comment: 0.3 is not exactly equal to 3*0.1. This can also be illustrated by creating your look-up list of latencies in two different ways. You can use the normal colon syntax:
vec1 = 0.001:0.001:0.5;
Or you can use LINSPACE:
vec2 = linspace(0.001,0.5,500);
You'd think these two vectors would be equal to one another, but think again!:
>> isequal(vec1,vec2)
ans =
0 %# FALSE!
This is because the two methods create the vectors by performing successive additions or multiplications of 0.001 in different ways, giving ever so slightly different values for some entries in the vector. You can take a look at this technical solution for more details.
When comparing floating point numbers, you should therefore do your comparisons using some tolerance. For example, this finds the indices of entries in the look-up list that are within 0.0001 of your actual latency:
tolerance = 0.0001;
for i = 1:length(actual_latency)
row = find(abs(predicted_data(:,1) - actual_latency(i)) < tolerance);
...
The topic of floating point comparison is also covered in this related question.
You may try to do the following:
row = find(abs(predicted_data(:,1) - actual_latency(i))) < eps)
EPS is accuracy of floating-point operation.
Have you tried using a tolerance rather than == ?