Postgresql earthdistance - earth_box with radius - postgresql

Please, can you explain me this behaviour of earth_box function ... or what I'm doing wrong?
data used
40.749276, -73.985643 = Empire State Building - is in my table
40.689266, -74.044512 = Statue of Liberty - is my current position in select - 8324m far from Empire State Building
my table
=> select id, latitude, longitude, title from requests;
id | latitude | longitude | title
----+-----------+------------+-----------------------
1 | 40.749276 | -73.985643 | Empire State Building
distance from Empire State Building to Statue of Liberty
=> SELECT id, latitude, longitude, title, earth_distance(ll_to_earth(40.689266, -74.044512), ll_to_earth(latitude, longitude)) as distance_from_current_location FROM requests ORDER BY distance_from_current_location ASC;
id | latitude | longitude | title | distance_from_current_location
----+-----------+------------+-----------------------+--------------------------------
1 | 40.749276 | -73.985643 | Empire State Building | 8324.42998846164
My current position is Statue of Libery which is more than 8000m far from Empire State Buildng, but
select return row with id 1 even when radius is only 5558m ! Can you explain me this behaviour or what is wrong?
=> SELECT id,latitude,longitude,title FROM requests WHERE earth_box(ll_to_earth(40.689266, -74.044512), 5558) #> ll_to_earth(requests.latitude, requests.longitude);
id | latitude | longitude | title
----+-----------+------------+-----------------------
1 | 40.749276 | -73.985643 | Empire State Building
versions of extensions and postgresql
=> \dx
List of installed extensions
Name | Version | Schema | Description
---------------+---------+------------+-------------------------------------------------------------- cube | 1.0 | public | data type for multidimensional
cubes earthdistance | 1.0 | public | calculate great-circle
distances on the surface of the Earth plpgsql | 1.0 |
pg_catalog | PL/pgSQL procedural language
=> select version();
version
-------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4beta2 on x86_64-apple-darwin13.3.0, compiled by Apple
LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit
thank you
noe

The problem here is that earth_box gets Statute Miles.
8324.42998846164 meters are near 5.172560986623845 statute miles
Unit Converter
The solution: convert the radius into Statute Miles units
earth_box(ll_to_earth(40.689266, -74.044512), 5558/1.609) //doesn't return results
earth_box(ll_to_earth(40.689266, -74.044512), 9000/1.609) //does.

As per the doc, by default, the radius is expressed in meters.
I just tested out and it looks from the documentation that you need both earth_box and earth_distance in your WHERE clause statement.
So, you need to use both earth_box and earth_distance in conjunction to get correct results.
Also in doc in earth_box function description it says:
Some points in this box are further than the specified great circle
distance from the location, so a second check using earth_distance
should be included in the query.
So the following will return results
SELECT earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) distance
FROM (SELECT 1) test
WHERE
(earth_box(ll_to_earth(40.749276, -73.985643), 9000) #> ll_to_earth(40.689266,-74.044512))
AND earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) <= 9000
order by distance desc
but this won't as the actual distance is about 8324.429988461638 meters
SELECT earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) distance
FROM (SELECT 1) test
WHERE
(earth_box(ll_to_earth(40.749276, -73.985643), 6000) #> ll_to_earth(40.689266,-74.044512))
AND earth_distance(ll_to_earth(40.749276, -73.985643), ll_to_earth(40.689266,-74.044512)) <= 6000
order by distance desc

Related

Polygon intersection fails after an st_wrapx operation

I have a query that conditionally uses st_wrapx followed by an intersect. I've observed that the intersect fails unexpectedly even when 'wrapped' polygon is identical to the 'non wrapped' version
The following snippet illustrates the issue;
with
input(geom) as (
select st_polyfromtext('POLYGON((-71. 42.,-72. 43.,-71. 43.,-71. 42.))')
),
wrap(geom,wrapped_geom) as (
select geom,
st_wrapx(st_shiftlongitude(geom), 180, -360)
from input
)
select
st_asewkt(geom),
st_asewkt(wrapped_geom),
st_asewkt(geom) = st_asewkt(wrapped_geom) wkt_are_equal,
st_asewkb(geom) = st_asewkb(wrapped_geom) wkb_are_equal,
st_intersects(geom,wrapped_geom) intersects,
st_intersects(st_asewkt(geom)::geometry,st_asewkt(wrapped_geom)::geometry) recast_intersects
from wrap;
With the following results:
-[ RECORD 1 ]-----+---------------------------------------
st_asewkt | POLYGON((-71 42,-72 43,-71 43,-71 42))
st_asewkt | POLYGON((-71 42,-72 43,-71 43,-71 42))
wkt_are_equal | t
wkb_are_equal | t
intersects | f
recast_intersects | t
note that:
the WKB abd WKT for the two geometries are identical
the insect operation returns false
by round-tripping to WKT the intersect now is true
This is the version I have:
POSTGIS="2.5.5" [EXTENSION] PGSQL="110" GEOS="3.7.3-CAPI-1.11.3 b50468f" PROJ="Rel. 5.2.0, September 15th, 2018" GDAL="GDAL 2.4.4, released 2020/01/08" LIBXML="2.9.11" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" RASTER

PostGIS returns record as datatype. This is unexpected

I have this query
WITH buffered AS (
SELECT
ST_Buffer(geom , 10, 'endcap=round join=round') AS geom,
id
FROM line),
hexagons AS (
SELECT
ST_HexagonGrid(10, buffered.geom) AS hex,
buffered.id
FROM buffered
) SELECT * FROM hexagons;
This gives the datatype record in the column hex. This is unexpected. I expect geometry as a datatype. Why is that?
According to the documentation, the function ST_HexagonGrid returns a setof record. These records contain however a geometry attribute called geom, so in order to access the geometry of this record you have to wrap the variable with parenthesis () and call the attribute with a dot ., e.g.
SELECT (hex).geom FROM hexagons;
or just access fetch all attributes using * (in this case, i,j and geom):
SELECT (hex).* FROM hexagons;
Demo (PostGIS 3.1):
WITH j (hex) AS (
SELECT
ST_HexagonGrid(
10,ST_Buffer('LINESTRING(-105.55 41.11,-115.48 37.16,-109.29 29.38,-98.34 27.13)',1))
)
SELECT ST_AsText((hex).geom,2) FROM j;
st_astext
----------------------------------------------------------------------------------------
POLYGON((-130 34.64,-125 25.98,-115 25.98,-110 34.64,-115 43.3,-125 43.3,-130 34.64))
POLYGON((-115 25.98,-110 17.32,-100 17.32,-95 25.98,-100 34.64,-110 34.64,-115 25.98))
POLYGON((-115 43.3,-110 34.64,-100 34.64,-95 43.3,-100 51.96,-110 51.96,-115 43.3))
POLYGON((-100 34.64,-95 25.98,-85 25.98,-80 34.64,-85 43.3,-95 43.3,-100 34.64))
As ST_HexagonGrid returns a setof record, you can access the record atributes using a LATERAL as described here, or just call the function in the FROM clause:
SELECT i,j,ST_AsText(geom,2) FROM
ST_HexagonGrid(
10,ST_Buffer('LINESTRING(-105.55 41.11,-115.48 37.16,-109.29 29.38,-98.34 27.13)',1));
i | j | st_astext
----+---+----------------------------------------------------------------------------------------
-8 | 2 | POLYGON((-130 34.64,-125 25.98,-115 25.98,-110 34.64,-115 43.3,-125 43.3,-130 34.64))
-7 | 1 | POLYGON((-115 25.98,-110 17.32,-100 17.32,-95 25.98,-100 34.64,-110 34.64,-115 25.98))
-7 | 2 | POLYGON((-115 43.3,-110 34.64,-100 34.64,-95 43.3,-100 51.96,-110 51.96,-115 43.3))
-6 | 2 | POLYGON((-100 34.64,-95 25.98,-85 25.98,-80 34.64,-85 43.3,-95 43.3,-100 34.64))
Further reading: How to divide world into cells (grid)

Creating polygon geometry from text field the same table in PostGiS

I have a table like this
Table "public.zone_polygons"
Column | Type |
-----------+-------------------------+
id | integer |
zone_id | integer |
zone_name | text |
zone_path | text |
geom | geometry(Geometry,4326) |
Each zone_path has a list of lat longs as text in this format
75.2323 30.7423,
75.3432 30.5344,
75.5423 30.2342,
75.9123 30.3122,
75.2323 30.7423
I am trying to generate a geometry using the zone_path values using the below query.
update zone_polygons set geom=ST_SetSRID(ST_MakePolygon(ST_GeomFromText('LINESTRING(zone_path)')), 4326);
I get the below error
ERROR: parse error - invalid geometry
HINT: "LINESTRING(zo" <-- parse error at position 13 within geometry
Is there a way in postgis to use one of the fields to create geometry.
I believe you have a typo and the coordinates are in Long - Lat (India), not Lat-Long (middle of Barents sea). PostGIS expects coordinates as Long - Lat, so if the input list is indeed in lat-long, it would needs to be swapped. You can either fix the source or use ST_FlipCoordinates
Since the coordinates are saved in a column, you would need to concatenate the LINESTRING( and the column content (not name) using 'LINESTRING(' || zone_path || ')'
with src as (select '75.2323 30.7423, 75.3432 30.5344, 75.5423 30.2342, 75.9123 30.3122, 75.2323 30.7423' zone_path)
SELECT ST_ASTEXT(
ST_SetSRID(
ST_MakePolygon(
ST_GeomFromText('LINESTRING(' || zone_path || ')')), 4326))
FROM src;
--> POLYGON((75.2323 30.7423,75.3432 30.5344,75.5423 30.2342,75.9123 30.3122,75.2323 30.7423))

Sphinx search with filtering by coordinates

I have this query:
select id, post_category_name , title, description,WEIGHT(),
geodist(50.95, 24.69, latitude, longitude) dist
from serv1 where match('#(title,description) searchText ) and dist < 2000000000000;
in my DB post have latitude: 50.85, and longitude: 24.69
In result I have distance:893641 but real distance is 11119.49 meters.
I also tried convert input coordinates to radians but still have not correct distance.
What I'm doing wrong? Thank you in advance.
Try
geodist(50.95, 24.69, latitude, longitude, {in=deg}) dist
(note {in=deg})
It returns the number close to the one you're expecting:
mysql> select geodist(50.95, 24.69, 50.85, 24.69, {in=deg});
+-----------------------------------------------+
| geodist(50.95, 24.69, 50.85, 24.69, {in=deg}) |
+-----------------------------------------------+
| 11124.928711 |
+-----------------------------------------------+
1 row in set (0.00 sec)
Based on code in one of your other questions, would do something like
sql_query = select p.id, ... , \
RADIANS(l.Latitude) as latitude, RADIANS(l.Longitude) as longitude FROM ...
using the MySQL function to convert the stored degreee value to radions for the attribute.
The...
sql_attr_float = latitude
sql_attr_float = longitude
would be unchanged.

EXISTS(select 1 from t1) vs EXISTS(select * from t1) [duplicate]

I used to write my EXISTS checks like this:
IF EXISTS (SELECT * FROM TABLE WHERE Columns=#Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=#Filters
END
One of the DBA's in a previous life told me that when I do an EXISTS clause, use SELECT 1 instead of SELECT *
IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=#Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=#Filters
END
Does this really make a difference?
No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.
Quoth Microsoft:
http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.
To check yourself, try running the following:
SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )
If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.
EDIT: Note, the SQL Standard actually talks about this.
ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that
is immediately contained in an <exists predicate>, then the <select list> is
equivalent to a <value expression>
that is an arbitrary <literal>.
The reason for this misconception is presumably because of the belief that it will end up reading all columns. It is easy to see that this is not the case.
CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)
IF EXISTS (SELECT * FROM T)
PRINT 'Y'
Gives plan
This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.
So it is clear the above belief is wrong.
However Conor Cunningham from the Query Optimiser team explains here that he typically uses SELECT 1 in this case as it can make a minor performance difference in the compilation of the query.
The QP will take and expand all *'s
early in the pipeline and bind them to
objects (in this case, the list of
columns). It will then remove
unneeded columns due to the nature of
the query.
So for a simple EXISTS subquery like
this:
SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2) The * will be
expanded to some potentially big
column list and then it will be
determined that the semantics of the
EXISTS does not require any of those
columns, so basically all of them can
be removed.
"SELECT 1" will avoid having to
examine any unneeded metadata for that
table during query compilation.
However, at runtime the two forms of
the query will be identical and will
have identical runtimes.
I tested four possible ways of expressing this query on an empty table with various numbers of columns. SELECT 1 vs SELECT * vs SELECT Primary_Key vs SELECT Other_Not_Null_Column.
I ran the queries in a loop using OPTION (RECOMPILE) and measured the average number of executions per second. Results below
+-------------+----------+---------+---------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2 | 2043.5 | 2043.25 | 2073.5 | 2067.5 |
| 4 | 2038.75 | 2041.25 | 2067.5 | 2067.5 |
| 8 | 2015.75 | 2017 | 2059.75 | 2059 |
| 16 | 2005.75 | 2005.25 | 2025.25 | 2035.75 |
| 32 | 1963.25 | 1967.25 | 2001.25 | 1992.75 |
| 64 | 1903 | 1904 | 1936.25 | 1939.75 |
| 128 | 1778.75 | 1779.75 | 1799 | 1806.75 |
| 256 | 1530.75 | 1526.5 | 1542.75 | 1541.25 |
| 512 | 1195 | 1189.75 | 1203.75 | 1198.5 |
| 1024 | 694.75 | 697 | 699 | 699.25 |
+-------------+----------+---------+---------+--------------+
| Total | 17169.25 | 17171 | 17408 | 17408 |
+-------------+----------+---------+---------+--------------+
As can be seen there is no consistent winner between SELECT 1 and SELECT * and the difference between the two approaches is negligible. The SELECT Not Null col and SELECT PK do appear slightly faster though.
All four of the queries degrade in performance as the number of columns in the table increases.
As the table is empty this relationship does seem only explicable by the amount of column metadata. For COUNT(1) it is easy to see that this gets rewritten to COUNT(*) at some point in the process from the below.
SET SHOWPLAN_TEXT ON;
GO
SELECT COUNT(1)
FROM master..spt_values
Which gives the following plan
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))
Attaching a debugger to the SQL Server process and randomly breaking whilst executing the below
DECLARE #V int
WHILE (1=1)
SELECT #V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)
I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when SELECT 1 is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)
sqlservr.exe!CMEDAccess::GetProxyBaseIntnl() - 0x1e2c79 bytes
sqlservr.exe!CMEDProxyRelation::GetColumn() + 0x57 bytes
sqlservr.exe!CAlgTableMetadata::LoadColumns() + 0x256 bytes
sqlservr.exe!CAlgTableMetadata::Bind() + 0x15c bytes
sqlservr.exe!CRelOp_Get::BindTree() + 0x98 bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_FromList::BindTree() + 0x5c bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CRelOp_QuerySpec::BindTree() + 0xbe bytes
sqlservr.exe!COptExpr::BindTree() + 0x58 bytes
sqlservr.exe!CScaOp_Exists::BindScalarTree() + 0x72 bytes
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888) Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
This manual profiling attempt is backed up by the VS 2012 code profiler which shows a very different selection of functions consuming the compilation time for the two cases (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).
Both the SELECT 1 and SELECT * versions wind up checking column permissions and fail if the user is not granted access to all columns in the table.
An example I cribbed from a conversation on the heap
CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO
GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO
SELECT 1
WHERE EXISTS (SELECT 1
FROM T);
/* ↑↑↑↑
Fails unexpectedly with
The SELECT permission was denied on the column 'Z' of the
object 'T', database 'tempdb', schema 'dbo'.*/
GO
REVERT;
DROP USER blat
DROP TABLE T
So one might speculate that the minor apparent difference when using SELECT some_not_null_col is that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.
In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the OPTION (RECOMPILE) so that subsequent executions can use a cached plan gave the following.
+-------------+-----------+------------+-----------+--------------+
| Num of Cols | * | 1 | PK | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2 | 144933.25 | 145292 | 146029.25 | 143973.5 |
| 4 | 146084 | 146633.5 | 146018.75 | 146581.25 |
| 8 | 143145.25 | 144393.25 | 145723.5 | 144790.25 |
| 16 | 145191.75 | 145174 | 144755.5 | 146666.75 |
| 32 | 144624 | 145483.75 | 143531 | 145366.25 |
| 64 | 145459.25 | 146175.75 | 147174.25 | 146622.5 |
| 128 | 145625.75 | 143823.25 | 144132 | 144739.25 |
| 256 | 145380.75 | 147224 | 146203.25 | 147078.75 |
| 512 | 146045 | 145609.25 | 145149.25 | 144335.5 |
| 1024 | 148280 | 148076 | 145593.25 | 146534.75 |
+-------------+-----------+------------+-----------+--------------+
| Total | 1454769 | 1457884.75 | 1454310 | 1456688.75 |
+-------------+-----------+------------+-----------+--------------+
The test script I used can be found here
Best way to know is to performance test both versions and check out the execution plan for both versions. Pick a table with lots of columns.
There is no difference in SQL Server and it has never been a problem in SQL Server. The optimizer knows that they are the same. If you look at the execution plans, you will see that they are identical.
Personally I find it very, very hard to believe that they don't optimize to the same query plan. But the only way to know in your particular situation is to test it. If you do, please report back!
Not any real difference but there might be a very small performance hit. As a rule of thumb you should not ask for more data than you need.