ST_Covers: Geography Implementation is not working, Geometry Implementation taking too long - postgresql

I have a query: (This query uses geographical implementation of ST_Covers function)
SELECT ST_Covers(ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)))'),ST_GeographyFromText('POINT(20 30)'));
When I run this query it should return true but it returns false. I don't know whats wrong with PostGIS (or with this query)
and when i changes geographical implementation with geometrical one, and rearrange the query to be like below:
SELECT ST_Covers(ST_ASTEXT(ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)))')),text('POINT(20 30)'));
it works as it should, returning true:
I can use below query to be content but Problem is when database is too large it takes too much time
Please can someone tell me
how to make query 1 to work right (as intended, returning true), or
how to make query 2 work fast with large tables
(please do not suggest that i should remove *ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)* because it only represents geographical data that will be replaced by data from the column of table )
other values with which query 1 dont work are (5 5) (10 10) (-10 -10) and much more

The first query fails because you are using a geography type with something >180° wide. If it were something more realistic, such as 'MULTIPOLYGON(((100 0,100 50,0 50,0 0,100 0)))', it will return TRUE.
There is no direct way to find the maximum diameter of the exterior rings of the MultiPolygon geography types, but you can attempt to hunt these particular cases down with something like:
SELECT ST_XMax(geog::geometry) - ST_XMin(geog::geometry) AS width,
ST_YMax(geog::geometry) - ST_YMin(geog::geometry) AS height
FROM polygons
examine the ones that are > 180, and see if each of the parts are also > 180. If so, these should be regarded as invalid geographies.
The only reason why the second query returns TRUE is because ST_AsText converts to WKT, which is then re-interpreted back to WKB as geometry types (and implicitly calls ST_Covers(geometry, geometry), rather than ST_Covers(geography, geography)). This query is slow since it converts from WKB to WKT to WKB, with possible loss of precision in between conversions. A faster version of this is to cast the geography column to geometry using ::geometry, e.g.:
SELECT ST_Covers(geog::geometry, ST_SetSRID(ST_MakePoint(20, 30), 4326))
FROM polygons
Geometry types use simple "flat earth" Cartesian logic for ST_Covers, which is why you see TRUE for what you expect. Geography types use a different "round earth" logic, which uses more complicated spherical logic, but is easy to see if you have a globe handy.

Related

Errors converting Geometry to Geography

I am getting an error trying to convert data from a Geometry field to a geography field in a separate table.
INSERT INTO PIGeoData
([ID], [geo_name], [geo_wkt] ,[port_geography_binary] )
SELECT [id], [name] ,[wkt], GEOGRAPHY::STGeomFromWKB(em_ports.geom.STAsBinary(),4326)
FROM [guest].[em_ports]
where ID < 4548 and ID not in (select ID from PIGeoData)
The error I get is this
Msg 6522, Level 16, State 1, Line 1
A .NET Framework error occurred during execution of user-defined routine or aggregate "geography":
Microsoft.SqlServer.Types.GLArgumentException: 24205: The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation. To create a larger than hemisphere geography instance, upgrade the version of SQL Server and change the database compatibility level to at least 110.
Microsoft.SqlServer.Types.GLArgumentException:
at Microsoft.SqlServer.Types.GLNativeMethods.ThrowExceptionForHr(GL_HResult errorCode)
at Microsoft.SqlServer.Types.GLNativeMethods.GeodeticIsValid(GeoData& g, Double eccentricity, Boolean forceKatmai)
at Microsoft.SqlServer.Types.SqlGeography.IsValidExpensive(Boolean forceKatmai)
at Microsoft.SqlServer.Types.SqlGeography..ctor(GeoData g, Int32 srid)
at Microsoft.SqlServer.Types.SqlGeography.GeographyFromBinary(OpenGisType type, SqlBytes wkbGeography, Int32 srid)
I get the same message if I try to convert from WKT using
,GEOGRAPHY::STGeomFromText(wkt,4326)
Both these formats come from the MS documentation here
But if I copy the polygon data from the wkt and paste it into a query like this
declare #sGeo geography
declare #sWKT varchar(max)
select #sWKT = wkt from guest.em_ports where wkt like '%POLYGON ((73.50667 4.181667,73.50667 4.21,73.48 4.21,73.48 4.1783333,73.50667 4.181667,73.50667 4.181667))%'
set #sGeo = geography::STPolyFromText (#sWKT, 4326 )
Update PIGeoData
Set PortBoundaries = #sGeo
Where wkt like '%POLYGON ((73.50667 4.181667,73.50667 4.21,73.48 4.21,73.48 4.1783333,73.50667 4.181667,73.50667 4.181667))%'
that works.
So I moved all the non-geo data to the new table and started going through record by record to see which WKT was failing:
I used this query
Update PIGeoData
Set port_geography_binary = GEOGRAPHY::STGeomFromText(geo_wkt,4326)
where port_geography_binary is null and ID = <xyz>
where xyz was individual record ids
These WKT values succeeded
POLYGON ((-135.31197 59.451653,-135.32457 59.45799,-135.32996 59.454834,-135.36717 59.455154,-135.36452 59.449005,-135.36488 59.43996,-135.36697 59.43817,-135.33139 59.438065,-135.31197 59.451653,-135.31197 59.451653))
POLYGON ((-4.524549 48.365623,-4.518855 48.361416,-4.4854136 48.367413,-4.436236 48.381382,-4.420772 48.39644,-4.431077 48.398525,-4.4376454 48.393867,-4.438626 48.38611,-4.4559207 48.390007,-4.470995 48.387226,-4.4933248 48.384468,-4.499816 48.38401,-4.512855 48.3754,-4.524549 48.365623,-4.524549 48.365623))
These WKT values failed
POLYGON ((-8.788489 37.773106,-8.989748 37.785244,-9.11148 37.93065,-9.01401 38.13953,-8.993956 38.30128,-9.266149 38.264282,-9.382366 38.33244,-9.435615 38.54836,-9.656681 38.602306,-9.683701 38.883057,-9.1720295 39.00796,-8.444215 39.550682,-8.213643 39.355015,-8.537656 38.037514,-8.712016 37.782127,-8.788489 37.773106))
POLYGON ((-119.71587 34.396824,-119.69837 34.410378,-119.67453 34.41837,-119.62994 34.420082,-119.63012 34.380177,-119.62986 34.3551,-119.71534 34.355022,-119.71587 34.396824,-119.71587 34.396824))
There is nothing obvious to me in the data. Can anyone help with why these records and data are failing?
TIA
The relevant part of the error message is "A common reason for this error is that a polygon has the wrong ring orientation."
The polygons that have failed are in clockwise order.
To convert them to counter-clockwise order, you can use something like this:
DECLARE #t VARCHAR(MAX)='POLYGON ((-119.71587 34.396824,-119.69837 34.410378,-119.67453 34.41837,-119.62994 34.420082,-119.63012 34.380177,-119.62986 34.3551,-119.71534 34.355022,-119.71587 34.396824,-119.71587 34.396824))'
DECLARE #x XML=REPLACE(REPLACE(REPLACE(#t,'POLYGON ((','<root><p>'),'))','</p></root>'),',','</p><p>')
DECLARE #r VARCHAR(MAX)='POLYGON (('+STUFF((
SELECT ','+q.Point
FROM (
SELECT n.value('.','varchar(50)') AS Point, ROW_NUMBER() OVER (ORDER BY t.n) AS Position
FROM #x.nodes('/root/p') t(n)
) q ORDER BY q.Position DESC
FOR XML PATH(''), TYPE
).value('.','VARCHAR(MAX)'),1,1,'')+'))'
DECLARE #g GEOGRAPHY=GEOGRAPHY::STGeomFromText(#r,4326)
SELECT #g, #g.ToString()
Later edit:
There is a convention that says that a polygon should always be represented in counter-clockwise order. Imagine that you have a polygon in the shape of the equator; without this convention it would not be clear if the polygon represents the northern hemisphere or the southern hemisphere. See Spatial Data Types Overview in the Microsoft SQL Server documentation for details.
Additionally, there is a limitation in SQL Server when the compatibility level is 100 or below that each geography instance must fit inside a single hemisphere. If you are using SQL Server 2012 or later and you choose to use at least compatibility level 110, you can avoid the error message, but the polygon would represent the entire area that is outside of what you would normally think that the polygon represents.
If you use compatibility level is 100 or below, you could use a TRY/CATCH to detect the error and if it happens you should try reversing the polygon.
If you use compatibility level 110 or later, you can try to use STArea() to check if the polygon has a surface which is much bigger or much smaller than one hemisphere. If the area approaches 510100000000000 square meters (which approximately the area of the entire earth) then you should reverse the polygon.

Using ST_Intersects with ST_MakePoint with no SET_SRID

I'm using WHERE ST_Intersects(ST_SetSRID(ST_MakePoint($1, $2)::geography, 4326), geog) to find a point within a geography field (named geog in the example query).
For reasons I can't quite figure out*, ST_SetSRID sometimes causes issues, removing it from the query makes these issues go away. I'd like to remove ST_SetSRID from the query but can't find anywhere that explains what SRID ST_Intersects will use.
geog has an SRID of 4326. Will ST_Intersects just use that or is going to assume no coordinate system and give me results that differ than when using ST_SetSRID?
* In case you are curious the issue has something to do with prepared transactions, nodejs, and the minimum connection pool. For 1 minimum connections in the pool, after 4-6 queries the next query will take 15-30 seconds (which usually takes about 100ms). For 2 min connections it takes about 8-10 queries before issues occur, for 5 min, about 25 queries (and so on). I feel like I'm taking Crazy Pills.
ST_SetSRID returns a geometry, not a geography. You generally don't need to set the SRID for geography, since it assumes a default of 4326, so I suggest not using it (unless you have a different ellipsoid or something). (But if you are working with geometry, ST_SRID is mandatory).
Furthermore, ST_Intersects implicitly operates on either geometry or geography types. Depending if you used ST_SetSRID or not, it will pick either:
ST_Intersects(geometry, geometry); or
ST_Intersects(geography, geography)
You can explicitly choose the one of the operators by casting each parameter:
ST_Intersects(ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography, geog::geography)
(note I've moved the first ::geography to outside ST_SetSRID, so it sets an SRID then casts it as a geography). Or equivalently:
ST_Intersects(ST_MakePoint($1, $2)::geography, geog::geography)
As for the actual performance of the two intersects spatial operators, this depends if you have an index on either geometry or geography types for geog.

Problems with spatial join

I am new to sql, and attempting to use it to speed up spatial analysis on a set of ~1.2 million trips from a csv that contains the lat and lon for pickup and dropoff points.
What I am trying to do in plain English is:
select all trips that start in the area of interest (loaded into my database as a shapefile) into one table
select all trips that end in the area of interest into another
-perform a spatial join between these points and a shapefile of census tracks (which contains neighborhood names)
count by neighborhood name to list the most frequent origins/destination of trips to/ from the area of interest.
The code I am working with is below (If its helpful, NTA or neighborhood tabulation area, is the neighborhood name which I want to display in my table at the end of this operation) :
--Select all trips that end in project area
SELECT *
INTO end_PA
FROM trips, projarea
WHERE ST_Intersects(trips.dropoff, projarea.geom);
--for trips that end in project area - index by NTA of pick up point
ALTER TABLE end_PA ADD COLUMN GID SERIAL;
CREATE TABLE points_ct_end AS
SELECT nyct2010.ntacode as ct_nta, end_PA.gid as point_id
from nyct2010, end_PA WHERE ST_Intersects(nyct2010.geom , end_PA.pickup);
--Count most common NTA
--return count for each NAT as a csv
copy(
select count(ct_nta) from points_ct_end
group by ct_nta
order by count desc)
to 'C://TaxiData//Analysis//Trips_Arriving_LM.csv' DELIMITER ',' CSV HEADER;
However, I am having problems from the very start - ST_Intersects does not return any points within the area of interest!
Troubleshooting solutions I have tried thus far:
My first thought is that the points weren't in the correct SRID. When I created the 'dropoff' point I set the SRID to 4326. I tried both using ST_SetSRID to change the projection of both data sets to 4326, and manually re projecting the shapefiles to 4326 in ArcMap - but neither worked.
I plotted a small sample of the points from the 'trips' data set in Arc Map to ensure they were correctly projected and overlapping with the ProjArea shapefile. They are.
I imported the multipoint shapefile this created into my geo database to test if that worked with ST_Intersects. Nope.
I tried using ST_Within. This threw the error message:
ERROR: function st_within(character varying, geometry) does not exist
....
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.
I am using Big SQL and postgres
Thanks!!
My first thought is that the points weren't in the correct SRID. When I created the 'dropoff' point I set the SRID to 4326. I tried both using ST_SetSRID to change the projection of both data sets to 4326, and manually re projecting the shapefiles to 4326 in ArcMap - but neither worked.
ST_SetSRID doesn't change the projection (reproject). It just changes the internal representation. This can totally screw everything up if the previous SRID matched the input data. You likely wanted ST_Transform().
There isn't enough information here to trouble shoot this problem. However, we can answer this...
ERROR: function st_within(character varying, geometry) does not exist
This simply means the first argument is not a geometery. Of course, we can't do anything with that at all because we don't have your query that you tried with ST_Within().
Your syntax for ST_Intersects() looks to be right. But, there simply isn't enough information provided to help. Show some schema and sample data.

Postgres PostGIS: st_intersects returns false but only for first record returned, even if hard coded

Using Postgres 9.2
I have a strange issue. To simplify it:
I have some data with line points. The query in question is using st_intersects to determine if the line points overlap with a polygon. Both the line and the polygon are stored in a 3d representation, with the z-axis being 0. This is for geospatial data.
In this case, I have a line point where the start and end are the same value. Two records have seemingly the same value, the X,Y,Z components on the start and end points are the same. Comparing the two points using =~, they are equal. Using =, they are equal. Using st_equals, the result is false, but comparing the components that make up the lines, the values all seem to be equal, including comparing the binary representation by visual inspection.
When I do st_intersects(my_line, some_polygon), one record returns true, the other false, even though the value of the line for both records appear identical. I didn't create the original values, so I do not know how they were originally created. There is a vehicle associated with each record, and for whatever reason, one of the vehicles has this problem for several of its records.
If I change the function from st_intersects to the presumably more expensive st_3dintersects, they both return true as expected, and the problem goes away. The polygon being compared against is quite large, and this affects several records with different points, so its unlikely we're hitting some fringe rounding error of any kind. Using st_force2d doesn't work either.
Any ideas why I might be seeing the behavior that I'm seeing?
Here's the EWKT of the line, with the coordinates changed:
SRID=4326;LINESTRING(-85.6600021 30.7976979 0,-85.6600021 30.7976979 0)
Both records have this exact same value for ST_AsEWKT, and yet one of them returns false for st_intersects(my_line, the_poly) and the other returns true for st_intersects(my_line, the_poly). Even if I hard code the EWKT value I still see this discrepency:
ST_Intersects(
ST_GeomFromEWKT('SRID=4326;LINESTRING(-85.6600021 30.7976979 0,-85.6600021 30.7976979 0)')),
x.geom
)
It seems like it is always affecting the very first record in the result set and no other records. If I change everything else in my query, this always returns false for the first record and true for all the subsequent records.
Edit:
More investigation, it appears that the linestring is not valid with both the start and end being the same value. Casting st_makevalid fixes it by making it a point. Apparently the invalid linestring is evaluated inconsistently.
It's most likely there are sub-decimal differences with coordinates that you can only see with the WKB, which means there are small differences that you cannot see with the WKT formatting. Here is an example:
SELECT ST_AsEWKT(A) AS wkt_a, ST_AsEWKT(B) AS wkt_b,
ST_AsEWKT(A) = ST_AsEWKT(B) AS wkt_are_equal,
A::text = B::text AS wkb_are_equal,
ST_Intersects(A, B), ST_Distance(A, B),
ST_Distance(A, B) < 1e-12 AS pretty_much_intersect
FROM (
SELECT
'01010000A0E6100000A5B272793D6A55C07E96F8ED35CC3E400000000000000000'::geometry AS A,
'01010000A0E6100000A5B272793D6A55C07F96F8ED35CC3E400000000000000000'::geometry AS B
) f;
-[ RECORD 1 ]---------+------------------------------------------
wkt_a | SRID=4326;POINT(-85.6600021 30.7976979 0)
wkt_b | SRID=4326;POINT(-85.6600021 30.7976979 0)
wkt_are_equal | t
wkb_are_equal | f
st_intersects | f
st_distance | 3.5527136788005e-015
pretty_much_intersect | t
So you can see that the WKT are equal, but the WKB are not. There is a tiny distance between the two, therefore ST_Intersects will return false, as these predicate functions require exact noding.
A more robust metric to find geometries that essentially intersect is shown by testing if the distance is within a small distance, as demonstrated by the last column. Another solution is to see ST_Snap.
Now just seeing the invalid geometries in the question, my answer is to not use invalid geometries!
Behaviour is reproduced here:
DROP TABLE IF EXISTS invalid;
CREATE TEMP TABLE invalid(id integer primary key, geom geometry);
INSERT INTO invalid(id, geom) VALUES
(1, 'LINESTRING(-85.6600021 30.7976979,-85.6600021 30.7976979)'),
(2, 'LINESTRING(-85.6600021 30.7976979,-85.6600021 30.7976979)'),
(3, 'LINESTRING(-85.6600021 30.7976979,-85.6600021 30.7976979)');
SELECT id, ST_Intersects(
ST_GeomFromEWKT('LINESTRING(-85.6600021 30.7976979,-85.6600021 30.7976979)'), x.geom)
FROM invalid x;
id | st_intersects
----+---------------
1 | f
2 | t
3 | t
(3 rows)
I have no idea what you're talking about. Just provide data because you're not good at describing the problem.
\set linestring ST_GeomFromEWKT($$SRID=4326;LINESTRING(-85.6600021 30.7976979 0,-80.6600021 30.7976979 0)$$)
\set point ST_GeomFromEWKT($$SRID=4326;POINT(-85.6600021 30.7976979 0)$$)
SELECT ST_Intersects( :linestring, :point ) AS linestringPoint
, ST_Intersects( :linestring, :linestring ) AS linestringLinestring

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.