I'm using WHERE ST_Intersects(ST_SetSRID(ST_MakePoint($1, $2)::geography, 4326), geog) to find a point within a geography field (named geog in the example query).
For reasons I can't quite figure out*, ST_SetSRID sometimes causes issues, removing it from the query makes these issues go away. I'd like to remove ST_SetSRID from the query but can't find anywhere that explains what SRID ST_Intersects will use.
geog has an SRID of 4326. Will ST_Intersects just use that or is going to assume no coordinate system and give me results that differ than when using ST_SetSRID?
* In case you are curious the issue has something to do with prepared transactions, nodejs, and the minimum connection pool. For 1 minimum connections in the pool, after 4-6 queries the next query will take 15-30 seconds (which usually takes about 100ms). For 2 min connections it takes about 8-10 queries before issues occur, for 5 min, about 25 queries (and so on). I feel like I'm taking Crazy Pills.
ST_SetSRID returns a geometry, not a geography. You generally don't need to set the SRID for geography, since it assumes a default of 4326, so I suggest not using it (unless you have a different ellipsoid or something). (But if you are working with geometry, ST_SRID is mandatory).
Furthermore, ST_Intersects implicitly operates on either geometry or geography types. Depending if you used ST_SetSRID or not, it will pick either:
ST_Intersects(geometry, geometry); or
ST_Intersects(geography, geography)
You can explicitly choose the one of the operators by casting each parameter:
ST_Intersects(ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography, geog::geography)
(note I've moved the first ::geography to outside ST_SetSRID, so it sets an SRID then casts it as a geography). Or equivalently:
ST_Intersects(ST_MakePoint($1, $2)::geography, geog::geography)
As for the actual performance of the two intersects spatial operators, this depends if you have an index on either geometry or geography types for geog.
Related
I have a system with a large number of tables that contain historical data. Each table has a ts_from and ts_to column which are of type timestamptz. These represent the time period in which the data for a particular row was valid.
These columns are indexed.
If I want to query all rows that were valid at a particular timestamp, it is trivial to write the ts_from <= #at_timestamp AND ts_to >= #at_timestamp WHERE clause to utilitise the index.
However, I wanted to create a function called Temporal.at which would take the #at_timestamp column and the ts_from / ts_to columns and do this by hiding the complexity of the comparison from the query that uses it. You might think this is trivial, but I would also like to extend the concept to create a function called Temporal.between which would take a #from_timestamp and #to_timestamp and select all rows that were valid between those two periods. That function would not be trivial, as one would have to check where rows partially overlap the period rather than always being fully enclosed by it.
The issue is this: I have written these functions but they do not cause the index to be used. The query performance is woefully slow on the history tables, some of which have hundreds of millions of rows.
The questions therefore are:
a) Is there a way to write these functions so that we can be sure the indexes will be used?
b) Am I going about this completely the wrong way and is there a better way to proceed?
This is complicated if you model ts_from and ts_to as two different timestamp columns. Instead, you should use a range type: tstzrange. Then everything will become simple:
for containment in an interval, use #at_timestamp <# from_to
for interval overlap, use tstzinterval(#from_timestamp, #to_timestamp) && from_to
Both queries can be supported by a GiST index on the range column.
I am getting an error trying to convert data from a Geometry field to a geography field in a separate table.
INSERT INTO PIGeoData
([ID], [geo_name], [geo_wkt] ,[port_geography_binary] )
SELECT [id], [name] ,[wkt], GEOGRAPHY::STGeomFromWKB(em_ports.geom.STAsBinary(),4326)
FROM [guest].[em_ports]
where ID < 4548 and ID not in (select ID from PIGeoData)
The error I get is this
Msg 6522, Level 16, State 1, Line 1
A .NET Framework error occurred during execution of user-defined routine or aggregate "geography":
Microsoft.SqlServer.Types.GLArgumentException: 24205: The specified input does not represent a valid geography instance because it exceeds a single hemisphere. Each geography instance must fit inside a single hemisphere. A common reason for this error is that a polygon has the wrong ring orientation. To create a larger than hemisphere geography instance, upgrade the version of SQL Server and change the database compatibility level to at least 110.
Microsoft.SqlServer.Types.GLArgumentException:
at Microsoft.SqlServer.Types.GLNativeMethods.ThrowExceptionForHr(GL_HResult errorCode)
at Microsoft.SqlServer.Types.GLNativeMethods.GeodeticIsValid(GeoData& g, Double eccentricity, Boolean forceKatmai)
at Microsoft.SqlServer.Types.SqlGeography.IsValidExpensive(Boolean forceKatmai)
at Microsoft.SqlServer.Types.SqlGeography..ctor(GeoData g, Int32 srid)
at Microsoft.SqlServer.Types.SqlGeography.GeographyFromBinary(OpenGisType type, SqlBytes wkbGeography, Int32 srid)
I get the same message if I try to convert from WKT using
,GEOGRAPHY::STGeomFromText(wkt,4326)
Both these formats come from the MS documentation here
But if I copy the polygon data from the wkt and paste it into a query like this
declare #sGeo geography
declare #sWKT varchar(max)
select #sWKT = wkt from guest.em_ports where wkt like '%POLYGON ((73.50667 4.181667,73.50667 4.21,73.48 4.21,73.48 4.1783333,73.50667 4.181667,73.50667 4.181667))%'
set #sGeo = geography::STPolyFromText (#sWKT, 4326 )
Update PIGeoData
Set PortBoundaries = #sGeo
Where wkt like '%POLYGON ((73.50667 4.181667,73.50667 4.21,73.48 4.21,73.48 4.1783333,73.50667 4.181667,73.50667 4.181667))%'
that works.
So I moved all the non-geo data to the new table and started going through record by record to see which WKT was failing:
I used this query
Update PIGeoData
Set port_geography_binary = GEOGRAPHY::STGeomFromText(geo_wkt,4326)
where port_geography_binary is null and ID = <xyz>
where xyz was individual record ids
These WKT values succeeded
POLYGON ((-135.31197 59.451653,-135.32457 59.45799,-135.32996 59.454834,-135.36717 59.455154,-135.36452 59.449005,-135.36488 59.43996,-135.36697 59.43817,-135.33139 59.438065,-135.31197 59.451653,-135.31197 59.451653))
POLYGON ((-4.524549 48.365623,-4.518855 48.361416,-4.4854136 48.367413,-4.436236 48.381382,-4.420772 48.39644,-4.431077 48.398525,-4.4376454 48.393867,-4.438626 48.38611,-4.4559207 48.390007,-4.470995 48.387226,-4.4933248 48.384468,-4.499816 48.38401,-4.512855 48.3754,-4.524549 48.365623,-4.524549 48.365623))
These WKT values failed
POLYGON ((-8.788489 37.773106,-8.989748 37.785244,-9.11148 37.93065,-9.01401 38.13953,-8.993956 38.30128,-9.266149 38.264282,-9.382366 38.33244,-9.435615 38.54836,-9.656681 38.602306,-9.683701 38.883057,-9.1720295 39.00796,-8.444215 39.550682,-8.213643 39.355015,-8.537656 38.037514,-8.712016 37.782127,-8.788489 37.773106))
POLYGON ((-119.71587 34.396824,-119.69837 34.410378,-119.67453 34.41837,-119.62994 34.420082,-119.63012 34.380177,-119.62986 34.3551,-119.71534 34.355022,-119.71587 34.396824,-119.71587 34.396824))
There is nothing obvious to me in the data. Can anyone help with why these records and data are failing?
TIA
The relevant part of the error message is "A common reason for this error is that a polygon has the wrong ring orientation."
The polygons that have failed are in clockwise order.
To convert them to counter-clockwise order, you can use something like this:
DECLARE #t VARCHAR(MAX)='POLYGON ((-119.71587 34.396824,-119.69837 34.410378,-119.67453 34.41837,-119.62994 34.420082,-119.63012 34.380177,-119.62986 34.3551,-119.71534 34.355022,-119.71587 34.396824,-119.71587 34.396824))'
DECLARE #x XML=REPLACE(REPLACE(REPLACE(#t,'POLYGON ((','<root><p>'),'))','</p></root>'),',','</p><p>')
DECLARE #r VARCHAR(MAX)='POLYGON (('+STUFF((
SELECT ','+q.Point
FROM (
SELECT n.value('.','varchar(50)') AS Point, ROW_NUMBER() OVER (ORDER BY t.n) AS Position
FROM #x.nodes('/root/p') t(n)
) q ORDER BY q.Position DESC
FOR XML PATH(''), TYPE
).value('.','VARCHAR(MAX)'),1,1,'')+'))'
DECLARE #g GEOGRAPHY=GEOGRAPHY::STGeomFromText(#r,4326)
SELECT #g, #g.ToString()
Later edit:
There is a convention that says that a polygon should always be represented in counter-clockwise order. Imagine that you have a polygon in the shape of the equator; without this convention it would not be clear if the polygon represents the northern hemisphere or the southern hemisphere. See Spatial Data Types Overview in the Microsoft SQL Server documentation for details.
Additionally, there is a limitation in SQL Server when the compatibility level is 100 or below that each geography instance must fit inside a single hemisphere. If you are using SQL Server 2012 or later and you choose to use at least compatibility level 110, you can avoid the error message, but the polygon would represent the entire area that is outside of what you would normally think that the polygon represents.
If you use compatibility level is 100 or below, you could use a TRY/CATCH to detect the error and if it happens you should try reversing the polygon.
If you use compatibility level 110 or later, you can try to use STArea() to check if the polygon has a surface which is much bigger or much smaller than one hemisphere. If the area approaches 510100000000000 square meters (which approximately the area of the entire earth) then you should reverse the polygon.
I am new to sql, and attempting to use it to speed up spatial analysis on a set of ~1.2 million trips from a csv that contains the lat and lon for pickup and dropoff points.
What I am trying to do in plain English is:
select all trips that start in the area of interest (loaded into my database as a shapefile) into one table
select all trips that end in the area of interest into another
-perform a spatial join between these points and a shapefile of census tracks (which contains neighborhood names)
count by neighborhood name to list the most frequent origins/destination of trips to/ from the area of interest.
The code I am working with is below (If its helpful, NTA or neighborhood tabulation area, is the neighborhood name which I want to display in my table at the end of this operation) :
--Select all trips that end in project area
SELECT *
INTO end_PA
FROM trips, projarea
WHERE ST_Intersects(trips.dropoff, projarea.geom);
--for trips that end in project area - index by NTA of pick up point
ALTER TABLE end_PA ADD COLUMN GID SERIAL;
CREATE TABLE points_ct_end AS
SELECT nyct2010.ntacode as ct_nta, end_PA.gid as point_id
from nyct2010, end_PA WHERE ST_Intersects(nyct2010.geom , end_PA.pickup);
--Count most common NTA
--return count for each NAT as a csv
copy(
select count(ct_nta) from points_ct_end
group by ct_nta
order by count desc)
to 'C://TaxiData//Analysis//Trips_Arriving_LM.csv' DELIMITER ',' CSV HEADER;
However, I am having problems from the very start - ST_Intersects does not return any points within the area of interest!
Troubleshooting solutions I have tried thus far:
My first thought is that the points weren't in the correct SRID. When I created the 'dropoff' point I set the SRID to 4326. I tried both using ST_SetSRID to change the projection of both data sets to 4326, and manually re projecting the shapefiles to 4326 in ArcMap - but neither worked.
I plotted a small sample of the points from the 'trips' data set in Arc Map to ensure they were correctly projected and overlapping with the ProjArea shapefile. They are.
I imported the multipoint shapefile this created into my geo database to test if that worked with ST_Intersects. Nope.
I tried using ST_Within. This threw the error message:
ERROR: function st_within(character varying, geometry) does not exist
....
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.
I am using Big SQL and postgres
Thanks!!
My first thought is that the points weren't in the correct SRID. When I created the 'dropoff' point I set the SRID to 4326. I tried both using ST_SetSRID to change the projection of both data sets to 4326, and manually re projecting the shapefiles to 4326 in ArcMap - but neither worked.
ST_SetSRID doesn't change the projection (reproject). It just changes the internal representation. This can totally screw everything up if the previous SRID matched the input data. You likely wanted ST_Transform().
There isn't enough information here to trouble shoot this problem. However, we can answer this...
ERROR: function st_within(character varying, geometry) does not exist
This simply means the first argument is not a geometery. Of course, we can't do anything with that at all because we don't have your query that you tried with ST_Within().
Your syntax for ST_Intersects() looks to be right. But, there simply isn't enough information provided to help. Show some schema and sample data.
I have a query: (This query uses geographical implementation of ST_Covers function)
SELECT ST_Covers(ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)))'),ST_GeographyFromText('POINT(20 30)'));
When I run this query it should return true but it returns false. I don't know whats wrong with PostGIS (or with this query)
and when i changes geographical implementation with geometrical one, and rearrange the query to be like below:
SELECT ST_Covers(ST_ASTEXT(ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)))')),text('POINT(20 30)'));
it works as it should, returning true:
I can use below query to be content but Problem is when database is too large it takes too much time
Please can someone tell me
how to make query 1 to work right (as intended, returning true), or
how to make query 2 work fast with large tables
(please do not suggest that i should remove *ST_GeoGraphyFromText('MULTIPOLYGON(((179 -89,179 89,-179 89,-179 -89,179 -89)* because it only represents geographical data that will be replaced by data from the column of table )
other values with which query 1 dont work are (5 5) (10 10) (-10 -10) and much more
The first query fails because you are using a geography type with something >180° wide. If it were something more realistic, such as 'MULTIPOLYGON(((100 0,100 50,0 50,0 0,100 0)))', it will return TRUE.
There is no direct way to find the maximum diameter of the exterior rings of the MultiPolygon geography types, but you can attempt to hunt these particular cases down with something like:
SELECT ST_XMax(geog::geometry) - ST_XMin(geog::geometry) AS width,
ST_YMax(geog::geometry) - ST_YMin(geog::geometry) AS height
FROM polygons
examine the ones that are > 180, and see if each of the parts are also > 180. If so, these should be regarded as invalid geographies.
The only reason why the second query returns TRUE is because ST_AsText converts to WKT, which is then re-interpreted back to WKB as geometry types (and implicitly calls ST_Covers(geometry, geometry), rather than ST_Covers(geography, geography)). This query is slow since it converts from WKB to WKT to WKB, with possible loss of precision in between conversions. A faster version of this is to cast the geography column to geometry using ::geometry, e.g.:
SELECT ST_Covers(geog::geometry, ST_SetSRID(ST_MakePoint(20, 30), 4326))
FROM polygons
Geometry types use simple "flat earth" Cartesian logic for ST_Covers, which is why you see TRUE for what you expect. Geography types use a different "round earth" logic, which uses more complicated spherical logic, but is easy to see if you have a globe handy.
I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.