I have a use case where Im given 4 geo spatial Point sets that represents 4 rectangles . I have a table which has a point( which is just a latitude and longitude ) . My task is to check if the point in the table lies within any of the four rectangles.
This should be done in Pyspark . I tried this using udf's but its taking a long time as the main table contains lot of rows. Could anyone help me on how to efficiently solve this problem in Pyspark.
Right now I have used Shapely to help me with Point and Polygon creations
We can think of the rectangle as space bounded by (min_latitude, max_latitude) and (min_longitude, max_longitude). Let's assume your point-of-interest is x = (lat, lon). Now, for each rectangle you need to check whether min_latitude <=x <= max_latitude and min_longitude <= y <= max_longitude. These can be done using native spark functions, no udf is required. Also, before performing these operations, you can select only required columns (dataframe.select(cols...)) from your original dataframe to discard redundant information.
Related
I'm using Postgres v13.
I couldn't find a clear example of how to achieve this basic calculation. I'm totally confused about how to handle geometry and geography points.
I have a table that stores points in the format geography(Point, 4326) alongside their timestamp.
I need to obtain the total distance in meters between timestamps A and B, and I need it to be super specific using spherical calculations. To be clear, there may be N points.
So far I've been using this query, but the distance is way off for long distances and I don't understand if there is any difference in creating a line using geometry points or geography points:
SELECT ST_Length(ST_MakeLine(lh.position::geometry order by report_time), TRUE)
FROM location_history AS lh
WHERE lh.device_id = 1
AND lh.report_time BETWEEN '2022-10-10T13:25:00.000Z' AND '2022-10-11T13:25:00.000Z'
GROUP BY lh.device_id;
Does this query make sense? ST_MakeLine only accepts geometry points and confuses me. Is there another way of creating a line with geography points?
ST_Distance is used in every example I could find, but it just compares 2 points!
Thanks!
I have to store logical 3d coordiantes of object in postgres database. Each object typicaly has from 50-1000 points and probably never exceed 10000.
My intension is to use column of type real [][] in postgres.
I looked also postGis extension and wonder if it is suitable solution, but could not answer myself of several questions:
Which spatial reference should i use - only need logical coordinates x,y,z could i specify left or right coordinate system - this is the part that mostly confuses me?
2.How should orgnaize data - line geometry seems natural way to me?
Would be posible to find distance between two points in the array (line geometry)?
It would be natural to use the PostGIS geometry(pointz)[] as data type, an array of three-dimensional points.
Here is an example that shows a constant of that type and calculates the distance between the points:
WITH x(p) AS (
SELECT '{POINT Z (1 2 3):POINT Z (3 0 2)}'::geometry(pointz)[]
)
SELECT st_3ddistance(p[1], p[2]) FROM x;
st_3ddistance
---------------
3
(1 row)
I know that it might be dumb question, but I'm searching for some time and can't find proper answer.
I have PostgreSQL database with PostGIS installed. In one table I have entries with lon lat (let's assume that columns are place, lon, lat).
What should I add to this table or/and what procedure I can use, to be able to count distance between those places in meters.
I've read that it is necessary to know SRID of a place to be able to count distance. Is it possible to not know/use it and still be able to count distance in meters basing only on lon lat?
Short answer:
Just convert your x,y values on the fly using ST_MakePoint (mind the overhead!) and calculate the distance from a given point, the default SRS will be WGS84:
SELECT ST_Distance(ST_MakePoint(lon,lat)::GEOGRAPHY,
ST_MakePoint(23.73,37.99)::GEOGRAPHY) FROM places;
Using GEOGRAPHY you will get the result in meters, while using GEOMETRY will give it in degrees. Of course, knowing the SRS of coordinate pairs is imperative for calculating distances, but if you have control of the data quality and the coordinates are consistent (in this case, omitting the SRS), there is not much to worry about. It will start getting tricky if you're planing to perform operations using external data, from which you're also unaware of the SRS and it might differ from yours.
Long answer:
Well, if you're using PostGIS you shouldn't be using x,y in separated columns in the first place. You can easily add a geometry / geography column doing something like this.
This is your table ...
CREATE TABLE places (place TEXT, lon NUMERIC, lat NUMERIC);
Containing the following data ..
INSERT INTO places VALUES ('Budva',18.84,42.92),
('Ohrid',20.80,41.14);
Here is how you add a geography type column:
ALTER TABLE places ADD COLUMN geo GEOGRAPHY;
Once your column is added, this is how you convert your coordinates to geography / geometry and update your table:
UPDATE places SET geo = ST_MakePoint(lon,lat);
To compute the distance you just need to use the function ST_Distance, as follows (distance in meters):
SELECT ST_Distance(geo,ST_MakePoint(23.73,37.99)) FROM places;
st_distance
-----------------
686560.16822422
430876.07368955
(2 Zeilen)
If you have your location parameter in WKT, you can also use:
SELECT ST_Distance(geo,'POINT(23.73 37.99)') FROM places;
st_distance
-----------------
686560.16822422
430876.07368955
(2 Zeilen)
I am new to PostGIS. I am looking to have a simple bounded (-200 < x, y, z < 200) data set of 1,000,000 points on a plain XYZ graph. The only query I need is a fast K nearest neighbors and all neighbors such that the distance is less than < N. It seems that PostGIS has a LOT of extra features that I do not need.
What do SRID do I need? One that does not concern with feet or meters.
Am I right that I need to use the function
ST_3DDistance to query for the K nearest neighbors with LIMIT K? or with a maximum distance of N.
To add a column, I need to use SELECT AddGeometryColumn ('my_schema','my_spatial_table','geom_c',4326,'POINT',3, false);. Is that correct?
What is the difference between a 3D point and a PointZ?
Will AddGeometryColumn ensure that my distance query is fast?
Is PostGIS the right choice for my use case? The rest of my DB is already integrated with PostgreSQL
Thanks!
What do SRID do I need? One that does not concern with feet or meters.
You don't "need" a srid. If your data is a in a coordinate system, find the right srid, otherwise, use 0.
Am I right that I need to use the function ST_3DDistance to query for the K nearest neighbors with LIMIT K? or with a maximum distance of N.
Yes, you're right.
To add a column, I need to use SELECT AddGeometryColumn ('my_schema','my_spatial_table','geom_c',4326,'POINT',3, false);. Is that correct?
Yes, but I'd use 0 for srid, instead of 4326 (that is for degrees).
What is the difference between a 3D point and a PointZ?
PointZ is a 3d Point.
Will AddGeometryColumn ensure that my distance query is fast?
AddGeometryColumn will just add some constraints to the table, ensuring that the geometries you insert are coherent with the column definition.
I don't think you need it, but you could try adding an index to your geometry column using CREATE INDEX index_name ON schema.table USING gist (geom_col);
Is PostGIS the right choice for my use case? The rest of my DB is already integrated with PostgreSQL
I think it is the easiest way, not necessarly the "right" one.
You could also implement a distance function without postgis, storing the three coordinates in three numeric fields.
I've read several questions + answers here on SO about this theme, but I can't understand which is the common way (if there is one...) to find all the points whithin a "circle" having a certain radius, centered on a given point.
In particular I found two ways that seem the most convincing:
select id, point
from my_table
where st_Distance(point, st_PointFromText('POINT(-116.768347 33.911404)', 4326)) < 10000;
and:
select id, point
from my_table
where st_Within(point, st_Buffer(st_PointFromText('POINT(-116.768347 33.911404)', 4326), 10000));
Which is the most efficient way to query my database? Is there some other option to consider?
Creating a buffer to find the points is a definite no-no because of (1) the overhead of creating the geometry that represents the buffer, and (2) the point-in-polygon calculation is much less efficient than a simple distance calculation.
You are obviously working with (longitude, latitude) data so you should convert that to an appropriate Cartesian coordinate system which has the same unit of measure as your distance of 10,000. If that distance is in meter, then you could also cast the point from the table to geography and calculate directly on the (long, lat) coordinates. Since you only want to identify the points that are within the specified distance, you could use the ST_DWithin() function with calculation on the sphere for added speed (don't do this when at very high latitudes or with very long distances):
SELECT id, point
FROM my_table
WHERE ST_DWithin(point::geography,
ST_GeogFromText('POINT(-116.768347 33.911404)'),
10000, false);
I have used following query
SELECT *, ACOS(SIN(latitude) * SIN(Lat)) + COS(latitude) * COS(Lat) * COS(longitude) - (Long)) ) * 6380 AS distance FROM Table_tab WHERE ACOS( SIN(latitude) * SIN(Lat) + COS(latitude) * COS(Lat) * COS(longitude) - Long )) * 6380 < 10
In above query latitude and longitude are from database and lat, long are the points from we want to search.
WORKING : it will calculate the distance(In KM) between all the points in database from search points and check if the distance is less then 10 KM. It will return all the co-ordinates within 10 KM.
I do not know how postgis does it best, but in general:
Depending on your data it might be best to first search in a square bounding box (which contains the search area circle) in order to eliminate a lot of candidates, this should be extremely fast as you can use simple range operators on lon/lat which are ideally indexed properly for this.
In a second step search using the radius.
Also if your limit max points is relatively low and you know you have a lot of candidates, you may simply do a first 'optimistic' attempt with a box inside your circle, if you find enough points you are done !