Find minimum distance and update column with PostGIS - postgresql

I need help in my PostGIS database to calculate the distance between two points.
The goal is to find for each row in the "Dots" table the distance from the closest point in the "reflayer" points table and save it in meter in the "dist_from_ref" column.
The dots table structure is:
CREATE TABLE dots
(
dot_id INT,
site_id INT,
latitude float ( 6 ),
longitude float ( 6 ),
rsrp float ( 6 ),
dist INT,
project_id INT,
dist_from_site INT,
geom geometry,
dist_from_ref INT;
);
The reflayer structure is:
CREATE TABLE reflayers
(
layer_name varchar,
latitude float ( 6 ),
longitude float ( 6 ) ,
geom geometry
);
Dots table
Reflayer table
Does anyone have a solution that can update the "dist_from_ref" column with the minimal distance the query can find?
Edit:
UPDATE dots d
SET dist_from_ref = 100 * ROUND (1000 * ST_Distance(d.geom, r.geom))
FROM reflayers r
WHERE d.dist_from_ref IS NULL
AND r.geom = (SELECT r.geom
FROM reflayers r
ORDER BY ST_Distance(d.geom, r.geom) ASC LIMIT 1);
This query updates the columns as I want to, but it stuck on my PostGIS server with 60K rows.
I used it on 70 rows and I worked fine any suggestions to improve it?
Before and After
Dots table before
Dots table after
Text table
dot_id | site_id | latitude | longitude | rsrp | project_id | dist_from_site | dist_from_ref | geom
--------+---------+-----------+-----------+--------+------------+----------------+---------------+----------------------------------------------------
1 | 42047 | 31.902857 | 34.919445 | -90.9 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140
2 | 42047 | 31.902857 | 34.919445 | -89.5 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140
3 | 42047 | 31.902857 | 34.919445 | -89.5 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140

Place the subquery in the SELECT clause and reference it to the each row of the outer query, e.g.
SELECT *,(
SELECT min(ST_Distance(d.geom, r.geom))
FROM reflayers r) as distance
FROM dots d;
To update just do the same ..
UPDATE dots SET dist_from_ref = (
SELECT min(ST_Distance(dots.geom, r.geom))
FROM reflayers r)
Note: Depending on the table size this operation can become very time consuming. Since you have no way to join both tables, you let the query run a full scan in every single record of refLayers for every single record on dots in order to find the closest distance.

Related

PostGIS detect crossing with timestamp

I need to store a series of GPS points with timestamps in the database (traces of various vehicles).
Initially I wanted to write something on my own but it would involve a bit more computational power, then just having it come as a result of a single query.
I explored a bit and came across PostGIS, but I'm not sure if it's suitable or possible to solve this problem.
The idea is to check if a two vehicles passed each other at the same time.
I have a table with the coordinates, each coordinate is in a separate row, and each row has a timestamp associated with it.
The table has following schema (vehicle_id, latitude, longitude, timestamp).
So given multiple coordinates of a vehicles I need to check if it has crossed with other vehicles at the same time. I found that I could use ST_MakeLine to create a line string from a sequence of GPS points, and saw that there are different intersection functions, but that requires coordinates to match perfectly and here the offset may be let's say 30 meters and timestamp has to be taken in account.
Any answer would help.
Thanks
If I understood your use case correctly, I believe you don't need to create LineStrings to check if your trajectory intersects or gets close in a certain point in time.
Data Sample:
CREATE TABLE t (vehicle_id INT, longitude NUMERIC, latitude NUMERIC, ts TIMESTAMP);
INSERT INTO t VALUES (1,1,1.1111,'2019-05-01 15:30:00'),
(1,1,2.1112,'2019-05-01 15:40:00'),
(1,1,3.1111,'2019-05-01 15:50:00'),
(2,2,2.1111,'2019-05-01 15:30:00'),
(2,1,2.1111,'2019-05-01 15:40:00'),
(2,1,4.1111,'2019-05-01 15:05:00');
As you can see in the sample data above, vehicle_id 1 and 2 are close (less than 30m) to each other at 2019-05-01 15:40:00, which can be found using a query like this:
SELECT
t1.vehicle_id,t2.vehicle_id,t1.ts,
ST_AsText(ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY) AS p1,
ST_AsText(ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) AS p2,
ST_Distance(
ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY,
ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) AS distance
FROM t t1, t t2
WHERE
t1.vehicle_id <> t2.vehicle_id AND
t1.ts = t2.ts AND
ST_Distance(
ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY,
ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) <= 30
vehicle_id | vehicle_id | ts | p1 | p2 | distance
------------+------------+---------------------+-----------------+-----------------+-------------
1 | 2 | 2019-05-01 15:40:00 | POINT(1 2.1112) | POINT(1 2.1111) | 11.05757826
2 | 1 | 2019-05-01 15:40:00 | POINT(1 2.1111) | POINT(1 2.1112) | 11.05757826
(2 rows)
As you can see, the result is sort of duplicated since 1 is close to 2 and 2 is close to 1 at the same time. You can correct this using DISTINCT ON(), but since I'm not familiar with your data I guess you better adjust this yourself.
Note that the data type is GEOGRAPHY and not GEOMETRY. It's because distances with ST_Distance over geometries are calculated in degrees, and using geography it 's in meters.
EDIT: To address a question mentioned in comments.
To avoid the overhead of having to create geography records in execution time, you might want to already store the coordinates as geography. In that case the table would look like this ..
CREATE TABLE t (vehicle_id INT, geom GEOGRAPHY, ts TIMESTAMP);
And you could populate it like this.
INSERT INTO t (vehicle_id, geom, ts)
VALUES (1,ST_MakePoint(1,1.1111),'2019-05-01 15:30:00');
In case you want to avoid having to populate the table again, you might want to just move the data to another column and get rid (if you wish) of latitude and longitude:
ALTER TABLE t ADD COLUMN geom GEOGRAPHY;
UPDATE t SET geom = ST_MakePoint(longitude,latitude);
ALTER TABLE t DROP COLUMN longitude, DROP COLUMN latitude;
CREATE INDEX idx_point ON t USING GIST(geom);
SELECT vehicle_id,ts,ST_AsText(geom) FROM t;
vehicle_id | ts | st_astext
------------+---------------------+-----------------
1 | 2019-05-01 15:30:00 | POINT(1 1.1111)
1 | 2019-05-01 15:40:00 | POINT(1 2.1112)
1 | 2019-05-01 15:50:00 | POINT(1 3.1111)
2 | 2019-05-01 15:30:00 | POINT(2 2.1111)
2 | 2019-05-01 15:40:00 | POINT(1 2.1111)
2 | 2019-05-01 15:05:00 | POINT(1 4.1111)
(6 rows)

How to calculate the nearest neighbor distance for 10000 points in a table

I am using PostgreSQL and I am using PostGIS extension.
I am able to compare one point with this query:
SELECT st_distance(geom, 'SRID=4326;POINT(12.601828337172 50.5173393068512)'::geometry) as d
FROM pointst1
ORDER BY d
but I want to compare not to one fixed point but to a column of points. And I want to do this with some sort of indexing so that it is computationally cheap and not 10000x10000 like a cross join within that table.
Create table:
create table pointst1
(
id integer not null
constraint pointst1_id_pk
primary key,
geom geometry(Point, 4325)
);
create unique index pointst1_id_uindex
on pointst1 (id);
create index geomidx
on pointst1 (geom);
Edit:
Refined query (comparing 10000 points with their nearest neighbor but getting the result of the point itself which is 0 and not the next nearest point:
select points.*,
p1.id as p1_id,
ST_Distance(geography(p1.geom), geography(points.geom)) as distance
from
(select distinct on(p2.geom)*
from pointst1 p2
where p2.id is not null) as points
cross join lateral
(select id, geom
from pointst1
order by points.geom <-> geom
limit 1) as p1;
Your query is already calculating the distance from the given geometry to all records in the table pointst1.
Considering these values ..
INSERT INTO pointst1 VALUES (1,'SRID=4326;POINT(16.19 48.21)'),
(2,'SRID=4326;POINT(18.96 47.50)'),
(3,'SRID=4326;POINT(13.47 52.52)'),
(4,'SRID=4326;POINT(-3.70 40.39)');
... if you run your query, it will already calculate the distance from all points in the table:
SELECT ST_Distance(geom, 'SRID=4326;POINT(12.6018 50.5173)'::geometry) as d
FROM pointst1
ORDER BY d
d
------------------
2.1827914536208
4.26600662563949
7.03781262396208
19.1914274750473
(4 Zeilen)
Change your index to GIST, which is the most suitable for geometry data:
create index geomidx on pointst1 using GIST (geom);
Just note that an index won't speed up this query of yours, since you're doing a full scan. But as soon as you start playing more in the where clause, you might see some improvement.
EDIT:
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
1 | 3 | 5.0965184194703
1 | 4 | 21.3720495039666
2 | 1 | 2.85954541841881
2 | 3 | 7.43911957156222
2 | 4 | 23.7492673571207
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
3 | 4 | 21.0225069865609
4 | 1 | 21.3720495039666
4 | 2 | 23.7492673571207
4 | 3 | 21.0225069865609
(12 rows)
Removing duplicate distances:
SELECT DISTINCT ON(d) * FROM (
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2) AS j
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
4 | 3 | 21.0225069865609
4 | 1 | 21.3720495039666
2 | 4 | 23.7492673571207
(6 rows)

Remove duplicate with lower Date from SQL result

I have following table:
CREATE TABLE Kundendaten (
beschreiben_knr INTEGER REFERENCES Kunde(knr) DEFERRABLE INITIALLY DEFERRED,
erstelldatum DATE,
anschrift VARCHAR(40),
sonderrabat INTEGER,
PRIMARY KEY (erstelldatum, beschreiben_knr)
);
If i make this query:
select * from Kundendaten ORDER BY erstelldatum DESC;
i get:
beschreiben_knr | erstelldatum | anschrift | sonderrabat
-----------------+--------------+---------------+-------------
1 | 2015-11-01 | Winkelgasse 5 | 0
2 | 2015-11-01 | Badeteich 7 | 10
3 | 2015-11-01 | Senfgasse 7 | 15
1 | 2015-10-30 | Sonnenweg 3 | 5
But i need to get only the entry for the highest date entry if there are more then one. In this case the last row should not appear.
How can i achieve this in postgresql?
You want something like WHERE erstelldatum = MAX(DATE) but that doesn't work. You can use a sub-query to get the newest date.
SELECT *
FROM Kundendaten
WHERE erstelldatum = (
SELECT MAX(erstelldatum) FROM Kundendaten
);
(SQL Fiddle)
Postgres will optimize that subquery so it is only run once, but you'll want to make sure erstelldatum is indexed.

way to calculate sum of two corresponded columns in postgresql

I have two tables called "incoming" and "orders" and i want to create view called "stock" which is produced using data from incoming and orders.
CREATE TABLE incoming
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
CONSTRAINT pk PRIMARY KEY (id),
CONSTRAINT "incoming_model_size_color_key" UNIQUE (model, size, color)
)
CREATE TABLE orders
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
Comenttext text,
CONSTRAINT pk_orders PRIMARY KEY (id)
)
For now i have this dirty solution:
CREATE OR REPLACE VIEW stock AS
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quanity) AS sum
FROM orders
GROUP BY orders.color, orders.size, orders.model
)
SELECT incoming.model,
incoming.size,
incoming.color,
incoming.quanity - (( SELECT
CASE count(*)
WHEN 1 THEN ( SELECT total_orders_1.sum
FROM total_orders total_orders_1
WHERE incoming.model = total_orders_1.model AND incoming.size = total_orders_1.size)
ELSE 0::bigint
END AS "case"
FROM total_orders
WHERE incoming.model = total_orders.model AND incoming.size=total_orders.size)) AS quanity
FROM incoming;
how can i use it more clear and simple?
examples:
select * from incloming
id | model | size | color | price | quanity
----+-------+------+-------+-------+--------
1 | 1 | 6 | 5 | 550 | 15
2 | 1 | 5 | 5 | 800 | 20
select * from orders
id | model | size | color | price | quanity |
----+-------+------+-------+-------+---------+
1 | 1 | 6 | 5 | 1000 | 1 |
2 | 1 | 6 | 5 | 1000 | 2 | -- sum is 3
select * from stock
model | size | color | quanity
-------+------+-------+----------
1 | 6 | 5 | 12 --= 15 - 3 !! excellent
1 | 5 | 5 | 20 -- has no oerders yet
You just need to left join on the aggregated orders:
select i.model, i.size, i.color, i.quantity,
o.qty as ordered,
i.quantity - coalesce(o.qty, 0) as quantity_on_stock
from incoming i
left join (
select model, size, color, sum(quantity) as qty
from orders
group by model, size, color
) o on (o.model, o.size, o.color) = (i.model, i.size, i.color);
SQLFiddle: http://sqlfiddle.com/#!15/7fbec/2
When using your CTE as base, then you wind up with this:
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quantity) AS sum
FROM orders
GROUP BY color, size, model
)
SELECT i.model,
i.size,
i.color,
i.quantity - coalesce(tot.sum, 0) AS quanity
FROM incoming i
LEFT JOIN total_orders tot on (tot.model, tot.size, tot.color) = (i.model, i.size, i.color);
Whether or not the CTE or the derived table (the first solution) is faster you need to test.

How do I get min, median and max from my query in postgresql?

I have written a query in which one column is a month. From that I have to get min month, max month, and median month. Below is my query.
select ext.employee,
pl.fromdate,
ext.FULL_INC as full_inc,
prevExt.FULL_INC as prevInc,
(extract(year from age (pl.fromdate))*12 +extract(month from age (pl.fromdate))) as month,
case
when prevExt.FULL_INC is not null then (ext.FULL_INC -coalesce(prevExt.FULL_INC,0))
else 0
end as difference,
(case when prevExt.FULL_INC is not null then (ext.FULL_INC - prevExt.FULL_INC) / prevExt.FULL_INC*100 else 0 end) as percent
from pl_payroll pl
inner join pl_extpayfile ext
on pl.cid = ext.payrollid
and ext.FULL_INC is not null
left outer join pl_extpayfile prevExt
on prevExt.employee = ext.employee
and prevExt.cid = (select max (cid) from pl_extpayfile
where employee = prevExt.employee
and payrollid = (
select max(p.cid)
from pl_extpayfile,
pl_payroll p
where p.cid = payrollid
and pl_extpayfile.employee = prevExt.employee
and p.fromdate < pl.fromdate
))
and coalesce(prevExt.FULL_INC, 0) > 0
where ext.employee = 17
and (exists (
select employee
from pl_extpayfile preext
where preext.employee = ext.employee
and preext.FULL_INC <> ext.FULL_INC
and payrollid in (
select cid
from pl_payroll
where cid = (
select max(p.cid)
from pl_extpayfile,
pl_payroll p
where p.cid = payrollid
and pl_extpayfile.employee = preext.employee
and p.fromdate < pl.fromdate
)
)
)
or not exists (
select employee
from pl_extpayfile fext,
pl_payroll p
where fext.employee = ext.employee
and p.cid = fext.payrollid
and p.fromdate < pl.fromdate
and fext.FULL_INC > 0
)
)
order by employee,
ext.payrollid desc
If it is not possible, than is it possible to get max month and min month?
To calculate the median in PostgreSQL, simply take the 50% percentile (no need to add extra functions or anything):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY x) FROM t;
You want the aggregate functions named min and max. See the PostgreSQL documentation and tutorial:
http://www.postgresql.org/docs/current/static/tutorial-agg.html
http://www.postgresql.org/docs/current/static/functions-aggregate.html
There's no built-in median in PostgreSQL, however one has been implemented and contributed to the wiki:
http://wiki.postgresql.org/wiki/Aggregate_Median
It's used the same way as min and max once you've loaded it. Being written in PL/PgSQL it'll be a fair bit slower, but there's even a C version there that you could adapt if speed was vital.
UPDATE After comment:
It sounds like you want to show the statistical aggregates alongside the individual results. You can't do this with a plain aggregate function because you can't reference columns not in the GROUP BY in the result list.
You will need to fetch the stats from subqueries, or use your aggregates as window functions.
Given dummy data:
CREATE TABLE dummystats ( depname text, empno integer, salary integer );
INSERT INTO dummystats(depname,empno,salary) VALUES
('develop',11,5200),
('develop',7,4200),
('personell',2,5555),
('mgmt',1,9999999);
... and after adding the median aggregate from the PG wiki:
You can do this with an ordinary aggregate:
regress=# SELECT min(salary), max(salary), median(salary) FROM dummystats;
min | max | median
------+---------+----------------------
4200 | 9999999 | 5377.5000000000000000
(1 row)
but not this:
regress=# SELECT depname, empno, min(salary), max(salary), median(salary)
regress-# FROM dummystats;
ERROR: column "dummystats.depname" must appear in the GROUP BY clause or be used in an aggregate function
because it doesn't make sense in the aggregation model to show the averages alongside individual values. You can show groups:
regress=# SELECT depname, min(salary), max(salary), median(salary)
regress-# FROM dummystats GROUP BY depname;
depname | min | max | median
-----------+---------+---------+-----------------------
personell | 5555 | 5555 | 5555.0000000000000000
develop | 4200 | 5200 | 4700.0000000000000000
mgmt | 9999999 | 9999999 | 9999999.000000000000
(3 rows)
... but it sounds like you want the individual values. For that, you must use a window, a feature new in PostgreSQL 8.4.
regress=# SELECT depname, empno,
min(salary) OVER (),
max(salary) OVER (),
median(salary) OVER ()
FROM dummystats;
depname | empno | min | max | median
-----------+-------+------+---------+-----------------------
develop | 11 | 4200 | 9999999 | 5377.5000000000000000
develop | 7 | 4200 | 9999999 | 5377.5000000000000000
personell | 2 | 4200 | 9999999 | 5377.5000000000000000
mgmt | 1 | 4200 | 9999999 | 5377.5000000000000000
(4 rows)
See also:
http://www.postgresql.org/docs/current/static/tutorial-window.html
http://www.postgresql.org/docs/current/static/functions-window.html
One more option for median:
SELECT x
FROM table
ORDER BY x
LIMIT 1 offset (select count(*) from x)/2
To find Median:
for instance consider that we have 6000 rows present in the table.First we need to take half rows from the original Table (because we know that median is always the middle value) so here half of 6000 is 3000(Take 3001 for getting exact two middle value).
SELECT *
FROM (SELECT column_name
FROM Table_name
ORDER BY column_name
LIMIT 3001)As Table1
ORDER BY column_name DESC ---->Look here we used DESC(Z-A)it will display the last
-- two values(using LIMIT 2) i.e (3000th row and 3001th row) from 6000
-- rows
LIMIT 2;