I need to store a series of GPS points with timestamps in the database (traces of various vehicles).
Initially I wanted to write something on my own but it would involve a bit more computational power, then just having it come as a result of a single query.
I explored a bit and came across PostGIS, but I'm not sure if it's suitable or possible to solve this problem.
The idea is to check if a two vehicles passed each other at the same time.
I have a table with the coordinates, each coordinate is in a separate row, and each row has a timestamp associated with it.
The table has following schema (vehicle_id, latitude, longitude, timestamp).
So given multiple coordinates of a vehicles I need to check if it has crossed with other vehicles at the same time. I found that I could use ST_MakeLine to create a line string from a sequence of GPS points, and saw that there are different intersection functions, but that requires coordinates to match perfectly and here the offset may be let's say 30 meters and timestamp has to be taken in account.
Any answer would help.
Thanks
If I understood your use case correctly, I believe you don't need to create LineStrings to check if your trajectory intersects or gets close in a certain point in time.
Data Sample:
CREATE TABLE t (vehicle_id INT, longitude NUMERIC, latitude NUMERIC, ts TIMESTAMP);
INSERT INTO t VALUES (1,1,1.1111,'2019-05-01 15:30:00'),
(1,1,2.1112,'2019-05-01 15:40:00'),
(1,1,3.1111,'2019-05-01 15:50:00'),
(2,2,2.1111,'2019-05-01 15:30:00'),
(2,1,2.1111,'2019-05-01 15:40:00'),
(2,1,4.1111,'2019-05-01 15:05:00');
As you can see in the sample data above, vehicle_id 1 and 2 are close (less than 30m) to each other at 2019-05-01 15:40:00, which can be found using a query like this:
SELECT
t1.vehicle_id,t2.vehicle_id,t1.ts,
ST_AsText(ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY) AS p1,
ST_AsText(ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) AS p2,
ST_Distance(
ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY,
ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) AS distance
FROM t t1, t t2
WHERE
t1.vehicle_id <> t2.vehicle_id AND
t1.ts = t2.ts AND
ST_Distance(
ST_MakePoint(t1.longitude,t1.latitude)::GEOGRAPHY,
ST_MakePoint(t2.longitude,t2.latitude)::GEOGRAPHY) <= 30
vehicle_id | vehicle_id | ts | p1 | p2 | distance
------------+------------+---------------------+-----------------+-----------------+-------------
1 | 2 | 2019-05-01 15:40:00 | POINT(1 2.1112) | POINT(1 2.1111) | 11.05757826
2 | 1 | 2019-05-01 15:40:00 | POINT(1 2.1111) | POINT(1 2.1112) | 11.05757826
(2 rows)
As you can see, the result is sort of duplicated since 1 is close to 2 and 2 is close to 1 at the same time. You can correct this using DISTINCT ON(), but since I'm not familiar with your data I guess you better adjust this yourself.
Note that the data type is GEOGRAPHY and not GEOMETRY. It's because distances with ST_Distance over geometries are calculated in degrees, and using geography it 's in meters.
EDIT: To address a question mentioned in comments.
To avoid the overhead of having to create geography records in execution time, you might want to already store the coordinates as geography. In that case the table would look like this ..
CREATE TABLE t (vehicle_id INT, geom GEOGRAPHY, ts TIMESTAMP);
And you could populate it like this.
INSERT INTO t (vehicle_id, geom, ts)
VALUES (1,ST_MakePoint(1,1.1111),'2019-05-01 15:30:00');
In case you want to avoid having to populate the table again, you might want to just move the data to another column and get rid (if you wish) of latitude and longitude:
ALTER TABLE t ADD COLUMN geom GEOGRAPHY;
UPDATE t SET geom = ST_MakePoint(longitude,latitude);
ALTER TABLE t DROP COLUMN longitude, DROP COLUMN latitude;
CREATE INDEX idx_point ON t USING GIST(geom);
SELECT vehicle_id,ts,ST_AsText(geom) FROM t;
vehicle_id | ts | st_astext
------------+---------------------+-----------------
1 | 2019-05-01 15:30:00 | POINT(1 1.1111)
1 | 2019-05-01 15:40:00 | POINT(1 2.1112)
1 | 2019-05-01 15:50:00 | POINT(1 3.1111)
2 | 2019-05-01 15:30:00 | POINT(2 2.1111)
2 | 2019-05-01 15:40:00 | POINT(1 2.1111)
2 | 2019-05-01 15:05:00 | POINT(1 4.1111)
(6 rows)
Related
Lets say I have this table (balances) schema and data:
+----+---------+------------+
| id | balance | createdAt |
+----+---------+------------+
| 1 | 10 | 2021-11-18 |
| 2 | 12 | 2021-11-16 |
| 3 | 6 | 2021-11-04 |
+----+---------+------------+
To retrieve the last 7 days of balances, I would do something like this:
SELECT * FROM "balances" WHERE "createdAt" BETWEEN '2021-11-19T09:04:17.488Z' AND '2021-11-12T10:04:17.488Z' ORDER BY "createdAt" ASC
This will give me 2 records (IDs: 1 & 2), which is fine. However, what I'm looking at doing, probably with a second query, is to grab the record that is previous to that result set, by createdAt date, as my query is ordered by createdAt. Is there a way to do this with PG?
So whatever the time-range I use, I would also retrieve the record that is n-1 to the result set
To obtain the record you want, you may use a LIMIT query:
SELECT *
FROM balances
WHERE createdAt < '2021-11-19T09:04:17.488Z'
ORDER BY createdAt DESC
LIMIT 1;
This answer makes an assumption that there is only one record which is logically earlier than 2021-11-19T09:04:17.488Z, and there is no edge case of ties. If there are ties, we can break them by adding more levels to the ORDER BY clause.
I need help in my PostGIS database to calculate the distance between two points.
The goal is to find for each row in the "Dots" table the distance from the closest point in the "reflayer" points table and save it in meter in the "dist_from_ref" column.
The dots table structure is:
CREATE TABLE dots
(
dot_id INT,
site_id INT,
latitude float ( 6 ),
longitude float ( 6 ),
rsrp float ( 6 ),
dist INT,
project_id INT,
dist_from_site INT,
geom geometry,
dist_from_ref INT;
);
The reflayer structure is:
CREATE TABLE reflayers
(
layer_name varchar,
latitude float ( 6 ),
longitude float ( 6 ) ,
geom geometry
);
Dots table
Reflayer table
Does anyone have a solution that can update the "dist_from_ref" column with the minimal distance the query can find?
Edit:
UPDATE dots d
SET dist_from_ref = 100 * ROUND (1000 * ST_Distance(d.geom, r.geom))
FROM reflayers r
WHERE d.dist_from_ref IS NULL
AND r.geom = (SELECT r.geom
FROM reflayers r
ORDER BY ST_Distance(d.geom, r.geom) ASC LIMIT 1);
This query updates the columns as I want to, but it stuck on my PostGIS server with 60K rows.
I used it on 70 rows and I worked fine any suggestions to improve it?
Before and After
Dots table before
Dots table after
Text table
dot_id | site_id | latitude | longitude | rsrp | project_id | dist_from_site | dist_from_ref | geom
--------+---------+-----------+-----------+--------+------------+----------------+---------------+----------------------------------------------------
1 | 42047 | 31.902857 | 34.919445 | -90.9 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140
2 | 42047 | 31.902857 | 34.919445 | -89.5 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140
3 | 42047 | 31.902857 | 34.919445 | -89.5 | 1 | 21 | 7200 | 0101000020E6100000F5F6E7A221E73F4041BCAE5FB0754140
Place the subquery in the SELECT clause and reference it to the each row of the outer query, e.g.
SELECT *,(
SELECT min(ST_Distance(d.geom, r.geom))
FROM reflayers r) as distance
FROM dots d;
To update just do the same ..
UPDATE dots SET dist_from_ref = (
SELECT min(ST_Distance(dots.geom, r.geom))
FROM reflayers r)
Note: Depending on the table size this operation can become very time consuming. Since you have no way to join both tables, you let the query run a full scan in every single record of refLayers for every single record on dots in order to find the closest distance.
I have two tables (table1 and table2) with three columns: id, value and geometry. The geometries are point features.
I want to do a join between both tables where the resulting table contains for each point of table1, the minimum distance to a point of table2, the value of table1 and the value of the corresponding point of table2.
I tried the following code, but logically, this gives for each poin of table1 the distance to each point of table2. However, I cannot leave v2 from the group by clause. How can I get the table I want?
SELECT t1.value AS v1,
t2.value AS v2,
MIN(st_distance(t1.eometry, t2.geometry)) AS dis
FROM table1 t1, table2 t2
GROUP BY v1, v2
For some simplicity I simply took integer values and their differences instead of the distance between points (but it should be exactly the same: just change the subtraction against the st_distance function):
demo:db<>fiddle
SELECT DISTINCT ON (v1.point)
v1.point,
v2.point,
abs(v1.point - v2.point)
FROM
table1 v1
CROSS JOIN table2 v2
ORDER BY v1.point, abs(v1.point - v2.point)
My tables:
table1.point: 1, 2, 4, 8, 16
table2.point: 2, 3, 5, 7, 11, 13
The result:
| point | point | abs |
|-------|-------|-----|
| 1 | 2 | 1 |
| 2 | 2 | 0 |
| 4 | 3 | 1 |
| 8 | 7 | 1 |
| 16 | 13 | 3 |
Explanation:
You have to calculate all differences to know which one is the smallest. That's the reason for the CROSS JOIN. Now you can ORDER BY the points of table1 and the differences (or distances). Notice the abs() function: This makes all negative values positive. Otherwise difference -42 would be taken instead of +1.
DISTINCT ON (v1.point) takes the first ordered row for each v1.point.
Notice:
Because of the CROSS JOIN and the heavy mathematics in st_distance it could be really slow for huge data sets!
So, I have data that looks something like this
User_Object | filesize | created_date | deleted_date
row 1 | 40 | May 10 | Aug 20
row 2 | 10 | June 3 | Null
row 3 | 20 | Nov 8 | Null
I'm building statistics to record user data usage to graph based on time based datapoints. However, I'm having difficulty developing a query to take the sum for each row of all queries before it, but only for the rows that existed at the time of that row's creation. Before taking this step to incorporate deleted values, I had a simple naive query like this:
SELECT User_Object.id, User_Object.created, SUM(filesize) OVER (ORDER BY User_Object.created) AS sum_data_used
FROM User_Object
JOIN user ON User_Object.user_id = user.id
WHERE user.id = $1
However, I want to alter this somehow so that there's a conditional for the the window function to only get the sum of any row created before this User Object when that row doesn't have a deleted date also before this User Object.
This incorrect syntax illustrates what I want to do:
SELECT User_Object.id, User_Object.created,
SUM(CASE WHEN NOT window_function_row.deleted
OR window_function_row.deleted > User_Object.created
THEN filesize ELSE 0)
OVER (ORDER BY User_Object.created) AS sum_data_used
FROM User_Object
JOIN user ON User_Object.user_id = user.id
WHERE user.id = $1
When this function runs on the data that I have, it should output something like
id | created | sum_data_used|
1 | May 10 | 40
2 | June 3 | 50
3 | Nov 8 | 30
Something along these lines may work for you:
SELECT a.user_id
,MIN(a.created_date) AS created_date
,SUM(b.filesize) AS sum_data_used
FROM user_object a
JOIN user_object b ON (b.user_id <= a.user_id
AND COALESCE(b.deleted_date, a.created_date) >= a.created_date)
GROUP BY a.user_id
ORDER BY a.user_id
For each row, self-join, match id lower or equal, and with date overlap. It will be expensive because each row needs to look through the entire table to calculate the files size result. There is no cumulative operation taking place here. But I'm not sure there is a way that.
Example table definition:
create table user_object(user_id int, filesize int, created_date date, deleted_date date);
Data:
1;40;2016-05-10;2016-08-29
2;10;2016-06-03;<NULL>
3;20;2016-11-08;<NULL>
Result:
1;2016-05-10;40
2;2016-06-03;50
3;2016-11-08;30
I have a table like the following:
X | Y | Z | node
----------------
1 | 2 | 3 | 100
2 | 2 | 3 |
2 | 2 | 4 |
2 | 2 | 5 | 200
3 | 2 | 5 |
4 | 2 | 5 |
5 | 2 | 5 | 300
X, Y, Z are 3D space coordinates of some points, a curve passes through all the corresponding points from the first row to the last row. I need to calculate the curve length between two adjacent points whose "node" column aren't null.
If would be great if I can directly insert the result into another table that has three columns: "first_node", "second_node", "curve_length".
I don't need to interpolate extra points into the curve, just need to accumulate lengths all the straight lines, for example, in order to calculate the curve length between node 100 and 200, I need to sum the lengths of 3 straight lines: (1,2,3)<->(2,2,3), (2,2,3)<->(2,2,4), (2,2,4)<->(2,2,5)
EDIT
The table has an ID column, which is in increasing order from the first row to the last row.
To get a previous value in SQL, use the lag window function, e.g.
SELECT
x,
lag(x) OVER (ORDER BY id) as prev_x, ...
FROM ...
ORDER BY id;
That lets you get the previous and next points in 3-D space for a given segment. From there you can trivially calculate the line segment length using regular geometric maths.
You'll now have the lengths of each segment (sqlfiddle query). You can use this as input into other queries, using SELECT ... FROM (SELECT ...) subqueries or a CTE (WITH ....) term.
It turns out to be pretty awkward to go from the node segment lengths to node-to-node lengths. You need to create a table that spans the null entries, using a recursive CTE or with a window function.
I landed up with this monstrosity:
SELECT
array_agg(from_id) AS seg_ids,
-- 'max' is used here like 'coalese' for an aggregate,
-- since non-null is greater than null
max(from_node) AS from_node,
max(to_node) AS to_node,
sum(seg_length) AS seg_length
FROM (
-- lengths of all sub-segments with the null last segment
-- removed and a partition counter added
SELECT
*,
-- A running counter that increments when the
-- node ID changes. Allows us to group by series
-- of nodes in the outer query.
sum(CASE WHEN from_node IS NULL THEN 0 ELSE 1 END) OVER (ORDER BY from_id) AS partition_id
FROM
(
-- lengths of all sub-segments
SELECT
id AS from_id,
lead(id, 1) OVER (ORDER BY id) AS to_id,
-- length of sub-segment
sqrt(
(x - lead(x, 1) OVER (ORDER BY id)) ^ 2 +
(y - lead(y, 1) OVER (ORDER BY id)) ^ 2 +
(z - lead(z, 1) OVER (ORDER BY id)) ^ 2
) AS seg_length,
node AS from_node,
lead(node, 1) OVER (ORDER BY id) AS to_node
FROM
Table1
) sub
-- filter out the last row
WHERE to_id IS NOT NULL
) seglengths
-- Group into series of sub-segments between two nodes
GROUP BY partition_id;
Credit to How do I efficiently select the previous non-null value? for the partition trick.
Result:
seg_ids | to_node | from_node | seg_length
---------+---------+---------+------------
{1,2,3} | 100 | 200 | 3
{4,5,6} | 200 | 300 | 3
(2 rows)
To insert directly into another table, use INSERT INTO ... SELECT ....