Calculate length of a series of line segments - postgresql

I have a table like the following:
X | Y | Z | node
----------------
1 | 2 | 3 | 100
2 | 2 | 3 |
2 | 2 | 4 |
2 | 2 | 5 | 200
3 | 2 | 5 |
4 | 2 | 5 |
5 | 2 | 5 | 300
X, Y, Z are 3D space coordinates of some points, a curve passes through all the corresponding points from the first row to the last row. I need to calculate the curve length between two adjacent points whose "node" column aren't null.
If would be great if I can directly insert the result into another table that has three columns: "first_node", "second_node", "curve_length".
I don't need to interpolate extra points into the curve, just need to accumulate lengths all the straight lines, for example, in order to calculate the curve length between node 100 and 200, I need to sum the lengths of 3 straight lines: (1,2,3)<->(2,2,3), (2,2,3)<->(2,2,4), (2,2,4)<->(2,2,5)
EDIT
The table has an ID column, which is in increasing order from the first row to the last row.

To get a previous value in SQL, use the lag window function, e.g.
SELECT
x,
lag(x) OVER (ORDER BY id) as prev_x, ...
FROM ...
ORDER BY id;
That lets you get the previous and next points in 3-D space for a given segment. From there you can trivially calculate the line segment length using regular geometric maths.
You'll now have the lengths of each segment (sqlfiddle query). You can use this as input into other queries, using SELECT ... FROM (SELECT ...) subqueries or a CTE (WITH ....) term.
It turns out to be pretty awkward to go from the node segment lengths to node-to-node lengths. You need to create a table that spans the null entries, using a recursive CTE or with a window function.
I landed up with this monstrosity:
SELECT
array_agg(from_id) AS seg_ids,
-- 'max' is used here like 'coalese' for an aggregate,
-- since non-null is greater than null
max(from_node) AS from_node,
max(to_node) AS to_node,
sum(seg_length) AS seg_length
FROM (
-- lengths of all sub-segments with the null last segment
-- removed and a partition counter added
SELECT
*,
-- A running counter that increments when the
-- node ID changes. Allows us to group by series
-- of nodes in the outer query.
sum(CASE WHEN from_node IS NULL THEN 0 ELSE 1 END) OVER (ORDER BY from_id) AS partition_id
FROM
(
-- lengths of all sub-segments
SELECT
id AS from_id,
lead(id, 1) OVER (ORDER BY id) AS to_id,
-- length of sub-segment
sqrt(
(x - lead(x, 1) OVER (ORDER BY id)) ^ 2 +
(y - lead(y, 1) OVER (ORDER BY id)) ^ 2 +
(z - lead(z, 1) OVER (ORDER BY id)) ^ 2
) AS seg_length,
node AS from_node,
lead(node, 1) OVER (ORDER BY id) AS to_node
FROM
Table1
) sub
-- filter out the last row
WHERE to_id IS NOT NULL
) seglengths
-- Group into series of sub-segments between two nodes
GROUP BY partition_id;
Credit to How do I efficiently select the previous non-null value? for the partition trick.
Result:
seg_ids | to_node | from_node | seg_length
---------+---------+---------+------------
{1,2,3} | 100 | 200 | 3
{4,5,6} | 200 | 300 | 3
(2 rows)
To insert directly into another table, use INSERT INTO ... SELECT ....

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

Retain only 3 highest positive and negative records in a table

I am new to databases and postgres as such.
I have a table called names which has 2 columns name and value which gets updated every x seconds with new name value pairs. My requirement is to retain only 3 positive and 3 negative values at any point of time and delete the rest of the rows during each table update.
I use the following query to delete the old rows and retain the 3 positive and 3 negative values ordered by value.
delete from names
using (select *,
row_number() over (partition by value > 0, value < 0 order by value desc) as rn
from names ) w
where w.rn >=3
I am skeptical to use a conditional like value > 0 in a partition statement. Is this approach correct?
For example,
A table like this prior to delete :
name | value
--------------
test | 10
test1 | 11
test1 | 12
test1 | 13
test4 | -1
test4 | -2
My table after delete should look like :
name | value
--------------
test1 | 13
test1 | 12
test1 | 11
test4 | -1
test4 | -2
demo:db<>fiddle
This works generally as expected: value > 0 clusters the values into all numbers > 0 and all numbers <= 0. The ORDER BY value orders these two groups as expected well.
So, the only thing, I would change:
row_number() over (partition by value >= 0 order by value desc)
remove: , value < 0 (Because: Why should you group the positive values into negative and other? You don't have any negative numbers in your positive group and vice versa.)
change: value > 0 to value >= 0 to ignore the 0 as long as possible
For deleting: If you want to keep the top 3 values of each direction:
you should change w.rn >= 3 into w.rn > 3 (it keeps the 3rd element as well)
you need to connect the subquery with the table records. In real cases you should use id columns for that. In your example you could take the value column: where n.value = w.value AND w.rn > 3
So, finally:
delete from names n
using (select *,
row_number() over (partition by value >= 0 order by value desc) as rn
from names ) w
where n.value = w.value AND w.rn > 3
If it's not a hard requirement to delete the other rows, you could instead select only the rows you're interested in:
WITH largest AS (
SELECT name, value
FROM names
ORDER BY value DESC
LIMIT 3),
smallest AS (
SELECT name, value
FROM names
ORDER BY value ASC
LIMIT 3)
SELECT * FROM largest
UNION
SELECT * FROM smallest
ORDER BY value DESC

PostGIS minimum distance of two sets including other variables from both tables

I have two tables (table1 and table2) with three columns: id, value and geometry. The geometries are point features.
I want to do a join between both tables where the resulting table contains for each point of table1, the minimum distance to a point of table2, the value of table1 and the value of the corresponding point of table2.
I tried the following code, but logically, this gives for each poin of table1 the distance to each point of table2. However, I cannot leave v2 from the group by clause. How can I get the table I want?
SELECT t1.value AS v1,
t2.value AS v2,
MIN(st_distance(t1.eometry, t2.geometry)) AS dis
FROM table1 t1, table2 t2
GROUP BY v1, v2
For some simplicity I simply took integer values and their differences instead of the distance between points (but it should be exactly the same: just change the subtraction against the st_distance function):
demo:db<>fiddle
SELECT DISTINCT ON (v1.point)
v1.point,
v2.point,
abs(v1.point - v2.point)
FROM
table1 v1
CROSS JOIN table2 v2
ORDER BY v1.point, abs(v1.point - v2.point)
My tables:
table1.point: 1, 2, 4, 8, 16
table2.point: 2, 3, 5, 7, 11, 13
The result:
| point | point | abs |
|-------|-------|-----|
| 1 | 2 | 1 |
| 2 | 2 | 0 |
| 4 | 3 | 1 |
| 8 | 7 | 1 |
| 16 | 13 | 3 |
Explanation:
You have to calculate all differences to know which one is the smallest. That's the reason for the CROSS JOIN. Now you can ORDER BY the points of table1 and the differences (or distances). Notice the abs() function: This makes all negative values positive. Otherwise difference -42 would be taken instead of +1.
DISTINCT ON (v1.point) takes the first ordered row for each v1.point.
Notice:
Because of the CROSS JOIN and the heavy mathematics in st_distance it could be really slow for huge data sets!

How to calculate the nearest neighbor distance for 10000 points in a table

I am using PostgreSQL and I am using PostGIS extension.
I am able to compare one point with this query:
SELECT st_distance(geom, 'SRID=4326;POINT(12.601828337172 50.5173393068512)'::geometry) as d
FROM pointst1
ORDER BY d
but I want to compare not to one fixed point but to a column of points. And I want to do this with some sort of indexing so that it is computationally cheap and not 10000x10000 like a cross join within that table.
Create table:
create table pointst1
(
id integer not null
constraint pointst1_id_pk
primary key,
geom geometry(Point, 4325)
);
create unique index pointst1_id_uindex
on pointst1 (id);
create index geomidx
on pointst1 (geom);
Edit:
Refined query (comparing 10000 points with their nearest neighbor but getting the result of the point itself which is 0 and not the next nearest point:
select points.*,
p1.id as p1_id,
ST_Distance(geography(p1.geom), geography(points.geom)) as distance
from
(select distinct on(p2.geom)*
from pointst1 p2
where p2.id is not null) as points
cross join lateral
(select id, geom
from pointst1
order by points.geom <-> geom
limit 1) as p1;
Your query is already calculating the distance from the given geometry to all records in the table pointst1.
Considering these values ..
INSERT INTO pointst1 VALUES (1,'SRID=4326;POINT(16.19 48.21)'),
(2,'SRID=4326;POINT(18.96 47.50)'),
(3,'SRID=4326;POINT(13.47 52.52)'),
(4,'SRID=4326;POINT(-3.70 40.39)');
... if you run your query, it will already calculate the distance from all points in the table:
SELECT ST_Distance(geom, 'SRID=4326;POINT(12.6018 50.5173)'::geometry) as d
FROM pointst1
ORDER BY d
d
------------------
2.1827914536208
4.26600662563949
7.03781262396208
19.1914274750473
(4 Zeilen)
Change your index to GIST, which is the most suitable for geometry data:
create index geomidx on pointst1 using GIST (geom);
Just note that an index won't speed up this query of yours, since you're doing a full scan. But as soon as you start playing more in the where clause, you might see some improvement.
EDIT:
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
1 | 3 | 5.0965184194703
1 | 4 | 21.3720495039666
2 | 1 | 2.85954541841881
2 | 3 | 7.43911957156222
2 | 4 | 23.7492673571207
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
3 | 4 | 21.0225069865609
4 | 1 | 21.3720495039666
4 | 2 | 23.7492673571207
4 | 3 | 21.0225069865609
(12 rows)
Removing duplicate distances:
SELECT DISTINCT ON(d) * FROM (
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2) AS j
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
4 | 3 | 21.0225069865609
4 | 1 | 21.3720495039666
2 | 4 | 23.7492673571207
(6 rows)

How to create a flag that increments by 1 based on conditions

How can I create a flag by looking at values of consecutive variables?
For example, in the table(image) below,
For row#1, flag takes the value 1;
For row#2 onwards it checks:
If variable1 =lag(variable2)
and variable2=lag(variable1) then flag = lag(flag) else flag increments by 1.
In this case, the condition doesn’t match therefore the flag takes value 2.
For row#3:
Since it matches the above condition flag is same as 2
For row#4: Even though it matches the above condition, the flag changes to 3 as the previous 2 rows(row#2 &row#3) have already been matched
And so on..
The final flag will look like:
Bear in mind that you better have your input data sorted to implement a "moving flag" with 2-row-based aggregation. For this answer's sake I've added a row_number() function to generate the order in which your sample data is given.
Test data
create table flagtest( var1 text, var2 text);
insert into flagtest(var1,var2) values
('T','Z'),('B','A'),('A','B'),('B','A'),('A','B'),('A','B'),
('A','B'),('B','A'),('C','D'),('E','F'),('F','E'),('M','N');
Code
-- fourth part
select var1, var2, sum(change_flag_2_based) over (order by ordcol) as flag
from( -- third part
select *,
case when
lag(change_flag) over (order by ordcol) = 0
and lag(change_flag, 2) over (order by ordcol) = 1
then 1 else change_flag
end as change_flag_2_based
from ( -- second part
select
var1, var2, ordcol,
case when
var1 = lag(var2) over (order by ordcol) and
var2 = lag(var1) over (order by ordcol)
then 0 else 1
end as change_flag
from ( -- first part
select var1, var2, row_number() over () as ordcol
from flagtest
) ordered_data
) prep_aggr_all
) prep_aggr_two_rows_based;
How does it work?
First part is all about providing a column to order the input data later in window functions. This will be any other column that you currently have in your table. In the example it introduces row_number() window function to generate such numerical order.
Second part is where we are marking rows, with assumed strategy of cross-equals between two variables comparing current with previous row, with indicators 1 and 0 whether the flag should change in this particular row. This is not a 2-based pair aggregation (yet).
Third part introduces comparing current row change flag indicator with indicators from two previous rows and if 1 row behind doesn't change the flag and 2 rows behind does change it it means that we should mark current row as flag-changing (2-row-based flag).
Fourth part is just for moving sum which makes final flags by summing those groups.
Output
var1 | var2 | flag
------+------+------
T | Z | 1
B | A | 2
A | B | 2
B | A | 3
A | B | 3
A | B | 4
A | B | 5
B | A | 5
C | D | 6
E | F | 7
F | E | 7
M | N | 8