PostGIS: intersections of set of collinear line segments, with counts - postgresql

I have a set of collinear line segments (may be mutually disjoint, contained, or overlapping).
I want to make a new set of line segments where the segments are disjoint or touching (not overlapping), and each line segment has a count of the original line segments that cover it.
For example, suppose the original set is (drawn non-collinearly for illustration):
A----------------------B
C---------------------------D
E-----F
G-------------H
I-------J
the desired new set would be:
A-------C---E-----F-----B-----------D G-------------H-------J
1 2 3 2 1 1 1
(only the point coordinates matter, the new set does not share point objects with the old set)
How can I achieve this with PostGIS?
Related question: suppose I start with a table of line segments, not all collinear, how do I write the entire query that groups the collinear segments together and then applies the solution to my first question?
Thanks for any help!

Setup (for later queries):
create table lines (
id serial primary key,
label text not null,
line_data geometry(linestring) not null
);
insert into lines(label, line_data)
values ('A-B', ST_MakeLine(ST_MakePoint(-3, -6), ST_MakePoint( 1, 2))),
('D-C', ST_MakeLine(ST_MakePoint( 2, 4), ST_MakePoint(-2, -4))),
('E-F', ST_MakeLine(ST_MakePoint(-1, -2), ST_MakePoint( 0, 0))),
('G-H', ST_MakeLine(ST_MakePoint( 3, 6), ST_MakePoint( 4, 8))),
('I-J', ST_MakeLine(ST_MakePoint( 4, 8), ST_MakePoint( 5, 10))),
('P-L', ST_MakeLine(ST_MakePoint( 1, 0), ST_MakePoint( 2, 2))),
('X-Y', ST_MakeLine(ST_MakePoint( 2, 2), ST_MakePoint( 0, 4)));
Notes:
I purposely switched your D and C points to demonstrate a need for vector negation
The P-L line is parallel with your example lines (but not collinear)
The X-Y line has nothing to do with the others
the solutions below obviously won't work, when you have linestrings that have more than 2 points and those are not on the same line (so when a single linestring is not straight).
The ST_Union aggregate function can split your collinear linestrings. You'll just need to calculate how many lines are containing those.
However, grouping by collinearity is not that simple. I did not find any out-of-the-box solution for this, but you can calculate it (this will not calculate counts yet):
select string_agg(label, ','), ST_AsText(ST_Multi(ST_Union(line_data)))
from lines
group by (
select case
when ST_SRID(s) <> ST_SRID(e) then row(ST_SRID(s), s, null)
when ST_X(s) = ST_X(e) then row(ST_SRID(s), ST_SetSRID(ST_MakePoint(ST_X(s), 1.0), ST_SRID(s)), null)
when ST_Y(s) = ST_Y(e) then row(ST_SRID(s), ST_SetSRID(ST_MakePoint(1.0, ST_Y(e)), ST_SRID(s)), null)
else (
select row(
ST_SRID(s),
(select case
when ST_Y(rv) < 0
then ST_SetSRID(ST_MakePoint(-ST_X(rv), -ST_Y(rv)), ST_SRID(s))
else rv
end), -- normalized vector (negated when necessary, but same for all parallel lines)
(ST_X(e) * ST_Y(s) - ST_X(s) * ST_Y(e)) / (ST_X(e) - ST_X(s)) -- solution of the linear equaltion, where x=0
)
from coalesce(1.0 / nullif(ST_Distance(s, e), 0), 0) dmi, -- distance's multiplicative inverse
ST_TransScale(e, -ST_X(s), -ST_Y(s), dmi, dmi) rv -- raw vector (translated and scaled)
)
end
from ST_StartPoint(line_data) s,
ST_EndPoint(line_data) e
)
will produce:
X-Y | MULTILINESTRING((2 2,0 4))
P-L | MULTILINESTRING((1 0,2 2))
E-F,A-B,I-J,G-H,D-C | MULTILINESTRING((-3 -6,-2 -4),(-2 -4,-1 -2),(-1 -2,0 0),(0 0,1 2),(2 4,1 2),(3 6,4 8),(4 8,5 10))
To calculate counts, JOIN your original data again, where the splitted lines are contained by (ST_Contains) your original lines:
select ST_AsText(splitted_line), count(line_data)
from (select ST_Multi(ST_Union(line_data)) ml
from lines
group by (
select case
when ST_SRID(s) <> ST_SRID(e) then row(ST_SRID(s), s, null)
when ST_X(s) = ST_X(e) then row(ST_SRID(s), ST_SetSRID(ST_MakePoint(ST_X(s), 1.0), ST_SRID(s)), null)
when ST_Y(s) = ST_Y(e) then row(ST_SRID(s), ST_SetSRID(ST_MakePoint(1.0, ST_Y(e)), ST_SRID(s)), null)
else (
select row(
ST_SRID(s),
(select case
when ST_Y(rv) < 0
then ST_SetSRID(ST_MakePoint(-ST_X(rv), -ST_Y(rv)), ST_SRID(s))
else rv
end), -- normalized vector (negated when necessary, but same for all parallel lines)
(ST_X(e) * ST_Y(s) - ST_X(s) * ST_Y(e)) / (ST_X(e) - ST_X(s)) -- solution of the linear equaltion, where x=0
)
from coalesce(1.0 / nullif(ST_Distance(s, e), 0), 0) dmi, -- distance's multiplicative inverse
ST_TransScale(e, -ST_X(s), -ST_Y(s), dmi, dmi) rv -- raw vector (translated and scaled)
)
end
from ST_StartPoint(line_data) s,
ST_EndPoint(line_data) e)) al,
generate_series(1, ST_NumGeometries(ml)) i,
ST_GeometryN(ml, i) splitted_line
left join lines on ST_Contains(line_data, splitted_line)
group by splitted_line
will return:
LINESTRING(-3 -6,-2 -4) | 1
LINESTRING(-2 -4,-1 -2) | 2
LINESTRING(-1 -2,0 0) | 3
LINESTRING(0 0,1 2) | 2
LINESTRING(2 2,0 4) | 1
LINESTRING(1 0,2 2) | 1
LINESTRING(2 4,1 2) | 1
LINESTRING(3 6,4 8) | 1
LINESTRING(4 8,5 10) | 1

Related

Apply groupby in udf from a increase function Pyspark

I have the follow function:
import copy
rn = 0
def check_vals(x, y):
global rn
if (y != None) & (int(x)+1) == int(y):
return rn + 1
else:
# Using copy to deepcopy and not forming a shallow one.
res = copy.copy(rn)
# Increment so that the next value with start form +1
rn += 1
# Return the same value as we want to group using this
return res + 1
return 0
#pandas_udf(IntegerType(), functionType=PandasUDFType.GROUPED_AGG)
def check_final(x, y):
return lambda x, y: check_vals(x, y)
I need apply this function in a follow df:
index initial_range final_range
1 1 299
1 300 499
1 500 699
1 800 1000
2 10 99
2 100 199
So I need that follow output:
index min_val max_val
1 1 699
1 800 1000
2 10 199
See, that the grouping field there are a news abrangencies, that are the values min(initial) and max(final), until the sequence is broken, applying the groupBy.
I tried:
w = Window.partitionBy('index').orderBy(sf.col('initial_range'))
df = (df.withColumn('nextRange', sf.lead('initial_range').over(w))
.fillna(0,subset=['nextRange'])
.groupBy('index')
.agg(check_final("final_range", "nextRange").alias('check_1'))
.withColumn('min_val', sf.min("initial_range").over(Window.partitionBy("check_1")))
.withColumn('max_val', sf.max("final_range").over(Window.partitionBy("check_1")))
)
But, don't worked.
Anyone can help me?
I think pure Spark SQL API can solve your question and it doesn't need to use any UDF, which might be an impact of your Spark performance. Also, I think two window function is enough to solve this question:
df.withColumn(
'next_row_initial_diff', func.col('initial_range')-func.lag('final_range', 1).over(Window.partitionBy('index').orderBy('initial_range'))
).withColumn(
'group', func.sum(
func.when(func.col('next_row_initial_diff').isNull()|(func.col('next_row_initial_diff')==1), func.lit(0))
.otherwise(func.lit(1))
).over(
Window.partitionBy('index').orderBy('initial_range')
)
).groupBy(
'group', 'index'
).agg(
func.min('initial_range').alias('min_val'),
func.max('final_range').alias('max_val')
).drop(
'group'
).show(100, False)
Column next_row_initial_diff: Just like the lead you use to shift/lag the row and check if it's in sequence.
Column group: To group the sequence in index partition.

Polars Dataframe: Apply MinMaxScaler to a column with condition

I am trying to perform the following operation in Polars.
For value in column B which is below 80 will be scaled between 1 and 4, where as for anything above 80, will be set as 5.
df_pandas = pd.DataFrame(
{
"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"B": [50, 300, 80, 12, 105, 78, 66, 42, 61.5, 35],
}
)
test_scaler = MinMaxScaler(feature_range=(1,4)) # from sklearn.preprocessing
df_pandas.loc[df_pandas['B']<80, 'Test'] = test_scaler.fit_transform(df_pandas.loc[df_pandas['B']<80, "B"].values.reshape(-1,1))
df_pandas = df_pandas.fillna(5)
This is what I did with Polars:
# dt is a dictionary
dt = df.filter(
pl.col('B')<80
).to_dict(as_series=False)
below_80 = list(dt.keys())
dt_scale = list(
test_scaler.fit_transform(
np.array(dt['B']).reshape(-1,1)
).reshape(-1) # reshape back to one dimensional
)
# reassign to dictionary dt
dt['B'] = dt_scale
dt_scale_df = pl.DataFrame(dt)
dt_scale_df
dummy = df.join(
dt_scale_df, how="left", on="A"
).fill_null(5)
dummy = dummy.rename({"B_right": "Test"})
Result:
A
B
Test
1
50.0
2.727273
2
300.0
5.000000
3
80.0
5.000000
4
12.0
1.000000
5
105.0
5.000000
6
78.0
4.000000
7
66.0
3.454545
8
42.0
2.363636
9
61.5
3.250000
10
35.0
2.045455
Is there a better approach for this?
Alright, I have got 3 examples for you that should help you from which the last should be preferred.
Because you only want to apply your scaler to a part of a column, we should ensure we only send that part of the data to the scaler. This can be done by:
window function over a partition
partition_by
when -> then -> otherwise + min_max expression
Window function over partititon
This requires a python function that will be applied over the partitions. In the function itself we then have to check in which partition we are and deal with it accordingly.
df = pl.from_pandas(df_pandas)
min_max_sc = MinMaxScaler((1, 4))
def my_scaler(s: pl.Series) -> pl.Series:
if s.len() > 0 and s[0] > 80:
out = (s * 0 + 5)
else:
out = pl.Series(min_max_sc.fit_transform(s.to_numpy().reshape(-1, 1)).flatten())
# ensure all types are the same
return out.cast(pl.Float64)
df.with_column(
pl.col("B").apply(my_scaler).over(pl.col("B") < 80).alias("Test")
)
partition_by
This partitions the the original dataframe to a dictionary holding the different partitions. We then only modify the partitions as needed.
parts = (df
.with_column((pl.col("B") < 80).alias("part"))
.partition_by("part", as_dict=True)
)
parts[True] = parts[True].with_column(
pl.col("B").map(
lambda s: pl.Series(min_max_sc.fit_transform(s.to_numpy().reshape(-1, 1)).flatten())
).alias("Test")
)
parts[False] = parts[False].with_column(
pl.lit(5.0).alias("Test")
)
pl.concat([df for df in parts.values()]).select(pl.all().exclude("part"))
when -> then -> otherwise + min_max expression
This one I like best. We can make function that creates a polars expression that is the min_max scaling function you need. This will have best performance.
def min_max_scaler(col: str, predicate: pl.Expr):
x = pl.col(col)
x_min = x.filter(predicate).min()
x_max = x.filter(predicate).max()
# * 3 + 1 to set scale between 1 - 4
return (x - x_min) / (x_max - x_min) * 3 + 1
predicate = pl.col("B") < 80
df.with_column(
pl.when(predicate)
.then(min_max_scaler("B", predicate))
.otherwise(5).alias("Test")
)

PostGIS make buffer on LINESTRING Z to have a POLYGON Z

I have several LINESTRING Z geometies in PostgreSQL and they look like
LINESTRING Z (1 2 1,1 1 4)
I want to make a buffer around this linestring so that i can have a POLYGON Z geometry for further export to dxf.
I tried this
select st_astext(st_buffer('LINESTRING Z (1 2 1,1 1 4)'::geometry, 2)) as geom;
and it gives me
POLYGON((3 1,2.96157056080646 0.609819355967741,2.84775906502257 0.23463313526
9818,2.66293922460509 -0.111140466039206,2.41421356237309 -0.414213562373096,2.
1111404660392 -0.662939224605091,1.76536686473018 -0.847759065022574,1.39018064
403226 -0.961570560806461,1 -1,0.609819355967745 -0.961570560806461,0.234633135
269822 -0.847759065022574,-0.111140466039202 -0.662939224605092,-0.414213562373
094 -0.414213562373096,-0.662939224605089 -0.111140466039207,-0.847759065022572
0.234633135269818,-0.96157056080646 0.609819355967739,-1 1,-1 2,-0.96157056080
646 2.39018064403226,-0.847759065022572 2.76536686473018,-0.662939224605089 3.1
1114046603921,-0.414213562373094 3.4142135623731,-0.111140466039203 3.662939224
60509,0.234633135269821 3.84775906502257,0.609819355967744 3.96157056080646,1 4
,1.39018064403226 3.96157056080646,1.76536686473018 3.84775906502257,2.11114046
60392 3.66293922460509,2.41421356237309 3.4142135623731,2.66293922460509 3.1111
4046603921,2.84775906502257 2.76536686473018,2.96157056080646 2.39018064403226,
3 2,3 1)) (1 row)
which is in 2D POLYGON not POLYGON Z
How can I make it 3D?
I'm not totally sure what you want to achieve, but did you take a look at ST_Force3D?
SELECT
ST_AsText(
ST_Force3D(
ST_Buffer('LINESTRING Z (1 2 1,1 1 4)'::GEOMETRY, 2)));
It will return a POLYGON Z geometry:
POLYGON Z ((3 1 0,2.96157056080646 0.609819355967741 0,2.84775906502257 0.234633135269818 0,2.66293922460509 -0.111140466039206 0,2.41421356237309 -0.414213562373096 0,2.1111404660392 -0.662939224605091 0,1.76536686473018 -0.847759065022574 0,1.39018064403226 -0.961570560806461 0,1 -1 0,0.609819355967745 -0.961570560806461 0,0.234633135269822 -0.847759065022574 0,-0.111140466039202 -0.662939224605092 0,-0.414213562373094 -0.414213562373096 0,-0.662939224605089 -0.111140466039207 0,-0.847759065022572 0.234633135269818 0,-0.96157056080646 0.609819355967739 0,-1 1 0,-1 2 0,-0.96157056080646 2.39018064403226 0,-0.847759065022572 2.76536686473018 0,-0.662939224605089 3.11114046603921 0,-0.414213562373094 3.4142135623731 0,-0.111140466039203 3.66293922460509 0,0.234633135269821 3.84775906502257 0,0.609819355967744 3.96157056080646 0,1 4 0,1.39018064403226 3.96157056080646 0,1.76536686473018 3.84775906502257 0,2.1111404660392 3.66293922460509 0,2.41421356237309 3.4142135623731 0,2.66293922460509 3.11114046603921 0,2.84775906502257 2.76536686473018 0,2.96157056080646 2.39018064403226 0,3 2 0,3 1 0))
The function ST_Buffer discards the Z dimension, as stated in the documentation:
... This function ignores the third dimension (z) and will always give a
2-d buffer even when presented with a 3d-geometry.
EDIT:
This query sort of creates a buffer with the average Z value of a given LINESTRING Z.
WITH j AS (
SELECT
ST_DumpPoints(
ST_Buffer('LINESTRING Z (1 2 1,1 1 4)'::GEOMETRY, 2)
) AS pt,
(SELECT AVG(z) AS avg_z
FROM (SELECT ST_Z((ST_DumpPoints('LINESTRING Z (1 2 1,1 1 4)'::GEOMETRY)).geom) AS z) AS z) AS lsz
)
SELECT ST_AsText(
ST_MakePolygon(ST_MakeLine(ST_MakePoint(ST_X((pt).geom),ST_Y((pt).geom),lsz))))
FROM j
GROUP BY lsz;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
POLYGON Z ((3 1 2.5,2.96157056080646 0.609819355967741 2.5,2.84775906502257 0.234633135269818 2.5,2.66293922460509 -0.111140466039206 2.5,2.41421356237309 -0.414213562373096 2.5,2.1111404660392 -0.662939224605091 2.5,1.76536686473018 -0.847759065022574 2.5,1.39018064403226 -0.961570560806461 2.5,1 -1 2.5,0.609819355967745 -0.961570560806461 2.5,0.234633135269822 -0.847759065022574 2.5,-0.111140466039202 -0.662939224605092 2.5,-0.414213562373094 -0.414213562373096 2.5,-0.662939224605089 -0.111140466039207 2.5,-0.847759065022572 0.234633135269818 2.5,-0.96157056080646 0.609819355967739 2.5,-1 1 2.5,-1 2 2.5,-0.96157056080646 2.39018064403226 2.5,-0.847759065022572 2.76536686473018 2.5,-0.662939224605089 3.11114046603921 2.5,-0.414213562373094 3.4142135623731 2.5,-0.111140466039203 3.66293922460509 2.5,0.234633135269821 3.84775906502257 2.5,0.609819355967744 3.96157056080646 2.5,1 4 2.5,1.39018064403226 3.96157056080646 2.5,1.76536686473018 3.84775906502257 2.5,2.1111404660392 3.66293922460509 2.5,2.41421356237309 3.4142135623731 2.5,2.66293922460509 3.11114046603921 2.5,2.84775906502257 2.76536686473018 2.5,2.96157056080646 2.39018064403226 2.5,3 2 2.5,3 1 2.5))
(1 row)

Querying polygons that contain 4 points

I have 4 points that I always get, I would like to query if the polygon defined by a multipoint contains those 4 points. I’m using PostGIS and Postgres.
I'm also using OGR/GDAL for that purpose. Would someone provide me with the Query using SQL for that purpose.
This checks if the points (1 1), (2 2), (3 3), and (4 4) all lie inside the polygon defined by (0 0), (10 0), (10 10), (0 10) and (0 0):
SELECT st_contains(
st_polygon(
st_linefrommultipoint(
st_mpointfromtext(
'MULTIPOINT(0 0, 10 0, 10 10, 0 10, 0 0)'
)
),
0
),
st_mpointfromtext(
'MULTIPOINT(1 1, 2 2, 3 3, 4 4)'
)
);
So to find all multipoints that satisfy the criterion, you could use something like that:
SELECT id
FROM multipoints
WHERE st_contains(
st_polygon(
st_addpoint(
st_linefrommultipoint(
multipoints.geom
),
st_startpoint(
st_linefrommultipoint(
multipoints.geom
)
),
-1
),
st_srid(multipoints.geom)
),
st_mpointfromtext(
'MULTIPOINT(1 1, 2 2, 3 3, 4 4)',
8307
)
);
This assumes that the multipoints don't form a closed polygon (i.e., first point is equal to last).
I used SRID 8307 in my example, replace it with the one you need.

How to check 3D coordinates against 3D bounding box in PostGIS?

I would have imagined the obvious query was:
postgres=# SELECT ST_GeomFromText( 'POINT( 1 2 3 )' ) &&&
'BOX3D( -5 -5 -5, 5 5 5 )'::box3d;
But this results in
?column?
----------
f
As opposed to t.
The query seems to lose the z-coordinate from the bounding box completely. This also results in the following issue where a bounding box ranging from z=1 to z=2 will return t for a point at z=0:
galaxymap=# SELECT ST_GeomFromText( 'POINT( 0 0 0 )' ) &&&
'BOX3D( -1 -1 1, 1 1 2 )'::box3d;
?column?
----------
t
(1 row)
After an hour of googling I finally happened upon an e-mail conversation on the postgis-devel mailing list.
Our boxes are all broken.
There should be somewhere a wiki page or ticker or something about
options to improve the situation.
The suggested workaround seems to be using lines (or bounding diagonals, which I didn't try):
SELECT ST_MakePoint( 1, 2, 3 ) &&& ST_MakeLine(
ST_MakePoint( -10, -10, -10 ), ST_MakePoint( 10, 10, 10 ) );
?column?
----------
t