Is there a PostGIS function to conditionally merge linestring geometries to the neighboring ones? - postgresql

I had a lines (multilinestring) table in my PostGIS database (Postgres 11), which I have converted to linestrings and also checked the validity (ST_IsValid()) of new linestring geometries.
create table my_line_tbl as
select
gid gid_multi,
adm_code, t_count,
st_length((st_dump(st_linemerge(geom))).geom)::int len,
(st_dump(st_linemerge(geom))).geom geom
from
my_multiline_tbl
order by gid;
alter table my_line_tbl add column id serial primary key not null;
The first 10 rows look like this:
id, gid_multi, adm_code, t_count, len, geom
1, 1, 30, 5242, 407, LINESTRING(...)
2, 1, 30, 3421, 561, LINESTRING(...)
3, 2, 50, 5248, 3, LINESTRING(...)
4, 2, 50, 1458, 3, LINESTRING(...)
5, 2, 60, 2541, 28, LINESTRING(...)
6, 2, 30, 3325, 4, LINESTRING(...)
7, 2, 20, 1142, 5, LINESTRING(...)
8, 2, 30, 1425, 7, LINESTRING(...)
9, 3, 30, 2254, 4, LINESTRING(...)
10, 3, 50, 2254, 50, LINESTRING(...)
I am trying to develop the logic.
Find all <= 10 m segments and merge those to neighboring geometries
(previous or next) > 10 m
If there are many <= 10 m segments next to
each other merge them to make > 10 m segments (min length: > 10 m)
In case of intersections, merge any <= 10 m segments to the longest neighboring geometry
I thought of using SQL window functions to check the length (st_length()) of succeeding geometries (lead(id) over()), and then merging them, but the problem with this approach is, the successive IDs are not next to each other (do not intersects, st_intersects()).
My code attempt (dynamic SQL) is here, where I try to separate <= 10 and > 10 meter geometries.
with lt10mseg as (
select
id, gid_multi,
len, geom lt10m_geom
from
my_line_tbl
where len <= 10
order by id
), gt10mseg as (
select
id, gid_multi,
len, geom gt10m_geom
from
my_line_tbl
where len > 10
order by id
)
select
st_intersects(lt10m_geom,gt10m_geom)
from
lt10mseg, gt10mseg
order by id
Any help/suggestions (dynamic SQL/PLPGSQL) to continue develop the above logic? The ultimate goal is to get rid of <= 10 m segments by merging them to the neighbors.

Related

How to approach summing up individual sets of columns per id in postgreSQL?

Task: I need to sum up relevant values from a json for a specific id. How can I accomplish this in postgreSQL?
I receive post insights from Facebook's Graph API and it contains a cell with a json listing countries with their two letter abbreviation and the corresponding watchtime in ms from that country.
post_id
date
watchtime_per_country
107_[pageID]
2022-09-01
** see json below **
The second part is a table that contains the relevant countries for each [page_id]
page_id
target country
P01
Germany (DE)
P01
Italy (IT)
P02
Mozambique (MZ)
P02
Colombia (CO)
Now I would like to get the sum of
Germany (DE): 162 and Japan (JP): 24 --> 186 for P01
Mozambique (MZ): 3 and 6 --> 9 for P02
So far I have unnested the json and unpacked all possible +-250 country values into own columns but I am not sure whether this is a good approach. After that I am not sure how to build those sums in a flexible efficient way. Or whether it is possible at all in postgreSQL.
Does anyone have an idea?
**** json ****
{"Brazil (BR)": 9210, "Germany (DE)": 162, "Portugal (PT)": 68, "Japan (JP)": 24, "United States (US)": 17, "Italy (IT)": 13, "France (FR)": 9, "United Kingdom (GB)": 8, "Netherlands (NL)": 6, "Belgium (BE)": 6, "Colombia (CO)": 6, "Austria (AT)": 5, "Sweden (SE)": 4, "Canada (CA)": 4, "Argentina (AR)": 3, "Mozambique (MZ)": 3, "Angola (AO)": 3, "Switzerland (CH)": 2, "Saudi Arabia (SA)": 2, "New Zealand (NZ)": 2, "Norway (NO)": 2, "Indonesia (ID)": 2, "Denmark (DK)": 2, "United Arab Emirates (AE)": 2, "Russia (RU)": 2, "Spain (ES)": 1, "China (CN)": 1, "Israel (IL)": 1, "Chile (CL)": 0, "Bulgaria (BG)": 0, "Australia (AU)": 0, "Cape Verde (CV)": 0, "Ireland (IE)": 0, "Egypt (EG)": 0, "Luxembourg (LU)": 0, "Bolivia (BO)": 0, "Paraguay (PY)": 0, "Uruguay (UY)": 0, "Czech Republic (CZ)": 0, "Hungary (HU)": 0, "Finland (FI)": 0, "Algeria (DZ)": 0, "Peru (PE)": 0, "Mexico (MX)": 0, "Guinea-Bissau (GW)": 0}
You have a couple ways you can go. If you will do little the post insights then you can get page sums directly processing the JSON.
Your later comment indicates there may be more. Unpacking the JSON into a single table is the way to go; it is data normalization.
One very slight correction. The 2 character is not a MS coding for the country. It is the ISO-3166 alpha-2 code (defined in ISO-3166-1) (yes MS uses it).
Either way the first step is to extract the keys from the JSON then use those keys to extract the Values. Then JOIN the relevant_countries table on the alpha2 code.
with work_insights (jkey,country_watchtime) as
( select json_object_keys(country_watchtime), country_watchtime
from insights_data_stage
)
, watch_insights(cntry, alpha2, watchtime) as
( select trim(replace(substring(jkey, '^.*\('),'(',''))
, upper(trim(replace(replace(substring(jkey, '\(.*\)'),'(',''),')','')) )
, (country_watchtime->> jkey)::numeric
from work_insights
)
, relevant_codes (page_id, alpha2) as
( select page_id, substring(substring(target_country, '\(..\)'),2,2) alpha2
from relevant_countries
)
select rc.page_id, sum(watchtime) watchtime
from relevant_codes rc
join watch_insights wi
on (wi.alpha2 = rc.alpha2)
where rc.page_id in ('P01','P02')
group by rc.page_id
order by rc.page_id;
For the normalization process you need a country table (as you already said you have) and another table for the normalized insights data. Populating begins the same
parsing as above, but developing columns for each value. Once created you JOIN this table with relevant_countries. (See demo containing both). Note: I normalized the relevant_countries table.
select rc.page_id, sum(pi.watchtime) watchtime
from post_insights pi
join relevant_countries_rev rc on (rc.alpha2 = pi.alpha2)
group by rc.page_id
order by rc.page_id;
Update: The results for P01 do not match your expected results. Your expectations indicate to sum Germany and Japan, but your relevant_countries table indicates Germany and Italy.

How to check if multilinestring really is multilinestring?

I have a huge database with a road network, and the geometry type is MULTILINESTRING. I would like to filter out the MULTILINESTRINGS with topological errors. Both the lines on the left side and on the right side are one-one record, made of two lines. Now on the right side they connect, so it doesn't really bother me, I can merge them later without a topological error. However on the left side they don't connect, but they still are one record.
What I've tried so far:
SELECT gid
FROM myschema.roads
WHERE (
NOT ST_Equals(ST_Endpoint(ST_GeometryN(the_geom,1 )),ST_Startpoint(ST_GeometryN(the_geom,2 )))
AND NOT ST_Equals(ST_Endpoint(ST_GeometryN(the_geom,2 )),ST_Startpoint(ST_GeometryN(the_geom,1 )))
)
If I could say that the MULTILINESTRINGS are made up of maximum two lines, it would work I assume. Unfortunatelly some of them are made up of 10-20 lines, and I cannot be sure that the line parts are folowwing each other in an ascending or descending order. So extending my SQL script is not an option in my opinion.
(I'm using QGIS with a PostGIS database, but I also posess ArcMap.)
If you're simply looking for a way to identify which MultiLineStrings contain more than one line you can simply use ST_LineMerge, then ST_Dump and count the returning LineStrings. In case a geometry contains non continuous lines the query will return a count bigger than 1, e.g.
WITH j (geom) AS (
VALUES ('MULTILINESTRING((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))'),
('MULTILINESTRING((10 10, 20 20, 10 40),(10 40, 30 30, 40 20, 30 10))'))
SELECT geom,(SELECT count(*) FROM ST_Dump(ST_LineMerge(geom)))
FROM j;
geom | count
---------------------------------------------------------------------+-------
MULTILINESTRING((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10)) | 2
MULTILINESTRING((10 10, 20 20, 10 40),(10 40, 30 30, 40 20, 30 10)) | 1
(2 Zeilen)
Another alternative is to use ST_NumGeometries after applying ST_LineMerge, e.g.
WITH j (geom) AS (
VALUES ('MULTILINESTRING((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))'),
('MULTILINESTRING((10 10, 20 20, 10 40),(10 40, 30 30, 40 20, 30 10))'))
SELECT geom,ST_NumGeometries(ST_LineMerge(geom)) AS count
FROM j;
geom | count
---------------------------------------------------------------------+-------
MULTILINESTRING((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10)) | 2
MULTILINESTRING((10 10, 20 20, 10 40),(10 40, 30 30, 40 20, 30 10)) | 1
(2 Zeilen)
You could use this function to check if the multilinestring is connected:
CREATE OR REPLACE FUNCTION is_connected(g geometry(MultiLineString)) RETURNS boolean
LANGUAGE plpgsql AS
$$DECLARE
i integer;
point geometry := NULL;
part geometry;
BEGIN
FOR i IN 1..ST_NumGeometries(g) LOOP
part := ST_GeometryN(g, i);
IF NOT ST_Equals(point, ST_Startpoint(part)) THEN
RETURN FALSE;
END IF;
point := ST_Endpoint(part);
END LOOP;
RETURN TRUE;
END;$$;

Multiple Cases For Same Result Column

Have a table where one of the columns has all of the info I need for a report.
I want to substring certain portions of this column as a column in this report, but the problem is that this column has results from 3 varying character lengths.
Example:
Row1: 20180101_ABC_12
Row2: 20180102_DEFG_23
Row3: 20180103_HIJKL_45
In this particular example I want the middle portion (eg. ABC) to be a column called 'Initials', problem is I am using CASE logic for each LEN. Not sure else how to achieve this.
My sample query below. It pulls all of the possible options, but as separate columns. What would I need to do to have these 3 options pull into one column, let's call it 'Initials'?
Thanks
SELECT
FileName
, CASE WHEN LEN(FileName) = 10 THEN SUBSTRING(FileName, 10, 3) ELSE NULL END
, CASE WHEN LEN(FileName) = 11 THEN SUBSTRING(FileName, 10, 4) ELSE NULL END
, CASE WHEN LEN(FileName) = 12 THEN SUBSTRING(FileName, 10, 5) ELSE NULL END
FROM File
In Tableau, you would accomplish this using a calculated field.
Initials:
CASE LEN(FileName)
WHEN 10 THEN SUBSTRING(FileName, 10, 3)
WHEN 11 THEN SUBSTRING(FileName, 10, 4)
WHEN 12 THEN SUBSTRING(FileName, 10, 5)
END
Or maybe
SUBSTRING(FileName
,10
,CASE LEN(FileName)
WHEN 10 THEN 3
WHEN 11 THEN 4
WHEN 12 THEN 5
END
)
But barring the more technical aspect, this can be solved with math (assuming your data is either limited to the 10, 11, and 12, or that the pattern holds):
SUBSTRING(FileName
,10
,LEN(FileName)-7
)
You need 1 CASE statement covering every possible case and not 3 separate ones, because each one creates a new column:
SELECT
FileName
, CASE LEN(FileName)
WHEN 10 THEN SUBSTRING(FileName, 10, 3)
WHEN 11 THEN SUBSTRING(FileName, 10, 4)
WHEN 12 THEN SUBSTRING(FileName, 10, 5)
ELSE NULL
END AS Initials
FROM File
Another way to get everything between the 2 _:
SELECT
FileName
, substring(
left(FileName, len(FileName) - charindex('_', reverse(FileName) + '_')),
charindex('_', FileName) + 1,
len(FileName)
) AS Initials
FROM File
but from your logic I assume that the values in column FileName have the same patern:
<9 digits>_<Initials>_<2 digits>
If this is the case then you can get what you want like this:
SELECT
FileName
, substring(FileName, 10, len(FileName) - 12) AS Initials
FROM File

postgres 10 - convert categorical column to presence absence matrix

I'd like to create a new table like so:
original table:
site_id, site_period
1, period_a
2, period_b
2, period_c
3, period_d
4, period_a
4, period_b
desired table:
site_id, period_a, period_b, period_c, period_d
1, 1, 0, 0, 0
2, 0, 1, 1, 0
3, 0, 0, 0, 1
4, 1, 1, 0, 0
This is probably a duplicate question as this is a relatively simple problem, but I didn't know what vocabulary to use to describe it to find a solution. I'm familiar with coding logic, but not terribly comfortable with sql queries. Thanks!
You can use CREATE TABLE ... AS SELECT and conditional aggregation.
CREATE TABLE desiredtable
AS
SELECT site_id,
count(CASE site_period
WHEN 'period_a' THEN
1
END) period_a,
...
count(CASE site_period
WHEN 'period_d' THEN
1
END) period_d
FROM originaltable
GROUP BY site_id;

What does the exclude_nodata_value argument to ST_DumpValues do?

Could anyone explain what the exclude_nodata_value argument to ST_DumpValues does?
For example, given the following:
WITH
-- Create a raster 4x4 raster, with each value set to 8 and NODATA set to -99.
tbl_1 AS (
SELECT
ST_AddBand(
ST_MakeEmptyRaster(4, 4, 0, 0, 1, -1, 0, 0, 4326),
1, '32BF', 8, -99
) AS rast
),
-- Set the values in rows 1 and 2 to -99.
tbl_2 AS (
SELECT
ST_SetValues(
rast, 1, 1, 1, 4, 2, -99, FALSE
) AS rast FROM tbl_1)
Why does the following select statement return NULLs in the first two rows:
SELECT ST_DumpValues(rast, 1, TRUE) AS cell_values FROM tbl_2;
Like this:
{{NULL,NULL,NULL,NULL},{NULL,NULL,NULL,NULL},{8,8,8,8},{8,8,8,8}}
But the following select statement return -99s?
SELECT ST_DumpValues(rast, 1, FALSE) AS cell_values FROM tbl_2;
Like this:
{{-99,-99,-99,-99},{-99,-99,-99,-99},{8,8,8,8},{8,8,8,8}}
Clearly, with both statements the first two rows really contain -99s. However, in the first case (exclude_nodata_value=TRUE) these values have been masked (but not replaced) by NULLS.
Thanks for any help. The subtle differences between NULL and NODATA within PostGIS have been driving me crazy for several days.