PostGIS equivalent of ArcMap Union - postgresql

What is the equivalent in PostGIS / PostgreSQL of the "Union" operation in ArcMap?
Say you have two shapefiles with two features each. (PostGIS equivalent: two tables with two rows with polygon geometries)
then the result would be 1 shapefile with 7 features. (PostGIS equivalent: Table with 7 rows with geometries)
I've looked at ST_Intersect, ST_Union and ST_Collect but can't find the right combination. Your help is much appreciated.

Here is a working query based on this answer from gis.stackexchange:
Read it from a) to d):
-- d) Extract the path number and the geom from the geometry dump
SELECT
(dump).path[1] id,
(dump).geom
FROM
(
-- c) Polygonize the unioned rings (returns a GEOMETRYCOLLECTION)
-- Dump them to return individual geometries
SELECT
ST_Dump(ST_Polygonize(geom)) dump
FROM
(
-- b) Union all rings in one big geometry
SELECT
ST_Union(geom) geom
FROM
(
-- a) First get the exterior ring from all geoms
SELECT
ST_ExteriorRing(geom) geom
FROM
rectangles
) a
) b
) c
Result:

Many thanks to Michael Entin
-- input data
with polys1 AS (
SELECT 1 df1, ST_GeogFromText('Polygon((0 0, 2 0, 2 2, 0 2, 0 0))') g
UNION ALL
SELECT 2, ST_GeogFromText('Polygon((2 2, 4 2, 4 4, 2 4, 2 2))')
),
polys2 AS (
SELECT 1 df2, ST_GeogFromText('Polygon((1 1, 3 1, 3 3, 1 3, 1 1))') g
UNION ALL
SELECT 2, ST_GeogFromText('Polygon((3 3, 5 3, 5 5, 3 5, 3 3))')
),
-- left and right unions
union1 AS (
SELECT ST_UNION_AGG(g) FROM polys1
),
union2 AS (
SELECT ST_UNION_AGG(g) FROM polys2
),
-- various combinations of intersections
pairs AS (
SELECT df1, df2, ST_INTERSECTION(a.g, b.g) g FROM polys1 a, polys2 b WHERE ST_INTERSECTS(a.g, b.g)
UNION ALL
SELECT df1, NULL, ST_DIFFERENCE(g, (SELECT * FROM union2)) g FROM polys1
UNION ALL
SELECT NULL, df2, ST_DIFFERENCE(g, (SELECT * FROM union1)) g FROM polys2
)
SELECT * FROM pairs WHERE NOT ST_IsEmpty(g)

Related

Convert jsonb in PostgreSQL to rows without cycle

ffI have a json array stored in my postgres database. The first table "Orders" looks like this:
order_id, basket_items_id
1, {1,2}
2, {3}
3, {1,2,3,1}
Second table "Items" looks like this:
item_id, price
1,5
2,3
3,20
Already tried to load data with multiple sql and select of different jsonb record, but this is not a silver bullet.
SELECT
sum(price)
FROM orders
INNER JOIN items on
orders.basket_items_id = items.item_id
WHERE order_id = 3;
Want to get this as output:
order_id, basket_items_id, price
1, 1, 5
1, 2, 3
2, 3, 20
3, 1, 5
3, 2, 3
3, 3, 20
3, 1, 5
or this:
order_id, sum(price)
1, 8
2, 20
3, 33
demo:db<>fiddle
SELECT
o.order_id,
elems.value::int as basket_items_id,
i.price
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
ORDER BY 1,2,3
jsonb_array_elements_text expands the jsonb array into one row each element. With this you are able to join against your second table directly
Since the expanded array gives you text elements you have to cast them into integers using ::int
Of course you can GROUP and SUM aggregate this as well:
SELECT
o.order_id,
SUM(i.price)
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
GROUP BY o.order_id
ORDER BY 1
Is your orders.basket_items_id column of type jsonb or int[]?
If the type is jsonb you can use json_array_elements_text to expand the column:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
jsonb_array_elements_text(basket_items_id)::int basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-Fiddle.
If the type is int[] (array of integers), you can run a similar query with the unnest function:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
unnest(basket_items_id) basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-fiddle

Divide table raw into chunks in Postgres with st_dwithin limit

I got a table with linestrings that I want to divide into chunks that have a list of id not higher than provided number for each and store only lines that are within certain distance.
For example, I got a table with 14 rows
create table lines ( id integer primary key, geom geometry(linestring) );
insert into lines (id, geom) values ( 1, 'LINESTRING(0 0, 0 1)');
insert into lines (id, geom) values ( 2, 'LINESTRING(0 1, 1 1)');
insert into lines (id, geom) values ( 3, 'LINESTRING(1 1, 1 2)');
insert into lines (id, geom) values ( 4, 'LINESTRING(1 2, 2 2)');
insert into lines (id, geom) values ( 11, 'LINESTRING(2 2, 2 3)');
insert into lines (id, geom) values ( 12, 'LINESTRING(2 3, 3 3)');
insert into lines (id, geom) values ( 13, 'LINESTRING(3 3, 3 4)');
insert into lines (id, geom) values ( 14, 'LINESTRING(3 4, 4 4)');
create index lines_gix on lines using gist(geom);
I want to split it into chunks with 3 ids for each chunk with lines that are within 2 meters from each other or the first one.
The result I am trying to get from this example is:
| Chunk No.| Id chunk list |
|----------|----------------|
| 1 | 1, 2, 3 |
| 2 | 4, 5, 6 |
| 3 | 7, 8, 9 |
| 4 | 10, 11, 12 |
| 5 | 13, 14 |
I tried to use st_clusterwithin but when lines are close to each other it will return all of them not split into chunks.
I also tried to use some with recursive magic like the one from the answer provided by Paul Ramsey here. But I don't know how to modify the query to return limited grouped id list.
I am not sure if it is the best possible answer so if anyone has a better method or know how to improve provided answer feel free to update it. With a little modification of Paul answer, I've managed to create following queries that are doing what I asked for.
-- Create function for easier interaction
CREATE OR REPLACE FUNCTION find_connected(integer, double precision, integer, integer[])
returns integer[] AS
$$
WITH RECURSIVE lines_r AS -- Recursive allow to use the same query on the output - is like continues append to result and use it inside a query
(SELECT ARRAY[id] AS idlist,
geom, id
FROM lines
WHERE id = $1
UNION ALL
SELECT array_append(lines_r.idlist, lines.id) AS idlist, -- append id list to array
lines.geom AS geom, -- keep geometry
lines.id AS id -- keep source table id
FROM (SELECT * FROM lines WHERE NOT $4 #> array[id]) lines, lines_r -- from source table and recursive table
WHERE ST_DWITHIN(lines.geom, lines_r.geom, $2) -- where lines are within 2 meters
AND NOT lines_r.idlist #> ARRAY[lines.id] -- recursive id list array not contain lines array
AND array_length(idlist, 1) <= $3
)
SELECT idlist
FROM lines_r WHERE array_length(idlist, 1) <= $3 ORDER BY array_length(idlist, 1) DESC LIMIT 1;
$$
LANGUAGE 'sql';
-- Create id chunks
WITH RECURSIVE groups_r AS (
(SELECT find_connected(id, 2, 3, ARRAY[id]) AS idlist, find_connected(id, 2, 3, ARRAY[id]) AS grouplist, id
FROM lines WHERE id = 1)
UNION ALL
(SELECT array_cat(groups_r.idlist, find_connected(lines.id, 2, 3, groups_r.idlist)) AS idlist,
find_connected(lines.id, 2, 3, groups_r.idlist) AS grouplist,
lines.id
FROM lines,
groups_r
WHERE NOT groups_r.idlist #> ARRAY[lines.id]
LIMIT 1))
SELECT
-- (SELECT array_agg(DISTINCT x) FROM unnest(idlist) t (x)) idlist, -- left for better understanding what is happening
row_number() OVER () chunk_id,
(SELECT array_agg(DISTINCT x) FROM unnest(grouplist) t (x)) grouplist,
id input_line_id
FROM groups_r;
The only problem is that performance is quite pure when the number of ids in the chunk increase. For a table with 300 rows and 20 ids per chunk, execution time is around 15 min, even with indexes on geometry and id columns.

Summarizing Only Rows with given criteria

all!
Given the following table structure
DECLARE #TempTable TABLE
(
idProduct INT,
Layers INT,
LayersOnPallet INT,
id INT IDENTITY(1, 1) NOT NULL,
Summarized BIT NOT NULL DEFAULT(0)
)
and the following insert statement which generates test data
INSERT INTO #TempTable(idProduct, Layers, LayersOnPallet)
SELECT 1, 2, 4
UNION ALL
SELECT 1, 2, 4
UNION ALL
SELECT 1, 1, 4
UNION ALL
SELECT 2, 2, 4
I would like to summarize only those rows (by the Layers only) with the same idProduct and which will have the sum of layers equal to LayersOnPallet.
A picture is worth a thousand words:
From the picture above, you can see that only the first to rows were summarized because both have the same idProduct and the sum(layers) will be equal to LayersOnPallet.
How can I achieve this? It's there any way to do this only in selects (not with while)?
Thank you!
Perhaps this will do the trick. Note my comments:
-- your sample data
DECLARE #TempTable TABLE
(
idProduct INT,
Layers INT,
LayersOnPallet INT,
id INT IDENTITY(1, 1) NOT NULL,
Summarized BIT NOT NULL DEFAULT(0)
)
INSERT INTO #TempTable(idProduct, Layers, LayersOnPallet)
SELECT 1, 2, 4 UNION ALL
SELECT 1, 2, 4 UNION ALL
SELECT 1, 1, 4 UNION ALL
SELECT 2, 2, 4;
-- an intermediate temp table used for processing
IF OBJECT_ID('tempdb..#processing') IS NOT NULL DROP TABLE #processing;
-- let's populate the #processing table with duplicates
SELECT
idProduct,
Layers,
LayersOnPallet,
rCount = COUNT(*)
INTO #processing
FROM #tempTable
GROUP BY
idProduct,
Layers,
LayersOnPallet
HAVING COUNT(*) > 1;
-- Remove the duplicates
DELETE t
FROM #TempTable t
JOIN #processing p
ON p.idProduct = t.idProduct
AND p.Layers = t.Layers
AND p.LayersOnPallet = t.LayersOnPallet
-- Add the new, updated record
INSERT #TempTable
SELECT
idProduct,
Layers * rCount,
LayersOnPallet, 1
FROM #processing;
DROP TABLE #processing; -- cleanup
-- Final output
SELECT idProduct, Layers, LayersOnPallet, Summarized
FROM #TempTable;
Results:
idProduct Layers LayersOnPallet Summarized
----------- ----------- -------------- ----------
1 4 4 1
1 1 4 0
2 2 4 0

Is Last Observation Carried Forward (LOCF) implemented in PostgreSQL?

Is the data imputation method Last Observation Carried Forward (LOCF) implemented in PostgreSQL?
If not, how could I implement this method?
The following code assumes a table tbl with columns a, b (keys), t (time) and v (value to locf impute):
create or replace function locf_s(a float, b float)
returns float
language sql
as '
select coalesce(b, a)
';
drop aggregate if exists locf(float);
CREATE AGGREGATE locf(FLOAT) (
SFUNC = locf_s,
STYPE = FLOAT
);
select a,b,t,v,
locf(v) over (PARTITION by a,b ORDER by t) as v_locf
from tbl
order by a,b,t
;
(SQLFiddle)
For a tutorial: "LOCF and Linear Imputation with PostgreSQL"
I based this table and data directly on the table in the linked article.
create table test (
unit integer not null
check (unit >= 1),
obs_time integer not null
check (obs_time >= 1),
obs_value numeric(5, 1),
primary key (unit, obs_time)
);
insert into test values
(1, 1, 3.8), (1, 2, 3.1), (1, 3, 2.0),
(2, 1, 4.1), (2, 2, 3.5), (2, 3, 3.8), (2, 4, 2.4), (2, 5, 2.8), (2, 6, 3.0),
(3, 1, 2.7), (3, 2, 2.4), (3, 3, 2.9), (3, 4, 3.5);
For the six observations in the linked article we need all the possible combinations of "unit" and "obs_time".
select distinct unit, times.obs_time
from test
cross join (select generate_series(1, 6) obs_time) times;
unit obs_time
--
1 1
1 2
1 3
1 4
1 5
1 6
2 1
. . .
3 6
We also need to know which row has the last observed value in it for each unit.
select unit, max(obs_time) obs_time
from test
group by unit
order by unit;
unit obs_time
--
1 3
2 6
3 4
Knowing those two sets, we can join and coalesce to get the last observation and carry it forward.
with unit_times as (
select distinct unit, times.obs_time
from test
cross join (select generate_series(1, 6) obs_time) times
), last_obs_time as (
select unit, max(obs_time) obs_time
from test
group by unit
)
select t1.unit, t1.obs_time,
coalesce(t2.obs_value, (select obs_value
from test
inner join last_obs_time
on test.unit = last_obs_time.unit
and test.obs_time = last_obs_time.obs_time
where test.unit = t1.unit)) obs_value
from unit_times t1
left join test t2
on t1.unit = t2.unit and t1.obs_time = t2.obs_time
order by t1.unit, t1.obs_time;
unit obs_time obs_value
--
1 1 3.8
1 2 3.1
1 3 2.0
1 4 2.0
1 5 2.0
1 6 2.0
2 1 4.1
. . .
3 4 3.5
3 5 3.5
3 6 3.5
To get the same visual output as the linked article shows, use the crosstab() function in the tablefunc module. You could also do that manipulation with application code.

Hierarchical query rollup Rollup

I have the following table:
parent_id child_id child_class
1 2 1
1 3 1
1 4 2
2 5 2
2 6 2
Parent_id represents a folder id. Child id represents either a child folder (where child_class=1) or child file (where child_class=2).
I'd like to get a rollup counter (bottom up) of all files only (child_class=2) the following way. for example if C is a leaf folder (no child folders) with 5 files, and B is a parent folder of C that has 4 files in it, the counter on C should say 5 and the counter on B should say 9 (=5 from C plus 4 files in B) and so forth recursively going bottom up taking into consideration sibling folders etc.
In the example above I expect the results below (notice 3 is a child folder with no files in it):
parent_id FilesCounter
3 0
2 2
1 3
I prefer an SQL query for performance but function is also possible.
I tried mixing hirarchical query with rollup (sql 2008 r2) with no success so far.
Please advise.
This CTE should do the trick... Here is the SQLFiddle.
SELECT parent_id, child_id, child_class,
(SELECT COUNT(*) FROM tbl a WHERE a.parent_id = e.parent_id AND child_class <> 1) AS child_count
INTO tbl2
FROM tbl e
;WITH CTE (parent_id, child_id, child_class, child_count)
AS
(
-- Start with leaf nodes
SELECT parent_id, child_id, child_class, child_count
FROM tbl2
WHERE child_id NOT IN (SELECT parent_id from tbl)
UNION ALL
-- Recursively go up the chain
SELECT e.parent_id, e.child_id, e.child_class, e.child_count + d.child_count
FROM tbl2 e
INNER JOIN CTE AS d
ON e.child_id = d.parent_id
)
-- Statement that executes the CTE
SELECT FOLDERS.parent_id, max(ISNULL(child_count,0)) FilesCounter
FROM (SELECT parent_id FROM tbl2 WHERE parent_id NOT IN (select child_id from tbl2)
UNION
SELECT child_id FROM tbl2 WHERE child_class = 1) FOLDERS
LEFT JOIN CTE ON FOLDERS.parent_id = CTE.parent_id
GROUP BY FOLDERS.parent_id
Zak's answer was close, but the root folder did not rollup well. The following does the work:
with par_child as (
select 1 as parent_id, 2 as child_id, 1 as child_class
union all select 1, 3, 1
union all select 1, 4, 2
union all select 2, 5, 1
union all select 2, 6, 2
union all select 2, 10, 2
union all select 3, 11, 2
union all select 3, 7 , 2
union all select 5, 8 , 2
union all select 5, 9 , 2
union all select 5, 12, 1
union all select 5, 13, 1
)
, child_cnt as
(
select parent_id as root_parent_id, parent_id, child_id, child_class, 1 as lvl from par_child union all
select cc.root_parent_id, pc.parent_id, pc.child_id, pc.child_class, cc.lvl + 1 as lvl from
par_child pc join child_cnt cc on (pc.parent_id=cc.child_id)
),
distinct_folders as (
select distinct child_id as folder_id from par_child where child_class=1
)
select root_parent_id, count(child_id) as cnt from child_cnt where child_class=2 group by root_parent_id
union all
select folder_id, 0 from distinct_folders df where not exists (select 1 from par_child pc where df.folder_id=pc.parent_id)