Choose "strongest" intersected area - postgresql

I have a materialized view which is the result of a spatial joint using st_intersect of two polygons layers. Table1 and table2, features of table1 can be itnersected for few polygons of table2, thsi is how i create the mview:
SELECT g.field1,
att.ogc_fid,
st_intersection(g.geom, att.geom) AS intersect_geom,
st_area(g.geom) AS geom_area,
st_area(st_intersection(g.geom, att.geom)) AS intersect_area
FROM table1 g
JOIN table2 att ON g.geom && att.geom;
field1 | ogc_fid | intersect_geom| geom_area | intersect_area
aa12345 1 123123 123131 1313123414
aa12345 3 1 1 1
bb12345 2 4124141 13141 14415151
bb12345 1 1243141414 1231313 13131323
From this mview i want to pick just the strongest intersected area and join to a description coming from table2.. I have tried the code below:
select a.*, b.desc
from table1 a
left join lateral
(
select desc
table2
where table2.ogc_fid= table1.ogc_fid
order by (intersect_area/geom_area) DESC NULLS LAST
limit 1
) b
field1 | ogc_fid | intersect_geom| geom_area | intersect_area | desc
aa12345 1 123123 123131 1313123414 desc for 1
bb12345 2 4124141 13141 14415151 desc for 2
but results here are not the expected ones. I went through other threads but im stuck when trying to get just one result (the strongest), and create a table with those strongest intersection so for one feature in table one i have the most strongest intersected.

If I understood you right, you have done the hard bit already. You just need to pick the one record per field from the view and join with table2... So try this:
SELECT DISTINCT ON (field1) field1, m.ogc_fid, b.desc FROM
mview AS m
INNER JOIN table2 AS b ON b.ogc_fid = m.ogc_fid
ORDER BY field1, (intersect_area/geom_area) DESC

Related

Anyway to UNION 3 columns with 2 in Spark SQL

spark.sql("""(SELECT DISTINCT game_id,winner as player_name from chess_wc_history_game_info WHERE winner!='draw' GROUP BY game_id,event,winner)
UNION
((SELECT game_id, player as player_name FROM chess_wc_history_moves WHERE black_queen_count=0 AND color='Black')
UNION
(SELECT game_id, player as player_name FROM chess_wc_history_moves WHERE white_queen_count=0 AND color='White'))""").show()
This outputs:
| game_id| player_name|
| -------------------|--------------------|
|61b784cc-cdab-496...| Morozevich,A|
|39a6b655-19d8-419...| Karpov, Anatoly|
|a744139e-aff8-4d3...| Tal, Mihail|
|e945781f-92a2-4fb...| Sargissian,G|
|f9307e55-3eff-477...| Adams,Mi|
|0230130d-ee51-4f9...| Barua, Dibyendu|
|3d34d86e-216e-41f...| Tiviakov, Sergei|
Expected Output:
| game_id| player_name|event |
| -------------------|--------------------|---------------|
|61b784cc-cdab-496...| Morozevich,A| Event names |
|39a6b655-19d8-419...| Karpov, Anatoly| |
|a744139e-aff8-4d3...| Tal, Mihail| |
|e945781f-92a2-4fb...| Sargissian,G| |
|f9307e55-3eff-477...| Adams,Mi| |
Whereas adding the event column which is contained inside the chess_wc_history_game_info table makes the query invalid saying UNION can only be performed on two columns whereas first one has 3 columns and second has 2 columns, is there anyway I can SELECT all 3 at once without changing the results? (game_id,event,winner)
From the names of your tables, it seems that you can get the event name from joining to the chess_wc_history_game_info table. See the SQL below for a solution based on this.
However, it also seems that the first SELECT statement will have data that overlaps with the second two SELECT statements. Do you need to UNION any data at all, or can you just select from chess_wc_history_game_info and ignore the other table?
This approach uses a LEFT join so that if no matching event info is found, the event will just be null.
spark.sql(
"""(SELECT DISTINCT game_id,winner as player_name, event from chess_wc_history_game_info WHERE winner!='draw' GROUP BY game_id,event,winner)
UNION
((SELECT m.game_id, player as player_name, event FROM chess_wc_history_moves m LEFT JOIN chess_wc_history_game_info i on m.game_id = i.game_id WHERE black_queen_count=0 AND color='Black')
UNION
(SELECT m.game_id, player as player_name, event FROM chess_wc_history_moves m LEFT JOIN chess_wc_history_game_info i on m.game_id = i.game_id WHERE white_queen_count=0 AND color='White'))"""
).show()

T-SQL select all IDs that have value A and B

I'm trying to find all IDs in TableA that are mentioned by a set of records in TableB and that set if defined in Table C. I've come so far to the point where a set of INNER JOIN provide me with the following result:
TableA.ID | TableB.Code
-----------------------
1 | A
1 | B
2 | A
3 | B
I want to select only the ID where in this case there is an entry for both A and B, but where the values A and B are based on another Query.
I figured this should be possible with a GROUP BY TableA.ID and HAVING = ALL(Subquery on table C).
But that is returning no values.
Since you did not post your original query, I will assume it is inside a CTE. Assuming this, the query you want is something along these lines:
SELECT ID
FROM cte
WHERE Code IN ('A', 'B')
GROUP BY ID
HAVING COUNT(DISTINCT Code) = 2;
It's an extremely poor question, but you you probably need to compare distinct counts against table C
SELECT a.ID
FROM TableA a
GROUP BY a.ID
HAVING COUNT(DISTINCT a.Code) = (SELECT COUNT(*) FROM TableC)
We're guessing though.

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

Join on a query returns more than one row

I have a query
SELECT id_anything FROM table1 JOIN table2 USING (id_tables)
Now, i have a situation which is:
If that join returns two rows from table2 i want to show the id_anything from table1 (1 row only)
and if the join from table2 returns 1 row, i want to show id_anything from table2.
Ps: id_anything from table 2 returns different values
Example data:
table1
id_tables | id_anything
1 | 1
table2
id_tables | id_anything
1 | 10
1 | 100
Return expected: 1
First, get the value you may want to return and the basis for deciding which to return together into one row.
SELECT table1.id_tables, table1.id_anything AS table1_id, MIN(table2.id_anything) AS table2_id, COUNT(*)
FROM table1 JOIN table2 USING (id_tables)
GROUP BY table1.id_tables, table1.id_anything
The aggregate function you use doesn't really matter since you'll only be using the value if there is only one.
You can then pick the relevant value:
WITH join_summary AS (
SELECT table1.id_tables, table1.id_anything AS table1_id, MIN(table2.id_anything) AS table2_id, COUNT(*) AS match_count
FROM table1 JOIN table2 USING (id_tables)
GROUP BY table1.id_tables, table1.id_anything
)
SELECT id_tables, CASE WHEN (match_count > 1) THEN table1_id ELSE table2_id END AS id_anything
FROM join_summary

Query to get row from one table, else random row from another

tblUserProfile - I have a table which holds all the Profile Info (too many fields)
tblMonthlyProfiles - Another table which has just the ProfileID in it (the idea is that this table holds 2 profileids which sometimes become monthly profiles (on selection))
Now when I need to show monthly profiles, I simply do a select from this tblMonthlyProfiles and Join with tblUserProfile to get all valid info.
If there are no rows in tblMonthlyProfile, then monthly profile section is not displayed.
Now the requirement is to ALWAYS show Monthly Profiles. If there are no rows in monthlyProfiles, it should pick up 2 random profiles from tblUserProfile. If there is only one row in monthlyProfiles, it should pick up only one random row from tblUserProfile.
What is the best way to do all this in one single query ?
I thought something like this
select top 2 * from tblUserProfile P
LEFT OUTER JOIN tblMonthlyProfiles M
on M.profileid = P.profileid
ORder by NEWID()
But this always gives me 2 random rows from tblProfile. How can I solve this ?
Try something like this:
SELECT TOP 2 Field1, Field2, Field3, FinalOrder FROM
(
select top 2 Field1, Field2, Field3, FinalOrder, '1' As FinalOrder from tblUserProfile P JOIN tblMonthlyProfiles M on M.profileid = P.profileid
UNION
select top 2 Field1, Field2, Field3, FinalOrder, '2' AS FinalOrder from tblUserProfile P LEFT OUTER JOIN tblMonthlyProfiles M on M.profileid = P.profileid ORDER BY NEWID()
)
ORDER BY FinalOrder
The idea being to pick two monthly profiles (if that many exist) and then 2 random profiles (as you correctly did) and then UNION them. You'll have between 2 and 4 records at that point. Grab the top two. FinalOrder column is an easy way to make sure that you try and get the monthly's first.
If you have control of the table structure, you might save yourself some trouble by simply adding a boolean field IsMonthlyProfile to the UserProfile table. Then it's a single table query, order by IsBoolean, NewID()
In SQL 2000+ compliant syntax you could do something like:
Select ...
From (
Select TOP 2 ...
From tblUserProfile As UP
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Order By NewId()
) As RandomProfile
Union All
Select MP....
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From (
Select TOP 1 ...
From tblUserProfile As UP
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
Order By NewId()
) As RandomProfile
Using SQL 2005+ CTE you can do:
With
TwoRandomProfiles As
(
Select TOP 2 ..., ROW_NUMBER() OVER ( ORDER BY UP.ProfileID ) As Num
From tblUserProfile As UP
Order By NewId()
)
Select MP.Col1, ...
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From TwoRandomProfiles
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Union All
Select ...
From TwoRandomProfiles
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
And Num = 1
The CTE has the advantage of only querying for the random profiles once and the use of the ROW_NUMBER() column.
Obviously, in all the UNION statements the number and type of the columns must match.