Query table with multiple joined values - postgresql

I've created a query that joins six tables:
SELECT a.accession, b.value, c.name, d.description, e.value, f.seqlen, f.residues
FROM chado.dbxref a inner join chado.dbxrefprop b on a.dbxref_id = b.dbxref_id
inner join chado.biomaterial d on b.dbxref_id = d.dbxref_id
inner join chado.feature f on d.dbxref_id = f.dbxref_id
inner join chado.biomaterialprop e on d.biomaterial_id = e.biomaterial_id
inner join chado.contact c on d.biosourceprovider_id = c.contact_id;
The output:
I'm currently working with a PostgreSQL schema called Chado (http://gmod.org/wiki/Chado_Tables). My attempts to comply with the preexisting schema have led me to deposit multiple joined values within the same table (two different values within the dbxrefprop table, three different values within the biomaterialprop table). Querying the database results in a substantial amount of redundant output. Is there a way for me to reduce output redundancy by modifying my query statement? Ideally, I'd like the output to resemble the following:
test001 | GB0101 | source011 | Faaberg,K.; Lyoo,K.; Korol,D.M. | serum | T1 | Iowa, USA | 01 Jan 2005 | 1234 | AUGAACGCCUUGCAUUACUAUGACUAUGAUU

Working query statement:
SELECT a.accession, string_agg(distinct b.value, ' | ' ORDER BY b.value) AS bvalue_list, c.name, d.description, string_agg(distinct e.value, ' | ' ORDER BY e.value) AS evalue_list, f.seqlen, f.residues
FROM chado.dbxref a INNER JOIN chado.dbxrefprop b ON a.dbxref_id = b.dbxref_id
INNER JOIN chado.biomaterial d ON b.dbxref_id = d.dbxref_id
INNER JOIN chado.feature f ON d.dbxref_id = f.dbxref_id
INNER JOIN chado.biomaterialprop e ON d.biomaterial_id = e.biomaterial_id
INNER JOIN chado.contact c ON d.biosourceprovider_id = c.contact_id
GROUP BY a.accession, c.name, d.description, f.seqlen, f.residues;

Related

Find a difference between 2 tables

I want to check that the poi_equipement table (relationship table) corresponds to the data in the data table (i.e. a two-way check)
https://dbfiddle.uk/gFMjbIpX
detect that wc (in poi_equipement) is extra (because it is not present in the data table) and that hotel is not in poi_equipement so it is absent compared to the data table
I don't understand why with the raquĂȘte except he just answers me hotel.
I want him to answer me hotel and wc.
select object from data where subject = 'url1'
except
select subject from poi_equipement inner join equipement on poi_equipement.equipement_id = equipement.id;
ideally I want to know when I have a difference in poi_equipement, in data or in the 2 tables
A full outer join will do
with params as (
select 'url1' as subject),
data_object as (
select d.object
from data d
join params prm
on d.subject = prm.subject),
equipment_subject as (
select e.subject
from poi_equipement pe
join poi p
on pe.poi_id = p.id
join equipement e
on pe.equipement_id = e.id
join params prm
on p.id_url = prm.subject)
select d.object as data,
e.subject as poi_equipment
from data_object d
full outer
join equipment_subject e
on d.object = e.subject
where d.object is null
or e.subject is null;
Result:
data |poi_equipment|
-----+-------------+
hotel| |
|wc |
You can remove where clause if you need to see which item is in both places.

SQL check value from another table

I've got multiple tables
I made my query like this :
SELECT a.creation, b.caseno, c.instanceno
FROM TableB b
JOIN TableA a
ON a.caseno = b.caseno
JOIN TableC c
ON c.caseno = b.caseno
WHERE a.creation BETWEEN '2021-01-01' AND '2021-12-31'
I've got TableD who contains the following column
| InstanceNo | Position | Creation | TaskNo |
The idea is to add a new colum (result) on my query.
If instance from c.instanceno exist on tableD and taskno is 30 or 20, in that case i would like the d.creation but for the max(position).
If not the value null is enough for the column result.
SELECT a.creation, b.caseno, c.instanceno, d.creation
FROM TableB b
JOIN TableA a
ON a.caseno = b.caseno
JOIN TableC c
ON c.caseno = b.caseno
LEFT JOIN (SELECT MAX(position) position, instanceno, creation, taskno FROM TableD GROUP BY instanceno, creation, taskno) d
ON d.instanceno = c.instanceno
AND d.taskno in (20,30)
WHERE a.creation BETWEEN '2021-01-01' AND '2021-12-31'

POSTGRESQL : Combining two query result with different columns but same number of rows

Im currently trying to combine 2 queries results in one.
These 2 queries has the same number of rows and are group by the same field but has different column.
This works :
SELECT Distinct
MAX(d.libelle) AS libelle_dpt,
MAX(d.code_dpt) AS code_dep,
MAX(r.libelle) AS libelle_region,
MAX(pd.pop_dep) AS nb_habitants,MAX(ls.nb_canton) AS nb_canton
FROM election_2015.commune c
LEFT JOIN (
SELECT MAX(c.code_canton) AS code_canton,Count(distinct c.code_canton) AS nb_canton
FROM election_2015.commune co
JOIN election_2015.departement d
ON co.code_dpt = d.code_dpt
JOIN election_2015.canton c
ON c.code_canton = co.code_canton
GROUP BY d.code_dpt
ORDER BY d.code_dpt ASC
) ls
ON ls.code_canton = c.code_canton
JOIN election_2015.departement d
ON d.code_dpt = c.code_dpt
JOIN election_2015.region r
ON d.code_region = r.code_region
JOIN election.popgent_all pd
ON pd.dep = d.code_dpt
GROUP BY d.code_dpt
But I was wondering if there is an other way to do this, maybe like an union but with rows?
Something like this (not working cause queries hasn't the same number of columns) :
SELECT Distinct
MAX(d.libelle) AS libelle_dpt,
MAX(d.id) AS id_dep,MAX(d.code_dpt) AS code_dep,
MAX(r.libelle) AS libelle_region,
MAX(pd.pop_dep) AS nb_habitants
FROM election_2015.commune c
LEFT JOIN election_2015.departement d
ON d.code_dpt = c.code_dpt
LEFT JOIN election_2015.region r
ON d.code_region = r.code_region
LEFT JOIN election.popgent_all pd
ON pd.dep = d.code_dpt
GROUP BY d.code_dpt
UNION
SELECT Count(distinct c.code_canton) AS nb_canton FROM election_2015.commune co
JOIN election_2015.departement d
ON co.code_dpt = d.code_dpt
JOIN election_2015.canton c
ON c.code_canton = co.code_canton
GROUP BY d.code_dpt
ORDER BY election_2015.departement.code_dpt ASC
Thanks for any help.
Alexandre

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

Update using left join in netezza

I need to perform a left join of two tables in netezza during an update. How can i achieve this ? Left join with three tables are working but not with two tables.
UPDATE table_1
SET c2 = t2.c2
FROM
table_1 t1
LEFT JOIN table_2.t1
ON t1.c1=t2.c1
LEFT JOIN table_3 t3
ON t2.c1=t3.c1
this works but
UPDATE table_1
SET c2 = t2.c2
FROM table_1 t1
LEFT JOIN table_2.t1
ON t1.c1=t2.c1
this says like trying to update multiple columns.
Thanks,
Manirathinam.
When performing an UPDATE TABLE with a join in Netezza, it's important to understand that the table being updated is always implicitly INNER JOINed with the FROM list. This behavior is documented here.
Your code is actually joining table_1 to itself (one copy with no alias, and one with t1 as an alias). Since there is no join criteria between those two versions of table_1, you are getting a cross join which is providing multiple rows that are trying to update table_1.
The best way to tackle an UPDATE with an OUTER join is to employ a subselect like this:
TESTDB.ADMIN(ADMIN)=> select * from table_1 order by c1;
C1 | C2
----+----
1 | 1
2 | 2
3 | 3
(3 rows)
TESTDB.ADMIN(ADMIN)=> select * from table_2 order by c1;
C1 | C2
----+----
1 | 10
3 | 30
(2 rows)
TESTDB.ADMIN(ADMIN)=> UPDATE table_1 t1
SET t1.c2 = foo.c2
FROM (
SELECT t1a.c1,
t2.c2
FROM table_1 t1a
LEFT JOIN table_2 t2
ON t1a.c1 = t2.c1
)
foo
WHERE t1.c1 = foo.c1;
UPDATE 3
TESTDB.ADMIN(ADMIN)=> select * from table_1 order by c1;
C1 | C2
----+----
1 | 10
2 |
3 | 30
(3 rows)