PostgreSQL Query not returning the proper results - postgresql

So this is my table structure
learning_paths
id
name
version
created_at
updated_at
learning_path_levels
id
name
learning_path_id
order
created_at
updated_at
learning_path_level_nodes
id
name
description
documentation_links
evaluation_methodology
learning_path_level_id
created_at
updated_at
learning_path_node_users
id
learning_path_level_node_id
user_id
evaluated_by
evaluated_at
is_successful
created_at
updated_at
I'm writing a query to retrieve the learning_path_name, count of the amount of levels each learning path has, the pending and completed nodes per level for the user, and the total amount of nodes per level.
I have the following query
select learning_paths."name",
sum(case when learning_path_node_users.is_successful and learning_path_node_users.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when learning_path_node_users.is_successful = false or learning_path_node_users.user_id is null then 1 else 0 end) as pending_nodes,
count(learning_path_levels.id) as total_levels,
count(*) as total_nodes
from learning_path_level_nodes
inner join learning_path_levels on learning_path_levels.id = learning_path_level_nodes.learning_path_level_id
inner join learning_paths on learning_paths.id = learning_path_levels.learning_path_id
left join learning_path_node_users on learning_path_node_users.learning_path_level_node_id = learning_path_level_nodes.id
group by learning_paths."name"
which returns:
name
completed_nodes
pending_nodes
total_levels
total_nodes
Devops
5
3
8
8
QA
0
1
1
1
Project manager
3
3
6
6
AI
0
5
5
5
Everything is correct, except for the levels count,
for example, for Devops,it should be 2, and it is returning 8
for Project Manager it should be 2, and it is returning 6
a pattern I see is that it returns the amount of nodes as the amount of levels,
How can I fix this?
I'd really appreciate any help or suggestions, as I've been struggling with this.
Thanks in advance
EDIT: As per your suggestion, I'm attaching a fiddle with the tables and data.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f29676ff7051686a28de96928db1e3a6

While I don't get the exact results you want, I think you want to add a distinct to your count for the total levels:
select
lp.name,
sum(case when u.is_successful and u.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when u.is_successful = false or u.user_id is null then 1 else 0 end) as pending_nodes,
count(distinct lpl.id) as total_levels, -- added "distinct"
array_agg (lpl.id) as level_detail, -- debugging aid
count(*) as total_nodes
from
learning_path_level_nodes n
join learning_path_levels lpl on lpl.id = n.learning_path_level_id
join learning_paths lp on lp.id = lpl.learning_path_id
left join learning_path_node_users u on u.learning_path_level_node_id = n.id
group by
lp.name
To help expose the rationale, I added the field level_detail, which you can delete, to show why the results are what they are. You can obviously remove that once the results are what you want.
If it's not what you expect, perhaps you can explain or give by example what I might be missing.

Related

How to include and exclude ids in once query postgresql

I use PostgreSQL 13.3
I'm trying to think how I can make include/exclude in query at the same time
I have include_system_ids [1,5] and exclude_system_ids [3]
There's one big table - records
system_records table
record
system_id
1
1
1
5
1
3
2
1
2
5
If a record contains an exclusive identifier, then it should not be included in the final selection. I had some several tries, but I didn't get a necessary result
Awaiting result: record with id 2
Fact result: 1, 2
My variants
select r.id from records r
left join (select record_id from system_records
where system_id in (1,5)
) include_ids on r.id = include_ids
left join (select record_id from system_records
where system_id not in (3)
) exclude_ids on r.id = exclude_ids.id
Honestly, I don't understand how I can do it((
Is there anyone who can help me
Maybe this query could be a solution (result here)
with x as (select record,string_agg(system_id::varchar,',') as sys_id from records group by record)
select records.*
from records,x
where records.record = x.record
and x.sys_id = '1,5'

Postgres distinct query in two columns

I want to write a postgres query. For every distinct combination of (career-id and uid) I should return the entire row which has max time.
This is the sample data
id time career_id uid content
1 100 10000 5 Abc
2 300 6 7 xyz
3 200 10000 5 wxv
4 150 6 7 hgr
Ans:
id time career_id uid content
2 300 6 7 xyz
3 200 10000 5 wxv
this can be done using distinct on () in Postgres
select distinct on (career_id, uid) *
from the_table
order by career_id, uid, "time" desc;
You can use CTE's for this. Something like this should work:
WITH cte_max_value AS (
SELECT
career_id,
uid,
max("time") as max_time
FROM mytable
GROUP BY career_id, uid
)
SELECT DISTINCT t.*
FROM mytable AS t
INNER JOIN cte_max_value AS cmv
ON t.uid = cmv.uid AND t.career_id = cmv.career_id AND t.time = cmv.max_time
The CTE gives you all the unique combinations of career_id and uid with the relevant maximum time, the inner join then joins the entire rows into this. I'm using if you get two rows with the same maximum time for the same combination of career_id and uid you will get two rows returned.
If you don't want that you will need to find a strategy to resolve this.
Edit: Also the proposed solution by a_hrose_with_name's solution is far nicer and unless you need some level of compatibility with other servers (sadly syntax varies) you should use that instead.

Query where ALL associated records have attribute value X

I have a query that counts associated records AND associated status
SELECT
orders.id,
SUM(CASE WHEN s.shipment_status='CLOSED' THEN 1 ELSE 0 END) as closed,
COUNT(*) as shipment_count
FROM orders as po
JOIN shipments as s ON s.order_id = po.id
GROUP BY po.id
I am attempting to query all orders, where all the shipments are CLOSED.
Essentially looking at the above, just returning when closed = shipment_count .
If I add an AND clause to the join then it will simply limit the number of shipments.
I figured this out with a HAVING clause, which doesn't use the select attrs.
SELECT
orders.id
FROM orders as po
JOIN shipments as s ON s.order_id = po.id
GROUP BY po.id
HAVING SUM(CASE WHEN s.shipment_status='CLOSED' THEN 1 ELSE 0 END) = COUNT(*)
Leaving my answer up in case it helps others. Maybe better answers available to come from the community.

Obtaining certain value from child table

I have two tables:
Customer (Parent)
Agreement (Child)
I need to display a certain value in my query if one of the many agreement statusID's has a certain value (e.g. written off). The join between the two tables is CustomerID.
So if a customer has 3 agreements and 2 agreements have a statusID of 1 and one has 5, I need to display a certain value. I only want to return one row in this query rather than the 3 which would occur in a typical join
Any suggestions?
select
CustomerID,
max(case when StatusId = 1 then 1 else 0 end) as HasStatus1,
max(case when StatusId = 2 then 1 else 0 end) as HasStatus2
--etc.
from Customer
left join Agreement
group by Customer.CustomerID
This will return a single row per customer due to the group by, with flags indicating if they have any agreements in each status of concern - if you're looking this up for a single CustomerID you'd obviously throw a where statement in there, and you could remove the group by as well (you'd have to remove CustomerID from the result set though of course).
Taking into account your comment, you'd want something like:
;with grouped as (
select
CustomerID,
max(case when StatusId = 1 then 1 else 0 end) as HasStatus1,
max(case when StatusId = 2 then 1 else 0 end) as HasStatus2,
max(case when StatusId = 5 then 1 else 0 end) as HasStatus5
--etc.
from Customer
left join Agreement
group by Customer.CustomerID
)
select
CustomerID,
case
when HasStatus5 = 1 then 5
when (HasStatus1 = 1 OR HasStatus2) and <no other status>) then 1
--etc.
else <Can't return StatusId here because there might be more than one... so whatever your default actually is> END as Result
from grouped

Restricting duplicate results in grouped result set without using distinct

I am attempting to create a query that returns a list of specific entity records without returning any duplicated entries from the entityID field. The query cannot use DISTINCT because the list is being passed to a reporting engine that doesn't understand result sets containing more than the entityID, and DISTINCT requires all the ORDER BY fields to be returned.
The result set cannot contain duplicate entityIDs because the reporting engine also cannot process a report for the same entity twice in the same run. I have found out the hard way that temporary tables aren't supported as well.
The entries need to be sorted in the query because the report engine only allows sorting on the entity_header level, and I need to sort based on the report.status. Thankfully the report engine honors the order in which you return the results.
The tables are as follows:
entity_header
=================================================
entityID(pk) Location active name
1 LOCATION1 0 name1
2 LOCATION1 0 name2
3 LOCATION2 0 name3
4 LOCATION3 0 name4
5 LOCATION2 1 name5
6 LOCATION2 0 name6
report
========================================================
startdate entityID(fk) status reportID(pk)
03-10-2013 1 running 1
03-12-2013 2 running 2
03-10-2013 1 stopped 3
03-10-2013 3 stopped 4
03-12-2013 4 running 5
03-10-2013 5 stopped 6
03-12-2013 6 running 7
Here is the query I've got so far, and it is almost what I need:
SELECT entity_header.entityID
FROM entity_header eh
INNER JOIN report r on r.entityID = eh.entityID
WHERE r.startdate between getdate()-7.5 and getdate()
AND eh.active = 0
AND eh.location in ('LOCATION1','LOCATION2')
AND r.status is not null
AND eh.name is not null
GROUP BY eh.entityID, r.status, eh.name
ORDER BY r.status, eh.name;
I would appreciate any advice this community can offer. I will do my best to provide any additional information required.
Here is a working sample that runs on ms SQL only.
I am using the rank() to count the number of times entityID appears in the results. Saved as list.
The list will contain an integer value of the number of times the entityID occurs.
Using where a.list = 1, filters the results.
Using ORDER BY a.ut, a.en, sorts the results. The ut and en are used to sort.
SELECT a.entityID FROM (
SELECT distinct TOP (100) PERCENT eh.entityID,
rank() over(PARTITION BY eh.entityID ORDER BY r.status, eh.name) as list,
r.status ut, eh.name en
FROM report AS r INNER JOIN entity_header as eh ON r.entityID = eh.entityID
WHERE (r.startdate BETWEEN GETDATE() - 7.5 AND GETDATE()) AND (eh.active = 0)
AND (eh.location IN ('LOCATION1', 'LOCATION2'))
ORDER BY r.status, eh.name
) AS a
where a.list = 1
ORDER BY a.ut, a.en