Restricting duplicate results in grouped result set without using distinct - group-by

I am attempting to create a query that returns a list of specific entity records without returning any duplicated entries from the entityID field. The query cannot use DISTINCT because the list is being passed to a reporting engine that doesn't understand result sets containing more than the entityID, and DISTINCT requires all the ORDER BY fields to be returned.
The result set cannot contain duplicate entityIDs because the reporting engine also cannot process a report for the same entity twice in the same run. I have found out the hard way that temporary tables aren't supported as well.
The entries need to be sorted in the query because the report engine only allows sorting on the entity_header level, and I need to sort based on the report.status. Thankfully the report engine honors the order in which you return the results.
The tables are as follows:
entity_header
=================================================
entityID(pk) Location active name
1 LOCATION1 0 name1
2 LOCATION1 0 name2
3 LOCATION2 0 name3
4 LOCATION3 0 name4
5 LOCATION2 1 name5
6 LOCATION2 0 name6
report
========================================================
startdate entityID(fk) status reportID(pk)
03-10-2013 1 running 1
03-12-2013 2 running 2
03-10-2013 1 stopped 3
03-10-2013 3 stopped 4
03-12-2013 4 running 5
03-10-2013 5 stopped 6
03-12-2013 6 running 7
Here is the query I've got so far, and it is almost what I need:
SELECT entity_header.entityID
FROM entity_header eh
INNER JOIN report r on r.entityID = eh.entityID
WHERE r.startdate between getdate()-7.5 and getdate()
AND eh.active = 0
AND eh.location in ('LOCATION1','LOCATION2')
AND r.status is not null
AND eh.name is not null
GROUP BY eh.entityID, r.status, eh.name
ORDER BY r.status, eh.name;
I would appreciate any advice this community can offer. I will do my best to provide any additional information required.

Here is a working sample that runs on ms SQL only.
I am using the rank() to count the number of times entityID appears in the results. Saved as list.
The list will contain an integer value of the number of times the entityID occurs.
Using where a.list = 1, filters the results.
Using ORDER BY a.ut, a.en, sorts the results. The ut and en are used to sort.
SELECT a.entityID FROM (
SELECT distinct TOP (100) PERCENT eh.entityID,
rank() over(PARTITION BY eh.entityID ORDER BY r.status, eh.name) as list,
r.status ut, eh.name en
FROM report AS r INNER JOIN entity_header as eh ON r.entityID = eh.entityID
WHERE (r.startdate BETWEEN GETDATE() - 7.5 AND GETDATE()) AND (eh.active = 0)
AND (eh.location IN ('LOCATION1', 'LOCATION2'))
ORDER BY r.status, eh.name
) AS a
where a.list = 1
ORDER BY a.ut, a.en

Related

PostgreSQL Query not returning the proper results

So this is my table structure
learning_paths
id
name
version
created_at
updated_at
learning_path_levels
id
name
learning_path_id
order
created_at
updated_at
learning_path_level_nodes
id
name
description
documentation_links
evaluation_methodology
learning_path_level_id
created_at
updated_at
learning_path_node_users
id
learning_path_level_node_id
user_id
evaluated_by
evaluated_at
is_successful
created_at
updated_at
I'm writing a query to retrieve the learning_path_name, count of the amount of levels each learning path has, the pending and completed nodes per level for the user, and the total amount of nodes per level.
I have the following query
select learning_paths."name",
sum(case when learning_path_node_users.is_successful and learning_path_node_users.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when learning_path_node_users.is_successful = false or learning_path_node_users.user_id is null then 1 else 0 end) as pending_nodes,
count(learning_path_levels.id) as total_levels,
count(*) as total_nodes
from learning_path_level_nodes
inner join learning_path_levels on learning_path_levels.id = learning_path_level_nodes.learning_path_level_id
inner join learning_paths on learning_paths.id = learning_path_levels.learning_path_id
left join learning_path_node_users on learning_path_node_users.learning_path_level_node_id = learning_path_level_nodes.id
group by learning_paths."name"
which returns:
name
completed_nodes
pending_nodes
total_levels
total_nodes
Devops
5
3
8
8
QA
0
1
1
1
Project manager
3
3
6
6
AI
0
5
5
5
Everything is correct, except for the levels count,
for example, for Devops,it should be 2, and it is returning 8
for Project Manager it should be 2, and it is returning 6
a pattern I see is that it returns the amount of nodes as the amount of levels,
How can I fix this?
I'd really appreciate any help or suggestions, as I've been struggling with this.
Thanks in advance
EDIT: As per your suggestion, I'm attaching a fiddle with the tables and data.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f29676ff7051686a28de96928db1e3a6
While I don't get the exact results you want, I think you want to add a distinct to your count for the total levels:
select
lp.name,
sum(case when u.is_successful and u.user_id is not null then 1 else 0 end) as completed_nodes,
sum(case when u.is_successful = false or u.user_id is null then 1 else 0 end) as pending_nodes,
count(distinct lpl.id) as total_levels, -- added "distinct"
array_agg (lpl.id) as level_detail, -- debugging aid
count(*) as total_nodes
from
learning_path_level_nodes n
join learning_path_levels lpl on lpl.id = n.learning_path_level_id
join learning_paths lp on lp.id = lpl.learning_path_id
left join learning_path_node_users u on u.learning_path_level_node_id = n.id
group by
lp.name
To help expose the rationale, I added the field level_detail, which you can delete, to show why the results are what they are. You can obviously remove that once the results are what you want.
If it's not what you expect, perhaps you can explain or give by example what I might be missing.

join 2 tables with different dates into one date column

I have two tables: a_table and b_table. They contain closing records and checkout records, that for each customer can be performed on different dates. I would like to combine these 2 tables together, so that there is only one date field, one customer field, one close and one check field.
a_table
time_modified customer_name
2021-05-03 Ben
2021-05-08 Ben
2021-07-10 Jerry
b_table
time_modified account_id
2021-05-06 Ben
2021-07-08 Jerry
2021-07-12 Jerry
Expected result
date account_id_a close check
2021-05-03 Ben 1 0
2021-05-06 Ben 0 1
2021-05-08 Ben 1 0
2021-07-08 Jerry 0 1
2021-07-10 Jerry 1 1
2021-07-12 Jerry 0 1
The query so far:
with a_table as (
select rz.time_modified::date, rz.customer_name,
case when rz.time_modified::date is not null then 1 else 0 end as close
from schema.rz
),
b_table as (
select bo.time_modified::date, bo.customer_name,
case when bo.time_modified::date is not null then 1 else 0 end as check
from schema.bo
)
SELECT (CURRENT_DATE::TIMESTAMP - (i * interval '1 day'))::date as date,
a.*, b.*
FROM generate_series(1,2847) i
left join a_table a
on a.time_modified = i.date
left join b_table b
on b.time_modified = i.date
The query above returns:
SQL Error [500310] [0A000]: [Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
you just need to do a union rather than a join.
Join merges two tables into one where union adds the second table to the first
First off the error you are getting is due to the use of the generate_series() function in a query where its results need to be combined with table data. Generate_series() is a lead-node-only function and its results cannot be used on compute nodes. You will need to generate the number series you desire in another way. See How to Generate Date Series in Redshift for possible ways to do this.
I'm not sure I follow your query entirely but it seems like you want to UNION the tables and not JOIN them. You haven't defined what rz and bo are so it is a bit confusing. However UNION and some calculation for close and check seems like the way to go

Postgres distinct query in two columns

I want to write a postgres query. For every distinct combination of (career-id and uid) I should return the entire row which has max time.
This is the sample data
id time career_id uid content
1 100 10000 5 Abc
2 300 6 7 xyz
3 200 10000 5 wxv
4 150 6 7 hgr
Ans:
id time career_id uid content
2 300 6 7 xyz
3 200 10000 5 wxv
this can be done using distinct on () in Postgres
select distinct on (career_id, uid) *
from the_table
order by career_id, uid, "time" desc;
You can use CTE's for this. Something like this should work:
WITH cte_max_value AS (
SELECT
career_id,
uid,
max("time") as max_time
FROM mytable
GROUP BY career_id, uid
)
SELECT DISTINCT t.*
FROM mytable AS t
INNER JOIN cte_max_value AS cmv
ON t.uid = cmv.uid AND t.career_id = cmv.career_id AND t.time = cmv.max_time
The CTE gives you all the unique combinations of career_id and uid with the relevant maximum time, the inner join then joins the entire rows into this. I'm using if you get two rows with the same maximum time for the same combination of career_id and uid you will get two rows returned.
If you don't want that you will need to find a strategy to resolve this.
Edit: Also the proposed solution by a_hrose_with_name's solution is far nicer and unless you need some level of compatibility with other servers (sadly syntax varies) you should use that instead.

DB2 - update increment based on timestamp

After a complex operation (some database merge-ing) I have a table that needs to be updated based on timestamp.
JobsTable
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 1
123 3 B 1
124 4 B 1
125 5 A 2
The point is to Update the RunNumber column incrementally for each resource based on timestamp. So in the end the expected result is:
Id Time_stamp Resource RunNumber
121 1 A 1
122 2 A 2 //changed
123 3 B 1
124 4 B 2 //changed
125 5 A 3 //changed
I tried doing this in multiple ways. Since DB2 update does not support Join or With statements I tried something like:
update JOBSTABLE JT
SET RunNumber =
(SELECT RunNumber
FROM (Select ID, ROW_NUMBER() OVER (ORDER BY TIME_STAMP ) RunNumber from JobsTable, ORDER BY TIME_STAMP) AS AAA
WHERE AAA.ID = JT.ID)
WHERE ID = ?
Error:
Assignment of a NULL value to a NOT NULL column "TBSPACEID=6, TABLEID=16, COLNO=2" is not allowed.. SQLCODE=-407, SQLSTATE=23502, DRIVER=3.64.82 SQL Code: -407, SQL State: 23502
Is this even possible? (I am aiming at doing this operation in a single query rather than using Cursors, etc..)
Thank you
Firstly, your subselect has a syntax error, which tells me it's not the exact statement that you are trying to run. The error message is pretty clear -- in your actual statement the subselect sometimes returns NULL.
Secondly, you should probably be numbering rows within a partition by resource.
Thirdly, you could probably do with a single subselect anyway -- this is based on the statement you published:
update JOBSTABLE JT
SET RunNumber =
(SELECT ROW_NUMBER() OVER (partition by resource ORDER BY TIME_STAMP )
from JobsTable where id = JT.ID)

SSRS 2005 column chart: show series label missing when data count is zero

I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId