Finding exact matches to a requested set of values

Finding exact matches to a requested set of values - db2

Hi I'm facing a challenge. There is a table progress.
User_id | Assesment_id
-----------------------
1 | Test_1
2 | Test_1
3 | Test_1
1 | Test_2
2 | Test_2
1 | Test_3
3 | Test_3
I need to pull out the user_id who have completed only Test_1 & test_2 (i.e User_id:2). The input parameters would be the list of Assesment id.
Edit:
I want those who have completed all the assessments on the list, but no others.
User 3 did not complete Test_2, and so is excluded.
User 1 completed an extra test, and is also excluded.
Only User 2 has completed exactly those assessments requested.

You don't need a complicated join or even subqueries. Simply use the INTERSECT operator:
select user_id from progress where assessment_id = 'Test_1'
intersect
select user_id from progress where assessment_id = 'Test_2'

I interpreted your question to mean that you want users who have completed all of the tests in your assessment list, but not any other tests. I'll use a technique called common table expressions so that you can follow step by step, but it is all one query statement.
Let's say you supply your assessment list as rows in a table called Checktests. We can count those values to find out how many tests are needed.
If we use a LEFT OUTER JOIN then values from the right-side table will be null. So the test_matched column will be null if an assessment is not on your list. COUNT() ignores null values, so we can use this to find out how many tests were taken that were on the list, and then compare this to the number of all tests the user took.
with x as
(select count(assessment_id) as tests_needed
from checktests
),
dtl as
(select p.user_id,
p.assessment_id as test_taken,
c.assessment_id as test_matched
from progress p
left join checktests c on p.assessment_id = c.assessment_id
),
y as
(select user_id,
count(test_taken) as all_tests,
count(test_matched) as wanted_tests -- count() ignores nulls
from dtl
group by user_id
)
select user_id
from y
join x on y.wanted_tests = x.tests_needed
where y.wanted_tests = y.all_tests ;

Related

Query users on filter applied to a one-to-many relationship table postgresql

We currently have a users table with a one-to-many relationship on a table called steps. Each user can have either four steps or seven steps. The steps table schema is as follows:
id | user_id | order | status
-----------------------------
# | # |1-7/1-4| 0 or 1
I am trying to query all of the users who have a status of 1 on all of their steps. So if they have either 4 or 7 steps, they must all have a status of 1.
I tried a join with a check on step 4 (since a step cannot be complete without the previous one being complete as well) but this has issues if someone with 7 steps completed step 4 but not 7.
select u.first_name, u.last_name, u.email, date(s.updated_at) as completed_date
from users u
join steps s on u.id = s.user_id
where s.order = 4 and s.status = 1;

The bool_and aggregate function should help you to identify the users with all their steps at status = 1 whatever the number of steps.
Then the array_agg aggregate function can help to find the updated_at date associated to the last step for each user by ordering the dates according to order DESC and selecting the first value in the resulting array [1] :
SELECT u.first_name, u.last_name, u.email
, s.completed_date
FROM users u
INNER JOIN
( SELECT user_id
, (array_agg(updated_at ORDER BY order DESC))[1] :: date as completed_date
FROM steps
GROUP BY user_id
HAVING bool_and(status :: boolean) -- filter the users with all their steps status = 1
) AS s
ON u.id = s.user_id

How to sum children occurrences from a joining table in Postgres?

I need to count how many consultants are using a skill through a joining table (consultant_skills), and the challenge is to sum the children occurrences to the parents recursively.
Here's the reproduction of what I'm trying to accomplish. The current results are:
skill_id | count
2 | 2
3 | 1
5 | 1
6 | 1
But I need to compute the count to the parents recursively, where the expected result would be:
skill_id | count
1 | 2
2 | 2
3 | 1
4 | 2
5 | 2
6 | 1
Does anyone know how can I do that?

Sqlfiddle Solution
You need to use WITH RECURSIVE, as the Mike suggests. His answer is useful, especially in reference to using distinct to eliminate redundant counts for consultants, but it doesn't drive to the exact results you're looking for.
See the working solution in the sqlfiddle above. I believe this is what you are looking for:
WITH RECURSIVE results(skill_id, parent_id, consultant_id)
AS (
SELECT skills.id as skill_id, parent_id, consultant_id
FROM consultant_skills
JOIN skills on skill_id = skills.id
UNION ALL
SELECT skills.id as skill_id, skills.parent_id as parent_id, consultant_id
FROM results
JOIN skills on results.parent_id = skills.id
)
SELECT skill_id, count(distinct consultant_id) from results
GROUP BY skill_id
ORDER BY skill_id
What is happening in the query below the UNION ALL is that we're recursively joining the skills table to itself, but rotating in the previous parent id as the new skill id, and using the new parent id on each iteration. The recursion stops because eventually the parent id is NULL and there is no JOIN because it's an INNER join. Hope that makes sense.

Getting NULL values in JOINED table with LIMIT

There are many similar questions which I've learned from, but my result set isn't returning the expected results.
My Objective:
Build a query that will return a result set containing all rows in table demo1 with user_id = "admin", and the only row of table demo2 with user_id = "admin". Each row in demo2 has a unique user_id so there's always only one row with "admin" as user_id.
However, I don't want demo2 data to wastefully repeat on every subsequent row of demo1. I only want the first row of the result set to contain demo2 data as non-null values. Null values for demo2 columns should only be returned for rows 2+ in the result set.
Current Status:
Right now my query is returning the appropriate columns (all demo1 and all demo2) but
all the data returned from demo2 is null.
Demo1:
id user_id product quantity warehouse
1 admin phone 3 A
2 admin desk 1 D
3 k45 chair 5 B
Demo2:
id user_id employee job country
1 admin james tech usa
2 c39 cindy tech spain
Query:
SELECT *
from demo1
left join (SELECT * FROM demo2 WHERE demo2.user_id = 'X' LIMIT 1) X
on (demo1.user_id = x.user_id)
WHERE demo1.user_id = 'admin'
Rationale:
The subquery's LIMIT 1 was my attempt to retrieve demo2 values for row 1 only, thinking the rest would be null. Instead, all values are null.
Current Result:
id user_id product quantity warehouse id employee job country
1 admin phone 3 A null null null null
2 admin desk 1 D null null null null
Desired Result:
id user_id product quantity warehouse id employee job country
1 admin phone 3 A 1 james tech usa
2 admin desk 1 D null null null null
I've tried substituting left join for left inner join, right join, full join, but nothing returns the desired result.

Your join is going to bring through ANY records that satisfies the join condition for your two tables. There is no changing that.
But you could suppress subsequent records in your result set from displaying the matching demo2 record that satisfied the join condition AFTER it's joined:
SELECT demo1.id ,
demo1.user_id,
demo1.product,
demo1.quantity,
demo1.warehouse
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.id END as demo2_id,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.employee END AS demo2_employee,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.job END as demo2_job,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.country END as demo2_country
from demo1
left join demo2
on demo1.user_id = demo2.user_id
AND demo2.user_id = 'X'
WHERE demo1.user_id = 'admin'
That's just a quick rewrite of your original sql with the addition CASE expressions included.
That being said, this sql will produce no results for demo2 since the demo2.user_id can't satisfy both conditions in this query:
The join condition demo1.user_id = demo2.user_id with the where predicate of demo1.user_id = 'admin'
Also hold the value X.
It's either admin and satisfies your first join condition, but fails your second. Or it's X and satisfies your second condition, but nor your first.

Here is another nice approach:
sqlfiddle

Does String Value Exists in a List of Strings | Redshift Query

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,

this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number

The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

How to find the last descendant (that matches other criteria) in a linear “ancestor-descendant” relationship

This question is based on the following question, but with an additional requirement: PostgreSQL: How to find the last descendant in a linear "ancestor-descendant" relationship
Basically, what I need is a Postgre-SQL statement that finds the last descendant in a linear “ancestor-descendant” relationship that matches additional criteria.
Example:
Here the content of table "RELATIONSHIP_TABLE":
id | id_ancestor | id_entry | bool_flag
---------------------------------------
1 | null | a | false
2 | 1 | a | false
3 | 2 | a | true
4 | 3 | a | false
5 | null | b | true
6 | null | c | false
7 | 6 | c | false
Every record within a particular hierarchy has the same "id_entry"
There are 3 different “ancestor-descendant” relationships in this example:
1. 1 <- 2 <- 3 <- 4
2. 5
3. 6 <- 7
Question PostgreSQL: How to find the last descendant in a linear "ancestor-descendant" relationship shows how to find the last record of each relationship. In the example above:
1. 4
2. 5
3. 7
So, what I need this time is the last descendant by "id_entry" whose "bool_flag" is set to true. In the example above:
1. 3
2. 5
3. <empty result>
Does anyone know a solution?
Thanks in advance :)
QStormDS

Graphs, trees, chains, etc represented as edge lists are usually good uses for recursive common table expressions - i.e. WITH RECURSIVE queries.
Something like:
WITH RECURSIVE walk(id, id_ancestor, id_entry, bool_flag, id_root, generation) AS (
SELECT id, id_ancestor, id_entry, bool_flag, id, 0
FROM RELATIONSHIP_TABLE
WHERE id_ancestor IS NULL
UNION ALL
SELECT x.id, x.id_ancestor, x.id_entry, x.bool_flag, walk.id_root, walk.generation + 1
FROM RELATIONSHIP_TABLE x INNER JOIN walk ON x.id_ancestor = walk.id
)
SELECT
id_entry, id_root, id
FROM (
SELECT
id, id_entry, bool_flag, id_root, generation,
max(CASE WHEN bool_flag THEN generation END ) OVER w as max_enabled_generation
FROM walk
WINDOW w AS (PARTITION BY id_root ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
) x
WHERE generation = max_enabled_generation;
... though it feels like there really should be a better way to do this than tracking how many generations we've walked down each path.
If id_entry is common for all members of a tree, you can avoid needing to track id_root. You should create a UNIQUE constraint on (id_entry, id) and a foreign key constraint on FOREIGN KEY (id_entry, id_ancestor) REFERENCES (id_entry, id) to make sure that the ordering is consistent, then use:
WITH RECURSIVE walk(id, id_ancestor, id_entry, bool_flag, generation) AS (
SELECT id, id_ancestor, id_entry, bool_flag, 0
FROM RELATIONSHIP_TABLE
WHERE id_ancestor IS NULL
UNION ALL
SELECT x.id, x.id_ancestor, x.id_entry, x.bool_flag, walk.generation + 1
FROM RELATIONSHIP_TABLE x INNER JOIN walk ON x.id_ancestor = walk.id
)
SELECT
id_entry, id
FROM (
SELECT
id, id_entry, bool_flag, generation,
max(CASE WHEN bool_flag THEN generation END ) OVER w as max_enabled_generation
FROM walk
WINDOW w AS (PARTITION BY id_entry ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
) x
WHERE generation = max_enabled_generation;
Since this gives you a table of final descendents matched up with root parents, you can just filter with a regular WHERE clause now, just append AND bool_flag. If you instead want to exclude chains that have bool_flag set to false at any point along the way, you can add WHERE bool_value in the RECURSIVE query's join.
SQLFiddle example: http://sqlfiddle.com/#!12/92a64/3

WITH RECURSIVE tail AS (
SELECT id AS opa
, id, bool_flag FROM boolshit
WHERE bool_flag = True
UNION ALL
SELECT t.opa AS opa
, b.id, b.bool_flag FROM boolshit b
JOIN tail t ON b.id_ancestor = t.id
)
SELECT *
FROM boolshit bs
WHERE bs.bool_flag = True
AND NOT EXISTS (
SELECT * FROM tail t
WHERE t.opa = bs.id
AND t.id <> bs.id
AND t.bool_flag = True
);
Explanation: select all records that have the bool_flag set,
EXCEPT those that have offspring (direct or indirect) that have the bool_flag set, too. This effectively picks the last record of the chain that has the flag set.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Finding exact matches to a requested set of values - db2

You don't need a complicated join or even subqueries. Simply use the INTERSECT operator: select user_id from progress where assessment_id = 'Test_1' intersect select user_id from progress where assessment_id = 'Test_2'

Related

Query users on filter applied to a one-to-many relationship table postgresql

How to sum children occurrences from a joining table in Postgres?

Getting NULL values in JOINED table with LIMIT

Does String Value Exists in a List of Strings | Redshift Query

How to find the last descendant (that matches other criteria) in a linear “ancestor-descendant” relationship

Categories

Resources