How to use 'Distinct' for just one column? - postgresql

I have a query checking the visits from some "locations" table I have. If the user signed up with a referral of "emp" or "oth", their first visit shouldn't count but the second visit and forward should count.
I'm trying to get a count of those "first visits" per location. Whenever they do a visit, I get a record on which location it was.
The problem is that my query is counting correctly, but some users have visits on different locations. So instead of just counting one visit for that location (the first one), is adding one per location where a user has done a visit.
This is my query
SELECT COUNT(DISTINCT CASE WHEN customer.ref IN ('emp', 'oth') THEN customer.id END) as visit_count, locations.name as location FROM locations
LEFT JOIN visits ON locations.location_name = visits.location_visit_name
LEFT JOIN customer ON customer.id = visits.customer_id
WHERE locations.active = true
GROUP BY locations.location_name, locations.id;
The results I'm getting are
visit_count | locations
-------------------------
7 | Loc 1
3 | Loc 2
1 | Loc 3
How it should be:
visit_count | locations
-------------------------
6 | Loc 1
2 | Loc 2
1 | Loc 3
Because 2 of these people have visits on both locations, so its counting one for each location. I think the DISTINCT is also doing it for the locations, when it should be only on the counting for the customer.id
Is there a way I can add something to my query to just grab the location for the first visit, without caring they have done other visits on other locations?

If I followed you correctly, you want to count only the first visit of each customer, spread by location.
One solution would be to use a correlated subquery in the on clause of the relevant join to filter on first customer visits. Assuming that column visit(visit_date) stores the date of each visit, you could do:
select
count(c.customer_id) visit_count,
l.name as location
from locations l
left join visits v
on l.location_name = v.location_visit_name
and v.visit_date = (
select min(v1.visit_date)
from visit v1
where v1.customer_id = v.customer_id
)
left join customer c
on c.id = v.customer_id
and c.ref in ('emp', 'oth')
where l.active = true
group by l.location_name, l.id;
Side notes:
properly fitering on the first visit per customer avoids the need for distinct in the count() aggregate function
table aliases make the query more concise and easier to understand; I recommend to use them in all queries
the filter on customer(ref) is better placed in the where clause than as a conditional count criteria

Try moving the when condition in where clause
SELECT COUNT( distinct customer.id) as visit_count
, locations.name as location
FROM locations
LEFT JOIN visits ON locations.location_name = visits.location_visit_name
LEFT JOIN customer ON customer.id = visits.customer_id
WHERE locations.active = true
AND customer.ref IN ('emp', 'oth')
GROUP BY locations.location_name;c

Related

Query users on filter applied to a one-to-many relationship table postgresql

We currently have a users table with a one-to-many relationship on a table called steps. Each user can have either four steps or seven steps. The steps table schema is as follows:
id | user_id | order | status
-----------------------------
# | # |1-7/1-4| 0 or 1
I am trying to query all of the users who have a status of 1 on all of their steps. So if they have either 4 or 7 steps, they must all have a status of 1.
I tried a join with a check on step 4 (since a step cannot be complete without the previous one being complete as well) but this has issues if someone with 7 steps completed step 4 but not 7.
select u.first_name, u.last_name, u.email, date(s.updated_at) as completed_date
from users u
join steps s on u.id = s.user_id
where s.order = 4 and s.status = 1;
The bool_and aggregate function should help you to identify the users with all their steps at status = 1 whatever the number of steps.
Then the array_agg aggregate function can help to find the updated_at date associated to the last step for each user by ordering the dates according to order DESC and selecting the first value in the resulting array [1] :
SELECT u.first_name, u.last_name, u.email
, s.completed_date
FROM users u
INNER JOIN
( SELECT user_id
, (array_agg(updated_at ORDER BY order DESC))[1] :: date as completed_date
FROM steps
GROUP BY user_id
HAVING bool_and(status :: boolean) -- filter the users with all their steps status = 1
) AS s
ON u.id = s.user_id

Union which excludes values from the first table

The origional problem I am attempting to solve is that I need to show all rows from a specific "joined" table. However these are sometimes blank with no totals and normally would not show (think categories and counts for each).
So what I am attempting to do is union to a "0 value" data set to show all categories. However when I do the union it shows a 0 value row, as well as the normal data. Here is an example..
SELECT category_name, COUNT(files_number)
FROM files
LEFT JOIN categories ON categories.category_id = files.category_id
UNION
SELECT category_name, 0
FROM categories
This will give me a result set that looks similar to this:
category_name | value
----------------------
open file | 0
open file | 23
closed file | 0
Is there any way to remove duplicate zero value entries? Please not there is also a complex WHERE clause in the actual query, so avoiding duplication on it is preferred.
I don't get why you are doing left join and union..
You can do below to remove duplicates,wrap your query and do group by
;with cte
as
(
SELECT category_name, COUNT(files_number)
FROM files
LEFT JOIN categories ON categories.category_id = files.category_id
UNION
SELECT category_name, 0
FROM categories
)
select categoryname,sum(aggcol)
from cte
group by
category
One way is to select all categories from the categories table, and LEFT JOIN onto the file counts (grouped by category_id).
SELECT c.category_name, ISNULL(fc.FileCount, 0) AS FileCount
FROM categories c
LEFT JOIN (
SELECT category_id, COUNT(files_number) AS FileCount
FROM files
GROUP BY category_id
) fc ON c.category_id = fc.category_id
Edit
If you want to reverse the query, you could do it something like this, using a RIGHT OUTER JOIN - so every category from categories table is returned, regardless of if there are any files for it:
SELECT c.category_name, COUNT(f.category_id) AS FileCount
FROM files f
RIGHT JOIN categories c ON c.category_id = f.category_id
GROUP BY c.name

Finding exact matches to a requested set of values

Hi I'm facing a challenge. There is a table progress.
User_id | Assesment_id
-----------------------
1 | Test_1
2 | Test_1
3 | Test_1
1 | Test_2
2 | Test_2
1 | Test_3
3 | Test_3
I need to pull out the user_id who have completed only Test_1 & test_2 (i.e User_id:2). The input parameters would be the list of Assesment id.
Edit:
I want those who have completed all the assessments on the list, but no others.
User 3 did not complete Test_2, and so is excluded.
User 1 completed an extra test, and is also excluded.
Only User 2 has completed exactly those assessments requested.
You don't need a complicated join or even subqueries. Simply use the INTERSECT operator:
select user_id from progress where assessment_id = 'Test_1'
intersect
select user_id from progress where assessment_id = 'Test_2'
I interpreted your question to mean that you want users who have completed all of the tests in your assessment list, but not any other tests. I'll use a technique called common table expressions so that you can follow step by step, but it is all one query statement.
Let's say you supply your assessment list as rows in a table called Checktests. We can count those values to find out how many tests are needed.
If we use a LEFT OUTER JOIN then values from the right-side table will be null. So the test_matched column will be null if an assessment is not on your list. COUNT() ignores null values, so we can use this to find out how many tests were taken that were on the list, and then compare this to the number of all tests the user took.
with x as
(select count(assessment_id) as tests_needed
from checktests
),
dtl as
(select p.user_id,
p.assessment_id as test_taken,
c.assessment_id as test_matched
from progress p
left join checktests c on p.assessment_id = c.assessment_id
),
y as
(select user_id,
count(test_taken) as all_tests,
count(test_matched) as wanted_tests -- count() ignores nulls
from dtl
group by user_id
)
select user_id
from y
join x on y.wanted_tests = x.tests_needed
where y.wanted_tests = y.all_tests ;

SSRS 2005 column chart: show series label missing when data count is zero

I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId

postgresql query with join and limit

I have a table Gift. Gift's can have many gift_images via a table association. I am trying to return a LIMITED # gifts with a certain privacy level that have at least one gift_images association.
In essence, I want to return the: gift entry with its FIRST associated gift_image (gift_image should be sorted by a position value it has, with position 1 being the FIRST). Gifts without a gift_image associated should be ignored.
This is what I have, but it's definitely not working.
SELECT gifts.* FROM gifts LEFT JOIN gift_images ON gifts.id = gift_images.gift_id WHERE gifts.privacy = 2 ORDER BY gift_images.position ASC LIMIT 10
Any help?
If you want to ignore gifts without images, you should use an INNER JOIN instead of the LEFT JOIN. In addition, for the query to be meaningful, you should select some field from gift_images in addition to fields from gift.
If all gifts that have gift images have an image for which position = 1, this query should do:
SELECT gifts.*, gift_images.*
FROM gifts
INNER JOIN gift_images
ON gifts.id = gift_images.gift_id
WHERE gifts.privacy = 2
AND gift_images.position = 1
LIMIT 10
Otherwise, you could try
SELECT gifts.*, gift_images.*
FROM gifts
INNER JOIN (SELECT gift_id, MIN(position) AS min_position
FROM gift_images
GROUP BY gift_id) AS positions
ON positions.gift_id = gifts.id
INNER JOIN gift_images
ON gift_images.gift_id = gifts.id
AND gift_images.position = positions.min_position