Postgres distinct query in two columns

Postgres distinct query in two columns - postgresql

I want to write a postgres query. For every distinct combination of (career-id and uid) I should return the entire row which has max time.
This is the sample data
id time career_id uid content
1 100 10000 5 Abc
2 300 6 7 xyz
3 200 10000 5 wxv
4 150 6 7 hgr
Ans:
id time career_id uid content
2 300 6 7 xyz
3 200 10000 5 wxv

this can be done using distinct on () in Postgres
select distinct on (career_id, uid) *
from the_table
order by career_id, uid, "time" desc;

You can use CTE's for this. Something like this should work:
WITH cte_max_value AS (
SELECT
career_id,
uid,
max("time") as max_time
FROM mytable
GROUP BY career_id, uid
)
SELECT DISTINCT t.*
FROM mytable AS t
INNER JOIN cte_max_value AS cmv
ON t.uid = cmv.uid AND t.career_id = cmv.career_id AND t.time = cmv.max_time
The CTE gives you all the unique combinations of career_id and uid with the relevant maximum time, the inner join then joins the entire rows into this. I'm using if you get two rows with the same maximum time for the same combination of career_id and uid you will get two rows returned.
If you don't want that you will need to find a strategy to resolve this.
Edit: Also the proposed solution by a_hrose_with_name's solution is far nicer and unless you need some level of compatibility with other servers (sadly syntax varies) you should use that instead.

Related

I want to select 2 data from database which durations less than 150

I have a problem with my SQL command. I want to select 2 movies which 2 movies sum of durations less than 150 I wrote this SQL command:
Select
movie_title,Sum(movie_time) as sum_movie
From
movie_movie
Group By
movie_title
Having
Sum(movie_time)<100
Order By
sum_movie DESC

You can get two movies with minimum movie_time values with order by movie_time ASC limit 2 in CTE, and then use that in the condition.
with two_min_movie as (
select *
from movie_movie
order by movie_time ASC limit 2
)
select *
from two_min_movie
where (select sum(movie_time) from two_min_movie) < 150
Demo in DBfiddle

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.

Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.

What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.

select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc

Counting Number of Users Whose Average is Greater than X in Postgres

I am trying to find out the number of users who have scored an average of 80 or higher. I am using Having in my query but it is not returning the count of number of rows.
The Schema looks like:
Results
user
test_no
question_no
score
My Query:
SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80;
I get:
user
2
4
8
(3 rows)
Instead I would like to get 3 (number of rows) as the output. If I do count("user"), I get the count of number of tests for each user.
I understand this is related to use Group By but I need it for my Having clause. Any suggestions how I can do this is appreciated.
Update: Here is some sample data: http://pastebin.com/k1nH5Wzh (-1 means unanswered)
Thanks!

The query you found is good. Some minor simplifications:
SELECT count(*) AS ct
FROM (
SELECT 1
FROM result
WHERE score > 0
GROUP BY user_id
HAVING (sum(score) / count(DISTINCT test_no)) >= 80
) sub
DISTINCT does not require parentheses.
You can SELECT a constant value in the subquery. The value is irrelevant, since you are only going to count the rows. Slightly shorter and cheaper.
Don't use the reserved word user as column name. That's asking for trouble. I am using user_id instead.

I am not sure if this is an efficient way to do it but this seems to be working.
SELECT COUNT(*) FROM
(SELECT "user" FROM results WHERE (score >0) GROUP BY "user"
HAVING (sum(score) / count(distinct(test_no))) >= 80)) q1;

Restricting duplicate results in grouped result set without using distinct

I am attempting to create a query that returns a list of specific entity records without returning any duplicated entries from the entityID field. The query cannot use DISTINCT because the list is being passed to a reporting engine that doesn't understand result sets containing more than the entityID, and DISTINCT requires all the ORDER BY fields to be returned.
The result set cannot contain duplicate entityIDs because the reporting engine also cannot process a report for the same entity twice in the same run. I have found out the hard way that temporary tables aren't supported as well.
The entries need to be sorted in the query because the report engine only allows sorting on the entity_header level, and I need to sort based on the report.status. Thankfully the report engine honors the order in which you return the results.
The tables are as follows:
entity_header
=================================================
entityID(pk) Location active name
1 LOCATION1 0 name1
2 LOCATION1 0 name2
3 LOCATION2 0 name3
4 LOCATION3 0 name4
5 LOCATION2 1 name5
6 LOCATION2 0 name6
report
========================================================
startdate entityID(fk) status reportID(pk)
03-10-2013 1 running 1
03-12-2013 2 running 2
03-10-2013 1 stopped 3
03-10-2013 3 stopped 4
03-12-2013 4 running 5
03-10-2013 5 stopped 6
03-12-2013 6 running 7
Here is the query I've got so far, and it is almost what I need:
SELECT entity_header.entityID
FROM entity_header eh
INNER JOIN report r on r.entityID = eh.entityID
WHERE r.startdate between getdate()-7.5 and getdate()
AND eh.active = 0
AND eh.location in ('LOCATION1','LOCATION2')
AND r.status is not null
AND eh.name is not null
GROUP BY eh.entityID, r.status, eh.name
ORDER BY r.status, eh.name;
I would appreciate any advice this community can offer. I will do my best to provide any additional information required.

Here is a working sample that runs on ms SQL only.
I am using the rank() to count the number of times entityID appears in the results. Saved as list.
The list will contain an integer value of the number of times the entityID occurs.
Using where a.list = 1, filters the results.
Using ORDER BY a.ut, a.en, sorts the results. The ut and en are used to sort.
SELECT a.entityID FROM (
SELECT distinct TOP (100) PERCENT eh.entityID,
rank() over(PARTITION BY eh.entityID ORDER BY r.status, eh.name) as list,
r.status ut, eh.name en
FROM report AS r INNER JOIN entity_header as eh ON r.entityID = eh.entityID
WHERE (r.startdate BETWEEN GETDATE() - 7.5 AND GETDATE()) AND (eh.active = 0)
AND (eh.location IN ('LOCATION1', 'LOCATION2'))
ORDER BY r.status, eh.name
) AS a
where a.list = 1
ORDER BY a.ut, a.en

Get total count per ID change

how can I get a total count of sheets per change of sheet
example:
select sheetID,
..
from SomeTable
results look something like this:
sheetID
-----------
1000
1000
1000
1000
3000
3000
3000
so I want something like this:
select sheetID,
count(sheetID) as TotalsheetCount
from SomeTable
I just don't know how to break the count up per change of sheetID.
So I'd end up with this essentially:
sheetID TotalsheetCount
-------- -----------
1000 4
1000 4
1000 4
1000 4
3000 3
3000 3
3000 3
so 4 is because there are 4 1000s, 3 because there are 3 3000s. I am wanting to repeat the total count for that sheetID for each row, even though it's repeating, I want to provide that.
UPDATE, here's what I did per the replies but I'm getting way too many results now as compoared to the count where I did not add that partition count before
select MainTable.sheetID,
COUNT(SomeTable.sheetID)OVER(PARTITION BY SomeTable.sheetID) AS TotalSheetCount
table2.SomeField1,
table2.SomeField1
from MainTable
join (select distinct Sales.SalesKey from SomeLongTableName_Sales) sales on sales.SheetKey = MainTable.sheetKey
left outer join Site on MainTable.SiteKey = Site.SiteKey
join Calendar on sales.Date >= Calendar.StartDate
and sales.Date < Calendar.EndDate
group by SomeTable.sheetID
the joins and stuff is more realistic to my real query but formatted for this post to hide real field and table names.

You probably want to use a GROUP BY:
SELECT sheetID, COUNT(sheetID) AS TotalsheetCount
FROM dbo.SomeTable
GROUP BY sheetID
I am wanting to repeat the total count for that sheetID for each row,
even though it's repeating, I want to provide that
If you're using at least SQL-Server 2005, you can use a CTE with COUNT + OVER-clause, otherwise use a sub-query:
WITH CTE AS
(
SELECT sheetID,
COUNT(sheetID)OVER(PARTITION BY sheetID) AS TotalsheetCount
FROM SomeTable
)
SELECT sheetID, TotalsheetCount FROM CTE

Use the GROUP BY clause in a subquery to select the counts:
SELECT sheetID,
count(sheetID) as TotalsheetCount
FROM SomeTable
GROUP BY sheetID
This would make your whole query look like this:
SELECT t.sheetID,
counts.TotalsheetCount
FROM SomeTable t,
(SELECT sheetID, count(sheetID) as TotalsheetCount FROM SomeTable GROUP BY sheetID) counts
WHERE t.sheetID = counts.sheetID

It looks like you need a group-by expression:
select sheetID,
count(*) as TotalsheetCount
from SomeTable
group by sheetID
Is that it?
DC

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse