Postgresql - Select distinct on limit 100 - postgresql

There's probably a simple solution to this. However my question is how do i go about selecting say 100 or 1000 distinct avclassfamily values in the following query?
Ideally there would be a command that'd be something like 'select distinct ON (1000) (avclassfamily)' but there isn't.
cursor.execute("select count(*),date_trunc( 'year', first_seen ) from (select DISTINCT ON (avclassfamily) * from malwarehashesandstrings) as p where first_seen is not null and behaviouralbinary is true and origindataset != 'MalRec' and origindataset != 'Ember Benign' and origindataset is not null group by date_trunc( 'year', first_seen );")
Please let me know if you have a solution or if there's anytthing i can clarify
Edit: As requested, a simpler version of the query:
select DISTINCT (avclassfamily) from malwarehashesandstrings;
But say if it was possible to select 100 distinct values there

Related

PostgreSQL: aggregate expression on a subquery

A total beginner's question: I wanted to run a sub-query with GROUP BY statement, and then find out a row with maximum value in the result. I have built an expression like that below:
SELECT agg.facid, agg.Slots
FROM
(SELECT facid AS facid, SUM(slots) AS Slots FROM cd.bookings
GROUP BY facid
ORDER BY SUM(slots) DESC) AS agg
WHERE agg.Slots = (SELECT MAX(Slots) FROM agg);
In my mind, this should first create a 2-column table with facid and SUM(slots) values, and then by addressing these columns as agg.facid and agg.Slots I should get only the row with max value in "Slots". However, instead I am getting this error:
ERROR: relation "agg" does not exist
LINE 6: WHERE agg.Slots = (SELECT MAX(Slots) FROM agg);
This is probably something very simple, so I am sorry in advance for a silly problem ;)
I am working on PostgreSQL 10, with pgAdmin 4.
Use a Common Table Expression:
WITH agg AS (
SELECT facid AS facid, SUM(slots) AS Slots
FROM cd.bookings
GROUP BY facid
)
SELECT agg.facid, agg.Slots
FROM agg
WHERE agg.Slots = (SELECT MAX(Slots) FROM agg);
So a bit more of a research, and I figured a solution which might be clean enough to my liking, using a Common Table Expression:
WITH sum AS (SELECT facid, SUM(slots) AS Slots FROM cd.bookings GROUP BY facid)
SELECT facid, Slots
FROM sum
WHERE Slots = (SELECT MAX(Slots) FROM sum);
The first line declares a CTE, which can later be called for a sub-query calculating what is the max value in aggregated slots column.
Hope it helps anyone interested.
Does this do what you are looking for?
SELECT
facid,
SUM(slots)
FROM cd.bookings
GROUP BY
facid
HAVING SUM(slots) = MAX(slots)

How to divide counts on a single table?

This is Postgres 8.x, specifically Redshift
I have a table that I'm querying to return a single value, which is the result of a simple division operation. Table's grain looks along the likes of user_id | campaign_title.
Division operation is like the count of rows where campaign_name is ilike %completed% divided by count of distinct user_ids.
So I have the numerator and denominator queries all written out, but I'm honestly confused how to combine them.
Numerator:
select count(*) as num_completed
from public.reward
where campaign_title ilike '%completion%'
;
Denominator:
select count(distinct(user_id))
from public.reward
The straightforward solution, just divide one by the other:
select (select count(*) as num_completed
from public.reward
where campaign_title ilike '%completion%')
/
(select count(distinct user_id) from public.reward);
The slightly more complicated but faster solution:
select count(case when campaign_title ilike '%completion%' then 1 end)
/
count(distinct user_id)
from public.reward;
The expression count(case when campaign_title ilike '%completion%' then 1 end) will only count rows that meet the condition specified in the when clause.
Unrelated but:
distinct is not a function. Writing distinct(user_id) is useless. And - in the case of Postgres - it can actually get you into trouble if you keep thinking of distinct as a function, because the expression (column_one, column_2) is something different in Postgres than the list of columns: column_one, column_2

Simple SELECT, but adding JOIN returns too many rows

The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)

Two different group by clauses in one query?

First time posting here, a newbie to SQl, and I'm not exactly sure how to word this but I'll try my best.
I have a query:
select report_month, employee_id, split_bonus,sum(salary) FROM empsal
where report_month IN('2010-12-01','2010-11-01','2010-07-01','2010-04-01','2010-09-01','2010-10-01','2010-08-01')
AND employee_id IN('100','101','102','103','104','105','106','107')
group by report_month, employee_id, split_bonus;
Now, to the result of this query, I want to add a new column split_bonus_cumulative that is essentially equivalent to adding a sum(split_bonus) in the select clause but for this case, the group buy should only have report_month and employee_id.
Can anyone show me how to do this with a single query? Thanks in advance.
Try:
SELECT
report_month,
employee_id,
SUM(split_bonus),
SUM(salary)
FROM
empsal
WHERE
report_month IN('2010-12-01','2010-11-01','2010-07-01','2010-04-01','2010-09-01','2010-10-01','2010-08-01')
AND
employee_id IN('100','101','102','103','104','105','106','107')
GROUP BY
report_month,
employee_id;
Assuming you're using Postgres, you might also find window functions useful:
http://www.postgresql.org/docs/9.0/static/tutorial-window.html
Unless I'm mistaking, you want something that resembles the following:
select report_month, employee_id, salary, split_bonus,
sum(salary) over w as sum_salary,
sum(split_bonus) over w as sum_bonus
from empsal
where ...
window w as (partition by employee_id);
CTEs are also convenient:
http://www.postgresql.org/docs/9.0/static/queries-with.html
WITH
rows as (
SELECT foo.*
FROM foo
WHERE ...
),
report1 as (
SELECT aggregates
FROM rows
WHERE ...
),
report2 as (
SELECT aggregates
FROM rows
WHERE ...
)
SELECT *
FROM report1, report2, ...

Find Missing Records

My SQL is a bit bad. I got a query that when I run it I return for example 10 rows but there are 15 in my where clause, how do I identify those 5 that I can't find?
Off course I can dump it in MS Excel but how do I use SQL?
Its a simple query:
select * from customers where username in ("21051", "21052"...
Nothing else in the where clause, it return 10 rows but there are 15 usernames in there where clause.
To identify the missing rows:
SELECT *
FROM (SELECT '21051' username
UNION ALL SELECT '21052'
UNION ALL SELECT '21053'
UNION ALL SELECT '21054') u
WHERE u.username NOT IN (SELECT c.username FROM customers c)