count(1) and distinct behaviour in postgres - postgresql

Imagine a table:
name age
John 20
Sam 60
Dave 30
John 15
I want to check count of distinct names, I query the table like so:
SELECT COUNT(1), DISTINCT(name)
FROM table
GROUP BY 2
But I get:
ERROR: syntax error at or near "DISTINCT"
Position: 18
But when I use:
SELECT DISTINCT(name), COUNT(1)
FROM table
GROUP BY 1
I do get what's expected:
John 2
Sam 1
Dave 1
Is there a reason why the first query is not working or am I making a mistake somewhere?

The distinct here is not required. GROUP BY means 'group by a distinct set of values'
so
SELECT COUNT(*), name
FROM table
GROUP BY name;
Will give you the result I think you want.

Related

How do I write a query in SQL with an exclusive?

Using the SELECT COUNT and BETWEEN operators I also need to exclude two integers.
SELECT COUNT(*)
FROM purchases
WHERE user_id BETWEEN 10 AND 50
I need to exclude 20 and 30. I’ve tried using NOT, NOT IN and NULL. What am I missing?
Just add another condition:
SELECT COUNT(*)
FROM purchases
WHERE user_id BETWEEN 10 AND 50
AND user_id NOT IN (20,30);
Adding "AND user_id NOT IN (20,30)" doesn't work? Could try just adding a "user_id <> 20 AND user_id <> 30"

Hierarchical count query in postgresql

I have a simple hierarchical table (analogous to employee/manager) that I want to show counts of subordinates by parent nodes.
Consider this example from this article
WITH RECURSIVE subordinates AS (
SELECT
employee_id,
manager_id,
full_name
FROM
employees
WHERE
employee_id = 2
UNION
SELECT
e.employee_id,
e.manager_id,
e.full_name
FROM
employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SELECT
*
FROM
subordinates;
What i need to do is generate output like this:
id full_name subordinate_count
---- --------- -----------------
1 Alice 42
2 Bob 18
3 Charlie 4
Let's say Alice is the CEO and Charlie is a low level manager.
It seems like you have to hard-code a clause in the first half of the union query to get a hierarchical query to work. I've tried several approaches but nothing is working. Thanks in advance to anyone that can help.
You can try to wrap this query inside an outer query and group over full_name with counts.
example:
select full_name,count(*)
from ("your recursive query") outer_query
group by outer_query.full_name;

PostgreSQL: a variation of rows to columns

postgresql V 9.3
Best explained with an example:
So I have 2 tables:
Books tables:
book_id name
1 Aragorn
2 Harry Potter
3 The Great Gatsby
4 Book name, with a comma
Users ids to books ids table:
user_id book_id
31 1
31 2
32 3
34 1
34 4
And I would like to show each user his/her books so something like this:
user_id book_names
31 Aragorn,Harry Potter
32 The Great Gatsby
34 Aragorn,Book name, with a comma
Basically each user get his/her books separated by commas
How can I achieve this in an efficient way?
If you are using Postgres version 8.4 or later, then you have array_agg() at your disposal. One option is to aggregate over the user books table by user_id and then use array_agg() to generate the CSV list of books for each user.
SELECT t1.user_id,
array_to_string(array_agg(t2.name), ',') AS book_names
FROM user_books t1
INNER JOIN books t2
ON t1.book_id = t2.book_id
GROUP BY t1.user_id
In Postgres 9.0 and above, you could use the following to aggregate book names into a CSV list:
string_agg(t2.name, ',' order by t2.name)

Getting group by attribute in nested query

I am trying to find the most frequent value in a postgresql table. The problem is that I also want to "group by" in that table and only get the most frequent from the values that have the same name.
So I have the following query:
select name,
(SELECT value FROM table where name=name GROUP BY value ORDER BY COUNT(*) DESC limit 1)
as mfq from table group by name;
So, I am using where name=name, trying to get the outside group by attribute "name", but it doesn't seem to work. Any ideas on how to do it?
Edit: for example in the following table:
name value
a 3
a 3
a 3
b 2
b 2
I want to get:
name value
a 3
b 2
but the above statement gives:
name value
a 3
b 3
instead, since where doesn't work correctly.
There is a dedicated function in PostgreSQL for this case: the mode() ordered-set aggregate:
select name, mode() within group (order by value) mode_value
from table
group by name;
which returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) -- which is the same behavior as with your order by count(*) desc limit 1.
It is available from PostgreSQL 9.4+.
http://rextester.com/GHGJH15037
If you want your query to work, you need table aliases. Table aliases and qualified column names are always a good idea:
select t.name,
(select t2.value
from table t2
where t2.name = t.name
group by t2.value
order by COUNT(*) desc
limit 1
) as mfq
from table t
group by t.name;

distinct key word only for one column

I'm using postgresql as my database, I'm stuck with getting desired results with a query,
what I have in my table is something like following,
nid date_start date_end
1 20 25
1 20 25
2 23 26
2 23 26
what I want is following
nid date_start date_end
1 20 25
2 23 26
for that I used SELECT DISTINCT nid,date_start,date_end from table_1 but this result duplicate entries, how can I get distinct nid s with corresponding date_start and date_end?
can anyone help me with this?
Thanks a lot!
Based on your sample data and sample output, your query should work fine. I'll assume your sample input/output is not accurate.
If you want to get distinct values of a certain column, along with values from other corresponding columns, then you need to determine WHICH value from the corresponding columns to display (your question and query would otherwise not make sense). For this you need to use aggregates and group by. For example:
SELECT
nid,
MAX(date_start),
MAX(date_end)
FROM
table_1
GROUP BY
nid
That query should work unless you are selecting more columns.
Or maybe you are getting the same nid with a different start and/or end date
Try distinct on:
select distinct on (col1) col1, col2 from table;
DISTINCT can't result in duplicate entries - that's what it does... removed duplicates.
Is your posted data is incorrect? Exactly what are your data and output?