MySQL Select if field is unique or null - group-by

Sorry, I can't find an example anywhere, mainly because I can't think of any other way to explain it that doesn't include DISTINCT or UNIQUE (which I've found to be misleading terms in SQL).
I need to select unique values AND null values from one table.
FLAVOURS:
id | name | flavour
--------------------------
1 | mark | chocolate
2 | cindy | chocolate
3 | rick |
4 | dave |
5 | jenn | vanilla
6 | sammy | strawberry
7 | cindy | chocolate
8 | rick |
9 | dave |
10 | jenn | caramel
11 | sammy | strawberry
I want the kids who have a unique flavour (vanilla, caramel) and the kids who don't have any flavour.
I don't want the kids with duplicate flavours (chocolate, strawberry).
My searches for help always return an answer for how to GROUP BY, UNIQUE and DISTINCT for chocolate and strawberry. That's not what I want. I don't want any repeated terms in a field - I want everything else.
What is the proper MySQL select statement for this?
Thanks!

You can use HAVING to select just some of the groups, so to select the groups where there is only one flavor, you use:
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1
If you then want to select those users that have NULL entries, you use
SELECT * FROM my_table WHERE flavour IS NULL
and if you combine them, you get all entries that either have a unique flavor, or NULL.
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1 AND flavour IS NOT NULL
UNION
SELECT * FROM my_table WHERE flavour IS NULL
I added the "flavour IS NOT NULL" just to ensure that a flavour that is NULL is not picked if it's the single one, which would generate a duplicate.

I don't have a database to hand, but you should be able to use a query along the lines of.
SELECT name from FLAVOURS WHERE flavour IN ( SELECT flavour, count(Flavour) from FLAVOURS where count(Flavour) = 1 GROUP BY flavour) OR flavour IS NULL;
I apologise if this isn't quite right, but hopefully is a good start.

You need a self-join that looks for duplicates, and then you need to veto those duplicates by looking for cases where there was no match (that's the WHERE t2.flavor IS NULL). Then your're doing something completely different, looking for nulls in the original table, with the second line in the WHERE clause (OR t1.flavor IS NULL)
SELECT DISTINCT t1.name, t1.flavor
FROM tablename t1
LEFT JOIN tablename t2
ON t2.flavor = t1.flavor AND t2.ID <> t1.ID
WHERE t2.flavor IS NULL
OR t1.flavor IS NULL
I hope this helps.

Related

Why does using DISTINCT ON () at different points in a query return different (unintuitive) results?

I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.

Build a list of grouped values

I'm new to this page and this is the first time i post a question. Sorry for anything wrong. The question may be old, but i just can't find any answer for SQL AnyWhere.
I have a table like
Order | Mark
======|========
1 | AA
2 | BB
1 | CC
2 | DD
1 | EE
I want to have result as following
Order | Mark
1 | AA,CC,EE
2 | BB,DD
My current SQL is
Select Order, Cast(Mark as NVARCHAR(20))
From #Order
Group by Order
and it just give me with result completely the same with the original table.
Any idea for this?
You can use the ASA LIST() aggregate function (untested, you might need to enclose the order column name into quotes as it is also a reserved name):
SELECT Order, LIST( Mark )
FROM #Order
GROUP BY Order;
You can customize the separator character and order if you need.
Note: it is rather a bad idea to
name your table and column name with like regular SQL clause (Order by)
use the same name for column an table (Order)

RE: Greatest Value with column name

Greatest value of multiple columns with column name?
I was reading the question above (link above) and the "ACCEPTED" answer (which seems correct) and have several questions concerning this answer.
(Sorry I have to create a new post, don't have a high enough reputation to comment on the old post as it seems very old)
Questions
My first question is what is the significance of "#var_max_val:= "? I reran the query without it and everything ran fine.
My second question is can someone explain how this achieve it's desired result:
CASE #var_max_val WHEN col1 THEN 'col1'
WHEN col2 THEN 'col2'
...
END AS max_value_column_name
My third question is as follows:
It seems that in this "case" statement he manually has to write a line of code ("when x then y") for every column in the table. This is fine if you have 1-5 columns. But what if you had 10,000? How would you go about it?
PS: I might be violating some forum rules in this post, do let me know if I am.
Thank you for reading, and thank you for your time!
The linked question is about mysql so it does not apply to postgresql (e.g. the #var_max_val syntax is specific to mysql). To accomplish the same thing in postgresql you can use a LATERAL subquery. For example, suppose that you have the following table and sample data:
CREATE TABLE t(col1 int, col2 int, col3 int);
INSERT INTO t VALUES (1,2,3), (5,8,6);
Then you can identify the maximum column for each row with the following query:
SELECT *
FROM t, LATERAL (
VALUES ('col1',col1),('col2',col2),('col3',col3)
ORDER BY 2 DESC
LIMIT 1
) l(maxcolname, maxcolval);
which produces the following output:
col1 | col2 | col3 | maxcolname | maxcolval
------+------+------+------------+-----------
1 | 2 | 3 | col3 | 3
5 | 8 | 6 | col2 | 8
I think this solution is much more elegant than the one presented in the linked article for mysql.
As for having to manually write the code, unfortunately, I do not think you can avoid that.
In Postgres 9.5 you can use jsonb functions to get column names. In this case you do not have to write manually all the columns names. The solution needs a primary key (or a unique column) for proper
grouping:
create table a_table(id serial primary key, col1 int, col2 int, col3 int);
insert into a_table (col1, col2, col3) values (1,2,3), (5,8,6);
select distinct on(id) id, key, value
from a_table t, jsonb_each(to_jsonb(t))
where key <> 'id'
order by id, value desc;
id | key | value
----+------+-------
1 | col3 | 3
2 | col2 | 8
(2 rows)

Checking if time ranges overlap in PostgreSQL without separate Start and End fields

I have a table in postgre which has room bookings stores as:
Room_id | Meet_time |
123456 | [9:00 - 10:00) |
I want to check if room 123456 is free from 9:30 to 10:00. How would I check that?
You can check overlapping of ranges with && operator (see documentation).
So to check all rooms which are free for given timerange, you can use this query:
select *
from <your table> as T
where not <given timerange> && T."Meet_time"
sql fiddle demo

fetching a table with another tables "conditions"

I have two tables like this:
Table Name: users
emx | userid
---------------
1 | 1
2 | 2
and another table called bodies
id | emx | text
--------------------------
1 | 1 | Hello
2 | 2 | How are you?
As you can see, bodies table has emx which is id numbers of users table. Now, when i want to fetch message that contains Hello i just search it on bodies and get the emx numbers and after that i fetch users table with these emx numbers. So, i am doing 2 sql queries to find it.
So, all i want to do is make this happen in 1 SQL query.
I tried some queries which is not correct and also i tried JOIN too. No luck yet. I just want to fetch users table with message contains 'Hello' in bodies table.
Note: I am using PostgreSQL 9.1.3.
Any idea / help is appreciated.
Read docs on how to join tables.
Try this:
SELECT u.emx, u.userid, b.id, b.text
FROM bodies b
JOIN users u USING (emx)
WHERE b.text ~ 'Hello';
This is how I'd do the join. I've left out the exact containment test.
SELECT users.userid
FROM users JOIN bodies ON (users.emx = bodies.emx)
WHERE ⌜true if bodies.text contains ?⌟