Build a list of grouped values - group-by

I'm new to this page and this is the first time i post a question. Sorry for anything wrong. The question may be old, but i just can't find any answer for SQL AnyWhere.
I have a table like
Order | Mark
======|========
1 | AA
2 | BB
1 | CC
2 | DD
1 | EE
I want to have result as following
Order | Mark
1 | AA,CC,EE
2 | BB,DD
My current SQL is
Select Order, Cast(Mark as NVARCHAR(20))
From #Order
Group by Order
and it just give me with result completely the same with the original table.
Any idea for this?

You can use the ASA LIST() aggregate function (untested, you might need to enclose the order column name into quotes as it is also a reserved name):
SELECT Order, LIST( Mark )
FROM #Order
GROUP BY Order;
You can customize the separator character and order if you need.
Note: it is rather a bad idea to
name your table and column name with like regular SQL clause (Order by)
use the same name for column an table (Order)

Related

Why does using DISTINCT ON () at different points in a query return different (unintuitive) results?

I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.

RE: Greatest Value with column name

Greatest value of multiple columns with column name?
I was reading the question above (link above) and the "ACCEPTED" answer (which seems correct) and have several questions concerning this answer.
(Sorry I have to create a new post, don't have a high enough reputation to comment on the old post as it seems very old)
Questions
My first question is what is the significance of "#var_max_val:= "? I reran the query without it and everything ran fine.
My second question is can someone explain how this achieve it's desired result:
CASE #var_max_val WHEN col1 THEN 'col1'
WHEN col2 THEN 'col2'
...
END AS max_value_column_name
My third question is as follows:
It seems that in this "case" statement he manually has to write a line of code ("when x then y") for every column in the table. This is fine if you have 1-5 columns. But what if you had 10,000? How would you go about it?
PS: I might be violating some forum rules in this post, do let me know if I am.
Thank you for reading, and thank you for your time!
The linked question is about mysql so it does not apply to postgresql (e.g. the #var_max_val syntax is specific to mysql). To accomplish the same thing in postgresql you can use a LATERAL subquery. For example, suppose that you have the following table and sample data:
CREATE TABLE t(col1 int, col2 int, col3 int);
INSERT INTO t VALUES (1,2,3), (5,8,6);
Then you can identify the maximum column for each row with the following query:
SELECT *
FROM t, LATERAL (
VALUES ('col1',col1),('col2',col2),('col3',col3)
ORDER BY 2 DESC
LIMIT 1
) l(maxcolname, maxcolval);
which produces the following output:
col1 | col2 | col3 | maxcolname | maxcolval
------+------+------+------------+-----------
1 | 2 | 3 | col3 | 3
5 | 8 | 6 | col2 | 8
I think this solution is much more elegant than the one presented in the linked article for mysql.
As for having to manually write the code, unfortunately, I do not think you can avoid that.
In Postgres 9.5 you can use jsonb functions to get column names. In this case you do not have to write manually all the columns names. The solution needs a primary key (or a unique column) for proper
grouping:
create table a_table(id serial primary key, col1 int, col2 int, col3 int);
insert into a_table (col1, col2, col3) values (1,2,3), (5,8,6);
select distinct on(id) id, key, value
from a_table t, jsonb_each(to_jsonb(t))
where key <> 'id'
order by id, value desc;
id | key | value
----+------+-------
1 | col3 | 3
2 | col2 | 8
(2 rows)

MySQL Select if field is unique or null

Sorry, I can't find an example anywhere, mainly because I can't think of any other way to explain it that doesn't include DISTINCT or UNIQUE (which I've found to be misleading terms in SQL).
I need to select unique values AND null values from one table.
FLAVOURS:
id | name | flavour
--------------------------
1 | mark | chocolate
2 | cindy | chocolate
3 | rick |
4 | dave |
5 | jenn | vanilla
6 | sammy | strawberry
7 | cindy | chocolate
8 | rick |
9 | dave |
10 | jenn | caramel
11 | sammy | strawberry
I want the kids who have a unique flavour (vanilla, caramel) and the kids who don't have any flavour.
I don't want the kids with duplicate flavours (chocolate, strawberry).
My searches for help always return an answer for how to GROUP BY, UNIQUE and DISTINCT for chocolate and strawberry. That's not what I want. I don't want any repeated terms in a field - I want everything else.
What is the proper MySQL select statement for this?
Thanks!
You can use HAVING to select just some of the groups, so to select the groups where there is only one flavor, you use:
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1
If you then want to select those users that have NULL entries, you use
SELECT * FROM my_table WHERE flavour IS NULL
and if you combine them, you get all entries that either have a unique flavor, or NULL.
SELECT * from my_table GROUP BY flavour HAVING COUNT(*) = 1 AND flavour IS NOT NULL
UNION
SELECT * FROM my_table WHERE flavour IS NULL
I added the "flavour IS NOT NULL" just to ensure that a flavour that is NULL is not picked if it's the single one, which would generate a duplicate.
I don't have a database to hand, but you should be able to use a query along the lines of.
SELECT name from FLAVOURS WHERE flavour IN ( SELECT flavour, count(Flavour) from FLAVOURS where count(Flavour) = 1 GROUP BY flavour) OR flavour IS NULL;
I apologise if this isn't quite right, but hopefully is a good start.
You need a self-join that looks for duplicates, and then you need to veto those duplicates by looking for cases where there was no match (that's the WHERE t2.flavor IS NULL). Then your're doing something completely different, looking for nulls in the original table, with the second line in the WHERE clause (OR t1.flavor IS NULL)
SELECT DISTINCT t1.name, t1.flavor
FROM tablename t1
LEFT JOIN tablename t2
ON t2.flavor = t1.flavor AND t2.ID <> t1.ID
WHERE t2.flavor IS NULL
OR t1.flavor IS NULL
I hope this helps.

phantom "name" column?

I start simple:
hoops=# select * from core_school limit 3;
id | school_name | nickname
----+------------------+----------
1 | Marshall |
2 | Ohio |
3 | Houston |
(10 rows)
Let's introduce an intentional error:
hoops=# select name from core_school;
ERROR: column "name" does not exist
LINE 1: select name from core_school;
But why does this work? (with an unexpected result!):
hoops=# select core_school.name from core_school limit 3;
name
-----------------
(1,Marshall,"")
(2,Ohio,"")
(3,Houston,"")
(3 rows)
Where did the "name" column come from in the third query?
This is PostgreSQL's autocast feature which allows calling function(argument) as argument.function.
What you are really calling is
SELECT NAME(core_school)
FROM core_school
Compare to this:
SELECT (1::int).exp
--
2.71828182845905
which is quite self-explaining.
This "feature" very often leads to confusion and will (finally) be removed in 9.1.
Maybe you have a different version of Postgres than I do. (I've got 8.3.7.) But I don't have any such "phantom" name column.
If you simply say "select core_school from core_school" you'll get one line of output for each row in the table, with that line consisting of an array of the values of all the columns in the table. That's what you're seeing.
Oh, I notice that you're getting a column name of dealer. Maybe you didn't real put a period between "core_school" and "name" but a space, and now "name" is an alias for the column name. (My Postgres installation requires the word "as" to make an alias for a column name, but some databases do not require this, so maybe there's an option in Postgres somewhere for compatibility.)

TSQL Select comma list to rows

How do I turn a comma list field in a row and display it in a column?
For example,
ID | Colour
------------
1 | 1,2,3,4,5
to:
ID | Colour
------------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
The usual way to solve this is to create a split function. You can grab one from Google, for example this one from SQL Team. Once you have created the function, you can use it like:
create table colours (id int, colour varchar(255))
insert colours values (1,'1,2,3,4,5')
select colours.id
, split.data
from colours
cross apply
dbo.Split(colours.colour, ',') as split
This prints:
id data
1 1
1 2
1 3
1 4
1 5
Another possible workaround is to use XML (assuming you are working with SQL Server 2005 or greater):
DECLARE #s TABLE
(
ID INT
, COLOUR VARCHAR(MAX)
)
INSERT INTO #s
VALUES ( 1, '1,2,3,4,5' )
SELECT s.ID, T.Colour.value('.', 'int') AS Colour
FROM ( SELECT ID
, CONVERT(XML, '<row>' + REPLACE(Colour, ',', '</row><row>') + '</row>') AS Colour
FROM #s a
) s
CROSS APPLY s.Colour.nodes('row') AS T(Colour)
I know this is an older post but thought I'd add an update. Tally Table and cteTally table based splitters all have a major problem. They use concatenated delimiters and that kills their speed when the elements get wider and the strings get longer.
I've fixed that problem and wrote an article about it which may be found at he following URL. http://www.sqlservercentral.com/articles/Tally+Table/72993/
The new method blows the doors off of all While Loop, Recursive CTE, and XML methods for VARCHAR(8000).
I'll also tell you that a fellow by the name of "Peter" made an improvement even to that code (in the discussion for the article). The article is still interesting and I'll be updating the attachments with Peter's enhancements in the next day or two. Between my major enhancement and the tweek Peter made, I don't believe you'll find a faster T-SQL-Only solution for splitting VARCHAR(8000). I've also solved the problem for this breed of splitters for VARCHAR(MAX) and am in the process of writing an article for that, as well.