RE: Greatest Value with column name - postgresql

Greatest value of multiple columns with column name?
I was reading the question above (link above) and the "ACCEPTED" answer (which seems correct) and have several questions concerning this answer.
(Sorry I have to create a new post, don't have a high enough reputation to comment on the old post as it seems very old)
Questions
My first question is what is the significance of "#var_max_val:= "? I reran the query without it and everything ran fine.
My second question is can someone explain how this achieve it's desired result:
CASE #var_max_val WHEN col1 THEN 'col1'
WHEN col2 THEN 'col2'
...
END AS max_value_column_name
My third question is as follows:
It seems that in this "case" statement he manually has to write a line of code ("when x then y") for every column in the table. This is fine if you have 1-5 columns. But what if you had 10,000? How would you go about it?
PS: I might be violating some forum rules in this post, do let me know if I am.
Thank you for reading, and thank you for your time!

The linked question is about mysql so it does not apply to postgresql (e.g. the #var_max_val syntax is specific to mysql). To accomplish the same thing in postgresql you can use a LATERAL subquery. For example, suppose that you have the following table and sample data:
CREATE TABLE t(col1 int, col2 int, col3 int);
INSERT INTO t VALUES (1,2,3), (5,8,6);
Then you can identify the maximum column for each row with the following query:
SELECT *
FROM t, LATERAL (
VALUES ('col1',col1),('col2',col2),('col3',col3)
ORDER BY 2 DESC
LIMIT 1
) l(maxcolname, maxcolval);
which produces the following output:
col1 | col2 | col3 | maxcolname | maxcolval
------+------+------+------------+-----------
1 | 2 | 3 | col3 | 3
5 | 8 | 6 | col2 | 8
I think this solution is much more elegant than the one presented in the linked article for mysql.
As for having to manually write the code, unfortunately, I do not think you can avoid that.

In Postgres 9.5 you can use jsonb functions to get column names. In this case you do not have to write manually all the columns names. The solution needs a primary key (or a unique column) for proper
grouping:
create table a_table(id serial primary key, col1 int, col2 int, col3 int);
insert into a_table (col1, col2, col3) values (1,2,3), (5,8,6);
select distinct on(id) id, key, value
from a_table t, jsonb_each(to_jsonb(t))
where key <> 'id'
order by id, value desc;
id | key | value
----+------+-------
1 | col3 | 3
2 | col2 | 8
(2 rows)

Related

Why does using DISTINCT ON () at different points in a query return different (unintuitive) results?

I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.

PostgreSQL, CREATE TABLE AS with predefined column(s)

For a first time I find very handy way for importing "last year data" to "this year data".
This works well:
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable AS
SELECT col1, col2, col3, col4
FROM dblink('host=localhost port=xxxx user=xxxx password=xxxx dbname=mylastyeardb',
'SELECT col1, col2, col3, col4
FROM mytable
WHERE TRIM(col1)<>'''' ')
AS x(col1 text, col2 text, col3 text, col4 text);
ALTER TABLE mytable ADD COLUMN cols_id SERIAL PRIMARY KEY;
Since 'cols_id' from old table is not appropriate for a new table maybe some of experienced users know how to setup a table in CREATE TABLE AS that it have 'cols_id' as (serial) primary key nice ordered and as a first column. Maybe such way I can avoid using of second (ALTER) command?
Any other advice for showed situation will be welcome too.
you either create table, defining its structure (with all handy shortcuts and options in one statement), or create table as select, "inheriting" [partially] the structure. Thus if you want primary key, you will need alter tabale any way...
To put id as first column in one statement, you can simply use a dummy value, eg sequential number:
t=# create table s as select row_number() over() as id,chr(n) from generate_series(197,200) n;
SELECT 4
t=# select * from s;
id | chr
----+-----
1 | Å
2 | Æ
3 | Ç
4 | È
(4 rows)
Of course after that you still need to create sequence, assign its value as default to the id column and add primary key on ot. Which makes it even more statements then you have ATM...

Build a list of grouped values

I'm new to this page and this is the first time i post a question. Sorry for anything wrong. The question may be old, but i just can't find any answer for SQL AnyWhere.
I have a table like
Order | Mark
======|========
1 | AA
2 | BB
1 | CC
2 | DD
1 | EE
I want to have result as following
Order | Mark
1 | AA,CC,EE
2 | BB,DD
My current SQL is
Select Order, Cast(Mark as NVARCHAR(20))
From #Order
Group by Order
and it just give me with result completely the same with the original table.
Any idea for this?
You can use the ASA LIST() aggregate function (untested, you might need to enclose the order column name into quotes as it is also a reserved name):
SELECT Order, LIST( Mark )
FROM #Order
GROUP BY Order;
You can customize the separator character and order if you need.
Note: it is rather a bad idea to
name your table and column name with like regular SQL clause (Order by)
use the same name for column an table (Order)

Grouping data in postgresql

If I have a table with multiple entries with same name I want to group only the name, i.e., show as many rows present in table but the name should appear only once and other data should show in multiple columns. i.e., for other rows name should be blank:
table expected result
---------------- ------------------
col1 col2 col1 col2
a 5 a 5
a 6 6
a 8 8
b 3 b 3
b 4 4
I'm using PostgreSQL 9.2.
You could use row_number to determine the first occurrence of each group, and from there, it's just a case away from not displaying it:
SELECT CASE rn WHEN 1 THEN col1 ELSE NULL END, col2
FROM (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY col2 ASC) AS rn
FROM my_table
ORDER BY col1, col2) t
Firstly I need to say that I do not have experience in PostgreSQL, just some basic SQL knowledge. It is not right to change data in original table itself, what you want is some 'view' of the data. Usually such things are made after data set is returned to client, actually it is a matter how to display the data (representation matter), and it should not be made in SQL query but on client side. But, if you want to bother the server with such things indeed, so I would do following: created copy of the table (it can be a temp table), then cleared values in col1 which are not the first in the subsequent select ordering records by col2. By the way, your table does not have primary key, so you will have a problem to implement that, since you can't identify parent record within the subsequent select.
So, the idea to archive that you need on client side (via a data cursor), just traversing records each by one, has even more points.

TSQL Select comma list to rows

How do I turn a comma list field in a row and display it in a column?
For example,
ID | Colour
------------
1 | 1,2,3,4,5
to:
ID | Colour
------------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
The usual way to solve this is to create a split function. You can grab one from Google, for example this one from SQL Team. Once you have created the function, you can use it like:
create table colours (id int, colour varchar(255))
insert colours values (1,'1,2,3,4,5')
select colours.id
, split.data
from colours
cross apply
dbo.Split(colours.colour, ',') as split
This prints:
id data
1 1
1 2
1 3
1 4
1 5
Another possible workaround is to use XML (assuming you are working with SQL Server 2005 or greater):
DECLARE #s TABLE
(
ID INT
, COLOUR VARCHAR(MAX)
)
INSERT INTO #s
VALUES ( 1, '1,2,3,4,5' )
SELECT s.ID, T.Colour.value('.', 'int') AS Colour
FROM ( SELECT ID
, CONVERT(XML, '<row>' + REPLACE(Colour, ',', '</row><row>') + '</row>') AS Colour
FROM #s a
) s
CROSS APPLY s.Colour.nodes('row') AS T(Colour)
I know this is an older post but thought I'd add an update. Tally Table and cteTally table based splitters all have a major problem. They use concatenated delimiters and that kills their speed when the elements get wider and the strings get longer.
I've fixed that problem and wrote an article about it which may be found at he following URL. http://www.sqlservercentral.com/articles/Tally+Table/72993/
The new method blows the doors off of all While Loop, Recursive CTE, and XML methods for VARCHAR(8000).
I'll also tell you that a fellow by the name of "Peter" made an improvement even to that code (in the discussion for the article). The article is still interesting and I'll be updating the attachments with Peter's enhancements in the next day or two. Between my major enhancement and the tweek Peter made, I don't believe you'll find a faster T-SQL-Only solution for splitting VARCHAR(8000). I've also solved the problem for this breed of splitters for VARCHAR(MAX) and am in the process of writing an article for that, as well.