Updating Group Number based on swap records in postgresql - postgresql

I just have a two column says like below
Ref Comp
A B
B A
I have the data like this like swapping. Now i just need to provided the same group number for both the records like mentioned below. I our case both the records are same so i need to provide same number for both the records in seperate column. Please provide any solution for this.
GROUP REF COMP
1 A B
1 B A

You can use the Window Function dense_rank ... in the over(...) use just the order clause: (see demo)
select dense_rank() over( order by least(ref,comp), greatest(ref,comp) ) as "Group"
, ref
, comp
from <your_table>
order by "Group", least(ref,comp);
For demo, I added a couple additional data rows. I seldom trust test result set with only 1 basic item. In this case "Group".

Related

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

Top N rows by group in ClickHouse

What is the proper way to query top N rows by group in ClickHouse?
Lets take an example of tbl having id2, id4, v3 columns and N=2.
I tried the following
SELECT
id2,
id4,
v3 AS v3
FROM tbl
GROUP BY
id2,
id4
ORDER BY v3 DESC
LIMIT 2 BY
id2,
id4
but getting error
Received exception from server (version 19.3.4):
Code: 215. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception
: Column v3 is not under aggregate function and not in GROUP BY..
I could put v3 into GROUP BY and it does seems to work, but it is not efficient to group by a metric.
There is any aggregate function, but we actually want all values (limited to 2 by LIMIT BY clause) not any value, so it doesn't sound like to be proper solution here.
SELECT
id2,
id4,
any(v3) AS v3
FROM tbl
GROUP BY
id2,
id4
ORDER BY v3 DESC
LIMIT 2 BY
id2,
id4
It can be used aggregate functions like this:
SELECT
id2,
id4,
arrayJoin(arraySlice(arrayReverseSort(groupArray(v3)), 1, 2)) v3
FROM tbl
GROUP BY
id2,
id4
You can also do it the way you would do it in "normal" SQL as described in this thread
While vladimir's solutions works for many cases, it didn't work for my case. I have a table, that looks like this:
column | group by
++++++++++++++++++++++
A | Yes
B | Yes
C | No
Now, imagine column A identifies the user and column B stands for whatever action a user could do e. g. on your website or your online game. Column C is the sum of how often the user has done this particular action. Vladimir's solution would allow me to get column A and C, but not the action the user has done (column B), meaning I would know how often a user has done something, but not what.
The reason for this is that it doesn't make sense to group by both A and B. Every row would be a unique group and you aren't able to find the top K rows since every group has only 1 member. The result is the same table you query against. Instead, if you group only by A, you can apply vladimir's solution but would get only columns A and C. You can't output column B because it's not part of the Group By statement as explained.
If you would like to get the top 2 (or top 5, or top 100) actions a user has done, you might look for a solution that this:
SELECT rs.id2, rs.id4, rs.v3
FROM (
SELECT id2, id4, v3, row_number()
OVER (PARTITION BY id2, id4 ORDER BY v3 DESC) AS Rank
FROM tbl
) rs WHERE Rank <= 2
Note: To use this, you have to set allow_experimental_window_functions = 1.

How can I combine two PIVOTs that use different aggregate elements and the same spreading/grouping elements into a single row per ID?

Couldn't find an exact duplicate question so please push one to me if you know of one.
https://i.stack.imgur.com/Xjmca.jpg
See the screenshot (sorry for link, not enough rep). In the table I have ID, Cat, Awd, and Xmit.
I want a resultset where each row is a distinct ID plus the aggregate Awd and Xmit amounts for each Cat (so four add'l columns per ID).
Currently I'm using two CTEs, one to aggregate each of Awd and Xmit. Both make use of the PIVOT operator, using Cat to spread and ID to group. After each CTE does its thing, I'm INNER JOINing them on ID.
WITH CTE1 (ID, P_Awd, G_Awd) AS (
SELECT ...
FROM Table
PIVOT(SUM(Awd) FOR Cat IN ('P', 'G'),
CTE2 ([same as CTE1 but replace "Awd" with "Xmit"])
SELECT ID, P_Awd, P_Xmit, G_Awd, G_Xmit
FROM CTE1 INNER JOIN CTE2 ON CTE1.ID = CTE2.ID
The output of this (greatly simplified) is two rows per ID, with each row holding the resultset of one CTE or the other.
What am I overlooking? Am I overcomplicating this?
Here on one method via a CROSS APPLY
Also, this is assumes you don't need dynamic SQL
Example
Select *
From (
Select ID
,B.*
From YourTable A
Cross Apply ( values (cat+'_Awd',Awd)
,(cat+'_Xmit',Xmit)
) B(Item,Value)
) src
Pivot (sum(Value) for Item in ([P_Awd],[P_XMit],[G_Awd],[G_XMit]) ) pvt
Returns (Limited Set -- Best if you not use images for sample data)
ID P_Awd P_XMit G_Awd G_XMit
1 1000 500 1000 0
2 2000 1500 500 500

Condensing Left Join Result Set into one row

I have a sql query which currently, due to a couple of Left joins, returns multiple rows:-
Id
Action
Group
12345 NULL NULL
12345 ADD NULL
12345 NULL ABC Group
How do I go about condensing these 3 rows into one e.g.
12345 ADD ABC Group
The constraints of your question are a bit unclear. If for every ID, there is only one possible non-NULL value of the other columns, you could use MIN or MAX to pull those values out. For example, put the query with the LEFT JOINS into a CTE, then do
SELECT ID, MAX(col2),MAX(col3)
FROM CTE
GROUP BY ID
If there is the potential for multiple non-NULL values per column, you will need to be more specific about what you would want the output to look like.

Calculate Mode - "Highest frequency row" DB2

What would be the most efficient way to calculating the mode across tables with joins in DB2..
I am trying to get the value with the most frequency(count) for a given column(ID - candidate key for joined table) on a given date.
The idea is to get the most common (value) from the table which has different (value)s for some accounts (for the same ID and date). We need to make it unique for use in another table.
You can use common table expressions [CTE's], indicated by WITH, to break the logic down into logical steps. First we'll build the summary rows, then we'll assign a ranking to the rows within each group, then pick out the ones that with the highest count of records.
Let's say we want to know which flavor of each item sells the most frequently on each date (perhaps assuming a record is quantity one).
WITH s as
(
SELECT itemID, saleDate, flavor, count(*) as tally
FROM sales
GROUP BY itemID, saleDate, flavor
), r as
(
SELECT itemID, saleDate, flavor, tally,
RANK() OVER (PARTITION BY itemID, saleDate ORDER BY tally desc) as pri
FROM s
)
SELECT itemID, saleDate, flavor, tally
FROM r
WHERE pri = 1
Here the names "s" and "r" refer to the result set from their respective CTE's. These names can then be used as to represent a table in another part of the statement.
The pri column will have the RANK() of tally value on the summary row from the first section "s" within the window of itemID and saleDate. Tally is descending, because we want the largest value first, which will get a RANK() of 1. Then in the main SELECT we simply pick those summary records which were first in their partition.
By using RANK() or DENSE_RANK() we could get back multiple flavors for an itemID, saleDate, if they are tied for first place. This could be eliminated by replacing RANK() with ROW_NUMBER(), but it would arbitrarily pick one of the tied flavors as a winner, and this may not be correct answer for the problem at hand.
If we had a sales quantity column in the table, we could replace COUNT(*) with SUM(salesqty) and find what had sold the most units.