Getting array aggregate of all modes for a group by result - postgresql

I have a bunch of ~600k rows of let's say owner's names (varchar) and pet type (also varchar). For each owner's name I'd like an array with the most frequent pet they have (or pets if they have an equal amount of the same pet type).
An example:
*owner, pet type*
alice, cat
alice, dog
bob, fish
bob, cat
bob, fish
eve, cat
eve, dog
eve, cat
eve, dog
Expected output:
alice, [cat, dog]
bob, [fish]
eve, [cat, dog]
My feeling is that this is some combination of 'distinct on' in an inner query with array_agg on an outer query to do the array aggregation - but I just can't get it right.

You can do this by combining window functions and grouping:
select owner, array_agg(pet order by pet)
from (
select owner, pet, dense_rank() over (partition by owner order by count(*) desc) as rnk
from pet
group by owner, pet
) t
where rnk = 1
group by owner
order by owner;
Online example: http://rextester.com/MTFIQ24341

with data as (
select 'alice' as owner, 'cat' pet_type
union all select 'alice' as owner, 'dog' pet_type
union all select 'bob' as owner, 'fish' pet_type
union all select 'bob' as owner, 'cat' pet_type
union all select 'bob' as owner, 'fish' pet_type
union all select 'eve' as owner, 'cat' pet_type
union all select 'eve' as owner, 'dog' pet_type
union all select 'eve' as owner, 'cat' pet_type
union all select 'eve' as owner, 'dog' pet_type
) , getMaxPet as (select owner , pet_type
from data d1
group by owner,pet_type
having count(pet_type) = (select max(pet_count) from (select count(pet_type) as pet_count
from data d2
where
d1.owner = d2.owner
group by owner,pet_type ) a ) )
select owner , array_agg(pet_type)
from getMaxPet
group by owner
Try this, Main logic is to find all pets counts based on each user and then selects pet who is having max number.

Related

Using array_agg with multiple DISTINCT Columns

In this query, I'm listing all users in organization 123 but I also want a column showing which other teams they are on across all organizations.
My query right now will give me the team names but I'd also like to get the team id as well. The DISTINCT is necessary because they user may have different roles on the same team.
Bonus points if I can sort the teams by when the user was given a role, which currently gives an error as I have it now.
SELECT
users.*,
(
SELECT
to_json(array_agg(DISTINCT teams.name ORDER BY teams.name))
FROM roles r
INNER JOIN user_roles ur ON ur.role_id=r.id AND ur.user_id=users.id
INNER JOIN teams ON r.team_id=teams.id
-- ORDER BY r.created_at
) teams
FROM users
INNER JOIN user_roles ON users_roles.user_id=users.id
INNER JOIN roles ON roles.id = user_roles.role_id
WHERE roles.type = 'admin' AND roles.organization_id = 123
GROUP BY users.id
This returns:
name | teams
John Smith | ['Team 1', 'Team 2']
Jane Doe | ['Team 2', 'Team 3']
What I'd like to return is the team name with its primary key id:
name | teams
John Smith | {1: 'Team 1', 2: 'Team 2'}
Jane Doe | {2: 'Team 2', 3: 'Team 3'}
EDIT
Or better yet:
name | teams
John Smith | [{id: 1, name: 'Team 1'}, {id: 2, 'Team 2'}]
Jane Doe | [{id: 2, name: 'Team 2'}, {id: 3, 'Team 3'}]
Considering that your query is working fine, replace the following section of your query mentioned in question:
SELECT
to_json(array_agg(DISTINCT teams.name ORDER BY teams.name))
FROM roles r
INNER JOIN user_roles ur ON ur.role_id=r.id AND ur.user_id=users.id
INNER JOIN teams ON r.team_id=teams.id
-- ORDER BY r.created_at
with
(SELECT
json_object(array_agg(id::text order by created_at desc),
array_agg(name order by created_at desc)) from
( SELECT
DISTINCT on (teams.id) teams.id, teams.name , r.created_at
FROM roles r
INNER JOIN user_roles ur ON ur.role_id=r.id AND ur.user_id=users.id
INNER JOIN teams ON r.team_id=teams.id
ORDER BY r.created_at
)tab)
Here is how I ended up solving it, building off of #akhilesh answer.
SELECT
json_object(
array_agg(id :: text ORDER BY created_at DESC),
array_agg(name ORDER BY created_at DESC)
)
FROM
(
SELECT
*
FROM
(
SELECT
teams.id,
teams.name,
MAX(ur.created_at) created_at
FROM
roles r
INNER JOIN user_roles ur ON ur.role_id = r.id AND ur.user_id = users.id
INNER JOIN teams ON r.team_id=teams.id
GROUP BY
teams.id
) T
ORDER BY
T.created_at DESC
) teams

Convertion of tabular data to JSON in Redshift

I am unable to figure out how to convert tabular data to JSON format and store it in another table in Redshift. For example, I have a "DEMO" table with four columns: pid,stid,item_id,trans_id.
For each combination of pid,stid,item_id there exist many trans_ids.
pid stid item_id trans_id :
1 , AB , P1 , T1
1 , AB , P1 , T2
1 , AB , P1 , T3
1 , AB , P1 , T4
2 , ABC , P2 , T5
2 , ABC , P2 , T6
2 , ABC , P2 , T7
2 , ABC , P2 , T8
I want to store this data in another table called "SAMPLE" as:
pid stid item_id trans_id
1 , AB , P1 , {"key1":T1, "key2":"T2" "key2":"T3" "key2":"T4"}
2 , ABC , P2 , {"key1":T5, "key2":"T6" "key2":"T7" "key2":"T8"}
I am unable to figure out how to load the data from "DEMO" to "SAMPLE" in JSON format only for column "trans_id" using a SQL query in Redshift. I don't want to use any intermediate files.
There is LISTAGG aggregate function that allows you to concatenate text values within groups. It allows the effective construction of JSON objects:
SELECT
pid
,stid
,item_id
,'{'||listagg(
'"key'||row_number::varchar||'":'||trans_id::varchar
,',') within group (order by row_number)
||'}'
FROM (
SELECT *, row_number() over (partition by pid,stid,item_id order by trans_id)
FROM "DEMO"
)
GROUP BY 1,2,3;
As a side note, in this particular case an array of transaction IDs might work better, you'll be able to request the element of a specific order easily without using keyN key:
WITH tran_arrays as (
SELECT
pid
,stid
,item_id
,listagg(trans_id::varchar,',') within group (order by trans_id) as tran_array
FROM "DEMO"
GROUP BY 1,2,3
)
SELECT *
,split_part(tran_array,',',1) as first_element
FROM tran_arrays;
Very similar to the existing Answer however slightly different. This example is also run out of an Oracle Database. I put the work into it and felt like sharing in case it may help someone else out.
/* Oracle Example */
WITH demo_data AS
(
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T1' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T2' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T3' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T4' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T5' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T6' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T7' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T8' AS trans_id FROM dual
)
, transformData AS
(
SELECT pid, stid, item_id, trans_id, rownum AS keyNum FROM demo_data
)
SELECT pid, stid, item_id
, '{'||
LISTAGG(CHR(34)||'key'||keynum||CHR(34)||':'||CHR(34)||trans_id||CHR(34), ' ')
WITHIN GROUP (ORDER BY pid)
||'}' AS trans_id
FROM transformData
GROUP BY pid, stid, item_id
;
Output will look like this:

Postgress by a CASE with DISTINCT in select

I have a query like below getting the error - 'SELECT DISTINCT, ORDER BY expressions must appear in select list'
select distinct name
from fruits
order by case
when name = 'mango' then 1
else 2
end
This results 4 records, say
apple, mango, pear and grape
How can I make sure I get Mango as the first record always and the rest follow. I tried using the case statement, but not able to get the desired results. Any ideas will be appreciated.
I believe this should accomplish what you describe as needing.
select distinct
name,
case name when 'Mango' then 1 else 2 end as fruitOrder
from fruits
order by
fruitOrder
If you need to always have 'mango' in first position, no matter the other rows, this could be a way:
with fruits(name) as (
select 'apple' from dual union all
select 'mango' from dual union all
select 'pear' from dual union all
select 'grape' from dual
)
select name
from fruits
order by case
when name = 'mango' then 1
else 2
end
If you need to add a DISTINCT, this should work:
select distinct name,
case
when name = 'mango' then 1
else 2
end orderCol
from fruits
order by orderCol
This will give you 'Mango' followed by the others in order;
WITH get_rows AS
(SELECT DISTINCT item_type
FROM the_item)
SELECT item_type
FROM
(SELECT 1 as seq, item_type
FROM get_rows
WHERE item_type = 'Mango'
UNION ALL
SELECT 2 as seq, item_type
FROM get_rows
WHERE item_type <> 'Mango')
ORDER BY seq, item_type

Union Select Distinct syntax?

I have a huge table that contains both shipping address information and billing address information. I can get unique shipping and billing addresses in two separate tables with the following:
SELECT DISTINCT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
ORDER BY Orders.ShipToName
SELECT DISTINCT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
ORDER BY Orders.BillToName
How can I get the distinct intersection of the two? I am unsure of the syntax.
something like this?
SELECT DISTINCT
toname, addr1, addr2, addr3, city, zip
FROM
(SELECT DISTINCT
ShipToName AS toName,
ShipToAddress1 AS addr1,
ShipToAddress2 AS addr2,
ShipToAddress3 AS addr3,
ShipToCity AS city,
ShipToZipCode AS zip
FROM
Orders
UNION ALL
SELECT DISTINCT
BillToName AS toName,
BillToAddress1 AS addr1,
BillToAddress2 AS addr2,
BillToAddress3 AS addr3,
BillToCity AS city,
BillToZipCode AS zip
FROM
Orders) o
ORDER BY ToName
You say "Intersection" but you accepted the Union answer so I guess you just want the UNION DISTINCT. No need for derived tables and the three DISTINCT. You can use the simple:
SELECT
ShipToName AS Name,
ShipToAddress1 AS Address1,
ShipToAddress2 AS Address2,
ShipToAddress3 AS Address3,
ShipToCity AS City,
ShipToZipCode AS ZipCode
FROM
Orders
UNION --- UNION means UNION DISTINCT
SELECT
BillToName
BillToAddress1,
BillToAddress2,
BillToAddress3,
BillToCity,
BillToZipCode
FROM
Orders
ORDER BY
Name ;
You can join both sets on all fields and this will return the records that match:
SELECT *
FROM Orders o1
INNER JOIN Orders o2
ON o1.ShipToName = o2.BillToName
AND o1.ShipToAddress1 = o2.BillToAddress1
AND o1.ShipToAddress2 = o2.BillToAddress2
AND o1.ShipToAddress3 = o2.BillToAddress3
AND o1.ShipToCity = o2.BillToCity
AND o1.ShipToZipCode = o2.BillToZipCode
Or you should be able to use INTERSECT:
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
INTERSECT
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders
Or even a UNION query (UNION removes duplicates between two sets of data):
SELECT ShipToName, ShipToAddress1, ShipToAddress2, ShipToAddress3, ShipToCity, ShipToZipCode
FROM Orders
UNION
SELECT BillToName, BillToAddress1, BillToAddress2, BillToAddress3, BillToCity, BillToZipCode
FROM Orders

Make a column values header for rest of columns using TSQL

I have following table
ID | Group | Type | Product
1 Dairy Milk Fresh Milk
2 Dairy Butter Butter Cream
3 Beverage Coke Coca cola
4 Beverage Diet Dew
5 Beverage Juice Fresh Juice
I need following output/query result:
ID | Group | Type | Product
1 Dairy
1 Milk Fresh Milk
2 Butter Butter Cream
2 Beverage
1 Coke Coca cola
2 Diet Dew
3 Juice Fresh Juice
For above sample a hard coded script can do the job but I look for a dynamic script for any number of groups. I do not have any idea how it can be done so, I do not have a sample query yet. I need ideas, examples that at least give me an idea. PIVOT looks a close option but does not looks to be fully fit for this case.
Here's a possible way. It basically unions the "Group-Headers" and the "Group-Items". The difficulty was to order them correctly.
WITH CTE AS
(
SELECT ID,[Group],Type,Product,
ROW_NUMBER() OVER (PARTITION BY [Group] Order By ID)AS RN
FROM Drink
)
SELECT ID,[Group],Type,Product
FROM(
SELECT RN AS ID,[Group],[Id]AS OriginalId,'' As Type,'' As Product, 0 AS RN, 'Group' As RowType
FROM CTE WHERE RN = 1
UNION ALL
SELECT RN AS ID,'' AS [Group],[Id]AS OriginalId,Type,Product, RN, 'Item' As RowType
FROM CTE
)X
ORDER BY OriginalId ASC
, CASE WHEN RowType='Group' THEN 0 ELSE 1 END ASC
, RN ASC
Here's a demo-fiddle: http://sqlfiddle.com/#!6/ed6ca/2/0
A slightly simplified approach:
With Groups As
(
Select Distinct Min(Id) As Id, [Group], '' As [Type], '' As Product
From dbo.Source
Group By [Group]
)
Select Coalesce(Cast(Z.Id As varchar(10)),'') As Id
, Coalesce(Z.[Group],'') As [Group]
, Z.[Type], Z.Product
From (
Select Id As Sort, Id, [Group], [Type], Product
From Groups
Union All
Select G.Id, Null, Null, S.[Type], S.Product
From dbo.Source As S
Join Groups As G
On G.[Group] = S.[Group]
) As Z
Order By Sort
It should be noted that the use of Coalesce is purely for aesthetic reasons. You could simply return null in these cases.
SQL Fiddle
And an approach with ROW_NUMBER:
IF OBJECT_ID('dbo.grouprows') IS NOT NULL DROP TABLE dbo.grouprows;
CREATE TABLE dbo.grouprows(
ID INT,
Grp NVARCHAR(MAX),
Type NVARCHAR(MAX),
Product NVARCHAR(MAX)
);
INSERT INTO dbo.grouprows VALUES
(1,'Dairy','Milk','Fresh Milk'),
(2,'Dairy','Butter','Butter Cream'),
(3,'Beverage','Coke','Coca cola'),
(4,'Beverage','Diet','Dew'),
(5,'Beverage','Juice','Fresh Juice');
SELECT
CASE WHEN gg = 0 THEN dr1 END GrpId,
CASE WHEN gg = 1 THEN rn1 END TypeId,
ISNULL(Grp,'')Grp,
CASE WHEN gg = 1 THEN Type ELSE '' END Type,
CASE WHEN gg = 1 THEN Product ELSE '' END Product
FROM(
SELECT *,
DENSE_RANK()OVER(ORDER BY Grp DESC) dr1
FROM(
SELECT *,
ROW_NUMBER()OVER(PARTITION BY Grp ORDER BY type,gg) rn1,
ROW_NUMBER()OVER(ORDER BY type,gg) rn0
FROM(
SELECT Grp,Type,Product, GROUPING(Grp) gg, GROUPING(type) tg FROM dbo.grouprows
GROUP BY Product, Type, Grp
WITH ROLLUP
)X1
WHERE tg = 0
)X2
WHERE gg=1 OR rn1 = 1
)X3
ORDER BY rn0