aggregate function to keep specific value, depending of other columns - postgresql

I have data of the following format
id_A id_B val
--------------------------------
1 1 1
1 2 2
2 1 3
2 3 4
Is there a nice way to group by id_A while keeping the value of the line where id_A = Id_B ?
The reason I need to aggregate is that if there is no such line, I want the average.
The result should look like this:
id_A val
-----------------
1 1
2 3.5
I've come up with the following, but that case looks ugly and hacky to me.
Select id_A,
Coalesce(
avg(case when id_A = id_B then val else null end),
avg(val)
) as value
From myTable
Group by id_A;

With postgres 9.4+ you can use FILTER clause for aggregates and window functions:
functions. Something like this:
Select id_A,
Coalesce(
avg(val) filter(where id_A = id_B),
avg(val)
) as value
From myTable
Group by id_A;
Details here:http://www.postgresql.org/docs/current/static/sql-expressions.html

Related

Preserve the order by ids in postgresql with DISTINCT

I have a query, which returns a simple list of numbers:
SELECT unnest(c) FROM t ORDER BY f LIMIT 10;
And it goes like
1
1
3
4
2
3
5
1
5
6
3
2
I want to keep the result unique, but also preserve order:
1
3
2
4
5
6
select distinct(id) from (select ...) as c;
does not work, beacuse it uses HashAggregate, which breaks order (and processes all rows to return just 10?). I tried GROUP BY, it also uses HashAggregate the whole table(?) and then sort and return 10 required rows.
Is it possible to do it effectively on DB size? Or should I just read rows from my first query in my application and do the stream filtering?
with ordinality is your friend to preserve the order.
select val
from unnest('{1,1,3,4,2,3,5,1,5,6,3,2}'::int[]) with ordinality t(val, ord)
group by val
order by min(ord); -- the first time that this item appeared
val
1
3
4
2
5
6
Or it may make sense to define this function:
create function arr_unique(arr anyarray)
returns anyarray language sql immutable as
$$
select array_agg(val order by ord)
from
(
select val, min(ord) ord
from unnest(arr) with ordinality t(val, ord)
group by val
) t;
$$;
select elem
from (
select
elem, elem_no, row_no, row_number() over (partition by elem order by row_no) as occurence_no
from (
select elem, elem_no, row_number() over () as row_no from t, unnest(c) WITH ORDINALITY a(elem, elem_no)
) A
) B
where occurence_no = 1
order by row_no

MAX() usage in GROUP BY with non-numeric column

I have a table similar to the following
UserId | ActionType
--------------------
1 | Create
2 | Read
1 | Edit
2 | Create
3 | Read
I want to find the "highest" action that a user has done, with the following hierarchy Create > Edit > Read. Running the desired query should return
UserId | ActionType
-------------------
1 | Create
2 | Create
3 | Read
Is there a way to leverage MAX() in HIVE to do this? My structure looks like the following very basic query but I'm unsure how to compute the above ActionType column.
SELECT UserId, ??? FROM UserActions GROUP BY UserId;
I think possible solutions are CASE statements in the GROUP BY or converting the values into numeric values, such as (Read => 0, Edit => 1, Create => 2) and then doing a GROUP BY, but I am hoping there is a more elegant solution.
Thanks!
i don't know if hiveql supports sub queries, but this is the idea if it was on SQL :
SELECT
a.UserId,
a.ActionType
From
a.UserActions
WHERE
a.ActionType = (
SELECT
b.ActionType
From
(
SELECT
MAX(COUNT(*)),
c.ActionType
FROM
UserActions as c
WHERE
c.UserId = a.UserId
GROUP BY
c.ActionType
) as b
)
Below would be query in hive.
select
t1.userId, t1.actionType,
min(case when t1.actionType='Create' then 1 else 100
when t1.actionType='Edit' then 2 else 100
when t1.actionType='Read' then 3 else 100 end) as GroupBy
from mytable t1 group by t1.userId, t1.actionType

How to convert timestamp to numbers

Suppose I have a table like this:
Id Types Timestamp
1 A 2014-02-04 00:00:00
2 A 2014-02-05 00:00:00
1 A 2014-02-05 03:59:00
3 C 2014-05-06 03:59:00
1 B 2014-02-04 03:00:00
2 D 2014-02-05 00:40:00
I would like the output to be like this:
Id 1 2 3 4 5 etc
1 A B A C D ...
2 A D NULL NULL NULL
3 C NULL NULL NULL NULL
Is it possible to make time expresses the type's order.
Thanks for any hints.
Preliminary comments:
SQL can only return a predefined number of columns returned. IMHO, the best you can get is values concatenated in an array.
I have name your input table MyTable and renamed the column Timestamp to MyTimestamp to avoid conflict with the corresponding type's keyword.
You have put C and D in the 1 row of your output. I will treat it as a typo (they are not on ID = 1)
-
WITH RECURSIVE ConcatAndOrder(ID, MyResult, RowNumForOrder, RowCountForOrder) AS (
SELECT ID, ARRAY[Type], RowNumForOrder, RowCountForOrder
FROM IndexedTable
WHERE RowNumForOrder = 1
UNION ALL
SELECT I.ID, MyResult || I.Type, I.RowNumForOrder, I.RowCountForOrder
FROM IndexedTable I
JOIN ConcatAndOrder C on I.ID = C.ID and I.RowNumForOrder = C.RowNumForOrder + 1
), IndexedTable(ID, Type, RowNumForOrder, RowCountForOrder) AS (
SELECT ID, Type,
row_number() OVER (PARTITION BY ID ORDER BY MyTimestamp),
count(*) OVER (PARTITION BY ID)
FROM MyTable
)
SELECT ID, MyResult
FROM ConcatAndOrder
WHERE RowNumForOrder = RowCountForOrder
ORDER BY ID

TSQL: Inserting missing records into table

I am stuck at this T-SQL query.
I have table below
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
1 Section2 500
3 Section2 100
4 Section2 200
Lets say for each section I can have maximum 5 Age. In above table there are some missing Ages. How do I insert missing Ages for each section. (Possibly without using cursor). The cost would be zero for missing Ages
So after the insertion the table should look like
Age SectioName Cost
---------------------
1 Section1 100
2 Section1 200
3 Section1 0
4 Section1 0
5 Section1 0
1 Section2 500
2 Section2 0
3 Section2 100
4 Section2 200
5 Section2 0
EDIT1
I should have been more clear with my question. The maximum age is dynamic value. It could be 5,6,10 or someother value but it will be always less than 25.
I think I got it
;WITH tally AS
(
SELECT 1 AS r
UNION ALL
SELECT r + 1 AS r
FROM tally
WHERE r < 5 -- this value could be dynamic now
)
select n.r, t.SectionName, 0 as Cost
from (select distinct SectionName from TempFormsSectionValues) t
cross join
(select ta.r FROM tally ta) n
where not exists
(select * from TempFormsSectionValues where YearsAgo = n.r and SectionName = t.SectionName)
order by t.SectionName, n.r
You can use this query to select missing value:
select n.num, t.SectioName, 0 as Cost
from (select distinct SectioName from table1) t
cross join
(select 1 as num union select 2 union select 3 union select 4 union select 5) n
where not exists
(select * from table1 where table1.age = n.num and table1.SectioName = t.SectioName)
It creates a Cartesian product of sections and numbers 1 to 5 and then selects those that doesn't exist yet. You can then use this query for the source of insert into your table.
SQL Fiddle (it has order by added to check the results easier but it's not necessary for inserting).
Use below query to generate missing rows
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
ORDER BY Section,Age
SQL Fiddle
You can utilize above result set for inserting missing rows by using EXCEPT operator to exclude already existing rows in table -
INSERT INTO test
SELECT t1.Age,t1.Section,ISNULL(t2.Cost,0) as Cost
FROM
(
SELECT 1 as Age,'Section1' as Section,0 as Cost
UNION
SELECT 2,'Section1',0
UNION
SELECT 3,'Section1',0
UNION
SELECT 4,'Section1',0
UNION
SELECT 5,'Section1',0
UNION
SELECT 1,'Section2',0
UNION
SELECT 2,'Section2',0
UNION
SELECT 3,'Section2',0
UNION
SELECT 4,'Section2',0
UNION
SELECT 5,'Section2',0
) as t1
LEFT JOIN test t2
ON t1.Age=t2.Age AND t1.Section=t2.Section
EXCEPT
SELECT Age,Section,Cost
FROM test
SELECT * FROM test
ORDER BY Section,Age
http://www.sqlfiddle.com/#!3/d9035/11

Self join to lowest occurrence of group

I have a problem in T-SQL that I find difficult to solve.
I have a table with groups of records, grouped by key1 and key2. I order each group chronologically by date. For each record, I want to see if there existed a record before (within the group and with lower date) for which the field "datafield" forms an allowed combination with the current record's "datafield". For the allowed combinations, I have a table called AllowedCombinationsTable.
I wrote following code to achieve it:
WITH Source AS (
SELECT key1, key2, datafield, date1,
ROW_NUMBER() OVER(PARTITION BY key1, key2 ORDER BY date1 ASC) AS dateorder
FROM table
)
SELECT L.key1, L.key2, L.datafield, DC.datafield2
FROM Source AS L
LEFT JOIN AllowedDataCombinationsTable DC
ON D.datafield1 = L.datafield
LEFT JOIN Source AS R
ON R.Key1 = L.Key1
AND R.Key2 = L.Key2
AND R.dateorder < L.dateorder
AND DC.datafield2 = L.datafield
-- AND "pick the one record with lowest dateorder"
Now for each of these possible combination records, I want to pick the first one (see placeholder in code). How can I do it most efficiently?
EDIT: OK let's say for the source, only showing group (1, 1):
**Key1 Key2 Datafield Date DateOrder**
1 1 "Horse" 1-Jan-2010 1
1 1 "Horse" 2-Jan-2010 2
1 1 "Sheep" 3-Jan-2010 3
1 1 "Dog" 4-Jan-2010 4
1 1 "Cat" 5-Jan-2010 5
AllowedCombinationsTable:
**Datafield1 Datafield**
Cat Sheep (and Sheep Cat)
Cat Horse (and Horse Cat)
Dog Horse (and Horse Dog)
After my join I have now:
**Key1 Key2 Datafield Date DateOrder JoinedCombination JoinedCombinationDateOrder**
1 1 "Horse" 1-Jan-2010 1 NULL NULL
1 1 "Horse" 2-Jan-2010 2 NULL NULL
1 1 "Sheep" 3-Jan-2010 3 NULL NULL
1 1 "Dog" 4-Jan-2010 4 "Horse" 1
1 1 "Dog" 4-Jan-2010 4 "Horse" 2
1 1 "Cat" 5-Jan-2010 5 "Horse" 1
1 1 "Cat" 5-Jan-2010 5 "Horse" 2
1 1 "Cat" 5-Jan-2010 5 "Sheep" 3
I want to display only the first "Horse" for record 4 "Dog", and also only the first "Horse" for record 5 "Cat".
Get it? ;)
I think this may do it--don't have data set up to test the query with. Check the comments for rationale.
WITH Source AS (
SELECT key1, key2, datafield, date1,
ROW_NUMBER() OVER(PARTITION BY key1, key2 ORDER BY date1 ASC) AS dateorder
FROM table
)
SELECT L.key1, L.key2, L.datafield, DC.datafield2
FROM Source AS L
LEFT JOIN AllowedDataCombinationsTable DC
ON DC.datafield1 = L.datafield -- DC Alias
LEFT JOIN Source AS R
ON R.Key1 = L.Key1
AND R.Key2 = L.Key2
AND DC.datafield2 = R.datafield -- Changed alias from L to R
AND R.dateorder = 1 -- Pick out lowest one
AND R.dateorder < L.dateorder -- Make sure it's not the same one
Well, I don't use WITH or OVER, so this is a different approach.. I might be over-simplifying something, but without having the data in front of me this is what I came up with:
SELECT distinct a.Key1, a.Key2, a.Datafield,
ISNULL(b.Datafield,'') as Datafield1,
ISNULL(b.Date,a.Date) as `Date`,
MIN(a.DateOrder) as DateOrder
FROM Source a
LEFT JOIN Source b
ON a.Key1 = b.Key1
AND a.Key2 = b.Key2
AND a.Dateorder <> b.Dateorder
LEFT JOIN AllowedDataCombinationsTable c
ON a.Datafield = c.Datafield
AND b.Datafield = c.Datafield1
GROUP BY a.Key1, a.Key2, a.Datafield, ISNULL(b.Datafield,''), ISNULL(b.Date,a.Date)