Average of grouping columns - postgresql

My table is something like this
id ...... amount...........food
+++++++++++++++++++++++++++++++++++++
1 ........ 5 ............. banana
1 ........ 4 ............. strawberry
2 ........ 2 ............. banana
2 ........ 7 ............. orange
2 ........ 8 ............. strawberry
3 ........ 10 .............lime
3 ........ 12 .............banana
What I want is a table display each food, with the average number of times it appears in each ID.
The table should look something like this I think:
food ........... avg............
++++++++++++++++++++++++++++++++
banana .......... 6.3 ............
strawberry ...... 6 ............
orange .......... 7 ............
lime ............ 10 ............
I'm not really sure on how to do this. If I use just avg(amount) then it will just add the whole amount column

Did you try GROUP BY?
SELECT food, AVG(amount) "avg"
FROM table1
GROUP BY food
Here is SQLFiddle
Output:
| food | avg |
|------------|-------------------|
| lime | 10 |
| orange | 7 |
| strawberry | 6 |
| banana | 6.333333333333333 |

Related

How to count rows after the occurence of a value by group (postgresql)

I have for example the following table:
Name
Day
Healthy
Jon
1
No
Jon
2
Yes
Jon
3
Yes
Jon
4
Yes
Jon
5
No
Mary
1
Yes
Mary
2
No
Mary
3
Yes
Mary
4
No
Mary
5
Yes
I want to add a column which counts the number of following days after day X a person was healthy:
Name
Day
Healthy
Number of days the person was healthy after day X (incl.)
Jon
1
No
3
Jon
2
Yes
3
Jon
3
Yes
2
Jon
4
Yes
1
Jon
5
No
0
Mary
1
Yes
3
Mary
2
No
2
Mary
3
Yes
2
Mary
4
No
1
Mary
5
Yes
1
Is it possible to use some sort of window function to create such a column? Thanks a lot for the help!
There are a couple of ways to do this with a window function. One is to order by day descending and use the default window. The other is to specify the window from the current row to the end of the partition.
This example casts the boolean healthy as an int so that it can be summed. If your table has literal Yes and No strings, then you can use sum((healthy = 'yes')::int) over (...) to achieve the same thing.
select name, day,
sum(healthy::int)
over (partition by name
order by day
rows between current row
and unbounded following) as num_subsequent_health_days
from my_table;
name | day | num_subsequent_health_days
:--- | --: | -------------------------:
Jon | 1 | 3
Jon | 2 | 3
Jon | 3 | 2
Jon | 4 | 1
Jon | 5 | 0
Mary | 1 | 3
Mary | 2 | 2
Mary | 3 | 2
Mary | 4 | 1
Mary | 5 | 1
db<>fiddle here
I assume your relation has the following schema:
CREATE TABLE test(name text, day int, healthy boolean);
Then this should produce the desired result:
SELECT name, day, sum(mapped) OVER (PARTITION BY name ORDER BY day DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM (SELECT name, day, CASE WHEN healthy THEN 1 ELSE 0 END AS mapped FROM test) sub ORDER BY name, day;

kdb+/q: Fast vector update given a list of keys and values to be updated

Given a list of ids/keys and a set of corresponding values for a constant column:
q)ikeys: 1 2 3 5;
q)ivals: 100 100.5 101.5 99.5;
What is the fastest way to update the `toupd column in the following table such that the rows that match the given ikeys are updated to the new values in ivals:i.e.
q) show tab;
ikeys | `toupd `noupd
------|--------------
1 | 0.5 1
2 | 100.5 2
3 | 500.5 4
4 | 400.5 8
5 | 400.5 16
6 | 600.5 32
7 | 700.5 64
is updated to:
q) show restab;
ikeys | `toupd `noupd
------|--------------
1 | 100 1
2 | 100.5 2
3 | 101.5 4
4 | 400.5 8
5 | 99.5 16
6 | 600.5 32
7 | 700.5 64
furthermore, is there a canonical method with which one could update multiple columns in this manner.
thanks
A dot amend is another approach which more easily generalises to more than one column. It can also take advantage of amend-in-place which would be the most memory efficient approach as it doesn't create a duplicate copy of the table in memory (assumes global).
ikeys:1 2 3 5
ivals:100 100.5 101.5 99.5
tab:([ikeys:1+til 7]toupd:.5 100.5 500.5 400.5 400.5 600.5 700.5;noupd:1 2 4 8 16 32 64)
q).[tab;(([]ikeys);`toupd);:;ivals]
ikeys| toupd noupd
-----| -----------
1 | 100 1
2 | 100.5 2
3 | 101.5 4
4 | 400.5 8
5 | 99.5 16
6 | 600.5 32
7 | 700.5 64
/amend in place
.[`tab;(([]ikeys);`toupd);:;ivals]
/generalise to two columns
q).[tab;(([]ikeys);`toupd`noupd);:;flip(ivals;1000 2000 3000 4000)]
ikeys| toupd noupd
-----| -----------
1 | 100 1000
2 | 100.5 2000
3 | 101.5 3000
4 | 400.5 8
5 | 99.5 4000
6 | 600.5 32
7 | 700.5 64
/you could amend in place here too
.[`tab;(([]ikeys);`toupd`noupd);:;flip(ivals;1000 2000 3000 4000)]
Here are two different ways of doing it.
tab lj ([ikeys] toupd: ivals)
or
m: ikeys
update toupd: ivals from tab where ikeys in m
I'm sure there are plenty more ways. If you want to find out which is fastest for your purpose (and your data), try using q)\t:1000 yourCodeHere for large tables and see which suits you best.
As for which is the canonical way for multiple columns, I imagine it would be the update, but it's a matter of personal preference, just do whatever is fastest.
A dictionary is also a common method of updating values given a mapping. Indexing the dictionary with the ikeys column gives the new values and then we fill in nulls with the old toupd column values.
q)show d:ikeys!ivals
1| 100
2| 100.5
3| 101.5
5| 99.5
q)update toupd:toupd^d ikeys from tab
ikeys| toupd noupd
-----| -----------
1 | 100 1
2 | 100.5 2
3 | 101.5 4
4 | 400.5 8
5 | 99.5 16
6 | 600.5 32
7 | 700.5 64
It also worth noting that the update condition with the where clause is not guaranteed to work in all cases, e.g. if you have more mapping values than appear in your ikeys column.
q)m:ikeys:1 2 3 5 7 11
q)ivals:100 100.5 101.5 99.5 100 100
q)update toupd: ivals from tab where ikeys in m
'length

Unpivot data in PostgreSQL

I have a table in PostgreSQL with the below values,
empid hyderabad bangalore mumbai chennai
1 20 30 40 50
2 10 20 30 40
And my output should be like below
empid city nos
1 hyderabad 20
1 bangalore 30
1 mumbai 40
1 chennai 50
2 hyderabad 10
2 bangalore 20
2 mumbai 30
2 chennai 40
How can I do this unpivot in PostgreSQL?
You can use a lateral join:
select t.empid, x.city, x.nos
from the_table t
cross join lateral (
values
('hyderabad', t.hyderabad),
('bangalore', t.bangalore),
('mumbai', t.mumbai),
('chennai', t.chennai)
) as x(city, nos)
order by t.empid, x.city;
Or this one: simpler to read- and real plain SQL ...
WITH
input(empid,hyderabad,bangalore,mumbai,chennai) AS (
SELECT 1,20,30,40,50
UNION ALL SELECT 2,10,20,30,40
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
)
SELECT
empid
, CASE i
WHEN 1 THEN 'hyderabad'
WHEN 2 THEN 'bangalore'
WHEN 3 THEN 'mumbai'
WHEN 4 THEN 'chennai'
ELSE 'unknown'
END AS city
, CASE i
WHEN 1 THEN hyderabad
WHEN 2 THEN bangalore
WHEN 3 THEN mumbai
WHEN 4 THEN chennai
ELSE NULL::INT
END AS city
FROM input CROSS JOIN i
ORDER BY empid,i;
-- out empid | city | city
-- out -------+-----------+------
-- out 1 | hyderabad | 20
-- out 1 | bangalore | 30
-- out 1 | mumbai | 40
-- out 1 | chennai | 50
-- out 2 | hyderabad | 10
-- out 2 | bangalore | 20
-- out 2 | mumbai | 30
-- out 2 | chennai | 40

kdb+ equivalent of SQL's rank() and dense_rank()

Any one every have to simulate the result of SQL's rank(), dense_rank(), and row_number(), in kdb+? Here is some SQL to demonstrate the features. If anyone has a specific solution below, perhaps I could work on generalising it to support multiple partition and order by columns -- and post back on this site.
CREATE TABLE student(course VARCHAR(10), mark int, name varchar(10));
INSERT INTO student VALUES
('Maths', 60, 'Thulile'),
('Maths', 60, 'Pritha'),
('Maths', 70, 'Voitto'),
('Maths', 55, 'Chun'),
('Biology', 60, 'Bilal'),
('Biology', 70, 'Roger');
SELECT
RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS dense_rank,
ROW_NUMBER() OVER (PARTITION BY course ORDER BY mark DESC) AS row_num,
course, mark, name
FROM student ORDER BY course, mark DESC;
+------+------------+---------+---------+------+---------+
| rank | dense_rank | row_num | course | mark | name |
+------+------------+---------+---------+------+---------+
| 1 | 1 | 1 | Biology | 70 | Roger |
| 2 | 2 | 2 | Biology | 60 | Bilal |
| 1 | 1 | 1 | Maths | 70 | Voitto |
| 2 | 2 | 2 | Maths | 60 | Thulile |
| 2 | 2 | 3 | Maths | 60 | Pritha |
| 4 | 3 | 4 | Maths | 55 | Chun |
+------+------------+---------+---------+------+---------+
Here is some kdb+ to generate the equivalent student table:
student:([] course:`Maths`Maths`Maths`Maths`Biology`Biology;
mark:60 60 70 55 60 70;
name:`Thulile`Pritha`Voitto`Chun`Bilal`Roger)
Thank you!
If you sort the table initially by course and mark:
student:`course xasc `mark xdesc ([] course:`Maths`Maths`Maths`Maths`Biology`Biology;mark:60 60 70 55 60 70;name:`Thulile`Pritha`Voitto`Chun`Bilal`Roger)
course mark name
--------------------
Biology 70 Roger
Biology 60 Bilal
Maths 70 Voitto
Maths 60 Thulile
Maths 60 Pritha
Maths 55 Chun
Then you can use something like the below to achieve your output:
update rank_sql:first row_num by course,mark from update dense_rank:1+where count each (where differ mark)cut mark,row_num:1+rank i by course from student
course mark name dense_rank row_num rank_sql
------------------------------------------------
Biology 70 Roger 1 1 1
Biology 60 Bilal 2 2 2
Maths 70 Voitto 1 1 1
Maths 60 Thulile 2 2 2
Maths 60 Pritha 2 3 2
Maths 55 Chun 3 4 4
This solution uses rank and the virtual index column if you would like to read up further on these.
For table ordered by target columns:
q) dense_sql:{sums differ x}
q) rank_sql:{raze #'[(1_deltas b),1;b:1+where differ x]}
q) row_sql:{1+til count x}
q) student:`course xasc `mark xdesc ([] course:`Maths`Maths`Maths`Maths`Biology`Biology;mark:60 60 70 55 60 70;name:`Thulile`Pritha`Voitto`Chun`Bilal`Roger)
q)update row_num:row_sql mark,rank_s:rank_sql mark,dense_s:dense_sql mark by course from student
I can think of this as of now:
Note: The rank function in kdb works on asc list, so I created below functions.
I would not xdesc the table, as I can just use the vector column and desc it
q)denseF
{((desc distinct x)?x)+1}
q)rankF
{((desc x)?x)+1}
q)update dense_rank:denseF mark,rank_rank:rankF mark,row_num:1+rank i by course from student
course
mark name
dense_rank
rank_rank
row_num
Maths
60 Thulile
2
2
1
Maths
60 Pritha
2
2
2
Maths
70 Voitto
1
1
3
Maths
55 Chun
3
4
4
Biology
60 Bilal
2
2
1
Biology
70 Roger
1
1
2

How to change row position in sorted list in postgres?

I have fruits table, where rows are ordering by index :
SELECT index, title FROM fruits ORDER BY index;
index | title
-------+--------
1 | Orange
-------+--------
2 | Apple
-------+--------
3 | Banana
-------+--------
4 | Cherry
-------+--------
5 | Strawberry
In client side I have sortable list, where user can drag items and change their order.
I need make query to change this order.
For example, I want to move Apple row after Cherry.
The result after query should be:
index | title
-------+--------
1 | Orange
-------+--------
2 | Banana
-------+--------
3 | Cherry
-------+--------
4 | Apple
-------+--------
5 | Strawberry
How I can implement this?