How to calculate average number and give subquery label - postgresql

I have two table "book" and "authorCollection". Because a book may have multi-authors, I hope to get the average number of authors in table "book" which published after year 2000(inclusive).
For example:
Table Book:
key year
1 2000
2 2001
3 2002
4 1999
Table authorCollection:
key author
1 Tom
1 John
1 Alex
1 Mary
2 Alex
3 Tony
4 Mary
The result should be (4 + 1 + 1) / 3 = 2;(key 4 publish before year 2000).
I write the following query statement, but not right, I need to get the number of result in subquery, but cannot give it a label "b", How can i solve this problem? And get the average number of author? I still confused about "COUNT(*) as count" meaning....Thanks.
SELECT COUNT(*) as count, b.COUNT(*) AS total
FROM A
WHERE key IN (SELECT key
FROM Book
WHERE year >= 2000
) b
GROUP BY key;

First, count number of authors for a key in a subquery. Next, aggregate needed values:
select avg(coalesce(ct, 0))
from book b
left join (
select key, count(*) ct
from authorcollection
group by 1
) a
using (key)
where year >= 2000;

A sample as well as handling 'divide by zero' error:
select case when count(distinct book.key)=0
then null
else count(authorCollection.key is not null)/count(distinct book.key)
end as avg_after_2000
from book
left join authorCollection on(book.key=authorCollection.key)
where book.year >= 2000

Related

Postgresql : Average over a limit of Date with group by

I have a table like this
item_id date number
1 2000-01-01 100
1 2003-03-08 50
1 2004-04-21 10
1 2004-12-11 10
1 2010-03-03 10
2 2000-06-29 1
2 2002-05-22 2
2 2002-07-06 3
2 2008-10-20 4
I'm trying to get the average for each uniq Item_id over the last 3 dates.
It's difficult because there are missing date in between so a range of hardcoded dates doesn't always work.
I expect a result like :
item_id MyAverage
1 10
2 3
I don't really know how to do this. Currently i manage to do it for one item but i have trouble extending it to multiples items :
SELECT AVG(MyAverage.number) FROM (
SELECT date,number
FROM item_list
where item_id = 1
ORDER BY date DESC limit 3
) as MyAverage;
My main problem is with generalising the "DESC limit 3" over a group by id.
attempt :
SELECT item_id,AVG(MyAverage.number)
FROM (
SELECT item_id,date,number
FROM item_list
ORDER BY date DESC limit 3) as MyAverage
GROUP BY item_id;
The limit is messing things up there.
I have made it " work " using between date and date but it's not working as i want because i need a limit and not an hardcoded date..
Can anybody help
You can use row_number() to assign 1 to 3 for the records with the last date for an ID an then filter for that.
SELECT x.item_id,
avg(x.number)
FROM (SELECT il.item_id,
il.number,
row_number() OVER (PARTITION BY il.item_id
ORDER BY il.date DESC) rn
FROM item_list il) x
WHERE x.rn BETWEEN 1 AND 3
GROUP BY x.item_id;

Replace content in 'order' column with sequential numbers

I have an 'order' column in a table in a postgres database that has a lot of missing numbers in the sequence. I am having a problem figuring out how to replace the numbers currently in the column, with new ones that are incremental (see examples).
What I have:
id order name
---------------
1 50 Anna
2 13 John
3 2 Bruce
4 5 David
What I want:
id order name
---------------
1 4 Anna
2 3 John
3 1 Bruce
4 2 David
The row containing the lowest order number in the old version of the column should get the new order number '1', the next after that should get '2' etc.
You can use the window function row_number() to calculate the new numbers. The result of that can be used in an update statement:
update the_table
set "order" = t.rn
from (
select id, row_number() over (order by "order") as rn
from the_table
) t
where t.id = the_table.id;
This assumes that id is the primary key of that table.

Kdb q query data from one table based on the data from another table without join

I'm new in kdb/q. And the following is my question. Really hope someone who experts in kdb can help me out.
I have two tables. Table t1 has two attributes: tp_time and id, which looks like:
tp_time id
------------------------------
2018.06.25T00:07:15.822 1
2018.06.25T00:07:45.823 3
2018.06.25T00:09:01.963 8
...
...
Table t2 has three attributes: tp_time, id, and price.
For each id, it has lots of price at different tp_time. So the table t2 is really large, which looks like the following:
tp_time id price
----------------------------------------
2018.06.25T00:05:99.999 1 10.87
2018.06.25T00:06:05.823 1 10.88
2018.06.25T00:06:18.999 1 10.88
...
...
2018.06.25T17:39:20.999 1 10.99
2018.06.25T17:39:23.999 1 10.99
2018.06.25T17:39:24.999 1 10.99
...
...
2018.06.25T01:39:39.999 2 10.99
2018.06.25T01:39:41.999 2 10.99
2018.06.25T01:39:45.999 2 10.99
...
...
What I try to do is for each row in Table t1, find its price at the nearest time and its price at approximately 5 seconds later. For example, for the first row in table t1:
2018.06.25T00:07:15.822 1
The price at nearest time is 10.87 and the price at around 5 seconds later is 10.88. And my expected output table looks like the following:
tp_time id price_1 price_2
----------------------------------------------------
2018.06.25T00:07:15.822 1 10.87 10.88
2018.06.25T00:07:45.823 3 SOME_PRICE SOME_PRICE
2018.06.25T00:09:01.963 8 SOME_PRICE SOME_PRICE
...
...
The thing is I cannot join t1 and t2 because table t2 is so large and I will kill the server. I've try something like ...where tp_time within(time1, time2). But I'm not sure how to deal with the time1 and time2 varibles.
Could someone gives me some helps on this questions? Thanks so much!
I'll recommend organizing the table t1 by applying the proper attributes so that when you join the tables, it will generate the results quickly.
Since you are looking for the prevailing price and price after 5 seconds, You will need wj for this.
the general syntax is :
wj[w;c;t;(q;(f0;c0);(f1;c1))]
w - begin and end time
t & q - unkeyed tables; q should be sorted by `id`time with `p# on id
c- names of the columns to be joined
f0,f1 - aggregation functions
In your case t2 should be sorted by `id`time with `p# on id
q)t2:update `g#id from `id`tp_time xasc ([] tp_time:`time$10:20:30 + asc -10?10 ; id:10?3 ;price:10?10.)
q)t1:([] tp_time:`time$10:20:30 + asc -3?5 ; id:1 1 1 )
q)select from t2 where id=1
tp_time id price
10:20:31.000 1 4.410662
10:20:32.000 1 5.473385
10:20:38.000 1 1.247049
q)wj[(`second$0 5)+\:t1.tp_time;`id`tp_time;t1;(t2;(first;`price);(last;`price))]
tp_time id price price
10:20:30.000 1 4.410662 5.473385
10:20:31.000 1 4.410662 5.473385
10:20:34.000 1 5.473385 1.247049 //price at 32nd second & 38th second

Distinct Count after Sum

So I am looking to do a count after aggregation. Basically I want to be able to total up the Inventory count with a sum and then count how many times each employee has a non zero inventory count.
So for this data Jack/Jimmy would have a count of 1, Sam would have a count of 2 and Steve would have a count of 0. I could easily do this in SQL on the back end but I also want them to be able to use a date parameter. So if they shifted the date to only 1/1/17 Sam would have a count of 1 and everyone else would have a 0. Any help would be much appreciated!
Data
Emp Item Inventory Date
Sam Crackers 1 1/1/2017
Jack Crackers 1 1/1/2017
Jack Crackers -1 2/1/2017
Jimmy Crackers -2 1/1/2017
Sam Apples 1 1/1/2017
Steve Apples -1 1/1/2017
Sam Cheese 1 1/1/2017
With Date>= '1/1/17':
Emp NonZeroCount
Sam 2
Jack 1
Jimmy 1
Steve 0
With Date = '1/1/17':
Emp NonZeroCount
Sam 1
Jack 0
Jimmy 0
Steve 0
SQL I envision it replacing
Create Table #Test(
Empl varchar(50),
Item Varchar (50),
Inventory int,
Date Date
)
Declare #DateParam Date
Set #DateParam = '1/1/17'
Insert into #Test (Empl,Item,Inventory,Date)
Values
('Sam','Crackers',1,'1/1/2017'),
('Jack','Crackers',1,'1/1/2017'),
('Jack','Crackers',-1,'2/1/2017'),
('Jimmy','Crackers',-2,'1/1/2017'),
('Sam','Apples',1,'1/1/2017'),
('Steve','Apples',-1,'1/1/2017'),
('Sam','Cheese',1,'1/1/2017');
Select
Item,Sum(Inventory) as Total
into #badItems
from #Test
Where Date >= #DateParam
group by Item
having Sum(Inventory) <> 0
Select
T.Empl,Count(Distinct BI.Item)
From #Test T
Inner Join #badItems BI on BI.Item = T.Item
group by T.Empl
This is a good case for creating a set in Tableau.
Select the Item field in the data pane on the left, and right click to create a set based on that field. Name it Bad Items, and define it using the following formula on the Condition tab, which assumes you've defined a parameter named [DateParam] of type Date.
sum(if [Date] >= [DateParam] then [Inventory] end) <> 0
You can then use the set on the filter shelf, row shelf, in calculations or combine with other sets as desired.
P.S. I used an alias to display the text "Bad Items" instead of "In" in the table, set a manual default sort order for the Emp field (in case you are trying to reproduce this exactly)

TSQL Join to get all records from table A for each record in table B?

I have two tables:
PeriodId Period (Periods Table)
-------- -------
1 Week 1
2 Week 2
3 Week 3
EmpId PeriodId ApprovedDate (Worked Table)
----- -------- ------------
1 1 Null
1 2 2/28/2013
2 2 2/28/2013
I am trying to write a query that results in this:
EmpId Period Worked ApprovedDate
----- -------- --------- ------------
1 Week 1 Yes Null
1 Week 2 Yes 2/28/2013
1 Week 3 No Null
2 Week 1 No Null
2 Week 2 Yes 2/28/2013
2 Week 3 No Null
The idea is that I need each Period from the Periods table for each Emp. If there was no record in the Worked table then the 'No' value is placed Worked field.
What does the TSQL look like to get this result?
(Note: if it helps I also have access to an Employee table that has EmpId and LastName for each employee. For performance reasons I'm hoping not to need this but if I do then so be it.)
You should be able to use the following:
select p.empid,
p.period,
case
when w.PeriodId is not null
then 'Yes'
else 'No' End Worked,
w.ApprovedDate
from
(
select p.periodid, p.period, e.empid
from periods p
cross join (select distinct EmpId from worked) e
) p
left join worked w
on p.periodid = w.periodid
and p.empid = w.empid
order by p.empid
See SQL Fiddle with Demo