combining results of CTEs - postgresql

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A

I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.

Related

PostgreSQL group by and count on specific condition

I have the following tables (example)
Analyze_Line
id
game_id
bet_result
game_type
1
1
WIN
0
2
2
LOSE
0
3
3
WIN
0
4
4
LOSE
0
5
5
LOSE
0
6
6
WIN
0
Game
id
league_id
home_team_id
away_team_id
1
1
1
2
2
2
2
3
3
3
3
4
4
1
1
2
5
2
2
3
6
3
3
4
Required Data:
league_id
WIN
LOSE
GameCnt
1
1
1
2
2
0
2
2
3
2
0
2
The Analyze_Line table is joined with the Game table and simple can get GameCnt grouping by league_id, but I am not sure how to calculate WIN count and LOSE count in bet_result
You can use conditionals in aggregate function to divide win and lose bet results per league.
select
g.league_id,
sum(case when a.bet_result = 'WIN' then 1 end) as win,
sum(case when a.bet_result = 'LOSE' then 1 end) as lose,
count(*) as gamecnt
from
game g
inner join analyze_line a on
g.id = a.game_id
group by
g.league_id
Since there is no mention of postgresql version, I can't recommend using FILTER clause (postgres specific), since it might not work for you.
Adding to Kamil's answer - PostgreSQL introduced the filter clause in PostgreSQL 9.4, released about eight years ago (December 2014). At this point, I think it's safe enough to use in answers. IMHO, it's a tad more elegant than summing over a case expression, but it does have the drawback of being PostgreSQL specific syntax, and thus not portable:
SELECT g.league_id,
COUNT(*) FILTER (WHERE a.bet_result = 'WIN') AS win,
COUNT(*) FILTER (WHERE a.bet_result = 'LOSE') AS lose,
COUNT(*) AS gamecnt
FROM game g
JOIN analyze_line a ON g.id = a.game_id
GROUP BY g.league_id

SQL Renumbering index after group by

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.
Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

Column of counts for time intervals

I want to get a table that constructs a column that tracks how many times an id appears in a given week. If the id appears once it is given a 1, if it appears twice it is given a 2, but if it appears more than two times it is given a 0.
id date
a 2015-11-10
a 2015-11-25
a 2015-11-09
b 2015-11-10
b 2015-11-09
a 2015-11-05
b 2015-11-23
b 2015-11-28
b 2015-12-04
a 2015-11-10
b 2015-12-04
a 2015-12-07
a 2015-12-09
c 2015-11-30
a 2015-12-06
c 2015-10-31
c 2015-11-04
b 2015-12-01
a 2015-10-30
a 2015-12-14
the one week intervals are given as follows
1 - 2015-10-30 to 2015-11-05
2 - 2015-11-06 to 2015-11-12
3 - 2015-11-13 to 2015-11-19
4 - 2015-11-20 to 2015-11-26
5 - 2015-11-27 to 2015-12-03
6 - 2015-12-04 to 2015-12-10
7 - 2015-12-11 to 2015-12-17
The table should look like this.
id interval count
a 1 2
b 1 0
c 1 2
a 2 0
b 2 2
c 2 0
a 3 0
b 3 0
c 3 0
a 4 1
b 4 1
c 4 0
a 5 0
b 5 2
c 5 1
a 6 0
b 6 2
c 6 0
a 7 1
b 7 0
c 7 0
The interval column doesn't have to be there, I simply added it for clarity.
I am new to sql and am unsure how to break the dates into intervals. The only thing I have is grouping by date and counting.
Select id ,date, count (*) as frequency
from data_1
group by id, date having frequency <= 2;
Looking at just the data you provided, this does the trick:
SELECT v.id,
i.interval,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM (VALUES('a'), ('b'), ('c')) v(id)
CROSS JOIN generate_series(1, 7) i(interval)
LEFT JOIN (
SELECT id, ((date - '2015-10-30')/7 + 1)::int AS interval, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, interval)
ORDER BY 2, 1;
A few words of explanation:
You have three id values which are here recreated with a VALUES clause. If you have many more or don't know beforehand which id's to enumerate, you can always replace the VALUES clause with a sub-query.
You provide a specific date range over 7 weeks. Since you might have weeks where a certain id is not present you need to generate a series of the interval values and CROSS JOIN that to the id values above. This yields the 21 rows you are looking for.
Then you calculate the occurrences of ids in intervals. You can subtract a date from another date which will give you the number of days in between. So subtract the date of the row from the earliest date, divide that by 7 to get the interval period, add 1 to make the interval 1-based and convert to integer. You can then convert counts of > 2 to 0 and NULL to 0 with a combination of CASE and coalesce().
The query outputs the interval too, otherwise you will have no clue what the data refers to. Optionally, you can turn this into a column which shows the date range of the interval.
More flexible solution
If you have more ids and a larger date range, you can use the below version which first determines the distinct ids and the date range. Note that the interval is now 0-based to make calculations easier. Not that it matters much because instead of the interval number, the corresponding date range is displayed.
WITH mi AS (
SELECT min(date) AS min, ((max(date) - min(date))/7)::int AS intv FROM my_table)
SELECT v.id,
to_char((mi.min + i.intv * 7)::timestamp, 'YYYY-mm-dd') || ' - ' ||
to_char((mi.min + i.intv * 7 + 6)::timestamp, 'YYYY-mm-dd') AS period,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM mi,
(SELECT DISTINCT id FROM my_table) v
CROSS JOIN LATERAL generate_series(0, mi.intv) i(intv)
LEFT JOIN LATERAL (
SELECT id, ((date - mi.min)/7)::int AS intv, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, intv)
ORDER BY 2, 1;
SQLFiddle with both solutions.
Assuming you have a table of all users, this will do the trick.
select
users.id,
interval_table.id,
CASE
WHEN count(log_table.user_id)>2 THEN 0
ELSE count(log_table.user_id)
END
from users
cross join interval_table
left outer join log_table
on users.id = log_table.user_id
and log_table.event_date >= interval_table.start_interval
and log_table.event_date < interval_table.stop_interval
group by users.id, interval_table.id
order by interval_table.id, users.id
Check it out: http://sqlfiddle.com/#!15/1a822/21

How to optimize query

I have the same problem as mentioned in In SQL, how to select the top 2 rows for each group. The answer is working fine. But it takes too much time. How to optimize this query?
Example:
sample_table
act_id: act_cnt:
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 4
a 4
b 4
c 4
d 4
e 4
Now i want to group it (or using some other ways). And i want to select 2 rows from each group. Sample Output:
act_id: act_cnt:
1 1
2 1
6 3
7 3
9 4
a 4
I am new to SQL. How to do it?
The answer you linked to uses an inefficient workaround for MySQL's lack of window functions.
Using a window function is most probably much faster as you only need to read the table once:
select name,
score
from (
select name,
score,
dense_rank() over (partition by name order by score desc) as rnk
from the_table
) t
where rnk <= 2;
SQLFiddle: http://sqlfiddle.com/#!15/b0198/1
Having an index on (name, score) should speed up this query.
Edit after the question (and the problem) has been changed
select act_id,
act_cnt
from (
select act_id,
act_cnt,
row_number() over (partition by act_cnt order by act_id) as rn
from sample_table
) t
where rn <= 2;
New SQLFiddle: http://sqlfiddle.com/#!15/fc44b/1

T-SQL table variable data order

I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3