What will be the SQL to get the results by applying group by on specific matching string of column value? - postgresql

What will be the SQL to get the results by applying the group by on a specific matching string of column value?
Table: results ( this is not the physical table but the result of specific queries )
rname count1 count2
Avg-1 2 2
Avg-1 1 1
Avg-2 2 2
Avg-1 1 1
Zen-3 2 2
Zen/D 2 1
QA/C 3 1
QA/D 2 1
The expected output is:
rname count1 count2
Avg 6 6
Zen 4 3
QA 5 2
In expected output count1 is sum of all count1 of rname which match the string 'Avg', 'Zen' and 'QA' respectively. Same for count2.
What will be the SQL can you please give me some directions?

demo:db<>fiddle
SELECT
(regexp_split_to_array(rname,'[-/]'))[1],
SUM(count1) AS count1,
SUM(count2) AS count2
FROM
mytable
GROUP BY 1
regexp_split_to_array(rname,'[-/]') splits the rname value at the - or the / character. Taking the first part ([1]) gives you Avg, Zen or QA
Group hy this result (using the column index 1)
SUM up the values

Related

PySpark subtract last row from first row in a group

I want to use window function to partition by ID and have the last row of each group to be subtracted from the first row and create a separate column with the output. What is the cleanest way to achieve that result?
ID col1
1 1
1 2
1 4
2 1
2 1
2 6
3 5
3 5
3 7
Desired output:
ID col1 col2
1 1 3
1 2 3
1 4 3
2 1 5
2 1 5
2 6 5
3 5 2
3 5 2
3 7 2
Code below
w=Window.partitionBy('ID').orderBy('col1').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('out', last('col1').over(w)-first('col1').over(w)).show()
Sounds like you’re defining the “first” row as the row with the minimum value of col1 in the group, and the “last” row as the row with maximum value of col1 in the group. To compute them, you can use the MIN and MAX window functions:
SELECT
ID,
col1,
(MAX(col1) OVER (PARTITION BY ID)) - (MIN(col1) OVER (PARTITION BY ID)) AS col2
FROM
...
If you’re defining “first” and “last” row somehow differently (e.g., in terms of some timestamp), you can use the more general FIRST_VALUE and LAST_VALUE window functions:
SELECT
ID,
col1,
(LAST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
-
(FIRST_VALUE(col1) OVER (PARTITION BY ID ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
AS col2
FROM
...
The two snippets above are equivalent, but the latter is more general: you can specify ordering by a different column and/or you can modify the window specification.

Map the column value against same sequences in PostgreSQL

I want to write a query as per below input and output
Input :-
Num Sr_no Exp_no
NULL 1 1
NULL 2 1
ABC_1 3 1
NULL 4 1
NULL 1 2
NULL 2 2
ABC_2 3 2
NULL 4 4
Expected Output:-
Num Sr_no Exp_no
ABC_1 1 1
ABC_1 2 1
ABC_1 3 1
ABC_1 4 1
ABC_2 1 2
ABC_2 2 2
ABC_2 3 2
ABC_2 4 4
As there is no details in question, this answer is on below assumptions
you want to fill num field based on exp_no grouping.
Assuming there is only one value in a exp_no group.
Try this:
with cte as
(
select distinct on (num,exp_no) num, exp_no
from test
where num is not null
order by 1)
select
coalesce(t1.num, cte.num),
t1.sr_no,
t1.exp_no
from test t1 left join cte on t1.exp_no=cte.exp_no
DEMO

SQL Renumbering index after group by

I have the following input table:
Seq Group GroupSequence
1 0
2 4 A
3 4 B
4 4 C
5 0
6 6 A
7 6 B
8 0
Output table is:
Line NewSeq GroupSequence
1 1
2 2 A
3 2 B
4 2 C
5 3
6 4 A
7 4 B
8 5
The rules for the input table are:
Any positive integer in the Group column indicates that the rows are grouped together. The entire field may be NULL or blank. A null or 0 indicates that the row is processed on its own. In the above example there are two groups and three 'single' rows.
the GroupSequence column is a single character that sorts within the group. NULL, blank, 'A', 'B' 'C' 'D' are the only characters allowed.
if Group has a positive integer, there must be alphabetic character in GroupSequence.
I need a query that creates the output table with a new column that sequences as shown.
External apps needs to iterate through this table in either Line or NewSeq order(same order, different values)
I've tried variations on GROUP BY, PARTITION BY, OVER(), etc. WITH no success.
Any help much appreciated.
Perhaps this will help
The only trick here is Flg which will indicate a new Group Sequence (values will be 1 or 0). Then it is a small matter to sum(Flg) via a window function.
Edit - Updated Flg method
Example
Declare #YourTable Table ([Seq] int,[Group] int,[GroupSequence] varchar(50))
Insert Into #YourTable Values
(1,0,null)
,(2,4,'A')
,(3,4,'B')
,(4,4,'C')
,(5,0,null)
,(6,6,'A')
,(7,6,'B')
,(8,0,null)
Select Line = Row_Number() over (Order by Seq)
,NewSeq = Sum(Flg) over (Order By Seq)
,GroupSequence
From (
Select *
,Flg = case when [Group] = lag([Group],1) over (Order by Seq) then 0 else 1 end
From #YourTable
) A
Order By Line
Returns
Line NewSeq GroupSequence
1 1 NULL
2 2 A
3 2 B
4 2 C
5 3 NULL
6 4 A
7 4 B
8 5 NULL

Remove duplicates based on only 1 column

My data is in the following format:
rep_id user_id other non-duplicated data
1 1 ...
1 2 ...
2 3 ...
3 4 ...
3 5 ...
I am trying to achieve a column for deduped_rep with 0/1 such that only first rep id across the associated users has a 1 and rest have 0.
Expected result:
rep_id user_id deduped_rep
1 1 1
1 2 0
2 3 1
3 4 1
3 5 0
For reference, in Excel, I would use the following formula:
IF(SUMPRODUCT(($A$2:$A2=A2)*($A$2:$A2=A2))>1,0,1)
I know there is the FIXED() LoD calculation http://kb.tableau.com/articles/howto/removing-duplicate-data-with-lod-calculations, but I only see use cases of it deduplicating based on another column. However, mine are distinct.
Define a field first_reg_date_per_rep_id as
{ fixed rep_id : min(registration_date) }
The define a field is_first_reg_date? as
registration_date = first_reg_date_per_rep_id
You can use that last Boolean field to distinguish the first record for each rep_id from later ones
try this query
select
rep_id,
user_id,
row_number() over(partition by rep_id order by rep_id,user_id) deduped_rep
from
table

combining results of CTEs

I have several CTEs. CTE1A counts number of type A shops in area 1. CTE1B counts number of type B shops in area 1 and so on up to CTE1D. Similarly, CTE2B counts number of type B shops in area 2 and so on. shop_types CTE selects all types of shops: A,B,C,D. How to display a table that shows for each area (column) how many shops of each type there is (rows).
For example:
1 2 3 4 5
A 0 7 4 0 0
B 2 3 8 2 9
C 8 5 8 1 6
D 7 1 5 4 3
Database has 2 tables:
Table regions: shop_id, region_id
Table shops: shop_id, shop_type
WITH
shop_types AS (SELECT DISTINCT shops.shop_type AS type FROM shops WHERE shops.shop_type!='-9999' AND shops.shop_type!='Other'),
cte1A AS (
SELECT regions.region_id, COUNT(regions.shop_id) AS shops_number, shops.shop_type
FROM regions
RIGHT JOIN shops
ON shops.shop_id=regions.shop_id
WHERE regions.region_id=1
AND shops.shop_type='A'
GROUP BY shops.shop_type,regions.region_id)
SELECT * FROM cte1A
I'm not entirely sure I understand why you are after, but it seems you are looking for something like this:
select sh.shop_type,
count(case when r.region_id = 1 then 1 end) as region_1_count,
count(case when r.region_id = 2 then 1 end) as region_2_count,
count(case when r.region_id = 3 then 1 end) as region_3_count
from shops sh
left join regions r on r.shop_id = sh.shop_id
group by sh.shop_type
order by sh.shop_type;
You need to add one case statement for each region you want to have in the output.
If you are using Postgres 9.4 you can replace the case statements using a filter condition which kind of makes the intention a bit easier to understand (I think)
count(*) filter (where r.region_id = 1) as region_1_count,
count(*) filter (where r.region_id = 2) as region_2_count,
...
SQLFiddle: http://sqlfiddle.com/#!1/98391/1
And before you ask: no you can't make the number of columns "dynamic" based on a select statement. The column list for a query must be defined before the statement is actually executed.