Count and list out unique combination by group - tsql

I am wanting to derive a SQL that can bring me the following result, but I just can't seem to get my head around this particular scenario.
I want to see how many individual combination of ROLE-combination by USER there are, and report the count and list out role-names.
Here is the example table:
USER ROLE
AAA Report
AAA Enquiry
AAA Manager
BBB Report
BBB Enquiry
BBB Manager
CCC Enquiry
CCC Report
DDD Report
EEE Report
EEE Enquiry
EEE Admin
FFF Report
FFF Enquiry
GGG Report
GGG Enquiry
GGG Manager
GGG PAYROLL
HHH Report
III Report
III Enquiry
There are AAA, and BBB with role combination of "Report-Enquiry-Manager", therefore count of 2 recored.
There is only CCC with Enquiry-Report.
There are DDD and HHH with role of Report.
Therefore, the desired output would be
COUNT ROLE-COMBINATION
2 Report-Enquiry-Manager
3 Enquiry-Report
2 Report
1 Report-Enquiry-Admin
1 Report-Enquiry-Manager-PAYROLL
Could someone point me to the right direction please.
Thanks heaps,

You can user GROUP BY to get Count and use STUFF to get - separated role values.
create table test
(
[USER] varchar(10)
,[Role] varchar(100)
)
insert into test values
('AAA','Report')
,('AAA','Enquiry')
,('AAA','Manager')
,('BBB','Report')
,('BBB','Enquiry')
,('BBB','Manager')
,('CCC','Enquiry')
,('CCC','Report')
,('DDD','Report')
,('EEE','Report')
,('EEE','Enquiry')
,('EEE','Admin')
,('FFF','Report')
,('FFF','Enquiry')
,('GGG','Report')
,('GGG','Enquiry')
,('GGG','Manager')
,('GGG','PAYROLL')
,('HHH','Report')
,('III','Report')
,('III','Enquiry')
select COunt(1) as [COUNT], [Role-Combination] from
(
select Count(1) as [Count], t.[User]
,STUFF((SELECT
'-' + cm.[role] AS [text()]
FROM
test cm
WHERE
cm.[user] = t.[user]
order by [role] desc
FOR XML PATH('')
, root('user'), TYPE).value('.','varchar(max)'), 1, 1, '' )AS [Role-Combination]
from test t
GROUP BY t.[User]
) result
group by result.[Role-Combination]
DROP TABLE test

DECLARE #USER TABLE
(
xUSER VARCHAR(30),
xROLE VARCHAR(30)
)
INSERT INTO #USER (xUSER,xROLE)
VALUES
('AAA','Report'),
('AAA','Enquiry'),
('AAA','Manager'),
('BBB','Report'),
('BBB','Enquiry'),
('BBB','Manager'),
('CCC','Enquiry'),
('CCC','Report'),
('DDD','Report'),
('EEE','Report'),
('EEE','Enquiry'),
('EEE','Admin'),
('FFF','Report'),
('FFF','Enquiry'),
('GGG','Report'),
('GGG','Enquiry'),
('GGG','Manager'),
('GGG','PAYROLL'),
('HHH','Report'),
('III','Report'),
('III','Enquiry')
; WITH x AS
(
SELECT DISTINCT
xUSER,
STUFF((
SELECT '- ' + xRole
FROM #User
WHERE xUser = a.xUSer
ORDER BY xRole
FOR XML PATH ('')
), 1, 1, '') Groups
FROM #User a
)
SELECT
[Cnt] = COUNT(xUser), Groups as [Role-Cmbine]
FROM x
GROUP BY Groups
ORDER BY [Cnt] DESC

Related

Remove duplicates based on condition and keep oldest element

Currently, the table is ordered in ascending order by row_number. I need help removing duplicates based on 2 conditions.
If there is a stage, that is online then I want to keep that row, doesn't matter which one, there can be multiple.
If there isn't a row with online for that org_id, then I keep row_number = 1 which would be the oldest element.
sales_id
org_id
stage
row_number
ccc_123
ccc
off-line
1
ccc_123
ccc
off-line
2
ccc_123
ccc
online
3
abc_123
abc
off-line
1
abc_123
abc
power-off
2
zzz_123
zzz
power-off
1
so the table should look like this after:
sales_id
org_id
stage
ccc_123
ccc
online
abc_123
abc
off-line
zzz_123
zzz
power-off
Looks like this, stackoverflow not working well with second table for some reason
I would use a combination of a CASE statement to modify the rownumber of records with stage='online' and then use ROW_NUMBER to allow me to filter for the lowest value in a group.
http://sqlfiddle.com/#!17/1421b/5
create table sales_stage (
sales_id varchar,
org_id varchar,
stage varchar,
row_num int);
insert into sales_stage (sales_id, org_id, stage, row_num) values
('ccc_123', 'ccc', 'off-line', 1),
('ccc_123', 'ccc', 'off-line', 2),
('ccc_123', 'ccc', 'online', 3),
('abc_123', 'abc', 'off-line', 1),
('abc_123', 'abc', 'power-off', 2),
('zzz_123', 'zzz', 'power-off', 1);
SELECT
sales_id, org_id, stage
FROM
(
SELECT
sales_id, org_id, stage,
ROW_NUMBER() OVER(PARTITION BY sales_id, org_id ORDER BY row_num) as rn
FROM (
SELECT sales_id, org_id, stage,
CASE WHEN stage='online' THEN -999 ELSE row_num END as row_num
FROM sales_stage
) x
) y
WHERE rn = 1

Postgresql Partition - Funtion calling in Select Query - is slow

Our system is a SAAS based system we use ClientID as a Masking for data fetching.
The DB load is based on the Size of the Company. So we partitioned the DB based on ClientID
Example: Before Partition
clienttable
clientid
clientname
clientaddress
1
ABC
...
2
EMN
...
3
XYZ
...
employeetable
clientid
employeeid
employeename
1
123
AAA
1
124
BBB
2
125
CCC
2
126
DDD
3
127
EEEE
jobtable
clientid
jobid
jobname
1
234
YTR
1
235
DER
2
236
SWE
3
237
VFT
3
238
GHJ
Example: After Partition
clienttable
clientid
clientname
clientaddress
1
ABC
...
2
EMN
...
3
XYZ
...
employeetable
employeetable_1
clientid
employeeid
employeename
1
123
AAA
1
124
BBB
employeetable_2
clientid
employeeid
employeename
2
125
CCC
2
126
DDD
employeetable_3
clientid
employeeid
employeename
3
127
EEE
jobtable
jobtable_1
clientid
jobid
jobname
1
234
YTR
1
235
DER
jobtable_2
clientid
jobid
jobname
2
236
SWE
jobtable_3
clientid
jobid
jobname
3
237
VFT
3
238
GHJ
When we write select queries:
Select employeeid,employeename from employeetable where clientid=2;
This query runs faster after partition. The problem we face is we have some user defined function to manipulate some data.
CREATE OR REPLACE FUNCTION GET_JOB_COUNT(NUMERIC, NUMERIC) RETURNS NUMERIC AS $BODY$
DECLARE
p_client_id ALIAS FOR $1;
p_employee_id ALIAS FOR $2;
v_is_count NUMERIC := 0;
BEGIN
SELECT COUNT(JOB_ID) INTO v_is_count FROM JOBTABLE where CLIENTID=p_client_id AND CREATEDBY=p_employee_id;
RETURN v_is_count;
END; $BODY$
LANGUAGE plpgsql;
Select employeeid,employeename,GET_JOB_COUNT(2,employeeid) from employeetable where clientid=2;
This query is slow after partition. Does this means the GET_JOB_COUNT function is run across Partition?
Is that the problem, then we can't use Functions like this in Select query after partition?
The function will be called once for each and every row from the employeetable (that is selected through the WHERE clause). I doubt you can improve the performance in any significant way using that approach.
It's better to do the aggregation (=count) for all rows at once, rather than for each row separately:
select e.employeeid, employeename, t.cnt
from employeetable e
left join (
select clientid, createdby, count(job_id) as cnt
from jobtable
group by client_id, created_by
) j on j.clientid = e.clientid and j.createdby = e.employeeid
where e.clientid = 2;
Another option to try is to use a lateral join to eliminate rows from the jobtable early - I am not sure if the optimizer is smart enough for that in the query above. So you can try this as an alternative:
select e.employeeid, employeename, j.cnt
from employeetable e
left join lateral (
select count(jt.job_id) as cnt
from jobtable jt
where jt.clientid = e.clientid
and jtcreatedby = e.employeeid
) j on true
where e.clientid = 2;
If you really do want to stick with the function, maybe making it a SQL function helps the optimizer. It at least removes the overhead of calling PL/pgSQL code:
CREATE OR REPLACE FUNCTION get_job_count(p_client_id numeric, p_employee_id numeric)
returns bigint
as
$body$
SELECT COUNT(JOB_ID)
FROM JOBTABLE
where CLIENTID = p_client_id
AND CREATEDBY = p_employee_id;
$BODY$
LANGUAGE sql
stable
parallel safe;
But I doubt that you will see a substantial improve by that.
As a side not: using numeric for an "ID" column seems like a rather strange choice. Why aren't you using int or bigint for that?

Removing duplicates based on one value

customer id name Pay_type
1111 aaaa regular
1111 aaaa late
1111 aaaa regular
1111 aaaa regular
2222 bbbb regular
2222 bbbb regular
2222 bbbb regular
3333 cccc regular
3333 cccc late
4444 dddd regular
4444 dddd regular
I have a SQL query that gives me the above result and I want the result to remove any customer that has a late fee
the output needs to be:
customer id name Pay_type
2222 bbbb regular
2222 bbbb regular
2222 bbbb regular
4444 dddd regular
4444 dddd regular
select
distinct a.customer_id,
a.name,
pay_type
from table a
left join table b on a.customer_id= b.id
left join table c on c.id = b.pay_id
where b.status = 'Done
I'd do this as an anti-join:
select *
from table a
where not exists (
select null
from table b
where
a.customer_id = b.customer_id and
b.pay_type = 'late'
)
This has advantages over a distinct or a "not in" approach in that it will stop looking after it finds a match. This should work efficiently for both large and small datasets.
Any solution that uses distinct would have to evaluate the entire dataset and then remove dupes.
I'm not sure exactly what your tables look like, but you could do something like:
WHERE customer_id NOT IN (
SELECT customer_id
FROM table_with_customer_and_pay_type
WHERE pay_type = 'late'
GROUP BY customer_id )
Common Table Expression variation:
WITH orig_result_set AS (
select
distinct a.customer_id,
a.name,
pay_type
from table a
left join table b on a.customer_id= b.id
left join table c on c.id = b.pay_id
where b.status = 'Done'
),
exclude_late_payments AS (
SELECT DISTINCT customer_id
FROM orig_result_set
WHERE pay_type = 'late'
),
on_time_payments AS (
SELECT customer_id,
name,
pay_type
FROM orig_result_set
WHERE customer_id NOT IN exclude_late_payments
)
SELECT *
FROM on_time_payments

How to Use Count as a Criteria in PostgreSQL

I have an existing table1 which contains "account", "tax_year" and other fields. I want to create a table2 with records from table1 when the frequency of CONCAT(account, tax_year) is 1 and meet the WHERE clause.
For instance, if table1 looks like below:
account year
aaa 2014
bbb 2016
bbb 2016
ddd 2014
ddd 2014
ddd 2015
Table2 should be:
account year
aaa 2014
ddd 2015
Here is my script:
DROP TABLE IF EXISTS table1;
CREATE table2 AS
SELECT
account::text,
tax_year::text,
building_number,
imprv_type,
building_style_code,
quality,
quality_description,
date_erected,
yr_remodel,
actual_area,
heat_area,
gross_area,
CONCAT(account, tax_year) AS unq
FROM table1
WHERE imprv_type=1001 and date_erected>0 and date_erected IS NOT NULL and quality IS NOT NULL and quality_description IS NOT NULL and yr_remodel>0 and yr_remodel IS NOT NULL and heat_area>0 and heat_area IS NOT NULL
GROUP BY account,
tax_year,
building_number,
imprv_type,
building_style_code,
quality,
quality_description,
date_erected,
yr_remodel,
actual_area,
heat_area,
gross_area,
unq
HAVING COUNT(unq)=1;
I've spent two days on it but still can't figure out how to make it right. Thank you ahead for your help!
The proper way to use count of pairs (account, tax_year) in table1:
select account, tax_year
from table1
where imprv_type=1001 -- and many more...
group by account, tax_year
having count(*) = 1;
so you should try:
create table table2 as
select *
from table1
where (account, tax_year) in (
select account, tax_year
from table1
where imprv_type=1001 -- and many more...
group by account, tax_year
having count(*) = 1
);
COUNT() = 1 is equivalent to NOT EXISTS(another with the same key fields):
SELECT
account, tax_year
-- ... maybe more fields ...
FROM table1 t1
WHERE NOT EXISTS ( SELECT *
FROM table1 nx
WHERE nx.account = t1.account -- same key field(s)
AND nx.tax_year = t1.tax_year
AND nx.ctid <> t1.ctid -- but a different row!
);
Note: I replaced the COUNT(CONCAT(account, tax_year) concatenation of key fields by a composite match key.

sql script to export table's column data to another table's column

Let's say , I have two table with same schema but different data .
Table_A and Table_B .
Table_A
--------
ID(p_key) Number(p_key) Column3 Column4
-----------------------------------------------------
ID1 1 AAA BBB
ID1 2 CCC DDD
ID2 1 EEE FFF
ID2 2 GGG HHH
-
Table_B
--------
ID(p_key) Number(p_key) Column3 Column4
-----------------------------------------------------
ID1 1 AAA_1 BBB_1
ID1 2 CCC_1 DDD_1
ID2 1 EEE_1 FFF_1
ID2 2 GGG_1 HHH_1
I want to export(overwrite) Table_B column3 data to Table_A column3 , where ID and Number Columns data are equal .
After executing of script , Table_A's data should be ,
Table_A
--------
ID(p_key) Number(p_key) Column3 Column4
-----------------------------------------------------
ID1 1 AAA_1 BBB
ID1 2 CCC_1 DDD
ID2 1 EEE_1 FFF
ID2 2 GGG_1 HHH
How can I make this using sql script only ?
I use MS SQL-Server 2008 R2 .
UPDATE TBLA
SET TBLA.Column3=TBLB.Column3 --, TBLA.Column4=TBLB.Column4 if you want
FROM
Table_A AS TBLA
LEFT OUTER JOIN Table_B AS TBLB ON (TBLB.ID1 = TBLA.ID1 AND TBLB.ID2 = TBLA.ID2)
Please note that 'ID' columns (i.e. 'primary keys') must be unique (as pkeys are :).But to be sure -as I don't know your exact table structure- before you execute the code above, create a SELECT statement with the join(s) and if the result set is correct, then add it to the UPDATE.