Sum of amounts of latest id by date PySpark - pyspark

I have data like in a dataframe
CommsId
Id
Amount
Date
85
1
10
07/10/2020
72
1
15
09/09/2021
85
1
25
09/09/2021
70
1
30
09/09/2021
72
1
-15
05/11/2020
70
1
-30
05/11/2020
For each date, I want to find the sum of amounts of the latest CommsId as of the date.
Expected output is as below
Date
Sum_Amount
Id
07/10/2020
10
1
09/09/2021
70
1
05/11/2021
25
1

Related

Trying to partition to remove rows where two columns don't match sql

How can I filter out rows within a group that do not have matching values in two columns?
I have a table A like:
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
5109
57
10
75
10
0206
85
11
58
11
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
T100
20
38
64
38
Within each CODE group, I want to check whether US_PRICE = NON_US_PRICE and remove that row from the resulting table.
I tried:
SELECT *,
CASE WHEN US_PRICE != NON_US_PRICE OVER (PARTITION BY CODE) END
FROM A;
but I think I am missing something when I try to partition by CODE.
I want the resulting table to look like
CODE
US_ID
US_PRICE
NON_US_ID
NON_US_PRICE
0206
85
15
33
14
0206
85
41
22
70
T100
20
10
49
NULL
For provided sample, simple WHERE clause could produce such result:
SELECT *
FROM A
WHERE US_PRICE IS DISTINCT FROM NON_US_PRICE;
IS DISTINCT FROM handles NULLs comparing to != operator.

TSQL Select TOP and Distinct from one table into a TEMP table

I have the following table:
Data nr1 nr2 nr3 nr4 nr5 nr6
2020-09-12 6 15 36 42 67 78
2020-09-10 46 48 67 78 80 87
2020-09-08 23 27 28 31 69 89
2020-09-05 7 14 27 56 72 83
2020-09-03 16 17 38 39 68 84
2020-09-01 10 22 28 45 48 71
2020-08-29 1 3 35 42 55 61
2020-08-27 37 49 52 53 75 87
2020-08-25 15 24 31 70 83 84
2020-08-22 7 12 45 47 73 87
2020-08-20 7 17 30 39 41 67
2020-08-18 13 22 28 58 65 77
2020-08-17 5 9 26 62 77 79
2020-08-13 4 5 49 57 66 75
2020-08-11 7 9 38 68 78 80
2020-08-08 6 16 22 55 58 83
2020-08-06 21 37 40 46 69 80
2020-08-04 5 19 21 25 45 82
2020-08-01 4 14 17 18 26 45
2020-07-30 4 15 19 26 28 55
2020-07-28 23 45 49 71 80 82
2020-07-25 18 30 42 70 78 80
2020-07-23 10 29 37 49 56 57
2020-07-21 4 34 46 54 55 62
2020-07-18 18 33 49 76 80 84
I have to do the following task:
Select into a #TEMP table with only one column DistinctNumbers all distinct numbers of the above table because some numbers in the above table might be repeated across rows and columns.
Select into another #TEMP table all numbers in the range from 1 to 99 which are not in the original table.
What is the best way of accomplishing these two tasks?
You should unpivot original table first
1.Unpivot original table into #temp table
2.Now you have all numbers in one column
3.Use while between 1 and 99 and insert counter into #RESULT table where not in #temp(unpivoted table)
SELECT DISTINCT(num) num INTO #TEMP_DISTINCT_NUMBERS FROM ORIGINAL_TABLE UNPIVOT (
num
FOR PivotColumn IN (nr1,nr2,nr3,nr4,nr5,nr6)
) AS UNPIVOTE_TABLE
CREATE TABLE #RESULT(NUM INT)
DECLARE #COUNTER INT =1;
WHILE(#COUNTER<=99)
BEGIN
INSERT INTO #RESULT SELECT #COUNTER WHERE #COUNTER NOT IN (SELECT num FROM
#TEMP_DISTINCT_NUMBERS)
SET #COUNTER=#COUNTER+1
END
SELECT * FROM #RESULT
you can try this:
;WITH tally
AS (SELECT 1 AS num
UNION ALL
SELECT num + 1
FROM tally
WHERE num < 99)
SELECT DISTINCT tally.num
FROM tally
LEFT JOIN
( SELECT num FROM #dataset --your dataset
CROSS APPLY (VALUES (nr1),(nr2),(nr3),(nr4),(nr5),(nr6)) AS B (num)
) AS dataset
ON tally.num = dataset.num
WHERE dataset.num IS NULL
Code above:
Create [tally] recursive common table expression with sequence from 1 to 99
Left join tally with your unpivoted dataset ...
test here: https://rextester.com/YEB57637

postgres find age range with no of minutes of different user to watch channels

I have two table 1000 of record given below.
My first table is USER table.
ID Name DateOfBirth
1 John 1980-11-20 00:00:00.000
2 Denial 1940-04-10 00:00:00.000
3 Binney 1995-12-25 00:00:00.000
4 Sara 1960-11-20 00:00:00.000
5 Poma 1980-11-20 00:00:00.000
6 Cameroon 1980-11-20 00:00:00.000
.....
.....
And my second table is CHANNEL_WATCH_DURATION_BY_USER
userid duration channelname
1 100 SAB
2 200 zee Tv
1 400 axn
2 0 star 1
3 800 star 2
3 700 star 3
4 200 star 4
.....
.....
I need to write the POSTGRES SQL Query to display different age groups contain duration with each channel.
under 18 20-30 age 30-40 age chaneel
10 40 100 star 1
20 0 200 star 2
30 79 0 zee
40 80 30 axn
.....
.....
SELECT
SUM(IF(DATEDIFF(NOW(),DateOfBirth)<18,1,0)) AS under18,
SUM(IF(DATEDIFF(NOW(),DateOfBirth) BETWEEN 20 AND 30,1,0)) as 20_to_30_age,
SUM(IF(DATEDIFF(NOW(),DateOfBirth)BETWEEN 30 AND 40,1,0)) as 30_to_40_age,
channelname as chaneel from
USER a,CHANNEL_WATCH_DURATION_BY_USER b where a.ID=b.USERID GROUP BY channelname

Flatten one column keeping others in POSTGRESQL

I created a view which is presently giving the data like this:
practice_name message_type message_count
CHC ALOG_SYNC 1
CHC BULKNT 0
CHC PIE_SYNC 1
CHC PPRV_SYNC 1
CHC SYNC_PRACT 3
CHC SYNC_PROV 9
CHC SYNC_WTXT 3
CHC SYNC_XYZ 0
Midtown ALOG_SYNC 0
Midtown BULKNT 0
Midtown PIE_SYNC 0
Midtown PPRV_SYNC 0
Midtown SYNC_PRACT 3
Midtown SYNC_PROV 0
Midtown SYNC_WTXT 3
Midtown SYNC_XYZ 0
NextGen MedicalPractice ALOG_SYNC 0
NextGen MedicalPractice BULKNT 1
NextGen MedicalPractice PIE_SYNC 0
NextGen MedicalPractice PPRV_SYNC 0
NextGen MedicalPractice SYNC_PRACT 3
NextGen MedicalPractice SYNC_PROV 591
NextGen MedicalPractice SYNC_WTXT 3
NextGen MedicalPractice SYNC_XYZ 0
My View:
CREATE OR REPLACE VIEW sha.sha_export_queue_view AS
SELECT q3.practice_name,
q3.message_type,
q3.share_site_org_key,
COALESCE(q2.message_count, '0'::text) AS message_count
FROM ( SELECT q1.practice_name,
mt.message_type,
q1.share_site_org_key
FROM sha.message_types mt,
( SELECT DISTINCT jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Practice Name'::text AS practice_name,
ai.share_site_org_key
FROM sha.sha_share_site_view ssv
LEFT JOIN ( SELECT mytable2.assessment_id,
mytable2.result_json,
mytable2.share_site_org_key,
mytable2.rnk
FROM ( SELECT assessment_info.assessment_id,
assessment_info.result_json,
assessment_info.share_site_org_key,
dense_rank() OVER (PARTITION BY assessment_info.share_site_org_key ORDER BY assessment_info.modified_datetime DESC) AS rnk
FROM sha.assessment_info
WHERE assessment_info.assessment_id = 8::numeric) mytable2
WHERE mytable2.rnk = 1) ai ON ssv.share_site_org_key = ai.share_site_org_key) q1) q3
LEFT JOIN ( SELECT jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Practice Name'::text AS practice_name,
jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Message Type'::text AS message_type,
jsonb_array_elements((ai.result_json -> 'Patient Portal Operational Information'::text) -> 'nxmd_export contents by message type'::text) ->> 'Message Count'::text AS message_count
FROM sha.sha_share_site_view ssv
LEFT JOIN ( SELECT mytable2.assessment_id,
mytable2.result_json,
mytable2.share_site_org_key,
mytable2.rnk
FROM ( SELECT assessment_info.assessment_id,
assessment_info.result_json,
assessment_info.share_site_org_key,
dense_rank() OVER (PARTITION BY assessment_info.share_site_org_key ORDER BY assessment_info.modified_datetime DESC) AS rnk
FROM sha.assessment_info
WHERE assessment_info.assessment_id = 8::numeric) mytable2
WHERE mytable2.rnk = 1) ai ON ssv.share_site_org_key = ai.share_site_org_key) q2 ON q3.message_type::text = q2.message_type AND q3.practice_name = q2.practice_name
ORDER BY q3.practice_name;
I want the second column to be flattened:
Practice Time Stamp <<message type 1>> <<message type 2>> <<message type 3>> <<message type 4 >> <<message type 5>> <<message type 6>> <<message type 7>> <<message type 8>>
Practice Name 1 21-12-2016 10:00 23 25 27 29 31 33 35 37
Practice Name 2 21-12-2016 10:00 24 26 28 30 32 34 36 38
Practice Name 3 21-12-2016 13:00 25 27 29 31 33 35 37 39
Practice Name 4 21-12-2016 13:00 26 28 30 32 34 36 38 40
Practice Name 5 24-12-2016 13:00 27 29 31 33 35 37 39 41
Practice Name 6 27-12-2016 13:00 28 30 32 34 36 38 40 42
Practice Name 7 30-12-2016 13:00 29 31 33 35 37 39 41 43
Practice Name 8 02-01-2017 13:00 30 32 34 36 38 40 42 44
Practice Name 1 05-01-2017 13:00 31 33 35 37 39 41 43 45
Practice Name 2 08-01-2017 13:00 32 34 36 38 40 42 44 46
Practice Name 3 11-01-2017 13:00 33 35 37 39 41 43 45 47
Is there any way I can achieve that?
Sorry for the little alignment issue.
The values are corresponding message type values
Sample for query (idea in comments to question):
SELECT
practice_name,
sum("ALOG_SYNC") AS "ALOG_SYNC",
sum("BULKNT") AS "BULKNT",
...
FROM (
SELECT
practice_name,
CASE WHEN q3.message_type = 'ALOG_SYNC' THEN sum(message_count) END AS "ALOG_SYNC",
CASE WHEN q3.message_type = 'BULKNT' THEN sum(message_count) END AS "BULKNT"
FROM
<your from + where clause>
) AS A
GROUP BY 1
Probably your query might be optimised.
Or you can use crosstab function (https://www.postgresql.org/docs/current/static/tablefunc.html)

Subselect and Max

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange
Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear