Collapsing data in SAS with PROC SQL - group-by

I've been trying unsuccesfully for sometime now to collapse a data set using a PROC SQL with GROUPBY and was wondering if I could get some help. Here is an example of what I am trying to do. Suppose we have the following data:
id year parent_id age
"01" 1990 "23" 17
"01" 1991 "23" 18
"01" 1992 "23" 19
"02" 1978 "18" 24
"02" 1979 "18" 25
that we wanted to collapse by id preserving the row with the min age across years to get the following dataset
id year parent_id age
"01" 1990 "23" 17
"02" 1978 "18" 24
I tried something along the lines of
proc sql;
CREATE TABLE output_tablename as
SELECT DISTINCT id, year, parent_id, min(age) as age
FROM input_tablename
GROUPBY id;
quit;
to no avail.

You can use the HAVING clause to pick only records where age = min(age).
proc sql;
create table want as
select * from have
group by ID
having age=min(age);
quit;
PROC SORT option:
proc sort data=have; by id descending age;
run;
proc sort data=have nodupkey out=want;
by id;
run;

Related

Mixing DISTINCT with GROUP_BY Postgres

I am trying to get a list of:
all months in a specified year that,
have at least 2 unique rows based on their date
and ignore specific column values
where I got to is:
SELECT DATE_PART('month', "orderDate") AS month, count(*)
FROM public."Orders"
WHERE "companyId" = 00001 AND "orderNumber" != 1 and DATE_PART('year', ("orderDate")) = '2020' AND "orderNumber" != NULL
GROUP BY month
HAVING COUNT ("orderDate") > 2
The HAVING_COUNT sort of works in place of DISTINCT insofar as I can be reasonably sure that condition filters the condition of data required.
However, being able to use DISTINCT based on a given date within a month would return a more reliable result. Is this possible with Postgres?
A sample line of data from the table:
Sample Input
"2018-12-17 20:32:00+00"
"2019-02-26 14:38:00+00"
"2020-07-26 10:19:00+00"
"2020-10-13 19:15:00+00"
"2020-10-26 16:42:00+00"
"2020-10-26 19:41:00+00"
"2020-11-19 20:21:00+00"
"2020-11-19 21:22:00+00"
"2020-11-23 21:10:00+00"
"2021-01-02 12:51:00+00"
without the HAVING_COUNT this produces
month
count
7
1
10
2
11
3
Month 7 can be discarded easily as only 1 record.
Month 10 is the issue: we have two records. But from the data above, those records are from the same day. Similarly, month 11 only has 2 distinct records by day.
The output should therefore be ideally:
month
count
11
2
We have only two distinct dates from the 2020 data, and they are from month 11 (November)
I think you just want to take the distinct count of dates for each month:
SELECT
DATE_PART('month', orderDate) AS month,
COUNT(DISTINCT orderDate::date) AS count
FROM Orders
WHERE
companyId = 1 AND
orderNumber != 1 AND
DATE_PART('year', orderDate) = '2020'
GROUP BY
DATE_PART('month', orderDate)
HAVING
COUNT(DISTINCT orderDate::date) > 2;

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month

I need to find the number of users that were invoiced for an amount greater than 0 in the previous month and were not invoiced in the current month. This calcualtion is to be done for 12 months in a single query. Output should be as below.
Month Count
01/07/2019 50
01/08/2019 34
01/09/2019 23
01/10/2019 98
01/11/2019 10
01/12/2019 5
01/01/2020 32
01/02/2020 65
01/03/2020 23
01/04/2020 12
01/05/2020 64
01/06/2020 54
01/07/2020 78
I am able to get the value only for one month. I want to get it for all months in a single query.
This is my current query:
SELECT COUNT(DISTINCT TWO_MONTHS_AGO.USER_ID), TWO_MONTHS_AGO.MONTH AS INVOICE_MONTH
FROM (
SELECT USER_ID, LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
WHERE invoice_amt > 0
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 2)
GROUP BY user_id
) AS TWO_MONTHS_AGO
LEFT JOIN (
SELECT user_id,LAST_DAY(invoice_ct_dt)) AS MONTH
FROM table a AS ID
AND LAST_DAY(invoice_ct_dt)) = ADD_MONTHS(LAST_DAY(CURRENT_DATE - 1), - 1)
GROUP BY USER_ID
) AS ONE_MONTH_AGO ON TWO_MONTHS_AGO.USER_ID = ONE_MONTH_AGO.USER_ID
WHERE ONE_MONTH_AGO.USER_ID IS NULL
GROUP BY INVOICE_MONTH;
Thank you in advance.
Lona
Probably lots of different approaches but the way I would do it is as follows:
Summarise data by user and month for the last 13 months (you need 12 months plus the previous month to that first month
Compare "this" month (that has data) to "next" month and select records where there is no "next" month data
Summarise this dataset by month and distinct userid
For example, assuming a table created as follows:
create table INVOICE_DATA (
USERID varchar(4),
INVOICE_DT date,
INVOICE_AMT NUMBER(10,2)
);
the following query should give you what you want - you may need to adjust it depending on whether you are including this month, or only up to the end of last month, in your calculation, etc.:
--Summarise data by user and month
WITH MONTH_SUMMARY AS
(
SELECT USERID
,TO_CHAR(INVOICE_DT,'YYYY-MM') "INVOICE_MONTH"
,TO_CHAR(ADD_MONTHS(INVOICE_DT,1),'YYYY-MM') "NEXT_MONTH"
,SUM(INVOICE_AMT) "MONTHLY_TOTAL"
FROM INVOICE_DATA
WHERE INVOICE_DT >= TRUNC(ADD_MONTHS(current_date(),-13),'MONTH') -- Last 13 months of data
GROUP BY 1,2,3
),
--Get data for users with invoices in this month but not the next month
USER_DATA AS
(
SELECT USERID, INVOICE_MONTH, MONTHLY_TOTAL
FROM MONTH_SUMMARY MS_THIS
WHERE NOT EXISTS
(
SELECT USERID
FROM MONTH_SUMMARY MS_NEXT
WHERE
MS_THIS.USERID = MS_NEXT.USERID AND
MS_THIS.NEXT_MONTH = MS_NEXT.INVOICE_MONTH
)
AND MS_THIS.INVOICE_MONTH < TO_CHAR(current_date(),'YYYY-MM') -- Don't include this month as obviously no next month to compare to
)
SELECT INVOICE_MONTH, COUNT(DISTINCT USERID) "USER_COUNT"
FROM USER_DATA
GROUP BY INVOICE_MONTH
ORDER BY INVOICE_MONTH
;

Find if between dates having any date from a list of dates?

I am trying to find if (select dates from public_holidays) exist in dates between start_date and end_date.
It will be look like this:
select
id, name, start_date, end_date,
case when (public_holidays = true) then number - 1 else number end as find_real_number
from my_table
Sample Data:
ID Name Start_date End_Date Numbers
1 Mike 3/9/2020 4/9/2020 67
2 Rick 3/1/2020 3/6/2020 34
3 Simm 3/24/2020 3/28/2020 98
4 Lisa 3/27/2020 4/5/2020 103
5 Rosy 3/9/2020 4/9/2020 23
And some sample expected results:
ID Name Start_date End_Date Numbers
1 Mike 3/9/2020 4/9/2020 66
2 Rick 3/1/2020 3/6/2020 34
3 Simm 3/24/2020 3/28/2020 98
4 Lisa 3/27/2020 4/5/2020 102
5 Rosy 3/9/2020 4/9/2020 23
Because we assume the 1st of April is a public holiday, so number of row 1st and 4th got minus by 1.
And sample public holidays view I created:
Public_holidays Dates
April fools 04/01/2020
Labour Day 05/01/2020
Random Day 07/24/2020
However, because I am building query on the Metabase, it does not allow me to create a table. All I did was create a view where has 2 columns that are 'Public Holidays' and 'Dates'
Anyone possibly could give me a suggestion of how to do this? Thanks.
Try something like this:
SELECT id, name, start_date, end_date,
numbers - ( SELECT COUNT(*) FROM holidays
WHERE dates BETWEEN t.start_date AND t.end_date ) AS numbers
FROM my_table AS t
This assumes that your holidays are in a table/view named holidays. Also it counts the holidays between start and end dates and subtract it from numbers of my_table.
I think you want to check public holiday falls or not between start and end Date.
so you should compare dates like below:
select
id, name, start_date, end_date,
case when ((CAST(start_date as date) < CAST(public_holiday_date as date)
and CAST(public_holiday_date as date) < CAST(end_date as date))
then number - 1 else number end as find_real_number
from my_table
or
select
id, name, start_date, end_date,
case when CAST(public_holiday_date as date)
between (CAST(start_date as date) and CAST(end_date as date)
then number - 1 else number end as find_real_number
from my_table

SAS: Separate date_from & date_to into separate lines

I've got an example like this:
data date_table;
stop;
length id $32.;
length name $32.;
length date_from date_to 8.;
format date_from date_to datetime19.;
run;
proc sql;
insert into date_table
values ('1', 'Mark', '13Jun2019 08:39:00'dt, '13Jun2019 11:39:00'dt)
values ('2', 'Bart', '13Jun2019 13:39:00'dt, '13Jun2019 17:39:00'dt);
quit;
I need some smart join (maybe with separate hour mapping table) to achieve something like this:
What I've been trying now was using mapping table
and join like:
proc sql;
create table testing as
select t1.id,
t1.name,
t1.date_from,
t1.date_to
from DATE_TABLE t1 inner join
WORK.CAL_TIME t2 on t1.date_from >= t2.Time and
t1.date_to <= t2.Time;
quit;
But of course the result is empty table because date dpoens't want t join. I might cut date_from and date_to to full hours but still such a join doens't work.
Help.
Looks like you are comparing apples (DATETIME) with oranges (TIME). The order of magnitude of those numbers are totally different.
684 data _null_;
685
686 dt = '13Jun2019 08:39:00'dt ;
687 tm = '08:00't ;
688 put (dt tm) (=comma20.);
689 run;
dt=1,876,034,340 tm=28,800
You probably just want to compare the time of day part of your datetime values to your time values. Also round your start times down and your end times up to the hour.
data date_table;
length id name $32 date_from date_to 8;
format date_from date_to datetime19.;
input id name (date:) (:datetime.);
cards;
1 Mark 13Jun2019:08:39:00 13Jun2019:11:39:00
2 Bart 13Jun2019:13:39:00 13Jun2019:17:39:00
;
data cal_time;
do time='08:00't to '21:00't by '01:00't ;
output;
end;
format time time5.;
run;
proc sql;
create table testing as
select t1.id
, t1.name
, max(t1.date_from,dhms(datepart(t1.date_from),0,0,t2.time))
as datetime_from format=datetime19.
, min(t1.date_to,dhms(datepart(t1.date_to),0,0,t2.time+'01:00't))
as datetime_to format=datetime19.
, t2.time
from DATE_TABLE t1
inner join WORK.CAL_TIME t2
on t2.time between intnx('hour',timepart(t1.date_from),0,'b')
and intnx('hour',timepart(t1.date_to),0,'e')
;
quit;
Result
Obs id name datetime_from datetime_to time
1 1 Mark 13JUN2019:08:39:00 13JUN2019:09:00:00 8:00
2 1 Mark 13JUN2019:09:00:00 13JUN2019:10:00:00 9:00
3 1 Mark 13JUN2019:10:00:00 13JUN2019:11:00:00 10:00
4 1 Mark 13JUN2019:11:00:00 13JUN2019:11:39:00 11:00
5 2 Bart 13JUN2019:13:39:00 13JUN2019:14:00:00 13:00
6 2 Bart 13JUN2019:14:00:00 13JUN2019:15:00:00 14:00
7 2 Bart 13JUN2019:15:00:00 13JUN2019:16:00:00 15:00
8 2 Bart 13JUN2019:16:00:00 13JUN2019:17:00:00 16:00
9 2 Bart 13JUN2019:17:00:00 13JUN2019:17:39:00 17:00

Not getting desired format from Oracle query

I am trying to fetch data from data base in below format,
Month Count
----- -----
201208 124
201209 0
201210 56
201211 25
201212 0
201301 184
201302 0
In database I have entries like,
Month Count
----- -----
201206 56
201208 124
201210 56
201211 25
201301 184
201304 49
Below is my query,
SELECT MONTH, Count
FROM TABLE_NAME
WHERE MONTH BETWEEN 201208 AND 201302
AND ID = 'X'
Output :
Month Count
----- -----
201208 124
201210 56
201211 25
201301 184
Can anyone help me getting data in desired format.
First you should generate full month's sequence between these dates. You can do it with CONNECT BY LEVEL in Oracle. then just JOIN this sequence with your table:
SELECT MonthSeq.MONTH,
NVL(Count,0) Count
FROM TABLE_NAME
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON TABLE_NAME.MONTH=MonthSeq.MONTH
ORDER BY MonthSeq.MONTH
SQLFiddle demo
UPD:
Your query from the comment should looks like the following. You should move WHERE condition to the JOIN ON. If you use it in WHERE you don't get rows with zero counts.
SELECT MonthSeq.MONTH,
NVL(SUM(TOTAL_SESSIONS),0) AS SESSIONS
FROM X
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON X.MONTH=MonthSeq.MONTH and X.acct_id = 'ABCD'
ORDER BY MonthSeq.MONTH
You need to use TO_DATE function to convert the month field to DATE format. Refer here for more in detail. Try like this,
SELECT TO_CHAR(TO_DATE(MONTH, 'YYYYMM'), 'YYYYMM') month, count
FROM TABLE_NAME
WHERE month BETWEEN TO_DATE('201208', 'YYYYMM') AND TO_DATE('201302', 'YYYYMM')
AND id = 'X'
ORDER BY TO_DATE(month, 'YYYYMM');