Re-assign value based on dates - date

I need to compare the dates and reassign the values to two new variables by ID.
If there are two dates for same id, then:
If the 'date' variable is earlier, its value should be reassigned to "earlier status".
If the 'date' variable is later, its value should be reassigned to "Current status".
if there is only one date for the id, the value will be reassigned to "current status". and the "earlier status" need to be missing.
if there are more than two dates for the id, then the value for the middle date will be ignored, and only use the earlier and most current value.
Any thoughts? Much appreciated!
This is the code that I have tried:
data origin;
input id date mmddyy8. status;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc print; format date yymmdd8.; run;
proc sort data=origin out=a1;
by id date;
run;
data need; set a1;
if first.date then EarlierStatus=status;
else if last.date then CurrentStatus=status;
by id;
run;
proc print; format date yymmdd8.; run;

So, a couple of things. First - note a few corrections to your code - in particular the : which is critical if you're going to input with mixed list style.
Second; you need to retain EarlierStatus. Otherwise it gets cleared out each data step iteration.
Third, you need to use first.id not first.date (and similar for last) - what first is doing there is saying "This is the first iteration of a new value of id". Date is what you'd say in English ("The first date for that...").
Finally, you need a couple of more tests to set your variables the way you have them.
data origin;
input id date :mmddyy10. status;
format date mmddyy10.;
datalines;
1 1/1/2010 0
1 1/1/2011 1
2 2/2/2002 1
3 3/3/2003 1
3 2/5/2010 0
4 1/1/2000 0
4 1/1/2003 0
4 1/1/2005 1
;
run;
proc sort data=origin out=a1;
by id date;
run;
data need;
set a1;
by id;
retain EarlierStatus;
if first.id then call missing(EarlierStatus); *first time through for an ID, clear EarlierStatus;
if first.id and not last.id then EarlierStatus=status; *if it is first time for the id, but not ONLY time, then set EarlierStatus;
else if last.id then CurrentStatus=status; *if it is last time for the id, then set CurrentStatus;
if last.id then output; *and if it is last time for the id, then output;
run;
The if/elses that I do there could be done slightly differently, depending on how you want to do things exactly, I was trying to keep things a bit direct as far as how they relate to each other.

This proc sql will get what you want:
proc sql;
create table need as
select distinct
t1.id,
t2.EarlierStatus,
t1.CurrentStatus
from (select distinct
id,
date,
status as CurrentStatus
from origin
group by id
having date=max(date)) as t1
left join (select distinct
id,
date,
status as EarlierStatus
from origin
group by id
having date ~= max(date)) as t2 on t1.id=t2.id;
quit;
The above code has two subqueries. In the first subquery, you retain only the rows with the max of date by id, and rename status to CurrentStatus. In the second subquery, you retain all the rows that do not have the max of date by id and rename status to EarlyStatus. So if your origin table has only one date for one id, it is also the max and you will delete this row in the second subquery. Then you perform a left join between the first and the second subqueries, pulling EarlyStatus from the second into the first query. If EarlyStatus is not found, then it goes missing.
Best,

Related

Create date between consecutive dates

I hope you can assist.
I have a SAS data set which has two columns, ID and Date which looks like this:
In some instances, the date column skips a month. I need a code which will create the missing date for each ID e.g. for AY273, I need a code that will create date 2022/11/20 and for WG163, 2022/12/15.
You can merge the data with itself shifted one observation forward (to get a lead value) and loop across that range.
Example:
data have;
input id $ date yymmdd10.;
format date yymmdd10.;
datalines;
AAAAA 2021-11-20
AY273 2022-10-20
AY273 2022-12-20
AY273 2023-01-20
WG163 2022-10-15
WG163 2022-11-15
WG163 2023-01-15
ZZZZZ 2022-01-15
;
data want(keep=id date fillflag);
merge have have(rename=(date=leaddate id=leadid) firstobs=2);
if id eq leadid then
do while (intck('month',date,leaddate) > 0);
output;
date = intnx('month',date,1,'sameday');
fillflag = 1;
end;
else
output;
run;
Try this
data WANT (drop = this_date last_date);
set HAVE(rename=(date = this_date));
by id;
last_date = lag(this_date);
if first.id then do;
date = this_date;
output;
end;
else do date = this_date to last_date + 16 by -30;
output;
end;
format date yymmdd10.;
proc sort;
by id date;
run;
If it does not work, I will correct it.

SAS 94 How to calculate the number of days until next record

Using SAS I want to be able to calculate the number of days between two dates where the value is the number of days until the next record.
The required output will be:
Date Num Days
10/09/2020 1
11/09/2020 1
12/09/2020 1
14/09/2020 2
15/09/2020 1
16/09/2020 1
17/09/2020 1
18/09/2020 1
20/09/2020 2
I have tried using Lag and Retain but just cant get it work.
Any advice and suggestions would be really appreciated.
If you sort the data by descending DATE then it is easier because then you just need to look backwards to find the next date. So you can use LAG() or DIF() function.
data want;
set have;
by descending date;
num_days = dif(date);
run;
To simulate a "lead" function you can set another copy of the data skipping the first observation.
data want;
set have ;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=1 drop=_all_);
num_days = next_date - date;
run;

removing day portion of date variable for time series SAS

I'm having some frustration with dates in SAS.
I am using proc forecast and am trying make my dates spread evenly. I did some pre-processing wiht proc sql to get my counts by month but my dates are incorrect.
Though my dataset looks good (b/c I used format MONYY.) the actual value of that variable is wrong.
date year month count
Jan10 2010 1 100
Feb10 2010 2 494
...
..
.
The Date value is actually the full SAS representation of the date (18267), meaning that it includes the day count.
Do I need to convert the variable to a string and back to a date or is there a quick proc i can run?
My goal is to use the date variable with proc forecast so I only want Month and year.
Thanks for any help!
You can't define a date variable in SAS (so the number of days passed from 1jan1960) excluding the day.
What you can do is to hide the day with a format like monyy. but the underlying number will always contain that information.
Maybe you can use the interval=month option in proc forecast?
Please add some detail about the problem you're encountering with the forecast procedure.
EDIT: check this example:
data past;
keep date sales;
format date monyy5.;
lu = 0;
n = 25;
do i = -10 to n;
u = .7 * lu + .2 * rannor(1234);
lu = u;
sales = 10 + .10 * i + u;
date = intnx( 'month', '1jul1991'd, i - n );
if i > 0 then output;
end;
run;
proc forecast data=past interval=month lead=10 out=pred;
var sales;
id date;
run;

TSQL update Datetime with Random Value between 2 Dates

What's the easiest way to update a table that contains a DATETIME column on TSQL with RANDOM value between 2 dates?
I see various post related to that but their Random values are really sequential when you ORDER BY DATE after the update.
Assumptions
First assume that you have a database containing a table with a start datetime column and a end datetime column, which together define a datetime range:
CREATE DATABASE StackOverflow11387226;
GO
USE StackOverflow11387226;
GO
CREATE TABLE DateTimeRanges (
StartDateTime DATETIME NOT NULL,
EndDateTime DATETIME NOT NULL
);
GO
ALTER TABLE DateTimeRanges
ADD CONSTRAINT CK_PositiveRange CHECK (EndDateTime > StartDateTime);
And assume that the table contains some data:
INSERT INTO DateTimeRanges (
StartDateTime,
EndDateTime
)
VALUES
('2012-07-09 00:30', '2012-07-09 01:30'),
('2012-01-01 00:00', '2013-01-01 00:00'),
('1988-07-25 22:30', '2012-07-09 00:30');
GO
Method
The following SELECT statement returns the start datetime, the end datetime, and a pseudorandom datetime with minute precision greater than or equal to the start datetime and less than the second datetime:
SELECT
StartDateTime,
EndDateTime,
DATEADD(
MINUTE,
ABS(CHECKSUM(NEWID())) % DATEDIFF(MINUTE, StartDateTime, EndDateTime) + DATEDIFF(MINUTE, 0, StartDateTime),
0
) AS RandomDateTime
FROM DateTimeRanges;
Result
Because the NEWID() function is nondeterministic, this will return a different result set for every execution. Here is the result set I generated just now:
StartDateTime EndDateTime RandomDateTime
----------------------- ----------------------- -----------------------
2012-07-09 00:30:00.000 2012-07-09 01:30:00.000 2012-07-09 00:44:00.000
2012-01-01 00:00:00.000 2013-01-01 00:00:00.000 2012-09-08 20:41:00.000
1988-07-25 22:30:00.000 2012-07-09 00:30:00.000 1996-01-05 23:48:00.000
All the values in the column RandomDateTime lie between the values in columns StartDateTime and EndDateTime.
Explanation
This technique for generating random values is due to Jeff Moden. He wrote a great article on SQL Server Central about data generation. Read it for a more thorough explanation. Registration is required, but it's well worth it.
The idea is to generate a random offset from the start datetime, and add the offset to the start datetime to get a new datetime in between the start datetime and the end datetime.
The expression DATEDIFF(MINUTE, StartDateTime, EndDateTime) represents the total number of minutes between the start datetime and the end datetime. The offset must be less than or equal to this value.
The expression ABS(CHECKSUM(NEWID())) generates an independent random positive integer for every row. The expression can have any value from 0 to 2,147,483,647. This expression mod the first expression gives a valid offset in minutes.
The epxression DATEDIFF(MINUTE, 0, StartDateTime) represents the total number of minutes between the start datetime and a reference datetime of 0, which is shorthand for '1900-01-01 00:00:00.000'. The value of the reference datetime does not matter, but it matters that the same reference date is used in the whole expression. Add this to the offset to get the total number of minutes between the reference datetime.
The ecapsulating DATEADD function converts this to a datetime value by adding the number of minutes produced by the previous expression to the reference datetime.
You can use RAND for this:
select cast(cast(RAND()*100000 as int) as datetime)
from here
Sql-Fiddle looks quite good: http://sqlfiddle.com/#!3/b9e44/2/0

T-sql IF Condition date evaluation

I have a simple question regarding T-SQL. I have a stored procedure which calls a Function which returns a date. I want to use an IF condition to compare todays date with the Functions returned date. IF true to return data.
Any ideas on the best way to handle this. I am learning t-sql at the moment and I am more familar with logical conditions from using C#.
ALTER FUNCTION [dbo].[monday_new_period](#p_date as datetime) -- Parameter to find current date
RETURNS datetime
BEGIN
-- 1 find the year and period given the current date
-- create parameters to store period and year of given date
declare #p_date_period int, #p_date_period_year int
-- assign the values to the period and year parameters
select
#p_date_period=period,
#p_date_period_year = [year]
from client_week_uk where #p_date between start_dt and end_dt
-- 2 determine the first monday given the period and year, by adding days to the first day of the period
-- this only works on the assumption a period lasts a least one week
-- create parameter to store the first day of the period
declare #p_start_date_for_period_x datetime
select #p_start_date_for_period_x = min(start_dt)
from client_week_uk where period = #p_date_period and [year] = #p_date_period_year
-- create parameter to store result
declare #p_result datetime
-- add x days to the first day to get a monday
select #p_result = dateadd(d,
case datename(dw, #p_start_date_for_period_x)
when 'Monday' then 0
when 'Tuesday' then 6
when 'Wednesday' then 5
when 'Thursday' then 4
when 'Friday' then 3
when 'Saturday' then 2
when 'Sunday' then 1 end,
#p_start_date_for_period_x)
Return #p_result
END
ALTER PROCEDURE [dbo].[usp_data_to_retrieve]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
IF monday_new_period(dbo.trimdate(getutcdate()) = getutcdate()
BEGIN
-- SQL GOES HERE --
END
Thanks!!
I assume you are working on Sql2008. See documentation of IF and CASE keywords for more details.
CREATE FUNCTION dbo.GetSomeDate()
RETURNS datetime
AS
BEGIN
RETURN '2012-03-05 13:12:14'
END
GO
IF CAST(GETDATE() AS DATE) = CAST(dbo.GetSomeDate() AS DATE)
BEGIN
PRINT 'The same date'
END
ELSE
BEGIN
PRINT 'Different dates'
END
-- in the select query
SELECT CASE WHEN CAST(GETDATE() AS DATE) = CAST(dbo.GetSomeDate() AS DATE) THEN 1 ELSE 0 END AS IsTheSame
This is the basic syntax for a T-SQL IF and a date compare.
If you are comparing just the date portion for equality you will need to use:
select dateadd(dd,0, datediff(dd,0, getDate()))
This snippet will effectively set the time portion to 00:00:00 so you can compare just dates. So in use it will look something like this.
IF dateadd(dd,0, datediff(dd,0, fn_yourFunction())) = dateadd(dd,0, datediff(dd,0, GETDATE()))
BEGIN
RETURN SELECT * FROM SOMEDATA
END
Hope that helps!