SPSS Modeler - Date Function fpr lastweek - date

I wanted to know if it was possible to use a date function for last week's date ranges. Currently I'm using the method as below.
Thanks.
datetime_date('Sale.Date') >= datetime_date(2016,04,18)
and
datetime_date('Sale.Date') <= datetime_date(2016,04,24)

Not a great solution but:
--datetime_date(datetime_year(#TODAY), datetime_month(#TODAY), datetime_day(#TODAY)-7)
This will work for all but 7 days of the month.
You could also get a rough estimate by calculating the following (note that you will use the method I outline and not the equations. you must copy paste the equations into their designated locations).
--a=date_in_years(#TODAY)-7/365
--year=a-(a mod 1)+1900
--month=(a mod 1)-((a mod 1) mod 1/12) *12
etc. then entering these formulas into this formula:
--datetime_date(year,month,day)
OR you can use SQL to calculate a 7 day difference. This will be the accurate and easy but would require the correct setup.

It is a little unclear what you are looking for exactly; what do you mean by "last week"? Is that the 7 days prior to today or is it the last calendar week? If it's the calendar week, which definition are you using? What day of the week does a week start? When does the first week of the year start?
Unfortunately, these definition vary between different parts of the world and even for different uses within countries.
Dates in SPSS Modeler are all represented as the number of seconds from Jan. 1st, 1900, so if you are looking for the dates of the past 7 days, the calculations are fairly trivial as you can use the datetime_in_seconds() function to get the numeric representation of any date or timestamp.
If you are looking for calendar weeks, things get a little more complicated, but I should be able to help with that, too, if you can answer the above questions.

For SPSS Modeler you can use a Derive Node and I see two options to do that:
A)
Derive Node: date_weeks_difference(date1,date2)
And then use a Filter node to keep only the derive node = 1
B)
Or you can use a function inside the Derive Node, creating a dummy variable:
if date_weeks_difference(date1,date2)= 1 then 1 else 0 endif

Related

Cyclic transformation of dates

I would like to use the day of the year in a machine learning model. As the day of the year is not continuous (day 365 of 2019 is followed by day 1 in 2020), I think of performing cyclic (sine or cosine) transformation, following this link.
However, in each year, there are no unique values of the new transformed variable; for example, two values for 0.5 in the same year, see figures below.
I need to be able to use the day of the year in model training and also in prediction. For a value of 0.5 in the sine transformation, it can be on either 31.01.2019 or 31.05.2019, then using 0.5 value can be confusing for the model.
Is it possible to make the model to differentiate between the two values of 0.5 within the same year?
I am modelling the distribution of a species using Maxent software. The species data is continuous every day in 20 years. I need the model to capture the signal of the day or the season, without using either of them explicitly as categorical variable.
Thanks
EDIT1
Based on furcifer's comment below. However, I find the Incremental modelling approach not useful for my application. It solves the issue of consistent difference between subsequent days; e.g. 30.12.2018, 31.12.2018, and 01.01.2019. But it does not differ than counting the number of days from a certain reference day (weight = 1). Having much higher values on the same date for 2019 than 2014 does not make ecological sense. I hope that interannual changes to be captured from the daily environmental conditions used (explanatory variables). The reason for my need to use day in the model is to capture the seasonal trend of the distribution of a migratory species, without the explicit use of month or season as a categorical variable. To predict suitable habitats for today, I need to make this prediction not only depends on the environmental conditions of today but also on the day of the year.
This is a common problem, but I'm not sure if there is a perfect solution. One thing I would note is that there are two things that you might want to model with your date variable:
Seasonal effects
Season-independent trends and autocorrelation
For seasonal effects, the cyclic transformation is sometimes used for linear models, but I don't see the sense for ML models - with enough data, you would expect a nice connection at the edges, so what's the problem? I think the posts you link to are a distraction, or at least they do not properly explain why and when a cyclic transformation is useful. I would just use dYear to model the seasonal effect.
However, the discontinuity might be a problem for modelling trends / autocorrelation / variation in the time series that is not seasonal, or common between years. For that reason, I would add an absolute date to the model, so use
y = dYear + dAbsolute + otherPredictors
A well-tuned ML model should be able to do the rest, with the usual caveats, and if you have enough data.
This may not the right choice depending on your needs, there are two choices that comes to my mind.
Incremental modeling
In this case, the dates are modeled in a linear fashion, so say 12 Dec, 2018 < 12, Dec, 2019.
For this you just need some form of transformation function that converts dates to numeric values.
As there are many dates that need to be converted to numeric representation, the first thing to make sure is that the output list also has the same order as Lukas mentioned. The easiest way to do this is by adding weight to each unit (weight_year > weight_month > weight_day).
def date2num(date_time):
d, m, y = date_time.split('-')
num = int(d)*10 + int(m)*100 + int(y)*1000 # these weights can be anything as long as
# they are ordered
return num
Now, it's important to normalize the numeric values.
import numpy as np
date_features = []
for d in list(df['date_time']):
date_features.append(date2num(d))
date_features = np.array(date_features)
date_features_normalized = (date_features - np.min(date_features))/(np.max(date_features) - np.min(date_features))
Using the day, month, year as separate features. So, instead of considering the date as whole, we segregate. The motivation is that maybe there will be some relations between the output and a specific date, month, etc. Like, maybe the output suddenly increases in the summer season (specific months) or maybe on weekends (specific days)

Power BI - Calculating a price/cost until a variable end date

Let's say I have a table of projects/programs/subscriptions/etc, it doesn't really matter as long as there is a price or cost per some amount of time. My table includes at least the following columns:
[ProjectName]
[StartDate]
[EndDate]
[CostPerDay]
I'm trying to allow the user to choose another date (slicer I assume?) and display the cost of each and all projects up to that date. Is this possible?
Edit: After the first responses I realize the original question was very poorly worded. Sorry about that. I've reworded it and I'll explain more here:
I am not trying to filter the programs by end date. I'm trying to sum a cost up until the end date OR slicer date, whichever is earlier.
Here's a short example table:
So we can also think of it as a Gantt chart like this:
Now imagine sliding a vertical line along that chart. I want to see the total cost up to that date.
I'm sure it will have to do with counting days between start date and the slicer date, then multiplying by cost. But how do we not include days after the end date of each project? Or it may be easier to do a range slicer with a min and max date, but again not counting days before or after each project.
To word it differently: can I input a date range, count the days that each project has in common with that range, and (the simple part) multiply days by cost?
It looks like what you need is a table visualization with a slicer for End date. Once the table and slicer are created, you can click on the small down facing arrow in the slicer and choose "Before". If you want both start date and end date in the equation, then you would have to add one more slicer for "Start date". If this is not what you are looking for kindly provide additional details so that we can help you out.

Formula to add days to a Gregorian date

I was looking at Tomohiko Sakamoto's weekday calculator. It's a formula to calculate the day-of-week directly given year, month, day. That made me wonder what other neat date calculation shortcuts exist.
In particular, given an input date as (in_year, in_month, in_day) and a number of days N to add, what's a formula for returning the output (out_year, out_month, out_day)? Is there a well-known trick like the algorithm above?
One way would be to convert the input to a Julian day (a count of days since 4713 BC), add N to it, and then convert back. There are formulas for conversion in both directions. But the combined formula would be quite unwieldy. Is there a simplified version?
Perhaps there is even a formula to move forward or back by a certain number of weekdays.
This question isn't "how do I do date arithmetic in my favourite programming language?" I know how to call the date library to perform these operations. It's more curiosity and the hope of starting a collection of cool date algorithms.
Some of the answers in Algorithm to add or subtract days from a date? will be relevant to this. In particular http://howardhinnant.github.io/date_algorithms.html gives code to convert y,m,d to a count of days and back again. Those two routines run back to back would be pretty fast.

Tableau: graphically show compounded leadtimes

I have a chart that shows the number of departures for a given 15 minute interval as seen here.
I need to compound these counts backwards for one hour. For example, the 3 departures shown at 11:00 need to also be represented at the 10:00, 10:15, 10:30, and 10:45 columns. When completed, the 10:00 would have a total of 6 departures (10:15 -> 6, 10:30 ->5, 10:45 -> 4, 11:00 -> 4).
I have done this via VBA in excell, but am now needing to replicate the chart in Tableau and have been beating my head in for about two weeks now. I'd love to hear any and all suggestions.
You can use a Cartesian join against a large enough date range of your choosing to in effect resample your data and add the additional time intervals you desire.
For example, if you have a month's worth of data (min date -> max date = 30 days), then you have (30 * 24 * 4) 2880 15 minute intervals.
Create all those intervals in a separate data sheet
Add a bogus column with value of link for all rows
Create the same bogus in your actual data source
Join the two sheets together on the link column
Create a calculated field that is something along the following:
[Interval] <= [Flight Time] AND [Interval] >= DATEADD('hour',-1,[Flight Time])
This calculated field will evaluate to TRUE when the interval time is within one hour before the flight time. You can then drag this field onto your filter shelf and select TRUE value only. Effectively your [Interval] field becomes your new date field.
I would recommend adding that filter to the context and applying across the entire datasource. Before you add this filter you'll have 2880 times the about of data so be sure to do a live view first. Be careful with extracts using Cartesian joins as you could potentially be extracting more than you bargained for.
See the following links for different techniques on how to do this and re-sampling dates in general in tableau.
https://community.tableau.com/thread/151387
Depending on the size of your data (and if a live view is not necessary) it is often times easier and more efficient to do this type of pre-processing outside of tableau in SQL or something like python's pandas library.
Here is another solution provided from the Tableau Cumunity Forum. I have not tried tyvich's solution yet, but I know this one got me where I needed. Please follow the link to see the solution using moving table calculations.
https://community.tableau.com/thread/251154

Calculating the difference between two dates using age and extract gives differing results in Postgresql

I'm using Postgresql (on Amazon Redshift), and I need to calculate the difference between two dates and then use that value in a formula to compute a ratio, so the date difference needs to be translated to a numeric value, preferably a float or double precision.
I have two dates: 1/1/2017 and 1/1/2014. I need to find the difference between these two dates in number of days.
When I use the age function I get 1080 days:
select age('2017-01-01','2014-01-01')
However, since age returns an interval and I need to work with a numeric result, I am using EXTRACT to convert the final value. I chose epoch since I wasn't able to find any other value for EXTRACT that would yield the number of time units between the two dates. This formula yields 1095.75 days (the divisor is the number of seconds in a day):
select extract(epoch from age('2017-01-01','2014-01-01'))/86400
Why am I getting a difference of 19.75 days when using age vs using extract?
Did you try
select '2017-01-01'::date - '2014-01-01'::date;
The difference between two dates is number of days in integer
1080 is the figure you would get if every month was 30 days long (36 months by 30 days equals 1080), as it would be if you used justify_days (either explicitly or if the DBMS called it implicitly). You don't say how you're getting this 1080 figure since I believe the duration would normally just print out something like 3 years, but that seems the most likely case
1095.75 seems the more correct figure, being 365.25 days multiplied by three years.
Out of those two, I would go with the latter method.
Although, as pointed out at http://www.postgresql.org/docs/8.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT, calculating the difference between two date types should yield the number of days:
select dtend - dtstart from somewhere
Redshift release notes say they recently released a months between function which looks similar to oracles months between function if that's what you're looking for. http://docs.aws.amazon.com/redshift/latest/dg/r_MONTHS_BETWEEN_function.html