Create a new column with the value of last year in Pyspark - pyspark

I have a dateset that contains dates and for each date a value and specific unit the value refers to. What I need is a new column that gives me the value for that unit exactly one year ago.
I want do this in PySpark but so far i have been unsuccessful.
Example
Time Unit Value Value_lastYear
21-12-2022 1 3 5
21-12-2021 1 5 8
21-12-2022 2 6 7
Anybody a good idea?

You could subtract 365 days from the timestamp like so:
df = df.withColumn('new_time', F.date_sub(F.col('Time'), 365))

Related

Date difference between pairs of dates per ID

I have data with 2 columns, in the following format:
ID
Date
1
1/1/2020
1
27/7/2020
1
15/3/2021
2
18/1/2020
3
1/1/2020
3
3/8/2020
3
18/9/2021
2
23/8/2020
2
30/2/2021
Now I would like to create a calculation field in Tableau to find per ID the difference between the different dates. For any value e.g. days.
For example for ID 1 the difference of the two dates according to calendar is 208 days. Next the difference of the second to third date for the same ID is 231 days.
A table calc like the following should do if you get the partitioning, addressing and ordering right — such as setting “compute using” to Date.
If first() < 0 then min([Date]) - lookup(min([Date]), -1) end

finding difference between 2 months in spark sql

im trying to use function months_between in spark sql to find difference between 2 months in two different dates however I don't want to consider number of days between the 2 months for example :
I have these 2 dates
28-1-2-21 and 4-4-2021 , I'm getting a difference =2.2 however I want value to be 3
another two dates :
7-1-2021 and 18-3-2021 , I'm getting difference = 2.36 , I want value to be 2
I was trying to use round function but it's not accurate since for some dates I need to round up a number and for other dates I need to round down the same number ,same as the example above
the function im using months_between((date1),(date2))
Looks like you want the number of months regardless of dates.
In that case, you can combine trunc and months_between.
trunc will truncate to the unit specified by the format, so using the unit=month, you will get the first day of the month.
months_between(trunc('date1', 'month'), trunc('date2', 'month'))

How to convert 365 days to one year in postgresql?

How do I convert 365 days to equal 1 year in PostgreSQL, in a new column within a query?
Let's say we have three columns:
Department
Employee
Days_of_Service
How can I create an extra column where 365 days = 1 year? I would assume this would be a float if the days were over and/or under 365 days. Feel free to explain what this process is called, I would love to better understand it for future queries.
The data in Days_of_Service is just an INT (i.e. 1 day = 1)
We can assume the original code is:
SELECT
Department
, Employee
, Days_of_Service
, SOLUTION AS years_of_service --Basically, 356 days should = 1 year in this column
From employee_list
I cannot find anything about unit conversions for PostgreSQL, for this specific situation.
Since a year does not consist of exactly 356 days, your best bet is to divide the number of days by the length of a tropical year in days:
days_of_service / 365.242189 AS years_of_service

Spotfire Text to Integer for Dates

I am attempting to load time series data from an excel spreadsheet into spotfire. In my spreadsheet there is a separate column for year (spotfire sees it as an integer) and month (spotfire sees it as text) since it is in the three letter abbreviation format ie January is JAN. I am trying to avoid changing the data in excel and would like to do all of my work in spotfire as this will be updated periodically. How do I link these columns in spotfire so that I can plot a variable over a time frame?
Click Insert > Insert Calculated Column... Make sure you have the right data table selected. In the Expression field type:
Date([year],
case when [month]="JAN" then 1
when [month]="FEB" then 2
when [month]="MAR" then 3
when [month]="APR" then 4
when [month]="MAY" then 5
when [month]="JUN" then 6
when [month]="JUL" then 7
when [month]="AUG" then 8
when [month]="SEP" then 9
when [month]="OCT" then 10
when [month]="NOV" then 11
when [month]="DEC" then 12 end,
1)
I would name it something like "monthdate". Note that each date will have the day equal to 1. If you also have the day in your data, just put that column in the formula above instead of the last 1.

SAS MDX check if time member belongs to 3 last periods

In MDX I need to define a measure that is calculated for all months except last N months.
For measure that becomes NULL in last 2 months I did this:
DEFINE Member '[Cube].[Measures].[my measure]' AS
'iif([DateDimension].[DateHierarchy].CURRENTMEMBER is
[DateDimension].[DateHierarchy].[All Months from hierarchy].LastChild.LastChild.Lag(1)
OR [DateDimension].[DateHierarchy].CURRENTMEMBER is
[DateDimension].[DateHierarchy].[All Months from hierarchy].LastChild.LastChild,
NULL,[Measures].[Measure XXX])';
And this works fine, but now I need to create lots of measures that should be NULLed in last 2, 4, 6 and 12 months. Above solution would work but would be very messy, so my question is:
Is there an MDX function / operator that allows to do somenting like this:
[DateDimension].[DateHierarchy].CURRENTMEMBER between STH
OR
[DateDimension].[DateHierarchy].CURRENTMEMBER >=
[DateDimension].[DateHierarchy].lastChild.lastChild.lag(N)
?
I checked the GT (greater or equal ) operator but this works only for comparing measures