Pyspark: How to use Dataset in python - pyspark

I am not sure if my title is correct so please let me know if it's missleading.
I have defined list of Sales with euros and year as its data. This can be seen in the code below.
#dataclass
class Sales:
year: int = 0
euros : int = 0
def __init__(self, year: int, euros: int):
self.year = year
self.euros = euros
salesList = [Sales(2015, 325), Sales(2016, 100), Sales(2017, 15), Sales(2018, 1000),
Sales(2019, 50), Sales(2020, 750), Sales(2021, 950), Sales(2022, 400)]
I'm tring to print the euro amount from the year 2019 using this syntax:
print(f"Sales for 2019: {sales2019.euros}")
I'm not sure how to use Dataset in pyspark because if i use createDataFrame function like this: spark.createDataFrame(salesList) it don't seem to be able to handle the data as an object.
How should i add the data/what do to in order to use the print syntax shown above?

Related

Kotlin: Getting the difference betweeen two dates (now and previous date)

Sorry if similar questions have been asked too many times, but it seems that there's one or more issues with every answer I find.
I have a date in the form of a String: Ex.: "04112005"
This is a date. 4th of November, 2005.
I want to get the difference, in years and days, between the current date and this date.
The code I have so far gets the year and just substracts them:
fun getAlderFraFodselsdato(bDate: String): String {
val bYr: Int = getBirthYearFromBirthDate(bDate)
var cYr: Int = Integer.parseInt(SimpleDateFormat("yyyy").format(Date()))
return (cYr-bYr).toString()
}
However, naturally, this is quite innacurate, since the month and days aren't included.
I've tried several approaches to create Date, LocalDate, SimpleDate etc. objects and using these to calcualate the difference. But for some reason I haven't gotten any of them to work.
I need to create a Date (or similar) object of the current year, month and day. Then I need to create the same object from a string containing say, month and year (""04112005""). Then I need to get the difference between these, in years, months and days.
All hints are appreciated.
I would use java.time.LocalDate for parsing and today along with a java.time.Period that calculates the period between two LocalDates for you.
See this example:
fun main(args: Array<String>) {
// parse the date with a suitable formatter
val from = LocalDate.parse("04112005", DateTimeFormatter.ofPattern("ddMMyyyy"))
// get today's date
val today = LocalDate.now()
// calculate the period between those two
var period = Period.between(from, today)
// and print it in a human-readable way
println("The difference between " + from.format(DateTimeFormatter.ISO_LOCAL_DATE)
+ " and " + today.format(DateTimeFormatter.ISO_LOCAL_DATE) + " is "
+ period.getYears() + " years, " + period.getMonths() + " months and "
+ period.getDays() + " days")
}
The output for a today of 2020-02-21 is
The difference between 2005-11-04 and 2020-02-21 is 14 years, 3 months and 17 days
It Works Below 26 API level
There are too many formates of dates you just enter the format of date and required start date and end date. It will show you result. You just see different date formate hare and here if you need.
tvDifferenceDateResult.text = getDateDifference(
"12 November, 2008",
"31 August, 2021",
"dd MMMM, yyyy")
General method to calculate date difference
fun getDateDifference(fromDate: String, toDate: String, formater: String):String{
val fmt: DateTimeFormatter = DateTimeFormat.forPattern(formater)
val mDate1: DateTime = fmt.parseDateTime(fromDate)
val mDate2: DateTime = fmt.parseDateTime(toDate)
val period = Period(mDate1, mDate2)
// period give us Year, Month, Week and Days
// days are between 0 to 6
// if you want to calculate days not weeks
//you just add 1 and multiply weeks by 7
val mDays:Int = period.days + (period.weeks*7) + 1
return "Year: ${period.years}\nMonth: ${period.months}\nDay: $mDays"
}
For legacy Date functions below api 26 without running desugaring with Gradle plugin 4.0, java.time.* use:
fun getLegacyDateDifference(fromDate: String, toDate: String, formatter: String= "yyyy-MM-dd HH:mm:ss" , locale: Locale = Locale.getDefault()): Map<String, Long> {
val fmt = SimpleDateFormat(formatter, locale)
val bgn = fmt.parse(fromDate)
val end = fmt.parse(toDate)
val milliseconds = end.time - bgn.time
val days = milliseconds / 1000 / 3600 / 24
val hours = milliseconds / 1000 / 3600
val minutes = milliseconds / 1000 / 3600
val seconds = milliseconds / 1000
val weeks = days.div(7)
return mapOf("days" to days, "hours" to hours, "minutes" to minutes, "seconds" to seconds, "weeks" to weeks)
}
The above answers using java.time.* api is much cleaner and accurate though.

PowerBI end of month

I'm using monthly data and trying to display YoY% calculations.
However, my code is not robust for different end-of-month dates caused by leap years, I think.
Value YoY% 2 =
VAR START_DATE = DATEADD('DATA'[Date], -12, MONTH)
RETURN
DIVIDE(SUM(DATA[Value]), CALCULATE(SUM(DATA[Value]),START_DATE))-1
I'm very much a power BI novice. Thank you for your help.
Try the following:
Value YoY% 2 =
VAR Curr_Year = YEAR(SELECTEDVALUE(DATA[Date]))
VAR Last_Year = Curr_Year - 1
RETURN
DIVIDE(
CALCULATE(SUM(DATA[Value]), FILTER(DATA, YEAR(DATA[Date]) = Curr_Year)),
CALCULATE(SUM(DATA[Value]), FILTER(DATA, YEAR(DATA[Date]) = Last_Year))
)

Extra milliseconds from Calendar.getInstance.getTimeInMillis

So, I had an innocent little snippet like
import java.util.Calendar
import java.sql.Timestamp
val cal = Calendar.getInstance
cal.set(1968, Calendar.APRIL, 25, 0, 45, 0)
val time = cal.getTimeInMillis
new java.sql.Timestamp(time)
new java.sql.Date(time).formatted("%1$tY-%1$tm-%1$te %1$tH:%1$tM:%1$tS.%1$tL")
and it kept showing me non-zero millisecond values. I was trying to construct a SQL timestamp in a test to compare to the output of a function, so it was a bit of a problem. I couldn't set an exact time -- there's no Calendar.setMillis API.
Or you could use the more recent and up-to-date java.time library.
import java.time.LocalDateTime
import java.time.ZoneId
val time = LocalDateTime.of(1968, 4, 25, 0, 45, 0) //date/time of interest
.atZone(ZoneId.systemDefault()) //this time zone
.toInstant().toEpochMilli() //in milliseconds
new java.sql.Timestamp(time)
//res0: java.sql.Timestamp = 1968-04-25 00:45:00.0
new java.sql.Date(time).formatted("%1$tY-%1$tm-%1$te %1$tH:%1$tM:%1$tS.%1$tL")
//res1: String = 1968-04-25 00:45:00.000
Calendar.getInstance always contains the current time. Subsequently calling 'set' on it overrides the year, month, day, hour, minute and second on it, but not the number of milliseconds. You have to zero those out yourself.
import java.util.Calendar
import java.sql.Timestamp
val cal = Calendar.getInstance
cal.set(1968, Calendar.APRIL, 25, 0, 45, 0)
val millis = cal.getTimeInMillis
val time = millis - millis % 1000
new java.sql.Timestamp(time)
new java.sql.Date(time).formatted("%1$tY-%1$tm-%1$te %1$tH:%1$tM:%1$tS.%1$tL")

How to get Min, Max and Length between dates for each year?

I have an rdd with type RDD[String] as an example here is a part of it as such:
1990,1990-07-08
1994,1994-06-18
1994,1994-06-18
1994,1994-06-22
1994,1994-06-22
1994,1994-06-26
1994,1994-06-26
1954,1954-06-20
2002,2002-06-26
1954,1954-06-23
2002,2002-06-29
1954,1954-06-16
2002,2002-06-30
...
result:
(1982,52)
(2006,64)
(1962,32)
(1966,32)
(1986,52)
(2002,64)
(1994,52)
(1974,38)
(1990,52)
(2010,64)
(1978,38)
(1954,26)
(2014,64)
(1958,35)
(1998,64)
(1970,32)
I group it nicely, but my problem is this v.size part, I do not know to to calculate that length.
Just to put it in perspective, here are expected results:
It is not a mistake that there is two times for 2002. But ignore that.
define date format:
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
and order:
implicit val localDateOrdering: Ordering[LocalDate] = Ordering.by(_.toEpochDay)
create a function that receives "v" and returns MAX(date_of_matching_year) - MIN(date_of_matching_year)) = LENGTH (in days):
def f(v: Iterable[Array[String]]): Int = {
val parsedDates = v.map(LocalDate.parse(_(1), formatter))
parsedDates.max.getDayOfYear - parsedDates.min.getDayOfYear
then replace the v.size with f(v)

Get Every Tuesday of the month with Coldfusion

I'm currently working with jquery FullCalendar plugin to create a specific calendar.
One of my tasks I have to work out is how to get any given specific day for the month.
I'm currently using Coldfusion 10 for the server side so I'm wondering is there any specific way of getting every instance of a Tuesday into an array of dates?
Ideally I would like to do this on the server side and populate the calendar plugin.
My issue is primarily trying to source every specific day of a calendar month.
Any advice greatly appreciated.
The firstXDayOfMonth() UDF on CFLlib allows you to find the first of a given day-of-week in a given month. From there you just need to loop from that date adding 7 each iteration until the month is no long the selected month.
theMonth = month(now());
startDate = firstXDayOfMonth(3, theMonth, year(now()));
tuesdays = [];
for (date=startDate; month(date) == theMonth; date +=7){
arrayAppend(tuesdays, dateAdd("s",0, date)); // this just converts date from a number back to a date
}
writeDump(tuesdays);
Update:
Actually the approach for that UDF on CFLib is terrible. Use this variation instead:
function firstXDayOfMonth(dayOfWeek,month,year){
var firstOfMonth = createDate(year, month,1);
var dowOfFirst = dayOfWeek(firstOfMonth);
var daysToAdd = (7 - (dowOfFirst - dayOfWeek)) MOD 7;
var dow = dateAdd("d", daysToAdd, firstOfMonth);
return dow;
}
I'll update the UDF on cflib a bit later: I need to write some decent unit tests for it first, and am a bit busy # the moment.
The Short Version:
At this time, there is not a function in CF that gets all the Tuesdays. But here's an easy way to do it:
// assuming a year and month are defined already
var firstDayOfMonth = createDate( year, month, 1 );
var targetDayOfWeek = 3; // Tuesday is 3 if Sunday is 1
var dayOfWeekArray = []; // This is the outcome.
// loop through each day of the month adding the target days to the array.
for( i = 1; i LTE daysInMonth( firstDayOfMonth ); i++){
var loopingDate = createDate( year, month, i );
if( dayOfWeek( loopingDate ) == targetDayOfWeek ){
ArrayAppend( dayOfWeekArray, loopingDate );
}
}
dayOfWeekArray is an array of every Tuesday of a month.
More Detail:
Your title and post seem to conflict as far as what you're looking for, so I'm going to stick with the title, since that's why I came here...
Here's what you can do to find all the Tuesdays in a month:
Create a date Object
Loop through the days in the target month using the date Object
If the current day is Tuesday, add it to an array
Boom, you got all the Tuesdays of a month in an array
Here's the code I used (cfscript):
// assuming a year and month are defined already
var firstDayOfMonth = createDate( year, month, 1 );
var dayOfWeekArray = [];
var targetDayOfWeek = 3; // Tuesday is 3 if Sunday is 1. Do a quick writeDump in the loop if you're not sure.
for( i = 1; i LTE daysInMonth( firstDayOfMonth ); i++){
var loopingDate = createDate( year, month, i );
if( dayOfWeek( loopingDate ) == targetDayOfWeek ){
ArrayAppend( dayOfWeekArray, datePart( "d", loopingDate );
// ArrayAppend( dayOfWeekArray, loopingDate ); - use this if you'd rather have the whole date object
}
}
This gives you dayOfWeekArray which will be the date of each Tuesday of a particular month. For instance, this month (Jan 2019) will be [1, 8, 15, 22, 29]. You can change this to be the entire date object if you want - that's what I did in the short version at the top.