Iterate over dates range (the scala way) - scala

Given a start and an end date I would like to iterate on it by day using a foreach, map or similar function. Something like
(DateTime.now to DateTime.now + 5.day by 1.day).foreach(println)
I am using https://github.com/nscala-time/nscala-time, but I get returned a joda Interval object if I use the syntax above, which I suspect is also not a range of dates, but a sort of range of milliseconds.
EDIT: The question is obsolete. As advised on the joda homepage, if you are using java 8 you should start with or migrate to java.time.

You may use plusDays:
val now = DateTime.now
(0 until 5).map(now.plusDays(_)).foreach(println)
Given start and end dates:
import org.joda.time.Days
val start = DateTime.now.minusDays(5)
val end = DateTime.now.plusDays(5)
val daysCount = Days.daysBetween(start, end).getDays()
(0 until daysCount).map(start.plusDays(_)).foreach(println)

For just iterating by day, I do:
Iterator.iterate(start) { _ + 1.day }.takeWhile(_.isBefore(end))
This has proven to be useful enough that I have a small helper object to provide an implicit and allow for a type transformation:
object IntervalIterators {
implicit class ImplicitIterator(val interval: Interval) extends AnyVal {
def iterateBy(step: Period): Iterator[DateTime] = Iterator.iterate(interval.start) { _ + step }
.takeWhile(_.isBefore(interval.end))
def iterateBy[A](step: Period, transform: DateTime => A): Iterator[A] = iterateBy(step).map(transform)
def iterateByDay: Iterator[LocalDate] = iterateBy(1.day, { _.toLocalDate })
def iterateByHour: Iterator[DateTime] = iterateBy(1.hour)
}
}
Sample usage:
import IntervalIterators._
(DateTime.now to 5.day.from(DateTime.now)).iterateByDay // Iterator[LocalDate]
(30.minutes.ago to 1.hour.from(DateTime.now)).iterateBy(1.second) // Iterator[DateTime], broken down by second

Solution with java.time API using Scala
Necessary import and initialization
import java.time.temporal.ChronoUnit
import java.time.temporal.ChronoField.EPOCH_DAY
import java.time.{LocalDate, Period}
val now = LocalDate.now
val daysTill = 5
Create List of LocalDate for sample duration
(0 to daysTill)
.map(days => now.plusDays(days))
.foreach(println)
Iterate over specific dates between start and end using toEpochDay or getLong(ChronoField.EPOCH_DAY)
//Extract the duration
val endDay = now.plusDays(daysTill)
val startDay = now
val duration = endDay.getLong(EPOCH_DAY) - startDay.getLong(EPOCH_DAY)
/* This code does not give desired results as trudolf pointed
val duration = Period
.between(now, now.plusDays(daysTill))
.get(ChronoUnit.DAYS)
*/
//Create list for the duration
(0 to duration)
.map(days => now.plusDays(days))
.foreach(println)

This answer fixes the issue of mrsrinivas answer, that .get(ChronoUnits.DAYS) returns only the days part of the duration, and not the total number of days.
Necessary import and initialization
import java.time.temporal.ChronoUnit
import java.time.{LocalDate, Period}
Note how above answer would lead to wrong result (total number of days is 117)
scala> Period.between(start, end)
res6: java.time.Period = P3M26D
scala> Period.between(start, end).get(ChronoUnit.DAYS)
res7: Long = 26
Iterate over specific dates between start and end
val start = LocalDate.of(2018, 1, 5)
val end = LocalDate.of(2018, 5, 1)
// Create List of `LocalDate` for the period between start and end date
val dates: IndexedSeq[LocalDate] = (0L to (end.toEpochDay - start.toEpochDay))
.map(days => start.plusDays(days))
dates.foreach(println)

you can use something like that:
object Test extends App {
private val startDate: DateTime = DateTime.now()
private val endDate: DateTime = DateTime.now().plusDays(5)
private val interval: Interval = new Interval(startDate, endDate)
Stream.from(0,1)
.takeWhile(index => interval.contains(startDate.plusDays(index)))
.foreach(index => println(startDate.plusDays(index)))
}

In this case, the Scala way is the Java way:
When running Scala on Java 9+, we can use java.time.LocalDate::datesUntil:
import java.time.LocalDate
import collection.JavaConverters._
// val start = LocalDate.of(2019, 1, 29)
// val end = LocalDate.of(2018, 2, 2)
start.datesUntil(end).iterator.asScala
// Iterator[java.time.LocalDate] = <iterator> (2019-01-29, 2019-01-30, 2019-01-31, 2019-02-01)
And if the last date is to be included:
start.datesUntil(end.plusDays(1)).iterator.asScala
// 2019-01-29, 2019-01-30, 2019-01-31, 2019-02-01, 2019-02-02

import java.util.{Calendar, Date}
import scala.annotation.tailrec
/** Gets date list between two dates
*
* #param startDate Start date
* #param endDate End date
* #return List of dates from startDate to endDate
*/
def getDateRange(startDate: Date, endDate: Date): List[Date] = {
#tailrec
def addDate(acc: List[Date], startDate: Date, endDate: Date): List[Date] = {
if (startDate.after(endDate)) acc
else addDate(endDate :: acc, startDate, addDays(endDate, -1))
}
addDate(List(), startDate, endDate)
}
/** Adds a date offset to the given date
*
* #param date ==> Date
* #param amount ==> Offset (can be negative)
* #return ==> New date
*/
def addDays(date: Date, amount: Int): Date = {
val cal = Calendar.getInstance()
cal.setTime(date)
cal.add(Calendar.DATE, amount)
cal.getTime
}

Related

Spark scala UDF in DataFrames is not working

I have defined a function to convert Epoch time to CET and using that function after wrapping as UDF in Spark dataFrame. It is throwing error and not allowing me to use it. Please find below my code.
Function used to convert Epoch time to CET:
import java.text.SimpleDateFormat
import java.util.{Calendar, Date, TimeZone}
import java.util.concurrent.TimeUnit
def convertNanoEpochToDateTime(
d: Long,
f: String = "dd/MM/yyyy HH:mm:ss.SSS",
z: String = "CET",
msPrecision: Int = 9
): String = {
val sdf = new SimpleDateFormat(f)
sdf.setTimeZone(TimeZone.getTimeZone(z))
val date = new Date((d / Math.pow(10, 9).toLong) * 1000L)
val stringTime = sdf.format(date)
if (f.contains(".S")) {
val lng = d.toString.length
val milliSecondsStr = d.toString.substring(lng-9,lng)
stringTime.substring(0, stringTime.lastIndexOf(".") + 1) + milliSecondsStr.substring(0,msPrecision)
}
else stringTime
}
val epochToDateTime = udf(convertNanoEpochToDateTime _)
Below given Spark DataFrame uses the above defined UDF for converting Epoch time to CET
val df2 = df1.select($"messageID",$"messageIndex",epochToDateTime($"messageTimestamp").as("messageTimestamp"))
I am getting the below shown error, when I run the code
Any idea how am I supposed to proceed in this scenario ?
The spark optimizer execution tells you that your function is not a Function1, that means that it is not a function that accepts one parameter. You have a function with four input parameters. And, although you may think that in Scala you are allowed to call that function with only one parameter because you have default values for the other three, it seems that Catalyst does not work in this way, so you will need to change the definition of your function to something like:
def convertNanoEpochToDateTime(
f: String = "dd/MM/yyyy HH:mm:ss.SSS"
)(z: String = "CET")(msPrecision: Int = 9)(d: Long): String
or
def convertNanoEpochToDateTime(f: String)(z: String)(msPrecision: Int)(d: Long): String
and put the default values in the udf creation:
val epochToDateTime = udf(
convertNanoEpochToDateTime("dd/MM/yyyy HH:mm:ss.SSS")("CET")(9) _
)
and try to define the SimpleDateFormat as a static transient value out of the function.
I found why the error is due to and resolved it. The problem is when I wrap the scala function as UDF, its expecting 4 parameters, but I was passing only one parameter. Now, I removed 3 parameters from the function and took those values inside the function itself, since they are constant values. Now in Spark Dataframe, I am calling the function with only 1 parameter and it works perfectly fine.
import java.text.SimpleDateFormat
import java.util.{Calendar, Date, TimeZone}
import java.util.concurrent.TimeUnit
def convertNanoEpochToDateTime(
d: Long
): String = {
val f: String = "dd/MM/yyyy HH:mm:ss.SSS"
val z: String = "CET"
val msPrecision: Int = 9
val sdf = new SimpleDateFormat(f)
sdf.setTimeZone(TimeZone.getTimeZone(z))
val date = new Date((d / Math.pow(10, 9).toLong) * 1000L)
val stringTime = sdf.format(date)
if (f.contains(".S")) {
val lng = d.toString.length
val milliSecondsStr = d.toString.substring(lng-9,lng)
stringTime.substring(0, stringTime.lastIndexOf(".") + 1) + milliSecondsStr.substring(0,msPrecision)
}
else stringTime
}
val epochToDateTime = udf(convertNanoEpochToDateTime _)
import spark.implicits._
val df1 = List(1659962673251388155L,1659962673251388155L,1659962673251388155L,1659962673251388155L).toDF("epochTime")
val df2 = df1.select(epochToDateTime($"epochTime"))

Populating a array of date tuples

i am trying to pass a list of date ranges needs to be in the below format.
val predicates =
Array(
“2021-05-16” → “2021-05-17”,
“2021-05-18” → “2021-05-19”,
“2021-05-20” → “2021-05-21”)
I am then using map to create a range of conditions that will be passed to the jdbc method.
val predicates =
Array(
“2021-05-16” → “2021-05-17”,
“2021-05-18” → “2021-05-19”,
“2021-05-20” → “2021-05-21”
).map { case (start, end) =>
s"cast(NEW_DT as date) >= date ‘$start’ AND cast(NEW_DT as date) <= date ‘$end’"
}
The process will need to run daily and i need to dynamically populate these values as i cannot use the hard coded way.
Need help in how i can return these values from a method with incrementing start_date and end_date tuples that can generate like above.I had a mere idea like below but as i am new to scala not able to figure out. Please help
def predicateRange(start_date: String, end_date: String): Array[(String,String)] = {
// iterate over the date values and add + 1 to both start and end and return the tuple
}
This assumes that every range is the same duration, and that each date range starts the next day after the end of the previous range.
import java.time.LocalDate
import java.time.format.DateTimeFormatter
def dateRanges(start: String
,rangeLen: Int
,ranges: Int): Array[(String,String)] = {
val startDate =
LocalDate.parse(start, DateTimeFormatter.ofPattern("yyyy-MM-dd"))
Array.iterate(startDate -> startDate.plusDays(rangeLen), ranges){
case (_, end) => end.plusDays(1) -> end.plusDays(rangeLen+1)
}.map{case (s,e) => (s.toString, e.toString)}
}
usage:
dateRanges("2021-05-16", 1, 3)
//res0: Array[(String, String)] = Array((2021-05-16,2021-05-17), (2021-05-18,2021-05-19), (2021-05-20,2021-05-21))
You can use following method to generate your tuple array,
import java.time.LocalDate
import java.time.format.DateTimeFormatter
def generateArray3(startDateString: String, endDateString: String): Array[(String, String)] = {
val dateFormatter = DateTimeFormatter.ISO_LOCAL_DATE
val startDate = LocalDate.parse(startDateString)
val endDate = LocalDate.parse(endDateString)
val daysCount = startDate.until(endDate).getDays
val dateStringTuples = Array.tabulate(daysCount)(i => {
val firstDate = startDate.plusDays(i)
val secondDate = startDate.plusDays(i + 1)
(dateFormatter.format(firstDate), dateFormatter.format(secondDate))
})
dateStringTuples
}
Usage:
println("--------------------------")
generateArray("2021-02-27", "2021-03-02").foreach(println)
println("--------------------------")
generateArray("2021-05-27", "2021-06-02").foreach(println)
println("--------------------------")
generateArray("2021-12-27", "2022-01-02").foreach(println)
println("--------------------------")
output :
--------------------------
(2021-02-27,2021-02-28)
(2021-02-28,2021-03-01)
(2021-03-01,2021-03-02)
--------------------------
(2021-05-27,2021-05-28)
(2021-05-28,2021-05-29)
(2021-05-29,2021-05-30)
(2021-05-30,2021-05-31)
(2021-05-31,2021-06-01)
(2021-06-01,2021-06-02)
--------------------------
(2021-12-27,2021-12-28)
(2021-12-28,2021-12-29)
(2021-12-29,2021-12-30)
(2021-12-30,2021-12-31)
(2021-12-31,2022-01-01)
(2022-01-01,2022-01-02)
--------------------------

Retrieve days from System.currentTimeMillis() - bug

I tried to retrieve a certain number of days from the current date; it works until 24 days but for 25 days it add days instead of retrieving ...
Does anyone of you understand why?
import java.sql.Timestamp
val retrieve_days: ((Int) => java.sql.Timestamp) = (nb_days: Int) => {
new java.sql.Timestamp(System.currentTimeMillis()- nb_days*24*60*60*1000)
}
val today = retrieve_days(0)
// today: java.sql.Timestamp = 2019-03-01 15:43:41.418
val minus1d = retrieve_days(1)
// minus1d: java.sql.Timestamp = 2019-02-28 15:43:41.958
val minus10d = retrieve_days(10)
// minus10d: java.sql.Timestamp = 2019-02-19 15:43:42.502
val minus20d = retrieve_days(20)
// minus20d: java.sql.Timestamp = 2019-02-09 15:43:42.895
val minus23d = retrieve_days(23)
//minus23d: java.sql.Timestamp = 2019-02-06 15:43:43.245
val minus24d = retrieve_days(24)
//minus24d: java.sql.Timestamp = 2019-02-05 15:43:43.577
val minus25d = retrieve_days(25)
//minus25d: java.sql.Timestamp = 2019-03-26 08:46:31.817
Thank you!
The calculation for 25 days is too large for the Int type:
25 * 24 * 60 * 60 * 1000
// mathematical value is 2160000000
Int.MaxValue
// 2147483647
// we can see the calculation is larger, this is why it becomes negative, therefore *adding* more time.
What you can do to remedy this is change it to a Long to get the appropriate calculation, this can be done by adding a .toLong at the appropriate place:
val retrieve_days: Int => java.sql.Timestamp = (nb_days: Int) => {
new java.sql.Timestamp(System.currentTimeMillis()- nb_days.toLong*24*60*60*1000)
}
val minus25d = retrieve_days(25)
// minus25d: java.sql.Timestamp = 2019-02-04 10:09:48.613

How to calculate 'n' days interval date in functional style?

I am trying to calculae the interval of 'n' days from start date to end date.Function signature would have start_date,end_date, interval as argument which return a map with list of start, end days of given intervals.
Example: start_date:2018-01-01 , End_date : 2018-02-20 interval: 20
Expected Output:
2018-01-01 , 2018-01-20 (20 days)
2018-01-21 , 2018-02-09 (20 days)
2018-02-09 , 2018-01-20 (remaining)
I tried to write in scala but i dont feel it's a proper functional style of writing.
case class DateContainer(period: String, range: (LocalDate, LocalDate))
def generateDates(startDate: String, endDate: String,interval:Int): Unit = {
import java.time._
var lstDDateContainer = List[DateContainer]()
var start = LocalDate.parse(startDate)
val end = LocalDate.parse(endDate)
import java.time.temporal._
var futureMonth = ChronoUnit.DAYS.addTo(start, interval)
var i = 1
while (end.isAfter(futureMonth)) {
lstDDateContainer = DateContainer("P" + i, (start, futureMonth)):: lstDDateContainer
start=futureMonth
futureMonth = ChronoUnit.DAYS.addTo(futureMonth, interval)
i += 1
}
lstDDateContainer= DateContainer("P" + i, (start, end))::lstDDateContainer
lstDDateContainer.foreach(println)
}
generateDates("2018-01-01", "2018-02-20",20)
Could anyone help me to write in a functional style.
I offer a solution that produces a slightly different result than given in the question but could be easily modified to get the desired answer:
//Preliminaries
val fmt = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val startDate ="2018-01-01"
val endDate = "2018-02-21"
val interval = 20L
val d1 = LocalDate.parse(startDate, fmt)
val d2 = LocalDate.parse(endDate, fmt)
//The main code
Stream.continually(interval)
.scanLeft((d1, d1.minusDays(1), interval)) ((x,y) => {
val finDate = x._2.plusDays(y)
if(finDate.isAfter(d2))
(x._2.plusDays(1), d2, ChronoUnit.DAYS.between(x._2, d2))
else
(x._2.plusDays(1), x._2.plusDays(y), y)
}).takeWhile(d => d._3 > 0).drop(1).toList
Result:
(2018-01-01,2018-01-20,20)
(2018-01-21,2018-02-09,20)
(2018-02-10,2018-02-21,12)
The idea is to scan a 3-tuple through a stream of interval and stop when no more days are remaining.
Use the java.time library to generate the dates and Stream.iterate() to generate the sequence of intervals.
import java.time.LocalDate
def generateDates( startDate :LocalDate
, endDate :LocalDate
, dayInterval :Int ) :Unit = {
val intervals =
Stream.iterate((startDate, startDate plusDays dayInterval-1)){
case (_,lastDate) =>
val nextDate = lastDate plusDays dayInterval
(lastDate plusDays 1, if (nextDate isAfter endDate) endDate
else nextDate)
}.takeWhile(_._1 isBefore endDate)
println(intervals.mkString("\n"))
}
usage:
generateDates(LocalDate.parse("2018-01-01"), LocalDate.parse("2018-02-20"), 20)
// (2018-01-01,2018-01-20)
// (2018-01-21,2018-02-09)
// (2018-02-10,2018-02-20)
Something like (Untested):
def dates(startDate: LocalDate, endDate: LocalDate, dayInterval: Int): List[(LocalDate, LocalDate, Int)] = {
if(startDate.isAfter(endDate)) {
Nil
}
else {
val nextStart = startDate.plusDays(dayInterval)
if(nextStart.isAfter(startDate)) {
List((startDate, endDate, ChronoUnit.DAYS.between(startDate, endDate)))
}
else {
(startDate, nextStart, dayInterval) :: dates(nextStart, endDate, dayInterval)
}
}
}
If your'e open to using Joda for date-time manipulations, here's what I use
import org.joda.time.{DateTime, Days}
// given from & to dates, find no of days elapsed in between (integer)
def getDaysInBetween(from: DateTime, to: DateTime): Int = Days.daysBetween(from, to).getDays
def getDateSegments(from: DateTime, to: DateTime, interval: Int): Seq[(DateTime, DateTime)] = {
// no of days between from & to dates
val days: Int = DateTimeUtils.getDaysInBetween(from, to) + 1
// no of segments (date ranges) between to & from dates
val segments: Int = days / interval
// (remaining) no of days in last range
val remainder: Int = days % interval
// last date-range
val remainderRanges: Seq[(DateTime, DateTime)] =
if (remainder != 0) from -> from.plusDays(remainder - 1) :: Nil
else Nil
// all (remaining) date-ranges + last date-range
(0 until segments).map { segment: Int =>
to.minusDays(segment * interval + interval - 1) -> to.minusDays(segment * interval)
} ++ remainderRanges
}

How to automate the creation of String elements with datetime

This thread arises from my previous question. I need to create Seq[String] that contains paths as String elements, however now I also need to add numbers 7,8,...-22 after a date. Also I cannot use LocalDate as it was suggested in the answer to the above-cited question:
path/file_2017-May-1-7
path/file_2017-May-1-8
...
path/file_2017-May-1-22
path/file_2017-April-30-7
path/file_2017-April-30-8
...
path/file_2017-April-30-22
..
I am searching for a flexible solution. My current solution implies the manual definition of dates yyyy-MMM-dd. However it is not efficient if I need to include more than 2 dates, e.g. 10 or 100. Moreover filePathsList is currently Set[Seq[String]] and I don't know how to convert it into Seq[String].
val formatter = new SimpleDateFormat("yyyy-MMM-dd")
val currDay = Calendar.getInstance
currDay.add(Calendar.DATE, -1)
val day_1_ago = currDay.getTime
currDay.add(Calendar.DATE, -1)
val day_2_ago = currDay.getTime
val dates = Set(formatter.format(day_1_ago),formatter.format(day_2_ago))
val filePathsList = dates.map(date => {
val list: Seq.empty[String]
for (num <- 7 to 22) {
list :+ s"path/file_$date-$num" + "
}
list
})
Here is how I was able to achieve what you outlined, adjust the days val to configure the amount of days you care about:
import java.text.SimpleDateFormat
import java.util.Calendar
val currDay = Calendar.getInstance
val days = 5
val dates = currDay.getTime +: List.fill(days){
currDay.add(Calendar.DATE, -1)
currDay.getTime
}
val formatter = new SimpleDateFormat("yyyy-MMM-dd")
val filePathsList = for {
date <- dates
num <- 7 to 22
} yield s"path/file_${formatter.format(date)}-$num"