I have a date key of type 20170501 which is in YYYYmmdd format. How can we get a date x days back from this date in Scala?
This is what I have in program
val runDate = 20170501
Now I want a date say 30 days back from this date.
Using Scala/JVM/Java 8...
scala> import java.time._
import java.time._
scala> import java.time.format._
import java.time.format._
scala> val formatter = DateTimeFormatter.ofPattern("yyyyMMdd")
formatter: java.time.format.DateTimeFormatter = Value(YearOfEra,4,19,EXCEEDS_PAD)Value(MonthOfYear,2)Value(DayOfMonth,2)
scala> val runDate = 20170501
runDate: Int = 20170501
scala> val runDay = LocalDate.parse(runDate.toString, formatter)
runDay: java.time.LocalDate = 2017-05-01
scala> val runDayMinus30 = runDay.minusDays(30)
runDayMinus30: java.time.LocalDate = 2017-04-01
You can also use joda-time API with which has really good functions like
date.minusMonths
date.minusYear
date.minusDays
date.minusHours
date.minusMinutes
Here is simple example usinf JodaTIme API '
import org.joda.time.format.DateTimeFormat
val dtf = DateTimeFormat.forPattern("yyyyMMdd")
val dt= "20170531"
val date = dtf.parseDateTime(dt)
println(date.minusDays(30))
Output:
2017-05-01T00:00:00.000+05:45
For this you need to use udf and create a DateTime object with your input format "YYYYmmdd" and do the operations.
Hope this helps!
Related
I want to print data of employees who joined before 1991. Below is my sample data:
69062,FRANK,ANALYST,5646,1991-12-03,3100.00,,2001
63679,SANDRINE,CLERK,69062,1990-12-18,900.00,,2001
Initial RDD for loading data:
val rdd=sc.textFile("file:////home/hduser/Desktop/Employees/employees.txt").filter(p=>{p!=null && p.trim.length>0})
UDF for converting string column to date column:
def convertStringToDate(s: String): Date = {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd")
dateFormat.parse(s)
}
Mapping each and every column to its datatype:
val dateRdd=rdd.map(_.split(",")).map(p=>(if(p(0).length >0 )p(0).toLong else 0L,p(1),p(2),if(p(3).length > 0)p(3).toLong else 0L,convertStringToDate(p(4)),if(p(5).length >0)p(5).toDouble else 0D,if(p(6).length > 0)p(6).toDouble else 0D,if(p(7).length> 0)p(7).toInt else 0))
Now I get data in tuples as below:
(69062,FRANK,ANALYST,5646,Tue Dec 03 00:00:00 IST 1991,3100.0,0.0,2001)
(63679,SANDRINE,CLERK,69062,Tue Dec 18 00:00:00 IST 1990,900.0,0.0,2001)
Now when I execute command I am getting below error:
scala> dateRdd.map(p=>(!(p._5.before("1991")))).foreach(println)
<console>:36: error: type mismatch;
found : String("1991")
required: java.util.Date
dateRdd.map(p=>(!(p._5.before("1991")))).foreach(println)
^
So where am I going wrong ???
Since you are working with rdd's and no df's and you have date strings with simple date checking, the following non-complicated way for an RDD:
val rdd = sc.parallelize(Seq((69062,"FRANK","ANALYST",5646, "1991-12-03",3100.00,2001),(63679,"SANDRINE","CLERK",69062,"1990-12-18",900.00,2001)))
rdd.filter(p=>(p._5 < "1991-01-01")).foreach(println)
No need to convert the date to legacy SimpleDate formats. Use Java.time. Since the 4th column is in the ISO expected format, you can simply use the below rdd step.
Check this out
val rdd=spark.sparkContext.textFile("in\\employees.txt").filter( x => {val y = x.split(","); java.time.LocalDate.parse(y(4)).isBefore(java.time.LocalDate.parse("1991-01-01")) } )
the
rdd.collect.foreach(println)
gave the below result
63679,SANDRINE,CLERK,69062,1990-12-18,900.00,,2001
hope, this answers your question.
EDIT1:
Using Java 7 and SimpleFormat libraries
import java.util.Date
import java.text.SimpleDateFormat
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark._
import org.apache.spark.sql.types._
import org.apache.spark.sql._
object DTCheck{
def main(args:Array[String]): Unit = {
def convertStringToDate(s: String): Date = {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd")
dateFormat.parse(s)
}
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder().appName("Employee < 1991").master("local[*]").getOrCreate()
val sdf = new SimpleDateFormat("yyyy-MM-dd")
val dt_1991 = sdf.parse("1991-01-01")
import spark.implicits._
val rdd=spark.sparkContext.textFile("in\\employees.txt").filter( x => {val y = x.split(","); convertStringToDate(y(4)).before(dt_1991 ) } )
rdd.collect.foreach(println)
}
}
How can I convert the dd/mm/yyyy to yyyymmdd format and also
dd/m/yyyy to yyyymmdd format by using joda time in Scala.
am using this dependency
"joda-time" % "joda-time" % "2.9.9",
This answer already answers your question in Java, but here it is translated into Scala:
import org.joda.time.DateTime
import org.joda.time.format.{DateTimeFormat, DateTimeFormatter}
// define original date format
val originalFormat: DateTimeFormatter = DateTimeFormat.forPattern("dd/MM/yyyy")
val input: DateTime = originalFormat.parseDateTime("02/09/2017")
// define new format
val newFormat: DateTimeFormatter = DateTimeFormat.forPattern("yyyyMMdd")
val output: String = newFormat.print(input) // 20170902
This code will already account for missing a leading 0 from your date (ie it will see 02/9/2017 and 2/9/2017 as the same thing). It will not predict missing parts of the year though, so 2/9/17 will be outputted as 00170902 instead of 20170902.
As the answer I linked to earlier mentions though, you can just use java.time to do the same thing:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val originalFormat = DateTimeFormatter.ofPattern("dd/MM/yyyy")
val input = LocalDate.parse("02/09/2017", originalFormat)
val newFormat = DateTimeFormatter.ofPattern("yyyyMMdd")
val output = input.format(newFormat)
import org.joda.time.DateTime
import org.joda.time.format._
val fmt = DateTimeFormat.forPattern("dd/mm/yyyy")
val dt = fmt.parseDateTime("02/02/2017")
val fmt2 = DateTimeFormat.forPattern("yyyymmdd")
fmt2.print(dt)
I am creating a date in Scala.
val dafo = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm'Z'")
val tz = TimeZone.getTimeZone("UTC")
dafo.setTimeZone(tz)
val endTime = dafo.format(new Date())
How can I set the yesterday's date instead of today's date?
Here is how you get yesterday's date/time and format it using java.time:
import java.time.{ZonedDateTime, ZoneId}
import java.time.format.DateTimeFormatter
val yesterday = ZonedDateTime.now(ZoneId.of("UTC")).minusDays(1)
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm'Z'")
val result = formatter format yesterday
println(result)
You can use Calendar:
val dafo = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm'Z'")
val tz = TimeZone.getTimeZone("UTC")
dafo.setTimeZone(tz)
val calendar = Calendar.getInstance()
calendar.add(Calendar.DATE, -1)
dafo.format(calendar.getTime)
JSR-310 implementation:
import java.time.OffsetDateTime
import java.time.format.DateTimeFormatter
DateTimeFormatter.ISO_INSTANT.format(OffsetDateTime.now().minusDays(1L))
Here is another alternate solution. I think its a cleaner version.
val today = java.time.LocalDate.now
today: java.time.LocalDate = 2019-12-10
val yesterday= java.time.LocalDate.now.minusDays(1)
yesterday: java.time.LocalDate = 2019-12-09
I want to get for example a string of the current time: "20180122_101043". How can I do this?
I can create a val cal = Calendar.getInstance() but I'm not sure what to do with it after.
LocalDateTime is what you might want to use,
scala> import java.time.LocalDateTime
import java.time.LocalDateTime
scala> LocalDateTime.now()
res60: java.time.LocalDateTime = 2018-01-22T01:21:03.048
If you don't want default LocalDateTime format which is basically ISO format without zone info, you can apply DateTimeFormatter as below,
scala> import java.time.format.DateTimeFormatter
import java.time.format.DateTimeFormatter
scala> DateTimeFormatter.ofPattern("yyyy-MM-dd_HH:mm").format(LocalDateTime.now)
res61: String = 2018-01-22_01:21
Related resource - How to parse/format dates with LocalDateTime? (Java 8)
Calendar is not the best choice here. Use:
java.util.Date + java.text.SimpleDateFormat if you have java 7 or below
new SimpleDateFormat("YYYYMMdd_HHmmss").format(new Date)
java.time.LocalDateTime + java.time.format.DateTimeFormatter for java 8+
LocalDateTime.now.format(DateTimeFormatter.ofPattern("YYYYMMdd_HHmmss"))
You can make use of Java 8 Date/Time API:
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
val format = "yyyyMMdd_HHmmss"
val dtf = DateTimeFormatter.ofPattern(format)
val ldt = LocalDateTime.of(2018, 1, 22, 10, 10, 43) // 20180122_101043
ldt.format(dtf)
To get the current time, use LocalDateTime.now().
You may use the java.time library. For instance:
import java.time
val s: String = time.LocalDateTime.now().toString
println(s)
gives 2021-04-23T14:13:01.163869. Then you need to manipulate the string, e.g.
time.LocalDateTime.now().toString.replace("T","_").replace("-","").replace(":","")
for 20210423_141639.769222.
Suppose my date format is 21/05/2017 then the output will be SUN.
How can I get the day given a date?
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val df = DateTimeFormatter.ofPattern("dd/MM/yyyy")
val dayOfWeek = LocalDate.parse("21/05/2017",df).getDayOfWeek
You can use SimpleDateFormat as illustrated below:
import java.util.Calendar
import java.text.SimpleDateFormat
val now = Calendar.getInstance.getTime
val date = new SimpleDateFormat("yyyy-MM-dd")
date.format(now)
res1: String = 2017-05-20
val dowInt = new SimpleDateFormat("u")
dowInt.format(now)
res2: String = 6
val dowText = new SimpleDateFormat("E")
dowText.format(now)
res3: String = Sat
[UPDATE]
As per comments below, please note that SimpleDateFormat isn't thread-safe.
You can also use nscala-time which is a Scala wrapper for Java's joda-time which is a datetime library with a ton of functionality.
import com.github.nscala_time.time.Imports._
val someDate =(newDateTime).withYear(2017)
.withMonthOfYear(5)
.withDayOfMonth(21)
someDate.getDayOfWeek
res7: Int = 7
Now you need to know the mapping between integer days of the week and names. Here, 7 corresponds to Sunday.