Date conversion to timestamp in EPOCH - scala

I am looking to convert date to day minus 1 12:00 AM (epoch time) in spark
val dateInformat=DateTimeFormatter.ofPattern("MM-dd-yyyy")
val batchpartitiondate= LocalDate.parse("10-14-2022",dateInformat)
batchpartitiondate: javatimelocalDate=2022-10-14
batchpartitiondate should be converted to epochtime(1665619200)
Date for example:
InputDate in spark submit argument is 12-15-2022
I need the output as epoch time (1665705600) i.e in GMT:Friday,October 14,12:00:00AM
if i give input as 12-14-2022 it should give the output as epoch time (1665619200) i.e in GMT:Thursday,October 13,12:00:00AM

Does this achieve what you are looking to do?
val dateInFormat = DateTimeFormatter.ofPattern("MM-dd-yyyy")
val batchPartitionDate = LocalDate.parse("10-14-2022", dateInformat)
val alteredDateTime = batchPartitionDate.minusDays(1).atStartOfDay()
// current zone
{
val zone = ZoneId.systemDefault()
val instant = alteredDateTime.atZone(zone).toInstant
val epochMillis = instant.toEpochMilli
}
// UTC
// Or you can specify the appropriate timezone insteasd of UTC
{
val zone = ZoneOffset.UTC
val instant = alteredDateTime.toInstant(zone)
val epochMillis = instant.toEpochMilli
}

Related

First and Last Date of Current and Previous Quarter For a Given Date in Scala

I'm writing a code for automatic execution once a month, which will take the date from the computer as the start and end date for execution of a while loop.
val today = new DateTime().withZone(DateTimeZone.forID("Asia/Kolkata"))
var start = today.toString(DateTimeFormat.forPattern("yyyy-MM-dd"))
val end = today.toString(DateTimeFormat.forPattern("yyyy-MM-dd"))
while(start<=end){
Data(start,end)
}
def Data (start: DateTime, end: DateTime) {
val start_temp = start
val end_temp = end
var start_temp_1 = start_temp.minusMonths(1).withDayOfMonth(1);
var start_date_monthly = start_temp_1.toString(DateTimeFormat.forPattern("yyyy-MM-dd"))
println("Last Month Start Date: " + start_date_monthly)
var end_temp_1 = end_temp.minusMonths(0).withDayOfMonth(1);
var end_temp_2 = end_temp_1.minusDays(1)
var end_date_monthly = end_temp_2.toString(DateTimeFormat.forPattern("yyyy-MM-dd"))
println("Last Month End Date: " + end_date_monthly)
}
The date in the variable today needs to be converted to the first and last date of
Previous month from today
Current and previous quarter from today
I was able to do the first one as displayed in the Data function.
Suppose today gets the following date - 2022-01-27T14:25:26.374+05:30
The above function returns
Last Month Start Date: 2021-12-01
Last Month End Date: 2021-12-31
How can I achieve this for the second one - first and last dates of current and previous quarter?
Ex - today gets the following date - 2022-01-27T14:25:26.374+05:30
I need to return 2021-10-01 and 2021-12-31 for the previous quarter
And 2022-01-01 and 2022-03-31 for the current quarter.
Certain solutions suggest using Spark SQL & Dataframes but that isn't applicable in this situation.
Is there a direct way do this as done in the month case? Or is a udf the only option here?
The following may do it. This is probably a UDF if it has to be applied to every row in the DF
import java.time.temporal.IsoFields
import java.time.temporal.IsoFields.QUARTER_OF_YEAR
import java.time.{LocalDate, YearMonth}
def printQuarterBeginAndEnd(localDate: LocalDate): Unit = {
val currentQuarter = localDate.get(QUARTER_OF_YEAR)
val currentYear = localDate.getYear
val currentMonth = localDate.getMonth
val startOfQuarter = YearMonth.of(currentYear, (currentQuarter-1) * 3 + 1).`with`(QUARTER_OF_YEAR, currentQuarter).atDay(1)
val endOfQuarter = YearMonth.of(currentYear, currentQuarter * 3).`with`(QUARTER_OF_YEAR, currentQuarter).atEndOfMonth()
println(s"Start $startOfQuarter ends $endOfQuarter")
}
printQuarterBeginAndEnd(LocalDate.now())
printQuarterBeginAndEnd(LocalDate.now().minus(1, IsoFields.QUARTER_YEARS))
Prints the following
import java.time.temporal.IsoFields
import java.time.temporal.IsoFields.QUARTER_OF_YEAR
import java.time.{LocalDate, YearMonth}
printQuarterBeginAndEnd: (localDate: java.time.LocalDate)Unit
Start 2022-01-01 ends 2022-03-31
Start 2021-10-01 ends 2021-12-31
scala> printQuarterBeginAndEnd(LocalDate.now().minus(1, IsoFields.QUARTER_YEARS)) Start 2021-10-01 ends 2021-12-31
scala> printQuarterBeginAndEnd(LocalDate.now()) Start 2022-01-01 ends 2022-03-31
scala> printQuarterBeginAndEnd(LocalDate.parse("2021-01-01")) Start 2021-01-01 ends 2021-03-31

Convert date to another format Scala Spark

I am reading a CSV that contains two types of date:
dd-MMM-yyyy hh:mm:ss -> 13-Dec-2019 17:10:00
dd/MM/yyyy hh:mm -> 11/02/2020 17:33
I am trying to transform all dates of the first type into the second type but I can't find a good solution. I am trying this:
val pr_date = readeve.withColumn("Date", when(to_date(col("Date"),"dd-MMM-yyyy hh:mm:ss").isNotNull,
to_date(col("Date"),"dd/MM/yyyy hh:mm")))
pr_date.show(25)
And I get the entire Date column as null values:
I am trying with this function:
def to_date_(col: Column,
formats: Seq[String] = Seq("dd-MMM-yyyy hh:mm:ss", "dd/MM/yyyy hh:mm")) = {
coalesce(formats.map(f => to_date(col, f)): _*)
}
val p2 = readeve.withColumn("Date",to_date_(readeve.col(("Date")))).show(125)
And in the first type of date i get nulls too:
What am I doing wrong? (new with Scala Spark)
Scala version: 2.11.7
Spark version: 2.4.3
Try code below? Note that 17 is HH, not hh. Also try to_timestamp instead of to_date because you want to keep the time.
val pr_date = readeve.withColumn(
"Date",
coalesce(
date_format(to_timestamp(col("Date"),"dd-MMM-yyyy HH:mm:ss"),"dd/MM/yyyy HH:mm"),
date_format(to_timestamp(col("Date"),"dd/MM/yyyy HH:mm"),"dd/MM/yyyy HH:mm")
)
)

Convert timestamp column from UTC to EST in spark scala

I have a column in spark dataframe of timestamp type with date format like '2019-06-13T11:39:10.244Z'
My goal is to convert this column into EST time(subtracting 4 hours) keeping the same format.
I tried it using from_utc_timestamp api but it seems it is converting the UTC time to my local timezone (+5:30) and adding it to the timestamp then subtracting 4 hours from it. I tried to use Joda time but for some reason it is adding 33 days to the EST time
innput = 2019-06-13T11:39:10.244Z
using from_utc_timestamp api:
val tDf = df.withColumn("newTimeCol", to_utc_timestamp(col("timeCol"), "America/New_York"))
output = 2019-06-13T13:09:10.244Z+5:30
using Joda time package:
val coder : (String => String) = (arg: String) => {
new DateTime(arg, DateTimeZone.UTC).minusHours(4).toString("yyyy-mm-dd'T'HH:mm:s.SS'Z'")}
val sqlfunc = udf(coder)
val tDf = df.withColumn("newTime", sqlfunc(col("_c20")))
output = 2019-39-13T07:39:10.244Z
desired output = 2019-06-13T07:39:10.244Z
Kindly advise how should I proceed. Thanks in advance
There is a typo in your format string when creating the output.
Your format string should be yyyy-MM-dd'T'HH:mm:s.SS'Z' but it is yyyy-mm-dd'T'HH:mm:s.SS'Z'.
mm is the format char for minutes while MM is the format char for the months. You can check all format chars here.

How to convert a DateTime with milliseconds into epoch time with milliseconds

I have data in hive table in the below format.
2019-11-21 18:19:15.817
I wrote a sql query as below to get the above column value into epoch format.
val newDF = spark.sql(f"""select TRIM(id) as ID, unix_timestamp(sig_ts) as SIG_TS from table""")
And I am getting the output column SIG_TS as 1574360296 which is not having milliseconds.
How to get the epoch timestamp of a date with milliseconds?
Simple way: Create an UDF since spark's built-in function truncates at seconds.
import java.sql.Timestamp
val fullTimestampUDF = udf{t: Timestamp => t.getTime}
val df = Seq("2019-11-21 18:19:15.817").toDF("sig_ts")
.withColumn("sig_ts_ut", unix_timestamp($"sig_ts"))
.withColumn("sig_ts_ut_long", fullTimestampUDF($"sig_ts"))
df.show(false)
+-----------------------+----------+--------------+
|sig_ts |sig_ts_ut |sig_ts_ut_long|
+-----------------------+----------+--------------+
|2019-11-21 18:19:15.817|1574356755|1574356755817 |
+-----------------------+----------+--------------+

How to generate unique date ranges

I would like to generate unique date range between current date to say like 2050.
val start_date = "2017-03-21"
val end_date = "2050-03-21"
Not sure how can i create a function for it. Any inputs here please. The difference between the start and end dates can be anything.
Unique date range means the function would never return me a date range which it has already returned.
I have this solution in mind:
val start_date = "2017-03-21"
val end_date = "2050-03-21"
while(start_date= "2017-03-21")
{
end_date = start_date+1
return( start_date, end_date)
}
start_date=start_date+1
We will use the java.time.LocalDate and temporal.ChronoUnit imports to achieve this:
scala> import java.time.LocalDate
import java.time.LocalDate
scala> import java.time.temporal.ChronoUnit
import java.time.temporal.ChronoUnit
scala> val startDate = LocalDate.parse("2017-03-21")
startDate: java.time.LocalDate = 2017-03-21
scala> val endDate = LocalDate.parse("2050-03-21")
endDate: java.time.LocalDate = 2050-03-21
scala> val dateAmount = 5
dateAmount: Int = 5
scala> val randomDates = List.fill(dateAmount) {
val randomAmt = ChronoUnit.DAYS.between(startDate, endDate) * math.random() // used to generate a random amount of days within given limits
startDate.plusDays(randomAmt.toInt) // returns a date from that random amount, will not go beyond endDate
}
randomDates: List[java.time.LocalDate] = List(2049-03-16, 2025-12-30, 2042-04-20, 2027-03-14, 2031-03-15)