Parsing a date in Scala - scala

Trying to parse a date string, 2020-10-20 19:36:00 using this Scala code.
val DATE_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
val date = LocalDate.parse("2020-10-20 19:36:00", DATE_FORMAT).atStartOfDay(ZoneId.systemDefault())
However when I println(date) I get this line 2020-10-20T00:00+02:00[Europe/Stockholm]
which is only including the date without hours, minutes... What method should I use instead to obtain a ZonedDateTime object containing all information in my date string.

I solved this issue using this piece of code.
val DATE_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").withZone(ZoneId.systemDefault())
val date = ZonedDateTime.parse(x,DATE_FORMAT)
I added the ZoneId to the pattern, making it possible to use the ZoneDateTime.parse(...) method

Related

Get current date as string in databricks using scala

I want to get current date in Scala as a String. For example, today current date is 5th Jan. I want to store it as a new variable dynamically as below.
val currdate : String = “20220105”
When I am using val currdate = Calendar.getInstance.getTime then am not getting output in desired format as above.
This is how it's done using the contemporary java.time library.
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val currdate: String =
LocalDate.now.format(DateTimeFormatter.ofPattern("yyyyMMdd"))
Older utilities like Calendar and SimpleDate still work (mostly) but should be avoided.
Why do you need it as String?
For a Spark query you could use java.sql.Timestamp directly.
This how you get it:
import java.sql.Timestamp
import java.text.SimpleDateFormat
import java.time.Instant
val now: Timestamp =
Timestamp.from(Instant.now())
If you really want a formatted String:
val asString =
new SimpleDateFormat("yyyyMMdd").format(now)
SimpleDateFormat is old and not thread-safe but should do the job.

Converting TZ timestamp string to a given format in UTC using spark and scala

I have a column called lastModified with String as given below that represents time in GMT.
"2019-06-24T15:36:16.000Z"
I want to format this string to the format yyyy-MM-dd HH:mm:ss in spark using scala. To achieve this, I created a dataframe with a new column "ConvertedTS".
which gives incorrect time.
Machine from where I am running this is in America/New_York timezone.
df.withColumn("ConvertedTS", date_format(to_utc_timestamp(to_timestamp(col("lastModified"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"), "America/New_York"), "yyyy-MM-dd HH:MM:SS").cast(StringType))
I am basically looking for formatting the result of below statement in yyyy-MM-dd HH:mm:ss
df.withColumn("LastModifiedTS", col("lastModified"))
One of the ways that is currently working for me is udf but as udfs are not recommended, I was looking for more of a direct expression that I can use.
val convertToTimestamp = (logTimestamp: String) => {
println("logTimeStamp: " + logTimestamp)
var newDate = ""
try {
val sourceFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
sourceFormat.setTimeZone(TimeZone.getTimeZone("GMT"))
val convertedDate = sourceFormat.parse(logTimestamp)
val destFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
destFormat.setTimeZone(TimeZone.getTimeZone("GMT"))
newDate = destFormat.format(convertedDate)
println("newDate: " + newDate)
} catch {
case e: Exception => e.printStackTrace()
}
newDate
}
//register for sql
EdlSparkObjects.sparkSession.sqlContext.udf.register("convertToTimestamp", convertToTimestamp)
// register for scala
def convertToTimestampUDF = udf(convertToTimestamp)
df.withColumn("LastModifiedTS", convertToTimestampUDF(col("lastModified")))
Thanks for help and guidance.
You're almost there with your first withColumn attempt. It just consists of an incorrect date formatting string yyyy-MM-dd HH:MM:SS. Also, cast(StringType) is unnecessary since date_format already returns a StringType column. Below is sample code with the corrected date formatting:
import org.apache.spark.sql.functions._
import spark.implicits._
val df = Seq(
(1, "2019-06-24T15:36:16.000Z"),
(2, "2019-07-13T16:25:27.000Z")
).toDF("id", "lastModified")
df.withColumn("ConvertedTS", date_format(to_utc_timestamp(to_timestamp(
$"lastModified", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"), "America/New_York"), "yyyy-MM-dd HH:mm:ss")
).
show(false)
// +---+------------------------+-------------------+
// |id |lastModified |ConvertedTS |
// +---+------------------------+-------------------+
// |1 |2019-06-24T15:36:16.000Z|2019-06-24 19:36:16|
// |2 |2019-07-13T16:25:27.000Z|2019-07-13 20:25:27|
// +---+------------------------+-------------------+

java.lang.RuntimeException: Unsupported literal type class org.joda.time.DateTime

I work on a project where I use a library, which is very new to me, although I was using it in other projects, without any problems.
org.joda.time.DateTime
So I work with Scala, and run the project as a job on Databricks.
scalaVersion := "2.11.12"
The code, where the exception comes from - according to my investigation so far ^^ - is the following:
var lastEndTime = config.getState("some parameters")
val timespanStart: Long = lastEndTime // last query ending time
var timespanEnd: Long = (System.currentTimeMillis / 1000) - (60*840) // 14 hours ago
val start = new DateTime(timespanStart * 1000)
val end = new DateTime(timespanEnd * 1000)
val date = DateTime.now()
Where the getState() function returns 1483228800 as Long type value.
EDIT: I use the start and end dates in filtering while building a dataframe. I compare columns (timespan type) with these values!
val df2= df
.where(col("column_name").isNotNull)
.where(col("column_name") > start &&
col("column_name") <= end)
The error I get:
ERROR Uncaught throwable from user code: java.lang.RuntimeException:
Unsupported literal type class org.joda.time.DateTime
2017-01-01T00:00:00.000Z
I am not sure I actually understand how and why this is an error, so every kind of help is more than welcome!! Thank you a lot in advance!!
This is a common problem when people start to work with Spark SQL. Spark SQL has its own types and you need to work with them if you want to take advantage of the Dataframe API. In your example you can not compare a Dataframe column value using a Spark Sql function like "col" with a DateTime object directly unless you use an UDF.
If you want to make your comparison using the Spark sql functions you can take a look to this post where you can find differences using Dates and Timestamps with Spark Dataframes.
If you (for any reason) need to use Joda you will inevitably need to build your UDF:
import org.apache.spark.sql.DataFrame
import org.joda.time.DateTime
import org.joda.time.format.{DateTimeFormat, DateTimeFormatter}
object JodaFormater {
val formatter: DateTimeFormatter = DateTimeFormat.forPattern("dd/MM/yyyy HH:mm:ss")
}
object testJoda {
import org.apache.spark.sql.functions.{udf, col}
import JodaFormater._
def your_joda_compare_udf = (start: DateTime) => (end: DateTime) => udf { str =>
val dt: DateTime = formatter.parseDateTime(str)
dt.isAfter(start.getMillis) && dt.isBefore(start.getMillis)
}
def main(args: Array[String]) : Unit = {
val start: DateTime = ???
val end : DateTime = ???
// Your dataframe with your date as StringType
val df: DataFrame = ???
df.where(your_joda_compare_udf(start)(end)(col("your_date")))
}
}
Note that using this implementation implies some overhead(memory and GC) because the conversion from StringType to a Joda DateTime object so you should use the Spark SQL functions whenever you can. In some posts you can read that udfs are black boxes because Spark can not optimize their execution, but sometimes they help.

Converting String timestamp to Scala Timestamp with Timezone throws Unparseable exception

I am working with Timestamps in String format. And trying to convert them to a Timestamp Value. It is throwing an exception for me.
Can anyone tell me what am I doing wrong here:
val s = "2017-12-14T09:54:52.662-06:00"
val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
val ts = new Timestamp(format.parse(s).getTime)
Throws: java.text.ParseException: Unparseable date: "2017-12-14T09:54:52.662-06:00"
That's because you use wrong pattern. Try to print format.format(new Date()) and you will see.
Right pattern:
val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX")

Date conversion

I have a date variable
var date: Date = new Date()
then I have converted this date to String:
var dateStr = date.toString()
now I need to convert back this String to date.
I have tried both:
1:
var stringToDate: Date = date2Str.asInstanceOf[Date]
and 2:
stringToDate: Date = new SimpleDateFormat("dd.MM.yyyy").parse(dateStr);
But in both case I got the error:
java.lang.ClassCastException:
java.lang.String cannot be cast to java.util.Date
I see a couple of problems in your code, but this works fine:
scala> val format = new java.text.SimpleDateFormat("dd-MM-yyyy")
format: java.text.SimpleDateFormat = java.text.SimpleDateFormat#9586200
scala> format.format(new java.util.Date())
res4: java.lang.String = 21-03-2011
scala> format.parse("21-03-2011")
res5: java.util.Date = Mon Mar 21 00:00:00 CET 2011
Starting Scala 2.11, targeting Java 8, the java.time Date Time API can be used:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val dtf = DateTimeFormatter.ofPattern("dd-MM-yyyy")
LocalDate.now().format(dtf) // "06-07-2018"
LocalDate.parse("06-07-2018", dtf) // java.time.LocalDate = 2018-07-06
Note that:
This is part of the standard library (no need for third party dependencies)
This is meant to replace the old java.util.Date/SimpleDateFormat api.
This is also supposed to replace the widely used joda-time library:
Note that from Java SE 8 onwards, users are asked to migrate to java.time (JSR-310) - a core part of the JDK which replaces this project.
And by association nscala-time which is a wrapper around joda-time.
Your first try should give you a ClassCastException because you cannot cast.aString to a Date. the second try does not seem to be using the right format that Date.toString() prints. The toString method of java.utility.Date returns a String in the format specified in the javadoc.
using nscala-time the following worked for me :
import com.github.nscala_time.time._
import com.github.nscala_time.time.Imports._
val ysterday= (DateTime.now- 1.days).toString(StaticDateTimeFormat.forPattern("yyyyMMdd"))