I have an Excel-reader, where I put the results in sparks dataframes. I have problems with parsing the timestamps.
I have timestamps as strings like Wed Dec 08 10:49:59 CET 2021. I was using spark-sql version 2.4.5 and everything worked fine until I recently updated to version 3.1.2.
Please find some minimal code below.
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{col, to_timestamp}
val ts: String = "Wed Dec 08 20:49:59 CET 2021"
val oldfmt: String = "E MMM dd HH:mm:ss z yyyy"
val ttdf = Seq(ts)
.toDF("theTimestampColumn")
.withColumn("parsedTime", to_timestamp(col("theTimestampColumn"), fmt = oldfmt))
ttdf.show()
Running this code with spark version 2.4.5 works like expected and produces the following output:
+--------------------+-------------------+
| theTimestampColumn| parsedTime|
+--------------------+-------------------+
|Wed Dec 08 20:49:...|2021-12-08 20:49:59|
+--------------------+-------------------+
Now, executing the same code, just with spark version 3.1.2, results in the following error:
Exception in thread "main" org.apache.spark.SparkUpgradeException:
You may get a different result due to the upgrading of Spark 3.0:
Fail to recognize 'E MMM dd HH:mm:ss z yyyy' pattern in the DateTimeFormatter.
1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.
2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
(clickable link: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html)
This website doesn't help me further. I don't find any mistakes in my formatstring.
The symbol E represents the day-of-week as text like Tue; Tuesday.
The symbol M represents the month-of-year like 7; 07; Jul; July. The symbols H,m,s,y are hours, minutes, seconds or years, respectively. The symbol z denotes the time-zone name like Pacific Standard Time; PST.
Do I miss something obvious here?
Any help will be really appreciated. Thank you in advance.
You can use E only for datetime formatting and not for parsing, as stated in datetime pattern documentation:
Symbols of ‘E’, ‘F’, ‘q’ and ‘Q’ can only be used for datetime formatting, e.g. date_format. They are not allowed used for datetime parsing, e.g. to_timestamp.
If you want to apply behavior of Spark version <3.0, you can set spark.sql.legacy.timeParserPolicy option to LEGACY:
sparkSession.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
And if you don't want to change spark configuration, you can remove the characters representing day with substr SQL function:
import org.apache.spark.sql.functions.{col, to_timestamp, expr}
val ts: String = "Wed Dec 08 20:49:59 CET 2021"
val fmt: String = "MMM dd HH:mm:ss z yyyy"
val ttdf = Seq(ts)
.toDF("theTimestampColumn")
.withColumn("preparedTimestamp", expr("substr(theTimestampColumn, 5, length(theTimestampColumn))"))
.withColumn("parsedTime", to_timestamp(col("preparedTimestamp"), fmt = fmt))
.drop("preparedTimestamp")
Related
I am using
#JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "dd-MM-yyyy hh:mm:ss", timezone = "Asia/Kolkata")
private Timestamp startDateTime;
to store timestamp comes in json as a string.
But the problem is it converts time between 12 pm to 1 pm into 00 am.
E.g. 2021-10-25 12:30:00 gets converted to 2021-10-25 00:30:00.
Any help will be appreciated.
Thank you.
The root cause of the problem is that you have used h instead of H. Note that h is used for 12-hour time format (i.e. time with AM/PM marker) while H is used for 24-hour time format. So, the solution to your problem is to change the format to dd-MM-yyyy HH:mm:ss.
In case, you want the AM/PM marker in the time, change the format to dd-MM-yyyy hh:mm:ss a.
I am not able to parse the date in the following format 'February 4, 2020, 3:15:14 PM GMT-6'
I tried to specify format but no luck.
With no format specified I get the warning:
Deprecation warning: value provided is not in a recognized RFC2822 or ISO format.
How do I get a parsed date from this string?
Thanks.
Unfortunately, Moment doesn't have a parsing token for offsets like GMT-6. The offset must be at least two digits to work correctly with the Z token. You can use a regex replace to alter your string before parsing.
var input = 'February 4, 2020, 3:15:14 PM GMT-6';
var adjusted = input.replace(/(GMT\+|-)([1-9])/, '$10$2');
var m = moment.parseZone(adjusted, 'MMMM D, YYYY, h:mm:ss A [GMT]Z');
m.format() //=> "2020-02-04T15:15:14-06:00"
(There are probably improvements to the regex that could be made, but this one works.)
I am trying to extract date from string and compare them. I am new to Scala. The string : Some(Date: Tue, 14 Aug 2018 20:57:42 GMT)Some(Last-Modified: Tue, 14 Aug 2018 20:57:24 GMT) I wish to comapare Date and Last Modified
Extract the Dates if working with Option
There are several Scala wrappers for the Java Time API but the example below just uses the Java API directly.
val someDate: Option[String] = Some("Date: Tue, 14 Aug 2018 20:57:42 GMT")
val someLastMod: Option[String] = Some("Last-Modified: Tue, 14 Aug 2018 20:57:24 GMT")
The we extract the meaningful date substrings ie. we remove the "Date: "
val dateStr = someDate.get.split("^[\\w\\-]+:")(1).trim
val lastModStr = someLastMod.get.split("^[\\w\\-]+:")(1).trim
You should note that the above uses get() which assumes you can guarantee you will always have a Some and never a None. You should read up on working with Option in Scala if you don't understand this point.
Extract the Dates if working with a String
val data = "Some(Date: Tue, 14 Aug 2018 20:57:42 GMT)Some(Last-Modified: Tue, 14 Aug 2018 20:57:24 GMT)"
First we extract just the string dates we are interested in. The following expression uses split to create an array of strings, which we filter over to remove any empty strings before finally mapping over whats left and using take to remove the trailing parenthesis )
val dates = data.split("Some\\([\\w\\-]+:*\\s").filter(_.nonEmpty).map(_.take(29))
// dates: Array[String] = Array(Tue, 14 Aug 2018 20:57:42 GMT, Tue, 14 Aug 2018 20:57:24 GMT)
Now we extract each date string from the array.
val dateStr = dates(0)
val lastModStr = dates(1)
Now use the Java Time API to do comparisons.
Now we start to use the Java time API. First you need to import the Java packages.
import java.time._
import java.time.format._
Now create a formatter to match your DateTime pattern in order to convert the Strings to LocalDateTime instances.
val formatter = DateTimeFormatter.ofPattern("EEE, d MMM yyyy HH:mm:ss z")
val date = LocalDateTime.parse(dateStr, formatter)
val lastMod = LocalDateTime.parse(lastModStr, formatter)
Do some comparisons using the LocalDateTime API.
date.isBefore(lastMod)
date.isAfter(lastMod)
Check out the LocalDateTime docs for more ways to compare them.
Consider this
Will the format for the Dates always be in the same pattern? If not, you will need to think about how you will handle different patterns otherwise you will run into runtime exceptions (DateTimeParseException). Read more in the docs
Are you really trying to parse data that looks like this?
val badString = "Some(Date: Tue, 14 Aug 2018 20:57:42 GMT)Some(Last-Modified: Tue, 14 Aug 2018 20:57:24 GMT)"
Whoever thought that might be a reasonable way to represent data should go back to school (grade school). But it can be done. First let's try to segregate the data elements we're interested in, and remove some of the cruft along the way.
val inArray :Array[String] = badString.split("Some[^:]+: ")
//inArray: Array[String] = Array("", "Tue, 14 Aug 2018 20:57:42 GMT)", "Tue, 14 Aug 2018 20:57:24 GMT)")
Next we need to describe the date/time format that we're dealing with. Note that we have to account for a trailing paren ) in the data.
import java.time.format.DateTimeFormatter
val dtFormatter = DateTimeFormatter.ofPattern("E, dd MMM yyyy HH:mm:ss z)")
Now we can turn all the good data into Java LocalDateTime elements. Any Array elements that don't match the DateTimeFormatter pattern are removed.
import util.Try
import java.time.LocalDateTime
val dates :Array[LocalDateTime] = inArray.flatMap{ dateStr =>
Try(LocalDateTime.parse(dateStr.trim, dtFormatter)).toOption
}
So now you can extract the dates, if any, from the dates array and compare them using the LocalDateTime API.
What I am encountering is quite peculiar.
My Code:
val aa = "2017-01-17 01:33:00"
val bb = "04:33"
val hour = bb.substring(0, bb.indexOf(":"))
val mins = bb.substring(bb.indexOf(":") + 1, bb.length())
val negatedmins = "-" + mins
val ecoffsethour = hour.toLong
val ecoffsetmins = negatedmins.toLong
println(aa)
val datetimeformatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
val txn_post_date_hkt_date_parsed = LocalDateTime.parse(aa, datetimeformatter)
println(txn_post_date_hkt_date_parsed)
val minushours = txn_post_date_hkt_date_parsed.minusHours(ecoffsethour)
println(minushours)
val minusmins = minushours.minusMinutes(ecoffsetmins)
println(minusmins)
val offsetPostdateDiff = minusmins.toString().replace("T", " ")
println(offsetPostdateDiff)
Output:
2017-01-17 01:33:00
2017-01-17T01:33
2017-01-16T21:33
2017-01-16T22:06
2017-01-16 22:06
In the same code I am changing only the "aa" value to ==> 2017-01-17 01:33:44
Now the output is :
2017-01-17 01:33:44
2017-01-17T01:33:44
2017-01-16T21:33:44
2017-01-16T22:06:44
2017-01-16 22:06:44
Why is the first method not taking seconds field into consideration?
My Requirement is : However the output should come in "yyyy-MM-dd
HH:mm:ss" format.
I'm quite new to Scala. Please enlighten me.
Default format is ISO 8601
The java.time classes use the standard ISO 8601 formats by default when parsing/generating strings to represent date-time value.
The standard format for a local date-time is what you are seeing with the T in the middle: YYYY-MM-DDTHH:MM:SS.SSSSSSSSS.
LocalDateTime ldt = LocalDateTime.now( ZoneId.of( "America/Montreal" ) ) ;
String output = ldt.toString() ;
2017-01-23T12:34:56.789
Your call println( txn_post_date_hkt_date_parsed ) is implicitly calling the built-in toString method on the LocalDateTime object, and thereby asking for the standard ISO 8601 format with the T.
println( txn_post_date_hkt_date_parsed.toString() )
Offsets
On an unrelated note, you are working too hard. The java.time classes handle offsets. I do not understand why you want an offset of such an odd number (four hours and thirty-three minutes), but so be it.
Here is your code revised, but in Java syntax.
String input = "2017-01-17 01:33:00" ;
DateTimeFormatter f = DateTimeFormatter.ofPattern( "yyyy-MM-dd HH:mm:ss" ) ;
LocalDateTime ldt = LocalDateTime.parse( input , f ) ;
OffsetDateTime utc = ldt.atOffset( ZoneOffset.UTC ) ;
ZoneOffset offset = ZoneOffset.of( "-04:33" ) ; // Behind UTC by four hours and thirty-three minutes.
OffsetDateTime odt = utc.withOffsetSameInstant( offset ) ;
You can see this code run live at IdeOne.com. Notice how the wall-clock time of your offset-from-UTC is on the previous date. Same moment in history, same point on the timeline, but viewed through two different wall-clock times (UTC, and four hours and thirty three minutes behind).
The Z on the end is standard ISO 8601 notation, short for Zulu and meaning UTC.
input: 2017-01-17 01:33:00
ldt.toString(): 2017-01-17T01:33
utc.toString(): 2017-01-17T01:33Z
odt.toString(): 2017-01-16T21:00-04:33
It's usually better to explicitly the format in which you want the output.
So, instead of
println datetime
You can do something like this:
println datetimeformat.print(datetime)
Good luck!
Edit: Change made to make the 2 expressions exactly equivalent
We are assign a string to a variable integer or long etc.. to a variable like
var str:String="This is String"
var inte:Int=1
like these
var dat:Date=new Date(22/05/2013)
this is possible?..
but output is
Thu Jan 01 05:30:00 IST 1970
How to assign a static date to a variable?..
scala> 22/05/2013
res0: Int = 0
You are calling Date constructor with an Int argument. It's a number of milliseconds since the standard base time known as "the epoch", namely January 1, 1970, 00:00:00 GMT. So you are getting standard base time.
You should use DateFormat.parse since all other Date constructors are deprecated.
From the question, I couldn't guess what you are trying to achieve..
Perhaps, this is what you are looking for..
import java.util.Date
import java.text.SimpleDateFormat
val format = new SimpleDateFormat("dd/MM/yyyy")
var date = format.parse("22/05/2013")
// date : java.util.Date = Wed May 22 00:00:00 IST 2013