How to match dates through fromJson(toJson(date)) with specs2 - scala

I am stuck on the following problem : I want to write a specs2 specification to assert that my to and from json transformations are symetrical. However, I get an error on joda datetime dates.
'2012-04-17T00:04:00.000+02:00' is not equal to '2012-04-17T00:04:00.000+02:00'. Values have the same string representation but possibly different types like List[Int] and List[String] (TimeSpecs.scala:18)
Here is a minimalist specs demonstrating the problem
import org.joda.time.DateTime
import org.specs2.mutable.Specification
class TimeSpecs extends Specification {
"joda and specs2" should {
"play nice" in {
val date = DateTime.parse("2012-04-17T00:04:00+0200")
val date2 = DateTime.parse("2012-04-17T00:04:00+0200")
date === date2
}
"play nice through play json transform" in {
import play.api.libs.json._
import play.api.libs.json.Json._
val date = DateTime.parse("2012-04-17T00:04:00+0200")
val jsDate= toJson(date)
val date2= jsDate.as[DateTime]
date === date2
}
}
}
how should I compare date and date2 in the second test ? they are the same but specs2 doesn't seem to see that :(
--- edit
"manually" inspecting the type at runtime with date.getClass.getCanonicalName returns org.joda.time.Datetime as expected
import org.joda.time.DateTime
import org.specs2.mutable.Specification
class TimeSpecs extends Specification {
"joda and specs2" should {
"play nice" in {
val date = DateTime.parse("2012-04-17T00:04:00+0200")
val date2 = DateTime.parse("2012-04-17T00:04:00+0200")
date === date2
}
"play nice through play json transform" in {
import play.api.libs.json._
import play.api.libs.json.Json._
val date:DateTime = DateTime.parse("2012-04-17T00:04:00+0200")
val jsDate= toJson(date)
val date2:DateTim= jsDate.as[DateTime]
println(date.getClass.getCanonicalName) //prints org.joda.time.DateTime
println(date2.getClass.getCanonicalName)//prints org.joda.time.DateTime
date === date2
}
}
}
Using DateTime#isEqual does kind of work but I loose the benefit of fluent matchers and the useful error messages they bring. Aditionally, what I am actually trying to compare are case class instances which happen to contain dates, not the dates themselves.
Using
date should beEqualTo(date2)
yields the same error as ===

The problem is that joda time defines a very strict equals which considers the date's Chronology for the equality ( DateTime#getChronology ). The isEqual method proposed by Kim Stebel does ignore the Chronology.
From there on, there are 2 possibilities: Defining custom read and writes for play, then using the same pattern to create the dates as in the following example
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
import org.specs2.mutable.Specification
class TimeSpecs extends Specification {
val pattern = "yyyy-MM-dd'T'HH:mm:ssZZ"
"joda and specs2" should {
"play nice" in {
val date = DateTime.parse("2012-04-17T00:04:00+0200",DateTimeFormat.forPattern(pattern))
val date2 = DateTime.parse("2012-04-17T00:04:00+0200",DateTimeFormat.forPattern(pattern))
date === date2
}
"play nice through play json transform" in {
import play.api.libs.json.Json._
//play2 custom write
implicit def customJodaWrite = play.api.libs.json.Writes.jodaDateWrites(pattern)
//play2 custom read
implicit def customJodaRead = play.api.libs.json.Reads.jodaDateReads(pattern)
val date:DateTime = DateTime.parse("2012-04-17T00:04:00+0200",DateTimeFormat.forPattern(pattern)) //make sure you parse the initial date with the same pattern
val jsDate= toJson(date)
val date2:DateTime= jsDate.as[DateTime]
println(date.getClass.getCanonicalName)
println(date2.getClass.getCanonicalName)
println(jsDate)
date should beEqualTo(date2)
}
}
}
Play 2.1 defaults to parsing (and writing to json) based on the unix timestamp in milliseconds without timezone information. When parsing back from the unix timestamp, it will consider it in the local computer timezone (in my case Europe/Paris). Hence the need for a custom parser/writer
Joda uses a specific formatter when calling parse without a parser argument, it doesn't seem possible to create the same formatter with only a pattern string ( I haven't found a way to activate the DateTimeFormatter#withOffsetParsed method through a pattern string).
Another possibility may be to define a custom specs2 matcher for jodatime which would use isEqual instead of equals.
Since I don't want the unix epoch in my json anyway, I'll stick with the custom play transformers

Related

Subtract Months from YYYYMM date in Scala

I am trying to subtract months from YYYYMM format.
import java.text.SimpleDateFormat
val date = 202012
val dt_format = new SimpleDateFormat("YYYYMM")
val formattedDate = dt_format.format(date)
new DateTime(formattedDate).minusMonths(3).toDate();
Expected output:
202012 - 3 months = 202009,
202012 - 14 months = 201910
But it did not work as expected. Please help!
Among standard date/time types YearMonth seems to be the most appropriate for the given use case.
import java.time.format.DateTimeFormatter
import java.time.YearMonth
val format = DateTimeFormatter.ofPattern("yyyyMM")
YearMonth.parse("197001", format).minusMonths(13) // 1968-12
This solution uses the functionality in java.time, available since Java 8. I would have preferred coming up with a solution that did not require to adjust the input so that it could be (forcefully) parsed into a LocalDate (so that plusMonths) could be used, but at least it works.
Probably a simple regex could get the job done. ;-)
import java.time.format.DateTimeFormatter
import java.time.LocalDate
val inFmt = DateTimeFormatter.ofPattern("yyyyMMdd")
val outFmt = DateTimeFormatter.ofPattern("yyyyMM")
def plusMonths(string: String, months: Int): String =
LocalDate.parse(s"${string}01", inFmt).plusMonths(months).format(outFmt)
assert(plusMonths("202012", -3) == "202009")
assert(plusMonths("202012", -14) == "201910")
You can play around with this code here on Scastie.

How to get creation date of a file in a Scala dataframe

How to print the date of a file in Scala is explained here.
My question is how I can get a variable containing this information which can be returned as a column to a dataframe. None of the conversions I would expect to be allowed, actually are allowed.
My code (using Scala 2.11):
import org.apache.spark.sql.functions._
import java.nio.file.{Files, Paths} // Needed for file time
import java.nio.file.attribute.BasicFileAttributes
import java.util.Date
def GetFileTimeFunc(pathStr: String): String = {
// From: https://stackoverflow.com/questions/47453193/how-to-get-creation-date-of-a-file-using-scala
val FileTime = Files.readAttributes(Paths.get(pathStr), classOf[BasicFileAttributes]).creationTime;
val JavaDate = Date.from(FileTime.toInstant);
return(JavaDate.toString())
}
#transient val GetFileTime = udf(GetFileTimeFunc _)
val filePath = "dbfs:/mnt/myData/" // location of data
val file_df = dbutils.fs.ls(filePath).toDF // Output columns are $"path", $"name", and $"size"
.withColumn("FileTimeCreated", GetFileTime($"path"))
display(file_df)//.select("name", "size"))
Output:
SparkException: Failed to execute user defined function($anonfun$2: (string) => string)
For some reason, Instant is not allowed as a column type, so I cannot use it as a return type.The same for FileTime, JavDate, etc.

java.text.ParseException: Unparseable date: "Some(2014-05-14T14:40:25.950)"

I need to fetch the date from a file.
Below is my spark program:
import org.apache.spark.sql.SparkSession
import scala.xml.XML
import java.text.SimpleDateFormat
object Active6Month {
def main(args:Array[String]){
val format = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm:ss.SSS")
val format1 = new SimpleDateFormat("yyyy-MM")
val spark = SparkSession.builder.appName("Active6Months").master("local").getOrCreate()
val data = spark.read.textFile("D:\\BGH\\StackOverFlow\\Posts.xml").rdd
val date = data.filter{line => {
line.toString().trim().startsWith("<row")
}}.filter{line=>{
line.contains("PostTypeId=\"1\"")
}}.map{line=>{
val xml = XML.loadString(line)
var closedDate = format1.format(format.parse(xml.attribute("ClosedDate").toString())).toString()
(closedDate,1)
}}.reduceByKey(_+_)
date.foreach(println)
spark.stop
}
}
And I am getting this error:
java.text.ParseException: Unparseable date: "Some(2014-05-14T14:40:25.950)"
The format of date in file is perfect i.e:
CreationDate="2014-05-13T23:58:30.457"
But in error it shows the String "Some" attached to it.
And my other question is why same working in below code:
val date = data.filter{line => {
line.toString().trim().startsWith("<row")
}}.filter{line=>{
line.contains("PostTypeId=\"1\"")
}}.flatMap{line=>{
val xml = XML.loadString(line)
xml.attribute("ClosedDate")
}}.map{line=>{
(format1.format(format.parse(line.toString())).toString(),1)
}}.reduceByKey(_+_)
My guess is that xml.attribute("ClosedDate").toString() is actually returning a string containing Some attached to it. Have you debugged that to make sure?
Maybe you shouldn't use toString(), but instead, get the attribute value, by using the proper method.
Or you can do it the "ugly" way and include "Some" in the pattern:
val format = new SimpleDateFormat("'Some('yyyy-MM-dd'T'hh:mm:ss.SSS')'")
Your second approach works because (and that's a guess because I don't code in Scala), probably the xml.attribute("ClosedDate") method returns an object, and calling toString() on this object returns the string with "Some" attached to it (why? ask the API authors). But when you use map on this object, it sets the line variable to the correct value (without the "Some" part).

Spark UDF thread safety

I'm using Spark to take a dataframe containing a column of dates, and create 3 new columns containing the time in days, weeks, and months between the date in the column and today.
My concern is around the use of SimpleDateFormat, which isn't thread safe. Ordinarily without Spark this would be fine since it's a local variable, but with Spark's lazy evaluation, is sharing a single SimpleDateFormat instance over multiple UDFs likely to cause an issue?
def calcTimeDifference(...){
val sdf = new SimpleDateFormat(dateFormat)
val dayDifference = udf{(x: String) => math.abs(Days.daysBetween(new DateTime(sdf.parse(x)), presentDate).getDays)}
output = output.withColumn("days", dayDifference(myCol))
val weekDifference = udf{(x: String) => math.abs(Weeks.weeksBetween(new DateTime(sdf.parse(x)), presentDate).getWeeks)}
output = output.withColumn("weeks", weekDifference(myCol))
val monthDifference = udf{(x: String) => math.abs(Months.monthsBetween(new DateTime(sdf.parse(x)), presentDate).getMonths)}
output = output.withColumn("months", monthDifference(myCol))
}
I dont think its safe, as we know, SimpleDateFormat is not thread-safe.
So I prefer this method to use SimpleDateFormat in Spark if you need:
import java.text.SimpleDateFormat
import java.util.SimpleTimeZone
/**
* Thread Safe SimpleDateFormat for Spark.
*/
object ThreadSafeFormat extends ThreadLocal[SimpleDateFormat] {
override def initialValue(): SimpleDateFormat = {
val dateFormat = new SimpleDateFormat("yyyy-MM-dd:H")
// if you need get UTC time, you can set UTC timezone
val utcTimeZone = new SimpleTimeZone(SimpleTimeZone.UTC_TIME, "UTC")
dateFormat.setTimeZone(utcTimeZone)
dateFormat
}
}
Then use ThreadSafeFormat.get() to get Thread-safe SimpleDateFormat to do anything.

In Scala I can have reference to a private type via implicit conversion

I've found this interesting behaviour in nscala_time package (scala version of joda-time)
import com.github.nscala_time.time.Imports._
import com.github.nscala_time.time.DurationBuilder
object tests {
val x = 3 seconds
//> x : is of type com.github.nscala_time.time.DurationBuilder
val xx: DurationBuilder = 3 seconds
//> fails to compile:
// class DurationBuilder in package time cannot be accessed in package com.github.nscala_time.time
}
What I'm trying to achieve is implicit conversion from nscala_time Duration to scala.concurrent.Duration
I need this becuase I'm using RxScala and nscala_time in one application.
// e.g. the following should be implicitly converted
// to nscala_time Duration first
// than to scala.lang.concurrent.Duration
3 seconds
nscala_time offers rich time & date api for my application, while I'm using RxScala in the same class for GUI responsivness.
You can download a simple project to play around: https://dl.dropboxusercontent.com/u/9958045/implicit_vs_private.zip
From scala-user group: It's a known issue https://issues.scala-lang.org/browse/SI-1800
perhaps you can use an implicit conversion? (btw Duration in nscala is essentially org.joda.time.Duration):
scala> import com.github.nscala_time.time.Imports._
import com.github.nscala_time.time.Imports._
scala> implicit class DurationHelper(d:org.joda.time.Duration) {
| def toScalaDuration = scala.concurrent.duration.Duration.apply(d.getMillis,scala.concurrent.duration.MILLISECONDS)
| }
defined class DurationHelper
scala> val d = RichInt(3).seconds.toDuration
// toDuration method is defined for com.github.nscala_time.time.DurationBuilder
d: org.joda.time.Duration = PT3S
scala> def exfun(d:scala.concurrent.duration.Duration) = d.toString
exfun: (d: scala.concurrent.duration.Duration)String
scala> exfun(d)
res41: String = 3000 milliseconds
(not using import scala.concurrent.duration._ here to avoid name clashes with joda/nlscala stuff)