Value of date changes in scala when reading from yaml file - scala

I have a YAML file, which has data -
time:
- days : 50
- date : 2020-02-30
I am reading both the things in my scala program -
val t = yml.get(time).asInstanceOf[util.Arraylist[util.LinkedHashMap[String, Any]]]
val date = t.get(1)
this output of date is - {date=Sat Feb 29 19:00:00 EST 2020}
or if I put the value in yaml file as date: 2020-02-25 the output will be {date=Mon Feb 24 19:00:00 EST 2020} and not {date=Tue Feb 25 19:00:00 EST 2020}.
Why is it always reducing the date value by 1?
Any help is highly appreciated.
PS - I want to validate the date; that is why the input is date: 2020-02-30

Related

Search data field by year in Janusgraph

I have a 'Date' property on my 'Patent' node class that is formatted like this:
==>Sun Jan 28 00:08:00 UTC 2007
==>Tue Jan 27 00:10:00 UTC 1987
==>Wed Jan 10 00:04:00 UTC 2001
==>Sun Jan 17 00:08:00 UTC 2010
==>Tue Jan 05 00:10:00 UTC 2010
==>Thu Jan 28 00:09:00 UTC 2010
==>Wed Jan 04 00:09:00 UTC 2012
==>Wed Jan 09 00:12:00 UTC 2008
==>Wed Jan 24 00:04:00 UTC 2018
And is stored as class java.util.Date in the database.
Is there a way to search this field to return all the 'Patents' for a particular year?
I tried variations of g.V().has("Patent", "date", 2000).values(). However, it doesn't return any results or an error message.
Is there a way to search this property field by year or do I need to create a separate property that just contains year?
You do not need to create a separate property for the year. JanusGraph recognizes the Date data type and can filter by date values.
gremlin> dateOfBirth1 = new GregorianCalendar(2000, 5, 6).getTime()
==>Tue Jun 06 00:00:00 MDT 2000
gremlin> g.addV("person").property("name", "Person 1").property("dateOfBirth", dateOfBirth1)
==>v[4144]
gremlin> dateOfBirth2 = new GregorianCalendar(2001, 5, 6).getTime()
==>Wed Jun 06 00:00:00 MDT 2001
gremlin> g.addV("person").property("name", "Person 2").property("dateOfBirth", dateOfBirth2)
==>v[4328]
gremlin> dateOfBirthFrom = new GregorianCalendar(2000, 0, 1).getTime()
==>Sat Jan 01 00:00:00 MST 2000
gremlin> dateOfBirthTo = new GregorianCalendar(2001, 0, 1).getTime()
==>Mon Jan 01 00:00:00 MST 2001
gremlin> g.V().hasLabel("person").
......1> has("dateOfBirth", gte(dateOfBirthFrom)).
......2> has("dateOfBirth", lt(dateOfBirthTo)).
......3> values("name")
==>Person 1

convert Tue Jul 07 2019 12:30:42 to timestamp scala /spark

I need to transform convert Tue Jul 07 2020 12:30:42 to timestamp with scala for spark.
So the expected result will be : 2020-07-07 12:30:42
Any idea, how to make this please ?
You can use the to_timestamp function.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY") <-- Spark 3.0 Only.
df.withColumn("date", to_timestamp('string, "E MMM dd yyyy HH:mm:ss"))
.show(false)
+------------------------+-------------------+
|string |date |
+------------------------+-------------------+
|Tue Jul 07 2020 12:30:42|2020-07-07 12:30:42|
+------------------------+-------------------+

Converting CDT timestamp into UTC format in spark scala

My Dataframe, myDF is like bellow -
DATE_TIME
Wed Sep 6 15:24:27 CDT 2017
Wed Sep 6 15:30:05 CDT 2017
Expected output in format :
2017-09-06 15:24:27
2017-09-06 15:30:05
Need to convert DATE_TIME timestamp to UTC.
Tried the below code in databricks notebook but it's not working.
%scala
val df = Seq(("Wed Sep 6 15:24:27 CDT 2017")).toDF("times")
df.withColumn("times2",date_format(to_timestamp('times,"ddd MMM dd hh:mm:ss CDT yyyy"),"yyyy-MM-dd HH:mm:ss")).show(false)
times | times2
Wed Sep 6 15:24:27 CDT 2017 | null
I think we need to remove wed from your string then use to_timestamp() function.
Example:
df.show(false)
/*
+---------------------------+
|times |
+---------------------------+
|Wed Sep 6 15:24:27 CDT 2017|
+---------------------------+
*/
df.withColumn("times2",expr("""to_timestamp(substring(times,5,length(times)),"MMM d HH:mm:ss z yyyy")""")).
show(false)
/*
+---------------------------+-------------------+
|times |times2 |
+---------------------------+-------------------+
|Wed Sep 6 15:24:27 CDT 2017|2017-09-06 15:24:27|
+---------------------------+-------------------+
*/

converting specific string format to date in sparksql

I have a column that contains a string with the following date as a string Sat Sep 14 09:54:30 UTC 2019. Not familiar with format at all.
I need to convert to date or timestamp. Just a unit that I can compare against. I just need a point of comparison with a precision of one day.
This can help you get the timestamp from your string and then you get the days from it using Spark SQL(2.x)
spark.sql("""SELECT from_utc_timestamp(from_unixtime(unix_timestamp("Sat Sep 14 09:54:30 UTC 2019","EEE MMM dd HH:mm:ss zzz yyyy") ),"IST")as timestamp""").show()
+-------------------+
| timestamp|
+-------------------+
|2019-09-14 20:54:30|
+-------------------+

How to find Minimum in a sequence

A = ... //tuples of Medication(patientid,date,medicine)
B = A.groupby(x => x.patientid)
Example B would look like below - now I need to find the minimum date, how to do that in scala??
(
478009505-01,
CompactBuffer(
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride)),
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride)),
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride))
)
)
Making some assumptions on the types:
case class Medication(id: Int, date: String, medicine: String)
val l = List(
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2010", "glimepiride")),
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2008", "glimepiride")),
None,
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2011", "glimepiride"))
)
You can use a for comprehension to extract all the dates, then get the min with minBy:
import java.text.SimpleDateFormat
val format = new SimpleDateFormat("EEE MMM dd hh:mm:ss zzz yyyy")
def createDateTime(s: String) = new Date(format.parse(s).getTime))
val dates = for {
optMed <- l // foreach item
med <- optMed // if it contains some value
} yield createDateTime(med.date) // create a comparable date
dates.minBy(_.getTime) // get the minimum date
The result is the oldest date(2008-06-12)