How to find Minimum in a sequence - scala

A = ... //tuples of Medication(patientid,date,medicine)
B = A.groupby(x => x.patientid)
Example B would look like below - now I need to find the minimum date, how to do that in scala??
(
478009505-01,
CompactBuffer(
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride)),
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride)),
Some(Medication(478009505-01,Fri Jun 12 10:30:00 EDT 2009,glimepiride))
)
)

Making some assumptions on the types:
case class Medication(id: Int, date: String, medicine: String)
val l = List(
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2010", "glimepiride")),
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2008", "glimepiride")),
None,
Some(Medication(478009505, "Fri Jun 12 10:30:00 EDT 2011", "glimepiride"))
)
You can use a for comprehension to extract all the dates, then get the min with minBy:
import java.text.SimpleDateFormat
val format = new SimpleDateFormat("EEE MMM dd hh:mm:ss zzz yyyy")
def createDateTime(s: String) = new Date(format.parse(s).getTime))
val dates = for {
optMed <- l // foreach item
med <- optMed // if it contains some value
} yield createDateTime(med.date) // create a comparable date
dates.minBy(_.getTime) // get the minimum date
The result is the oldest date(2008-06-12)

Related

How to convert "Wed Sep 07 11:11:19 GMT+05:30 2022" String to DateTime object in Flutter?

I'm using intl package to format the String.
import 'package:intl/intl.dart';
...
String date = "Wed Sep 07 11:11:19 GMT+05:30 2022";
DateFormat formatter = DateFormat("EEE MMM dd HH:mm:ss zXXX yyyy");
DateTime formattedDateTime = formatter.parse(date);
But getting an Exception
FormatException: Trying to read XXX from Wed Sep 07 11:11:19 GMT+05:30 2022 at position 24
Tested the date format with this tool.

Search data field by year in Janusgraph

I have a 'Date' property on my 'Patent' node class that is formatted like this:
==>Sun Jan 28 00:08:00 UTC 2007
==>Tue Jan 27 00:10:00 UTC 1987
==>Wed Jan 10 00:04:00 UTC 2001
==>Sun Jan 17 00:08:00 UTC 2010
==>Tue Jan 05 00:10:00 UTC 2010
==>Thu Jan 28 00:09:00 UTC 2010
==>Wed Jan 04 00:09:00 UTC 2012
==>Wed Jan 09 00:12:00 UTC 2008
==>Wed Jan 24 00:04:00 UTC 2018
And is stored as class java.util.Date in the database.
Is there a way to search this field to return all the 'Patents' for a particular year?
I tried variations of g.V().has("Patent", "date", 2000).values(). However, it doesn't return any results or an error message.
Is there a way to search this property field by year or do I need to create a separate property that just contains year?
You do not need to create a separate property for the year. JanusGraph recognizes the Date data type and can filter by date values.
gremlin> dateOfBirth1 = new GregorianCalendar(2000, 5, 6).getTime()
==>Tue Jun 06 00:00:00 MDT 2000
gremlin> g.addV("person").property("name", "Person 1").property("dateOfBirth", dateOfBirth1)
==>v[4144]
gremlin> dateOfBirth2 = new GregorianCalendar(2001, 5, 6).getTime()
==>Wed Jun 06 00:00:00 MDT 2001
gremlin> g.addV("person").property("name", "Person 2").property("dateOfBirth", dateOfBirth2)
==>v[4328]
gremlin> dateOfBirthFrom = new GregorianCalendar(2000, 0, 1).getTime()
==>Sat Jan 01 00:00:00 MST 2000
gremlin> dateOfBirthTo = new GregorianCalendar(2001, 0, 1).getTime()
==>Mon Jan 01 00:00:00 MST 2001
gremlin> g.V().hasLabel("person").
......1> has("dateOfBirth", gte(dateOfBirthFrom)).
......2> has("dateOfBirth", lt(dateOfBirthTo)).
......3> values("name")
==>Person 1

convert Tue Jul 07 2019 12:30:42 to timestamp scala /spark

I need to transform convert Tue Jul 07 2020 12:30:42 to timestamp with scala for spark.
So the expected result will be : 2020-07-07 12:30:42
Any idea, how to make this please ?
You can use the to_timestamp function.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY") <-- Spark 3.0 Only.
df.withColumn("date", to_timestamp('string, "E MMM dd yyyy HH:mm:ss"))
.show(false)
+------------------------+-------------------+
|string |date |
+------------------------+-------------------+
|Tue Jul 07 2020 12:30:42|2020-07-07 12:30:42|
+------------------------+-------------------+

Unexpected date when converting string to timestamp in pyspark

The following example:
import pyspark.sql.functions as F
df = sqlContext.createDataFrame([('Feb 4 1997 10:30:00',), ('Jan 14 2000 13:33:00',), ('Jan 13 2020 01:20:12',)], ['t'])
ts_format = "MMM dd YYYY HH:mm:ss"
df.select(df.t,
F.to_timestamp(df.t, ts_format),
F.date_format(F.current_timestamp(), ts_format))\
.show(truncate=False)
Outputs:
+--------------------+-----------------------------------------+------------------------------------------------------+
|t |to_timestamp(`t`, 'MMM dd YYYY HH:mm:ss')|date_format(current_timestamp(), MMM dd YYYY HH:mm:ss)|
+--------------------+-----------------------------------------+------------------------------------------------------+
|Feb 4 1997 10:30:00 |1996-12-29 10:30:00 |Jan 22 2020 14:38:28 |
|Jan 14 2000 13:33:00|1999-12-26 13:33:00 |Jan 22 2020 14:38:28 |
|Jan 22 2020 14:29:12|2019-12-29 14:29:12 |Jan 22 2020 14:38:28 |
+--------------------+-----------------------------------------+------------------------------------------------------+
Question:
The conversion from current_timestamp() to string works with the given format. Why the other way (String to Timestamp) doesn't?
Notes:
pyspark 2.4.4 docs point to simpleDateFormat patterns
Changing the year's format to lowercase fixed the issue
ts_format = "MMM dd yyyy HH:mm:ss"

in RDD how do you get apply function like MIN/MAX for an Iterable class

I would like to find out the efficient way to apply function to an RDD:
Here is what I am trying to do :
I have defined the following class:
case class Type(Key: String, category: String, event: String, date: java.util.Date, value: Double)
case class Type2(Key: String, Mdate: java.util.Date, label: Double)
then a loaded an RDD:
val TypeRDD: RDD[Type] = types.map(s=>s.split(",")).map(s=>Type(s(0), s(1), s(2),dateFormat.parse(s(3).asInstanceOf[String]), s(4).toDouble))
val Type2RDD: RDD[Type2] = types2.map(s=>s.split(",")).map(s=>Type2(s(0), dateFormat.parse(s(1).asInstanceOf[String]), s(2).toDouble))
Then I try to create two new RDD - one that has Type.Key = Type2.Key and another one that has Type.Key not in Type2
val grpType = TypeRDD.groupBy(_.Key1)
vl grpType2 = Type2RDD.groupBy(_.Key1)
//get data where they Key1 does not exists in Type2 and return the values in grpType1
val tempresult = grpType fullOuterJoin grpType2
val result = tempresult.filter(_._2._2.isEmpty).map(_._2._1)
//get data where Type.Key == Type2.Key
val result2 = grpType.join.grpType2.map(_._2)
UPDATED:
typeRDD =
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(40,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
type2RDD =
(24,Wed Dec 22 00:00:00 EST 3080,1.0)
(40,Wed Jan 22 00:00:00 EST 3080,1.0)
SO FOR THE RESULT 1 : I would like to get the following
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
FOR RESULT 2 :
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(40,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
AND I THEN WANT TO COUNT THE NUMBER OF EVENTS PER KEY
RESULT :
19 EVENT1 2
19 EVENT2 1
19 EVENT3 2
21 EVENT3 2
RESULT2:
24 EVENT2 2
40 EVENT1 1
THEN I WANT TO GET THE MIN/MAX/AVG FOR THE EVENTS
1. RESULT1 MIN EVENT COUNT = 1
RESULT1 MAX EVENT COUNT = 5
RESULT1 AVG EVENT COUNT = 10/4 = 2.5