I'm using intl package to format the String.
import 'package:intl/intl.dart';
...
String date = "Wed Sep 07 11:11:19 GMT+05:30 2022";
DateFormat formatter = DateFormat("EEE MMM dd HH:mm:ss zXXX yyyy");
DateTime formattedDateTime = formatter.parse(date);
But getting an Exception
FormatException: Trying to read XXX from Wed Sep 07 11:11:19 GMT+05:30 2022 at position 24
Tested the date format with this tool.
I have a 'Date' property on my 'Patent' node class that is formatted like this:
==>Sun Jan 28 00:08:00 UTC 2007
==>Tue Jan 27 00:10:00 UTC 1987
==>Wed Jan 10 00:04:00 UTC 2001
==>Sun Jan 17 00:08:00 UTC 2010
==>Tue Jan 05 00:10:00 UTC 2010
==>Thu Jan 28 00:09:00 UTC 2010
==>Wed Jan 04 00:09:00 UTC 2012
==>Wed Jan 09 00:12:00 UTC 2008
==>Wed Jan 24 00:04:00 UTC 2018
And is stored as class java.util.Date in the database.
Is there a way to search this field to return all the 'Patents' for a particular year?
I tried variations of g.V().has("Patent", "date", 2000).values(). However, it doesn't return any results or an error message.
Is there a way to search this property field by year or do I need to create a separate property that just contains year?
You do not need to create a separate property for the year. JanusGraph recognizes the Date data type and can filter by date values.
gremlin> dateOfBirth1 = new GregorianCalendar(2000, 5, 6).getTime()
==>Tue Jun 06 00:00:00 MDT 2000
gremlin> g.addV("person").property("name", "Person 1").property("dateOfBirth", dateOfBirth1)
==>v[4144]
gremlin> dateOfBirth2 = new GregorianCalendar(2001, 5, 6).getTime()
==>Wed Jun 06 00:00:00 MDT 2001
gremlin> g.addV("person").property("name", "Person 2").property("dateOfBirth", dateOfBirth2)
==>v[4328]
gremlin> dateOfBirthFrom = new GregorianCalendar(2000, 0, 1).getTime()
==>Sat Jan 01 00:00:00 MST 2000
gremlin> dateOfBirthTo = new GregorianCalendar(2001, 0, 1).getTime()
==>Mon Jan 01 00:00:00 MST 2001
gremlin> g.V().hasLabel("person").
......1> has("dateOfBirth", gte(dateOfBirthFrom)).
......2> has("dateOfBirth", lt(dateOfBirthTo)).
......3> values("name")
==>Person 1
I need to transform convert Tue Jul 07 2020 12:30:42 to timestamp with scala for spark.
So the expected result will be : 2020-07-07 12:30:42
Any idea, how to make this please ?
You can use the to_timestamp function.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY") <-- Spark 3.0 Only.
df.withColumn("date", to_timestamp('string, "E MMM dd yyyy HH:mm:ss"))
.show(false)
+------------------------+-------------------+
|string |date |
+------------------------+-------------------+
|Tue Jul 07 2020 12:30:42|2020-07-07 12:30:42|
+------------------------+-------------------+
The following example:
import pyspark.sql.functions as F
df = sqlContext.createDataFrame([('Feb 4 1997 10:30:00',), ('Jan 14 2000 13:33:00',), ('Jan 13 2020 01:20:12',)], ['t'])
ts_format = "MMM dd YYYY HH:mm:ss"
df.select(df.t,
F.to_timestamp(df.t, ts_format),
F.date_format(F.current_timestamp(), ts_format))\
.show(truncate=False)
Outputs:
+--------------------+-----------------------------------------+------------------------------------------------------+
|t |to_timestamp(`t`, 'MMM dd YYYY HH:mm:ss')|date_format(current_timestamp(), MMM dd YYYY HH:mm:ss)|
+--------------------+-----------------------------------------+------------------------------------------------------+
|Feb 4 1997 10:30:00 |1996-12-29 10:30:00 |Jan 22 2020 14:38:28 |
|Jan 14 2000 13:33:00|1999-12-26 13:33:00 |Jan 22 2020 14:38:28 |
|Jan 22 2020 14:29:12|2019-12-29 14:29:12 |Jan 22 2020 14:38:28 |
+--------------------+-----------------------------------------+------------------------------------------------------+
Question:
The conversion from current_timestamp() to string works with the given format. Why the other way (String to Timestamp) doesn't?
Notes:
pyspark 2.4.4 docs point to simpleDateFormat patterns
Changing the year's format to lowercase fixed the issue
ts_format = "MMM dd yyyy HH:mm:ss"
I would like to find out the efficient way to apply function to an RDD:
Here is what I am trying to do :
I have defined the following class:
case class Type(Key: String, category: String, event: String, date: java.util.Date, value: Double)
case class Type2(Key: String, Mdate: java.util.Date, label: Double)
then a loaded an RDD:
val TypeRDD: RDD[Type] = types.map(s=>s.split(",")).map(s=>Type(s(0), s(1), s(2),dateFormat.parse(s(3).asInstanceOf[String]), s(4).toDouble))
val Type2RDD: RDD[Type2] = types2.map(s=>s.split(",")).map(s=>Type2(s(0), dateFormat.parse(s(1).asInstanceOf[String]), s(2).toDouble))
Then I try to create two new RDD - one that has Type.Key = Type2.Key and another one that has Type.Key not in Type2
val grpType = TypeRDD.groupBy(_.Key1)
vl grpType2 = Type2RDD.groupBy(_.Key1)
//get data where they Key1 does not exists in Type2 and return the values in grpType1
val tempresult = grpType fullOuterJoin grpType2
val result = tempresult.filter(_._2._2.isEmpty).map(_._2._1)
//get data where Type.Key == Type2.Key
val result2 = grpType.join.grpType2.map(_._2)
UPDATED:
typeRDD =
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(40,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
type2RDD =
(24,Wed Dec 22 00:00:00 EST 3080,1.0)
(40,Wed Jan 22 00:00:00 EST 3080,1.0)
SO FOR THE RESULT 1 : I would like to get the following
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(19,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
(21,EVENT3,TEST3,Sun Aug 21 00:00:00 EDT 3396,1.0)
FOR RESULT 2 :
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(24,EVENT2,TEST2,Sun Aug 21 00:00:00 EDT 3396,1.0)
(40,EVENT1,TEST1,Sun Aug 21 00:00:00 EDT 3396,1.0)
AND I THEN WANT TO COUNT THE NUMBER OF EVENTS PER KEY
RESULT :
19 EVENT1 2
19 EVENT2 1
19 EVENT3 2
21 EVENT3 2
RESULT2:
24 EVENT2 2
40 EVENT1 1
THEN I WANT TO GET THE MIN/MAX/AVG FOR THE EVENTS
1. RESULT1 MIN EVENT COUNT = 1
RESULT1 MAX EVENT COUNT = 5
RESULT1 AVG EVENT COUNT = 10/4 = 2.5