Convert Spark.sql timestamp to java.time.Instant in Scala - scala

Very Simple question - Need to convert timestamp column in spark dataframe to java.time.Instant format

Here you can convert to java.time.instant:
val time1 = spark
.sql("...")
.as[java.sql.Timestamp]
.first()
.toInstant

Related

How to format datatype to TimestampType in spark DataFrame- Scala

I'm trying to cast the column type to Timestamptype for which the value is in the format "11/14/2022 4:48:24 PM". However when I display the results I see the values as null.
Here is the sample code that I'm using to cast the timestamp field.
val messages = df.withColumn("Offset", $"Offset".cast(LongType))
.withColumn("Time(readable)", $"EnqueuedTimeUtc".cast(TimestampType))
.withColumn("Body", $"Body".cast(StringType))
.select("Offset", "Time(readable)", "Body")
display(messages)
4
Is there any other way I can try to avoid the null values?
Instead of casting to TimestampType, you can use to_timestamp function and provide the time format explicitly, like so:
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._
val time_df = Seq((62536, "11/14/2022 4:48:24 PM"), (62537, "12/14/2022 4:48:24 PM")).toDF("Offset", "Time")
val messages = time_df
.withColumn("Offset", $"Offset".cast(LongType))
.withColumn("Time(readable)", to_timestamp($"Time", "MM/dd/yyyy h:mm:ss a"))
.select("Offset", "Time(readable)")
messages.show(false)
+------+-------------------+
|Offset|Time(readable) |
+------+-------------------+
|62536 |2022-11-14 16:48:24|
|62537 |2022-12-14 16:48:24|
+------+-------------------+
messages: org.apache.spark.sql.DataFrame = [Offset: bigint, Time(readable): timestamp]
One thing to remember, is that you will have to set one Spark configuration, to allow for legacy time parser policy:
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

How to convert a DateTime with milliseconds into epoch time with milliseconds

I have data in hive table in the below format.
2019-11-21 18:19:15.817
I wrote a sql query as below to get the above column value into epoch format.
val newDF = spark.sql(f"""select TRIM(id) as ID, unix_timestamp(sig_ts) as SIG_TS from table""")
And I am getting the output column SIG_TS as 1574360296 which is not having milliseconds.
How to get the epoch timestamp of a date with milliseconds?
Simple way: Create an UDF since spark's built-in function truncates at seconds.
import java.sql.Timestamp
val fullTimestampUDF = udf{t: Timestamp => t.getTime}
val df = Seq("2019-11-21 18:19:15.817").toDF("sig_ts")
.withColumn("sig_ts_ut", unix_timestamp($"sig_ts"))
.withColumn("sig_ts_ut_long", fullTimestampUDF($"sig_ts"))
df.show(false)
+-----------------------+----------+--------------+
|sig_ts |sig_ts_ut |sig_ts_ut_long|
+-----------------------+----------+--------------+
|2019-11-21 18:19:15.817|1574356755|1574356755817 |
+-----------------------+----------+--------------+

Convert epochmilli to DDMMYYYY - Spark Scala

I have a dataframe with one of the column containing timestamps respresented in epochmilli (column type is long) and I need to convert them to a column with DDMMYY using withColumn
Something like:
1528102439 ---> 040618
How do I achieve this?
val df_DateConverted = df.withColumn("Date",from_unixtime(df.col("timestamp").divide(1000),"ddMMyy"))

How to retrieve last 24-hours data from Spark DataFrame (Scala)?

I want to retrieve the last 24-hours data from my DataFrame.
val data = spark.read.parquet(path_to_parquet_file)
data.createOrReplaceTempView("table")
var df = spark.sql("SELECT datetime, product_PK FROM table WHERE datetime BETWEEN (datetime - 24*3600000) AND datetime")
However, I do not know how to convert datetime to milliseconds using Spark SQL (Spark 2.2.0 and Scala 2.11).
I can do it using DataFrame, but don't know how to merge everything together:
import org.apache.spark.sql.functions.unix_timestamp
df = df.withColumn("unix_timestamp",unix_timestamp(col("datetime"))).drop("datetime")

Extract value from scala TimeStampType

I have a schemaRDD created from a hive query
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val rdd = sqlContext.sql("Select * from mytime")
My RDD contains the following schema
StructField(id,StringType,true)
StructField(t,TimestampType,true)
We have our own custom database and want to same the TimestampType to a string. But I could not find a way to extract the value and save it as a string.
Can you help? Thanks!
What happens if you change your query to:
SELECT id, cast(t as STRING) from mytime