How to convert unix timestamp (long) to datetime in pyspark sql?

How to convert unix timestamp (long) to datetime in pyspark sql? - pyspark

In pyspark sql, I have unix timestamp column that is a long - I tried using the following but the output was not correct.
from_unixtime(col("firstAvailableDateTimeUnix"), "yyyy-MM-dd HH:mm:ss")
from_unixtime output

import org.apache.spark.sql.functions._
val df = Seq(1651484635297L).toDF("firstAvailableDateTimeUnix").withColumn("goodDate",
from_unixtime(col("firstAvailableDateTimeUnix")/1000, "yyyy-MM-dd HH:mm:ss")
)
df.show(false)
// +--------------------------+-------------------+
// |firstAvailableDateTimeUnix|goodDate |
// +--------------------------+-------------------+
// |1651484635297 |2022-05-02 12:43:55|
// +--------------------------+-------------------+

All you need to do is casting and then use date_format to format your timestamp:
date_format(col("firstAvailableDateTimeUnix").cast('timestamp'), "yyyy-MM-dd HH:mm:ss"))

Related

Convert string (with timestamp) to timestamp in pyspark

I have a dataframe with a string datetime column.
I am converting it to timestamp, but the values are changing.
Following is my code, can anyone help me to convert without changing values.
df=spark.createDataFrame(
data = [ ("1","2020-04-06 15:06:16 +00:00")],
schema=["id","input_timestamp"])
df.printSchema()
#Timestamp String to DateType
df = df.withColumn("timestamp",to_timestamp("input_timestamp"))
# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp_string', \
to_timestamp('timestamp').cast('string')) \
.show(truncate=False)
This is the output:
+---+--------------------------+-------------------+-------------------+
|id |input_timestamp |timestamp |timestamp_string |
+---+--------------------------+-------------------+-------------------+
|1 |2020-04-06 15:06:16 +00:00|2020-04-06 08:06:16|2020-04-06 08:06:16|
+---+--------------------------+-------------------+-------------------+
I want to know why the hour is changing from 15 to 8 and how can I prevent it?

I believe to_timestamp is converting timestamp value to your local time as you have +00:00 in your data.
Try to pass the format to to_timestamp() function.
Example:
from pyspark.sql.functions import to_timestamp
df.withColumn("timestamp",to_timestamp(col("input_timestamp"),"yyyy-MM-dd HH:mm:ss +00:00")).show(10,False)
#+---+--------------------------+-------------------+
#|id |input_timestamp |timestamp |
#+---+--------------------------+-------------------+
#|1 |2020-04-06 15:06:16 +00:00|2020-04-06 15:06:16|
#+---+--------------------------+-------------------+

from pyspark.sql.functions import to_utc_timestamp
df = spark.createDataFrame(
data=[('1', '2020-04-06 15:06:16 +00:00')],
schema=['id', 'input_timestamp'])
df.printSchema()
df = df.withColumn('timestamp', to_utc_timestamp('input_timestamp',
your_local_timezone))
df.withColumn('timestamp_string', df.timestamp.cast('string')).show(truncate=False)
Replace your_local_timezone with the actual value.

Converting timestamp format in dataframe

I have read a csv file and made a dataframe where timestamp column is in format "11/12/2020 3:01".
How do I convert this into "yyyy-mm-dd hh:mm:ss.ssssss" format for the data of that particular timestamp column?

import org.apache.spark.sql.functions._
df.withColumn("timestamp_col",
date_format(
unix_timestamp($"timestamp_col", "dd/MM/yyyy h:mm").cast("timestamp"),
"yyyy-MM-dd hh:mm:ss.SSSSSS"
)
)

watch for .strftime in the documentation
https://docs.python.org/3/library/datetime.html
timestamp.strftime("%d.%m.%Y")

How to convert timestamp column to epoch seconds?

How do you convert a timestamp column to epoch seconds?
var df = sc.parallelize(Seq("2018-07-01T00:00:00Z")).toDF("date_string")
df = df.withColumn("timestamp", $"date_string".cast("timestamp"))
df.show(false)
DataFrame:
+--------------------+---------------------+
|date_string |timestamp |
+--------------------+---------------------+
|2018-07-01T00:00:00Z|2018-07-01 00:00:00.0|
+--------------------+---------------------+

If you have a timestamp you can cast it to a long to get the epoch seconds
df = df.withColumn("epoch_seconds", $"timestamp".cast("long"))
df.show(false)
DataFrame
+--------------------+---------------------+-------------+
|date_string |timestamp |epoch_seconds|
+--------------------+---------------------+-------------+
|2018-07-01T00:00:00Z|2018-07-01 00:00:00.0|1530403200 |
+--------------------+---------------------+-------------+

Use unix_timestamp from org.apache.spark.functions. It can a timestamp column or from a string column where it is possible to specify the format. From the documentation:
public static Column unix_timestamp(Column s)
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.
public static Column unix_timestamp(Column s, String p)
Convert time string with given pattern (see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html) to Unix time stamp (in seconds), return null if fail.
Use as follows:
import org.apache.spark.functions._
df.withColumn("epoch_seconds", unix_timestamp($"timestamp")))
or if the column is a string with other format:
df.withColumn("epoch_seconds", unix_timestamp($"date_string", "yyyy-MM-dd'T'HH:mm:ss'Z'")))

It can be easily done with unix_timestamp function in spark SQL like this:
spark.sql("SELECT unix_timestamp(inv_time) AS time_as_long FROM agg_counts LIMIT 10").show()
Hope this helps.

You can use the function unix_timestamp and cast it into any datatype.
Example:
val df1 = df.select(unix_timestamp($"date_string", "yyyy-MM-dd HH:mm:ss").cast(LongType).as("epoch_seconds"))

How to change date format in Spark?

I have the following DataFrame:
+----------+-------------------+
| timestamp| created|
+----------+-------------------+
|1519858893|2018-03-01 00:01:33|
|1519858950|2018-03-01 00:02:30|
|1519859900|2018-03-01 00:18:20|
|1519859900|2018-03-01 00:18:20|
How to create a timestamp correctly`?
I was able to create timestamp column which is epoch timestamp, but dates to not coincide:
df.withColumn("timestamp",unix_timestamp($"created"))
For example, 1519858893 points to 2018-02-28.

Just use date_format and to_utc_timestamp inbuilt functions
import org.apache.spark.sql.functions._
df.withColumn("timestamp", to_utc_timestamp(date_format(col("created"), "yyy-MM-dd"), "Asia/Kathmandu"))

Try below code
df.withColumn("dateColumn", df("timestamp").cast(DateType))

You can check one solution here https://stackoverflow.com/a/46595413
To elaborate more on that with the dataframe having different formats of timestamp/date in string, you can do this -
val df = spark.sparkContext.parallelize(Seq("2020-04-21 10:43:12.000Z", "20-04-2019 10:34:12", "11-30-2019 10:34:12", "2020-05-21 21:32:43", "20-04-2019", "2020-04-21")).toDF("ts")
def strToDate(col: Column): Column = {
val formats: Seq[String] = Seq("dd-MM-yyyy HH:mm:SS", "yyyy-MM-dd HH:mm:SS", "dd-MM-yyyy", "yyyy-MM-dd")
coalesce(formats.map(f => to_timestamp(col, f).cast(DateType)): _*)
}
val formattedDF = df.withColumn("dt", strToDate(df.col("ts")))
formattedDF.show()
+--------------------+----------+
| ts| dt|
+--------------------+----------+
|2020-04-21 10:43:...|2020-04-21|
| 20-04-2019 10:34:12|2019-04-20|
| 2020-05-21 21:32:43|2020-05-21|
| 20-04-2019|2019-04-20|
| 2020-04-21|2020-04-21|
+--------------------+----------+
Note: - This code assumes that data does not contain any column of format -> MM-dd-yyyy, MM-dd-yyyy HH:mm:SS

String datetime into spark sql

How do we convert this date time string 2018-02-07 00:45 into sparksql timestamp..tried
to_timestamp('2018-02-07 00:45', 'yyyy-MM-dd HH:mm')
and
date_format('2018-02-07 00:45', 'y-MM-dd hh:mm').cast(TimestampType()
both did not work.
This is in pyspark..

from pyspark.sql.functions import to_timestamp
(sc
.parallelize([Row(dt='2016_08_21 11_31_08')])
.toDF()
.withColumn("parsed", to_timestamp("dt", "yyyy_MM_dd hh_mm_ss"))
.show(1, False))

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to convert unix timestamp (long) to datetime in pyspark sql? - pyspark

In pyspark sql, I have unix timestamp column that is a long - I tried using the following but the output was not correct. from_unixtime(col("firstAvailableDateTimeUnix"), "yyyy-MM-dd HH:mm:ss") from_unixtime output

All you need to do is casting and then use date_format to format your timestamp: date_format(col("firstAvailableDateTimeUnix").cast('timestamp'), "yyyy-MM-dd HH:mm:ss"))

Related

Convert string (with timestamp) to timestamp in pyspark

Converting timestamp format in dataframe

How to convert timestamp column to epoch seconds?

How to change date format in Spark?

String datetime into spark sql

Categories

Resources