Unix timestamp granularity changed to hours instead of milliseconds - scala

I have a Spark data frame with the column timestamp. I need to create event_hour in unix_timestamp format out of this column. The current issue is that the timestamp is in unix_timestamp format with a granularity of milliseconds while I need the granularity of hours.
Current values for timestamp:
1653192037
1653192026
1653192025
1653192024
1653192023
1653192022
Expected values:
1653192000
1653195600
1653199200
1653202800
How can I achieve that using Spark functions?
I've already tried to convert it to timestamp and then format it but I got null as the result:
inputDf
.withColumn("event_hour", unix_timestamp(date_format($"timestamp".cast(TimestampType), "MM-dd-yyyy HH")))

A (not very explicit but) efficient way would be to use modulus operation with 3600 (as 3600 seconds = 1 hour):
timestamp_hour = timestamp_second - (timestamp_second % 3600)
This assumes you are manipulating data as numeric.

You can use DateUtils API,
import org.apache.commons.lang3.time.DateUtils;
Long epochTimestamp_hour = DateUtils.truncate(Timestamp_column, Calendar.HOUR)).getTime();
create new column of type timestamp
use that column to truncate timestamp to epochTimestamp_hour

Related

Convert using unixtimestamp to Date

I have a field in a dataframe that has a column with date like 1632838270314 as an example
I want to convert it to date like 'yyyy-MM-dd' I have this so far but it doesn't work:
date = df['createdOn'].cast(StringType())
df = df.withColumn('date_key',unix_timestamp(date),'yyyy-MM-dd').cast("date"))
createdOn is the field that derives the date_key
The method unix_timestamp() is for converting a timestamp or date string into the number seconds since 01-01-1970 ("epoch"). I understand that you want to do the opposite.
Your example value "1632838270314" seems to be milliseconds since epoch.
Here you can simply cast it after converting from milliseconds to seconds:
from pyspark.sql import functions as F
df = sql_context.createDataFrame([
Row(unix_in_ms=1632838270314),
])
(
df
.withColumn('timestamp_type', (F.col('unix_in_ms')/1e3).cast('timestamp'))
.withColumn('date_type', F.to_date('timestamp_type'))
.withColumn('string_type', F.col('date_type').cast('string'))
.withColumn('date_to_unix_in_s', F.unix_timestamp('string_type', 'yyyy-MM-dd'))
.show(truncate=False)
)
# Output
+-------------+-----------------------+----------+-----------+-----------------+
|unix_in_ms |timestamp_type |date_type |string_type|date_to_unix_in_s|
+-------------+-----------------------+----------+-----------+-----------------+
|1632838270314|2021-09-28 16:11:10.314|2021-09-28|2021-09-28 |1632780000 |
+-------------+-----------------------+----------+-----------+-----------------+
You can combine the conversion into a single command:
df.withColumn('date_key', F.to_date((F.col('unix_in_ms')/1e3).cast('timestamp')).cast('string'))

How to run the postgres query with date as input on the column with timestamp in long format

I want to query postgres database table which has the column with timestamp in long milliseconds. But I have the time in date format "yyyy-MM-dd HH:mm:ssZ" like this. How can I convert this date format to long milliseconds to run the query?
You can either convert your long value to a proper timestamp:
select *
from the_table
where to_timestamp(the_millisecond_column / 1000) = timestamp '2020-10-05 07:42'
Or extract the seconds from the timestamp value :
select *
from the_table
where the_millisecond_column = extract(epoch from timestamp '2020-10-05 07:42') * 1000
The better solution is however to convert that column to a proper timestamp column to avoid the constant conversion between (milliseconds) and proper timestamp values

Scala/Java joda.time not converting date in 24 hours format

I am trying to convert a long utc value into "yyyy-MM-dd HH:mm:ss" formatted pattern. I am expecting my data to be converted on 24 hours range scale and in GMT. My code passes all the test cases, I push the data into database using the jar that is newly built with this code -
dbRecord("order_dt_utc") = if (orderTs.isDefined) Some(new DateTime(orderTs.get, DateTimeZone.UTC).toString("yyyy-MM-dd HH:mm:ss")) else None
and now, when I query my database, I find that the data is still converting on 12 hours range. The query -
SELECT order_id, order_dt, order_dt_utc, order_ts_utc, from_unixtime(order_ts_utc/1000) FROM order_items where order_dt >= '2018-08-01' AND order_dt <= '2018-08-02' ORDER BY order_dt_utc LIMIT 1000;
And you can see the the values are not matching in the columns from_unixtime(order_ts_utc/1000) and order_dt_utc -
I am not able to figure the reason for this behaviour.
To convert Time Zone use the function first:
CONVERT_TZ (dateobj, oldtz, newtz)
After that use the date_format function:
date_format(from_unixtime(order_ts_utc), '%Y-%m-%d %H:%i:%s');
to format your time to 00-23 format.

Convert string epoch to string timestamp in Scala

I have a column ORDER_DATE with epoch timestamp in string. How can I convert this column with string like str = "1536309236032" which is time in epoch to a string with format: 2018-09-07T14:03:56.032Z in Scala?
Currently I am using:
from_unixtime(input.col(ORDER_DATE), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
but this converts incorrectly to 50668-08-21 01:10:00.000. Here, it increases the year and increments 000 for milliseconds.
I don't want to divide by 1000 as we would like to have the result in milli seconds.
In the documentation, the definition of from_unixtime is as follows:
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
It uses seconds and are thus not compatible with milliseconds which is why the result is wrong. To convert the epoch timestamp and keeping the millisecond information, you can use concat:
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
df.withColumn("time", concat(
from_unixtime($"ORDER_DATE"/1000, "yyyy-MM-dd'T'HH:mm:ss."),
$"ORDER_DATE".substr(length($"ORDER_DATE")-2, length($"ORDER_DATE")),
lit("Z")))
This will work since the last 3 digits in the epoch timestamp is the same as those in the wanted result.
I got the idea from #Shaido and I did something similar. Finally, this solved the issue for me:
input.withColumn("time",
concat(from_unixtime(input.col("ORDER_DATE")/1000, "yyyy-MM-dd'T'HH:mm:ss"),
typedLit("."), substring(input.col("ORDER_DATE"), 11, 3), typedLit("Z")))

Convert String timestamp to total minutes

I have a Unix timestamp as String and I would like to extract hour and minutes in order to convert this timestamp into total minutes.
val timestamp = "1469768809"
It would be straightforward if timestamp were not String (i.e. using timestamp.get(Calendar.HOUR_OF_DAY)). However I don't know how to deal with a String.
Looks like you have epoch time in seconds. Convert the string to an Int, then do the necessary calculation:
val minutes = "1469768809".toInt / 60
Unless what you actually want is a datetime, in which case you should look into some of the date/time libraries for scala, e.g. nscala-time