Convert string epoch to string timestamp in Scala - scala

I have a column ORDER_DATE with epoch timestamp in string. How can I convert this column with string like str = "1536309236032" which is time in epoch to a string with format: 2018-09-07T14:03:56.032Z in Scala?
Currently I am using:
from_unixtime(input.col(ORDER_DATE), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
but this converts incorrectly to 50668-08-21 01:10:00.000. Here, it increases the year and increments 000 for milliseconds.
I don't want to divide by 1000 as we would like to have the result in milli seconds.

In the documentation, the definition of from_unixtime is as follows:
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
It uses seconds and are thus not compatible with milliseconds which is why the result is wrong. To convert the epoch timestamp and keeping the millisecond information, you can use concat:
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
df.withColumn("time", concat(
from_unixtime($"ORDER_DATE"/1000, "yyyy-MM-dd'T'HH:mm:ss."),
$"ORDER_DATE".substr(length($"ORDER_DATE")-2, length($"ORDER_DATE")),
lit("Z")))
This will work since the last 3 digits in the epoch timestamp is the same as those in the wanted result.

I got the idea from #Shaido and I did something similar. Finally, this solved the issue for me:
input.withColumn("time",
concat(from_unixtime(input.col("ORDER_DATE")/1000, "yyyy-MM-dd'T'HH:mm:ss"),
typedLit("."), substring(input.col("ORDER_DATE"), 11, 3), typedLit("Z")))

Related

Unix timestamp granularity changed to hours instead of milliseconds

I have a Spark data frame with the column timestamp. I need to create event_hour in unix_timestamp format out of this column. The current issue is that the timestamp is in unix_timestamp format with a granularity of milliseconds while I need the granularity of hours.
Current values for timestamp:
1653192037
1653192026
1653192025
1653192024
1653192023
1653192022
Expected values:
1653192000
1653195600
1653199200
1653202800
How can I achieve that using Spark functions?
I've already tried to convert it to timestamp and then format it but I got null as the result:
inputDf
.withColumn("event_hour", unix_timestamp(date_format($"timestamp".cast(TimestampType), "MM-dd-yyyy HH")))
A (not very explicit but) efficient way would be to use modulus operation with 3600 (as 3600 seconds = 1 hour):
timestamp_hour = timestamp_second - (timestamp_second % 3600)
This assumes you are manipulating data as numeric.
You can use DateUtils API,
import org.apache.commons.lang3.time.DateUtils;
Long epochTimestamp_hour = DateUtils.truncate(Timestamp_column, Calendar.HOUR)).getTime();
create new column of type timestamp
use that column to truncate timestamp to epochTimestamp_hour

Convert using unixtimestamp to Date

I have a field in a dataframe that has a column with date like 1632838270314 as an example
I want to convert it to date like 'yyyy-MM-dd' I have this so far but it doesn't work:
date = df['createdOn'].cast(StringType())
df = df.withColumn('date_key',unix_timestamp(date),'yyyy-MM-dd').cast("date"))
createdOn is the field that derives the date_key
The method unix_timestamp() is for converting a timestamp or date string into the number seconds since 01-01-1970 ("epoch"). I understand that you want to do the opposite.
Your example value "1632838270314" seems to be milliseconds since epoch.
Here you can simply cast it after converting from milliseconds to seconds:
from pyspark.sql import functions as F
df = sql_context.createDataFrame([
Row(unix_in_ms=1632838270314),
])
(
df
.withColumn('timestamp_type', (F.col('unix_in_ms')/1e3).cast('timestamp'))
.withColumn('date_type', F.to_date('timestamp_type'))
.withColumn('string_type', F.col('date_type').cast('string'))
.withColumn('date_to_unix_in_s', F.unix_timestamp('string_type', 'yyyy-MM-dd'))
.show(truncate=False)
)
# Output
+-------------+-----------------------+----------+-----------+-----------------+
|unix_in_ms |timestamp_type |date_type |string_type|date_to_unix_in_s|
+-------------+-----------------------+----------+-----------+-----------------+
|1632838270314|2021-09-28 16:11:10.314|2021-09-28|2021-09-28 |1632780000 |
+-------------+-----------------------+----------+-----------+-----------------+
You can combine the conversion into a single command:
df.withColumn('date_key', F.to_date((F.col('unix_in_ms')/1e3).cast('timestamp')).cast('string'))

Scala/Java joda.time not converting date in 24 hours format

I am trying to convert a long utc value into "yyyy-MM-dd HH:mm:ss" formatted pattern. I am expecting my data to be converted on 24 hours range scale and in GMT. My code passes all the test cases, I push the data into database using the jar that is newly built with this code -
dbRecord("order_dt_utc") = if (orderTs.isDefined) Some(new DateTime(orderTs.get, DateTimeZone.UTC).toString("yyyy-MM-dd HH:mm:ss")) else None
and now, when I query my database, I find that the data is still converting on 12 hours range. The query -
SELECT order_id, order_dt, order_dt_utc, order_ts_utc, from_unixtime(order_ts_utc/1000) FROM order_items where order_dt >= '2018-08-01' AND order_dt <= '2018-08-02' ORDER BY order_dt_utc LIMIT 1000;
And you can see the the values are not matching in the columns from_unixtime(order_ts_utc/1000) and order_dt_utc -
I am not able to figure the reason for this behaviour.
To convert Time Zone use the function first:
CONVERT_TZ (dateobj, oldtz, newtz)
After that use the date_format function:
date_format(from_unixtime(order_ts_utc), '%Y-%m-%d %H:%i:%s');
to format your time to 00-23 format.

Always get "1970" when extracting a year from timestamp

I have a timestamp like "1461819600". The I execute this code in a distributed environment as val campaign_startdate_year: String = Utils.getYear(campaign_startdate_timestamp).toString
The problem is that I always get the same year 1970. Which might be the reason of it?
import com.github.nscala_time.time.Imports._
def getYear(timestamp: Any): Int = {
var dt = 2017
if (!timestamp.toString.isEmpty)
{
dt = new DateTime(timestamp.toString.toLong).getYear // toLong should be multiplied by 1000 to get millisecond value
}
dt
}
The same issue occurs when I want to get a day of a month. I get 17 instead of 28.
def getDay(timestamp: Any): Int = {
var dt = 1
if (!timestamp.toString.isEmpty)
{
dt = new DateTime(timestamp.toString.toLong).getDayOfYear
}
dt
}
The timestamp you have is a number of seconds since 01-01-1970, 00:00:00 UTC.
Java (and Scala) usually use timestamps that are a number of milliseconds since 01-01-1970, 00:00:00 UTC.
In other words, you need to multiply the number with 1000.
The timestamp that you have seems to be in seconds since the epoch (i.e. a Unix timestamp). Java time utilities expect the timestamp to be in milliseconds.
Just multiply that value by 1000 and you should get the expected results.
You can rely on either on spark sql function which have some date utilities (get year/month/day, add day/month) or you can use JodaTime library to have more control over Date and DateTime, like in my answer here: How to replace in values in spark dataframes after recalculations?

Convert String timestamp to total minutes

I have a Unix timestamp as String and I would like to extract hour and minutes in order to convert this timestamp into total minutes.
val timestamp = "1469768809"
It would be straightforward if timestamp were not String (i.e. using timestamp.get(Calendar.HOUR_OF_DAY)). However I don't know how to deal with a String.
Looks like you have epoch time in seconds. Convert the string to an Int, then do the necessary calculation:
val minutes = "1469768809".toInt / 60
Unless what you actually want is a datetime, in which case you should look into some of the date/time libraries for scala, e.g. nscala-time