Convert String timestamp to total minutes - scala

I have a Unix timestamp as String and I would like to extract hour and minutes in order to convert this timestamp into total minutes.
val timestamp = "1469768809"
It would be straightforward if timestamp were not String (i.e. using timestamp.get(Calendar.HOUR_OF_DAY)). However I don't know how to deal with a String.

Looks like you have epoch time in seconds. Convert the string to an Int, then do the necessary calculation:
val minutes = "1469768809".toInt / 60
Unless what you actually want is a datetime, in which case you should look into some of the date/time libraries for scala, e.g. nscala-time

Related

Unix timestamp granularity changed to hours instead of milliseconds

I have a Spark data frame with the column timestamp. I need to create event_hour in unix_timestamp format out of this column. The current issue is that the timestamp is in unix_timestamp format with a granularity of milliseconds while I need the granularity of hours.
Current values for timestamp:
1653192037
1653192026
1653192025
1653192024
1653192023
1653192022
Expected values:
1653192000
1653195600
1653199200
1653202800
How can I achieve that using Spark functions?
I've already tried to convert it to timestamp and then format it but I got null as the result:
inputDf
.withColumn("event_hour", unix_timestamp(date_format($"timestamp".cast(TimestampType), "MM-dd-yyyy HH")))
A (not very explicit but) efficient way would be to use modulus operation with 3600 (as 3600 seconds = 1 hour):
timestamp_hour = timestamp_second - (timestamp_second % 3600)
This assumes you are manipulating data as numeric.
You can use DateUtils API,
import org.apache.commons.lang3.time.DateUtils;
Long epochTimestamp_hour = DateUtils.truncate(Timestamp_column, Calendar.HOUR)).getTime();
create new column of type timestamp
use that column to truncate timestamp to epochTimestamp_hour

How to run the postgres query with date as input on the column with timestamp in long format

I want to query postgres database table which has the column with timestamp in long milliseconds. But I have the time in date format "yyyy-MM-dd HH:mm:ssZ" like this. How can I convert this date format to long milliseconds to run the query?
You can either convert your long value to a proper timestamp:
select *
from the_table
where to_timestamp(the_millisecond_column / 1000) = timestamp '2020-10-05 07:42'
Or extract the seconds from the timestamp value :
select *
from the_table
where the_millisecond_column = extract(epoch from timestamp '2020-10-05 07:42') * 1000
The better solution is however to convert that column to a proper timestamp column to avoid the constant conversion between (milliseconds) and proper timestamp values

Scala/Java joda.time not converting date in 24 hours format

I am trying to convert a long utc value into "yyyy-MM-dd HH:mm:ss" formatted pattern. I am expecting my data to be converted on 24 hours range scale and in GMT. My code passes all the test cases, I push the data into database using the jar that is newly built with this code -
dbRecord("order_dt_utc") = if (orderTs.isDefined) Some(new DateTime(orderTs.get, DateTimeZone.UTC).toString("yyyy-MM-dd HH:mm:ss")) else None
and now, when I query my database, I find that the data is still converting on 12 hours range. The query -
SELECT order_id, order_dt, order_dt_utc, order_ts_utc, from_unixtime(order_ts_utc/1000) FROM order_items where order_dt >= '2018-08-01' AND order_dt <= '2018-08-02' ORDER BY order_dt_utc LIMIT 1000;
And you can see the the values are not matching in the columns from_unixtime(order_ts_utc/1000) and order_dt_utc -
I am not able to figure the reason for this behaviour.
To convert Time Zone use the function first:
CONVERT_TZ (dateobj, oldtz, newtz)
After that use the date_format function:
date_format(from_unixtime(order_ts_utc), '%Y-%m-%d %H:%i:%s');
to format your time to 00-23 format.

Convert string epoch to string timestamp in Scala

I have a column ORDER_DATE with epoch timestamp in string. How can I convert this column with string like str = "1536309236032" which is time in epoch to a string with format: 2018-09-07T14:03:56.032Z in Scala?
Currently I am using:
from_unixtime(input.col(ORDER_DATE), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
but this converts incorrectly to 50668-08-21 01:10:00.000. Here, it increases the year and increments 000 for milliseconds.
I don't want to divide by 1000 as we would like to have the result in milli seconds.
In the documentation, the definition of from_unixtime is as follows:
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
It uses seconds and are thus not compatible with milliseconds which is why the result is wrong. To convert the epoch timestamp and keeping the millisecond information, you can use concat:
val spark = SparkSession.builder.getOrCreate()
import spark.implicits._
df.withColumn("time", concat(
from_unixtime($"ORDER_DATE"/1000, "yyyy-MM-dd'T'HH:mm:ss."),
$"ORDER_DATE".substr(length($"ORDER_DATE")-2, length($"ORDER_DATE")),
lit("Z")))
This will work since the last 3 digits in the epoch timestamp is the same as those in the wanted result.
I got the idea from #Shaido and I did something similar. Finally, this solved the issue for me:
input.withColumn("time",
concat(from_unixtime(input.col("ORDER_DATE")/1000, "yyyy-MM-dd'T'HH:mm:ss"),
typedLit("."), substring(input.col("ORDER_DATE"), 11, 3), typedLit("Z")))

Always get "1970" when extracting a year from timestamp

I have a timestamp like "1461819600". The I execute this code in a distributed environment as val campaign_startdate_year: String = Utils.getYear(campaign_startdate_timestamp).toString
The problem is that I always get the same year 1970. Which might be the reason of it?
import com.github.nscala_time.time.Imports._
def getYear(timestamp: Any): Int = {
var dt = 2017
if (!timestamp.toString.isEmpty)
{
dt = new DateTime(timestamp.toString.toLong).getYear // toLong should be multiplied by 1000 to get millisecond value
}
dt
}
The same issue occurs when I want to get a day of a month. I get 17 instead of 28.
def getDay(timestamp: Any): Int = {
var dt = 1
if (!timestamp.toString.isEmpty)
{
dt = new DateTime(timestamp.toString.toLong).getDayOfYear
}
dt
}
The timestamp you have is a number of seconds since 01-01-1970, 00:00:00 UTC.
Java (and Scala) usually use timestamps that are a number of milliseconds since 01-01-1970, 00:00:00 UTC.
In other words, you need to multiply the number with 1000.
The timestamp that you have seems to be in seconds since the epoch (i.e. a Unix timestamp). Java time utilities expect the timestamp to be in milliseconds.
Just multiply that value by 1000 and you should get the expected results.
You can rely on either on spark sql function which have some date utilities (get year/month/day, add day/month) or you can use JodaTime library to have more control over Date and DateTime, like in my answer here: How to replace in values in spark dataframes after recalculations?