Unix Timestamp in Hive provide same results - date

I am using Hive 1.2.1000.
I am actually dealing with conversion to unix timestamp. I'm trying to convert a date with format:
dd/mm/yyyy hh:mm:ss
to unix_timestamp, hence:
unix_timestamp(date,"dd-mm-yyyy hh:mm:ss")
have been used.
More precisely, I have run the following code:
select '09/06/2012 04:02:32', regexp_replace('09/06/2012 04:02:32',"/","-")
which seems to work, in fact the result is: 1326081752
I have also noticed that I have N distinct dates and the distinct on unix_timestamp of these dates is M with M < N.
Doing some manipulations on the data i have seen that there are different dates with the same unix_timestamp.
Hence i've tried to go deeper and i have found a lot of dates, for example
09/06/2012 04:02:32
and
09/12/2012 04:02:32
Now, if I try to run the following code:
select
'09/06/2012 04:02:32',
regexp_replace('09/06/2012 04:02:32',"/","-"),
unix_timestamp(regexp_replace('09/06/2012 04:02:32',"/","-"),"dd-mm-yyyy hh:mm:ss"),
unix_timestamp('09-06-2012 04:02:32',"dd-mm-yyyy hh:mm:ss")
UNION ALL
select '09/12/2012 04:02:32',
regexp_replace('09/12/2012 04:02:32',"/","-"),
unix_timestamp(regexp_replace('09/12/2012 04:02:32',"/","-"),"dd-mm-yyyy hh:mm:ss"),
unix_timestamp('09-12-2012 04:02:32',"dd-mm-yyyy hh:mm:ss")
That's the output:
09/06/2012 04:02:32 09-06-2012 04:02:32 1326081752 1326081752 1326081752
09/12/2012 04:02:32 09-12-2012 04:02:32 1326081752 1326081752 1326081752
Which is clearly the same.
This result can be extended for all the dates where everything is identical except one between dd and mm.
Could you explain me why?
Thanks in advance,
Manuel
Ps. I have also tried with dates with other format for example:
select '2012-06-09 04:02:32', unix_timestamp(regexp_replace('2012-06-09 04:02:32',"/","-"),"yyyy-mm-dd hh:mm:ss")
UNION ALL
select '2012-12-09 04:02:32', unix_timestamp(regexp_replace('2012-12-09 04:02:32',"/","-"),"yyyy-mm-dd hh:mm:ss")
But result is the same.

The problem was that mm stands for minutes.
dd-MM-yyyy hh:mm:ss
instead of
dd-mm-yyyy hh:mm:ss
was the solution to the problem.

Related

pyspark How to filter rows based on HH:mm:ss portion in timestamp column

I have a dataframe in pyspark that has a timestamp string column in the following format:
"11/21/2018 07:21:49 PM"
This is in 24 hours format.
I want to filter the rows in the dataframe based on only the time portion of this string timestamp regardless of the date. For example I want to keep all rows that fall between the hours of 2:00pm and 4:00pm inclusive.
I tried the below to extract the HH:mm:ss and use the function between but it is not working.
# Grabbing only time portion from datetime column
import pyspark.sql.functions as F
time_format = "HH:mm:ss"
split_col = F.split(df['datetime'], ' ')
df = df.withColumn('Time', F.concat(split_col.getItem(1),F.lit(' '),split_col.getItem(2)))
df = df.withColumn('Timestamp', from_unixtime(unix_timestamp('Time', format=time_format)))
df.filter(F.col("Timestamp").between('14:00:00','16:00:00')).show()
Any ideas on how to filter rows only based on the HH:mm:ss portion in a timestamp column regardless of the actual date, would be very appreciated.
Format your timestamp to HH:mm:ss then filter using between clause.
Example:
df=spark.createDataFrame([("11/21/2018 07:21:49 PM",),("11/22/2018 04:21:49 PM",),("11/23/2018 12:21:49 PM",)],["ts"])
from pyspark.sql.functions import *
df.withColumn("tt",from_unixtime(unix_timestamp(col("ts"),"MM/dd/yyyy hh:mm:ss a"),"HH:mm:ss")).\
filter(col("tt").between("12:00","16:00")).\
show()
#+----------------------+--------+
#|ts |tt |
#+----------------------+--------+
#|11/23/2018 12:21:49 PM|12:21:49|
#+----------------------+--------+

Unsure of date format 1200819 but need to convert to 08-19-20

The column is a nvachar(20).
select [shipment_posted_date_arch]
FROM [RxIntegrity].[dbo].[DiscrepancyReport_Receipts]
When I pull this, the date is in this format of 1200819 for that column. I need to convert to normal date.
This seemed to work:
cast(convert(nvarchar(20), (19000000 + CONVERT(int, shipment_posted_date_arch))) as date)

UNIXTIMEFORMAT formula issue

I am trying to covert a timestamp value in my data to 'yyyy-mm-dd HH:mm:ss' using UNIXTIMEFORMAT formula but it is giving a wrong result as you can see in the screenshot here.
Unixtime 1592574691 translates to ==> 2020-06-19 17:51:31 but dataprep is converting to 1970-22-19 10:22:54
Your click_time is in seconds but UNIXTIMEFORMAT needs miliseconds. Try to multiply your column by 1000
UNIXTIMEFORMAT($col * 1000, 'yyyy-mm-dd HH:mm:ss')

pyspark : Convert string to date format without minute, decod and hour

Hello I would like to convert string date to date format:
for example from 190424 to 2019-01-24
I try with this code :
tx_wd_df = tx_wd_df.select(
'dateTransmission',
from_unixtime(unix_timestamp('dateTransmission', 'yymmdd')).alias('dateTransmissionDATE')
)
But I got this format : 2019-01-24 00:04:00
I would like only 2019-01-24
Any idea please?
Thanks
tx_wd_df.show(truncate=False)
You can simply use to_date(). This will discard the rest of the date, and pick up only the format that matches the input date format string.
import pyspark.sql.functions as F
date_column = "dateTransmission"
# MM because mm in Java Simple Date Format is minutes, and MM is months
date_format = "yyMMdd"
df = df.withColumn(date_column, F.to_date(F.col(date_column), date_format))

Scala/Java joda.time not converting date in 24 hours format

I am trying to convert a long utc value into "yyyy-MM-dd HH:mm:ss" formatted pattern. I am expecting my data to be converted on 24 hours range scale and in GMT. My code passes all the test cases, I push the data into database using the jar that is newly built with this code -
dbRecord("order_dt_utc") = if (orderTs.isDefined) Some(new DateTime(orderTs.get, DateTimeZone.UTC).toString("yyyy-MM-dd HH:mm:ss")) else None
and now, when I query my database, I find that the data is still converting on 12 hours range. The query -
SELECT order_id, order_dt, order_dt_utc, order_ts_utc, from_unixtime(order_ts_utc/1000) FROM order_items where order_dt >= '2018-08-01' AND order_dt <= '2018-08-02' ORDER BY order_dt_utc LIMIT 1000;
And you can see the the values are not matching in the columns from_unixtime(order_ts_utc/1000) and order_dt_utc -
I am not able to figure the reason for this behaviour.
To convert Time Zone use the function first:
CONVERT_TZ (dateobj, oldtz, newtz)
After that use the date_format function:
date_format(from_unixtime(order_ts_utc), '%Y-%m-%d %H:%i:%s');
to format your time to 00-23 format.