How to generate current_timestamp() without timezone in Pyspark? - pyspark

I am trying to get the current_timestamp in a column in my dataframe. I am using below code for that.
df_new = df.withColumn('LOAD_DATE_TIME' , F.current_timestamp())
But this code is generating load_date_time in below format when exported to csv file.
2019-11-19T16:59:44.000+05:30
I don't want the timezone part and want the datetime in this below format.
2019-11-19 16:59:44

Related

Pyspark convering the GDATU date format to the normal date format yyyy-mm-dd?

I am wondering how to convert the GDATU fields from the TCURR table to the normal date format yyyy-mm-dd using Pyspark.
I tried it by creating a new column, using from_unixtime. but it seems not right.
df = df.withColumn('GDATU_NEW', F.from_unixtime('GDATU', 'yyyy-mm-dd'))

Hive date (field name - trans_dt) conversion from "YYYYMMDD" to "YYYY-MM-DD" (field name - dt) and querying based on the newly created field

I am trying to query data for yesterday (or last 24 hrs) in Hive but the filter trans_dt>=date_sub(current_date(),1) only works when the trans_dt format is "YYYY-MM-DD". The current format of trans_dt is "YYYYMMDD" and so I am converting it to "YYYY-MM-DD" by using the below concat command.
select concat(substring(trans_dt,1,4),'-',substring(trans_dt,5,2),'-',substring(trans_dt,7,2)) as dt from xyz
But now I am not sure how to add the ">=date_sub(current_date(),1)" condition on the newly created field "dt". Its failing as it is not a field in the table.
I am new to this querying in Hive and am not sure if I am totally off and there is an easier way? Please advise. All my online searches have only told me that there is no direct function in Hive to convert the date format from "YYYYMMDD" to "YYYY-MM-DD".
have you tried this one?
CAST(CAST( trans_dt AS TIMESTAMP(0)) AS DATE format 'YYYY-MM-DD') as TRANS_DATE
using CAST can be the solution for changing the date format

How would I convert spark scala dataframe column to datetime?

Say I have a dataframe with two columns, both that need to be converted to datetime format. However, the current formatting of the columns varies from row to row, and when I apply to to_date method, I get all nulls returned.
Here's a screenshot of the format....
the code I tried is...
date_subset.select(col("InsertDate"),to_date(col("InsertDate")).as("to_date")).show()
which returned
Your datetime is not in the default format, so you should give the format.
to_date(col("InsertDate"), "MM/dd/yyyy HH:mm")
I don't know which one is month and date, but you can do that in this way.

pyspark converting unix time to date

I am using the following code to convert a column of unix time values into dates in pyspark:
transactions3=transactions2.withColumn('date', transactions2['time'].cast('date'))
The column transactions2['time'] contains the unix time values. However the column date which I create here has no values in it (date = None for all rows). Any idea why this would be?
Use from_unixtime. expr("from_unixtime(timeval)")

Date Format Conversion in Hive

I'm very new to sql/hive. At first, I loaded a txt file into hive using:
drop table if exists Tran_data;
create table Tran_data(tran_time string,
resort string, settled double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
Load data local inpath 'C:\Users\me\Documents\transaction_data.txt' into table Tran_Data;
The variable tran_time in the txt file is like this:10-APR-2014 15:01. After loading this Tran_data table, I tried to convert tran_time to a "standard" format so that I can join this table to another table using tran_time as the join key. The date format desired is 'yyyymmdd'. I searched online resources, and found this: unix_timestamp(substr(tran_time,1,11),'dd-MMM-yyyy')
So essentially, I'm doing this: unix_timestamp('10-APR-2014','dd-MMM-yyyy'). However, the output is "NULL".
So my question is: how to convert the date format to a "standard" format, and then further convert it to 'yyyymmdd' format?
from_unixtime(unix_timestamp('20150101' ,'yyyyMMdd'), 'yyyy-MM-dd')
My current Hive Version: Hive 0.12.0-cdh5.1.5
I converted datetime in first column to date in second column using the below hive date functions. Hope this helps!
select inp_dt, from_unixtime(unix_timestamp(substr(inp_dt,0,11),'dd-MMM-yyyy')) as todateformat from table;
inp_dt todateformat
12-Mar-2015 07:24:55 2015-03-12 00:00:00
unix_timestamp function will convert given string date format to unix timestamp in seconds , but not like this format dd-mm-yyyy.
You need to write your own custom udf to convert a given string date to the format that you need as present Hive do not have any predefined functions. We have to_date function to convert a timestamp to date , remaining all unix_timestamp functions won't help your problem.
select from_unixtime(unix_timestamp('01032018' ,'MMddyyyy'), 'yyyyMMdd');
input format: mmddyyyy
01032018
output after query: yyyymmdd
20180103
To help someone in the future:
The following function should work as it worked in my case
to_date(from_unixtime(UNIX_TIMESTAMP('10-APR-2014','dd-MMM-yyyy'))
unix_timestamp('2014-05-01','dd-mmm-yyyy') will work, your input string should be in this format for hive yyyy-mm-dd or yyyy-mm-dd hh:mm:ss
Where as you are trying with '01-MAY-2014' hive won't understand it as a date string