How would I convert spark scala dataframe column to datetime? - scala

Say I have a dataframe with two columns, both that need to be converted to datetime format. However, the current formatting of the columns varies from row to row, and when I apply to to_date method, I get all nulls returned.
Here's a screenshot of the format....
the code I tried is...
date_subset.select(col("InsertDate"),to_date(col("InsertDate")).as("to_date")).show()
which returned

Your datetime is not in the default format, so you should give the format.
to_date(col("InsertDate"), "MM/dd/yyyy HH:mm")
I don't know which one is month and date, but you can do that in this way.

Related

Converting string to date , datetime or Int in Mapping dataflow

I have a parquet file with a start_date and end_date columns
Formatted like this
01-Jan-2021
I've tried every combination conversion toDate, toString, toInterger functions but I still get Nulls returned when viewing the data (see image).
I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns.
eg 01012021 and 01-01-2021
I'm sure the default format has caused this issue.
Thanks
First, for the Date formatter, you need to first tell ADF what each part of your string represents. Use dd-MMM-yyy for your format. Then, use a string formatter to manipulate the output as such: toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyy-MM-dd')
For the integer representation: toInteger(toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyyMMdd'))
Ah, you say *"I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns. eg 01012021 and 01-01-2021"* Do you want in YYYYMMDD or dd-mm-yyy cause your example is in the later format.
Anyways, please see below expression you could use:
My source:
Use derived column:
Edit expression:
start_date_toInteger : toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyymmdd')
start_date_toDate: toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyy-mm-dd')
Final results:

If column having dates in multiple format, Get last date of month for specific date format

I have a spark data frame having two columns (SEQ - Integer, MAIN_DATE - Date) as:
Now I want to add a column based on the condition that if the format of MAIN_DATE is "MMM-YYYY" then it should be converted to Last day of the month and new data frame should look like this:
Any suggestion will be much appreciated.
You can use Spark's when/otherwise methods in order to operate differently for each different date format of the MAIN_DATE column.
More specifically, you can simply match the MMM-yyyy date format values of the column based on the field's String length (since we know that those values we always have 8 characters) as a condition in when and then:
use to_date to convert the String value to a valid date based on a format we give as an argument, and
use last_date to get the last day of the month each curry date in MAIN_DATE is referring to.
As for the "regular" rows with the dd-MMM-yyyy date format, just a to_date conversion would be sufficient within the otherwise method.
After that, all there's left to do is to convert the dates back to the desired dd-MMM-yyyy format (because to_date converts a given date to the yyyy-MM-dd format).
This is the solution in Scala (split in into two withColumns to make it more readable, instead of an one-liner):
df.withColumn("END_DATE",
when(length(col("MAIN_DATE")).equalTo(8), last_day(to_date(col("MAIN_DATE"), "MMM-yyyy")))
.otherwise(to_date(col("MAIN_DATE"), "dd-MMM-yyyy")))
.withColumn("END_DATE", date_format(col("END_DATE"), "dd-MMM-yyyy"))
This is what the resulting df DataFrame will look like:
+---+-----------+-----------+
|SEQ| MAIN_DATE| END_DATE|
+---+-----------+-----------+
| 1|16-JAN-2020|16-Jan-2020|
| 2| FEB-2017|28-Feb-2017|
+---+-----------+-----------+

In snowflake , how to convert one date format to another format. From YYYYMMDD to YYYY-MON-DD

I have table ABC in which I have column Z of datatype Date. The format of the data is YYYYMMDD. Now I am looking to convert the above format to YYYY-MON-DD format. Can someone help?
You can use to_char
TO_CHAR(Z,'YYYY-MON-DD')
Depending on what the purpose of the reformatting is, you can either explicitly cast it to a VARCHAR/CHAR and define the format, or you can change your display format to however you'd like to see all dates:
ALTER SESSION SET DATE_OUTPUT_FORMAT = 'YYYY-MON-DD';
It's important to understand that if the data is in a DATE field, then it is stored as a date, and the format of the date is dependent on your viewing preferences, not how it is stored.
Since the value of the date field is stored as a number, you have to convert it to date.
ALTER SESSION SET DATE_OUTPUT_FORMAT = 'YYYY-MON-DD';
select to_date(to_char( z ), 'YYYYMMDD');
(adding this answer to summarize and resolve the question - since the clues and answers are scattered through comments)
The question stated that column Z is of type DATE, but it really seems to be a NUMBER.
Then before parsing a number like 20201017 to a date, first you need to transform it to a STRING.
Once the original number is parsed to a date, it can be represented as a new string formatted as desired.
WITH data AS (
SELECT 20201017 AS z
)
SELECT TO_CHAR(TO_DATE(TO_CHAR(z), 'YYYYMMDD'), 'YYYY-MON-DD')
FROM data;
# 2020-Oct-17

pyspark converting unix time to date

I am using the following code to convert a column of unix time values into dates in pyspark:
transactions3=transactions2.withColumn('date', transactions2['time'].cast('date'))
The column transactions2['time'] contains the unix time values. However the column date which I create here has no values in it (date = None for all rows). Any idea why this would be?
Use from_unixtime. expr("from_unixtime(timeval)")

Date Format Conversion in Hive

I'm very new to sql/hive. At first, I loaded a txt file into hive using:
drop table if exists Tran_data;
create table Tran_data(tran_time string,
resort string, settled double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
Load data local inpath 'C:\Users\me\Documents\transaction_data.txt' into table Tran_Data;
The variable tran_time in the txt file is like this:10-APR-2014 15:01. After loading this Tran_data table, I tried to convert tran_time to a "standard" format so that I can join this table to another table using tran_time as the join key. The date format desired is 'yyyymmdd'. I searched online resources, and found this: unix_timestamp(substr(tran_time,1,11),'dd-MMM-yyyy')
So essentially, I'm doing this: unix_timestamp('10-APR-2014','dd-MMM-yyyy'). However, the output is "NULL".
So my question is: how to convert the date format to a "standard" format, and then further convert it to 'yyyymmdd' format?
from_unixtime(unix_timestamp('20150101' ,'yyyyMMdd'), 'yyyy-MM-dd')
My current Hive Version: Hive 0.12.0-cdh5.1.5
I converted datetime in first column to date in second column using the below hive date functions. Hope this helps!
select inp_dt, from_unixtime(unix_timestamp(substr(inp_dt,0,11),'dd-MMM-yyyy')) as todateformat from table;
inp_dt todateformat
12-Mar-2015 07:24:55 2015-03-12 00:00:00
unix_timestamp function will convert given string date format to unix timestamp in seconds , but not like this format dd-mm-yyyy.
You need to write your own custom udf to convert a given string date to the format that you need as present Hive do not have any predefined functions. We have to_date function to convert a timestamp to date , remaining all unix_timestamp functions won't help your problem.
select from_unixtime(unix_timestamp('01032018' ,'MMddyyyy'), 'yyyyMMdd');
input format: mmddyyyy
01032018
output after query: yyyymmdd
20180103
To help someone in the future:
The following function should work as it worked in my case
to_date(from_unixtime(UNIX_TIMESTAMP('10-APR-2014','dd-MMM-yyyy'))
unix_timestamp('2014-05-01','dd-mmm-yyyy') will work, your input string should be in this format for hive yyyy-mm-dd or yyyy-mm-dd hh:mm:ss
Where as you are trying with '01-MAY-2014' hive won't understand it as a date string