I have a parquet file with a start_date and end_date columns
Formatted like this
01-Jan-2021
I've tried every combination conversion toDate, toString, toInterger functions but I still get Nulls returned when viewing the data (see image).
I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns.
eg 01012021 and 01-01-2021
I'm sure the default format has caused this issue.
Thanks
First, for the Date formatter, you need to first tell ADF what each part of your string represents. Use dd-MMM-yyy for your format. Then, use a string formatter to manipulate the output as such: toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyy-MM-dd')
For the integer representation: toInteger(toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyyMMdd'))
Ah, you say *"I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns. eg 01012021 and 01-01-2021"* Do you want in YYYYMMDD or dd-mm-yyy cause your example is in the later format.
Anyways, please see below expression you could use:
My source:
Use derived column:
Edit expression:
start_date_toInteger : toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyymmdd')
start_date_toDate: toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyy-mm-dd')
Final results:
Related
I have a column which contians date as string but in many formats like - dd/MM/yy, dd/MMM/yyy .. etc etc. And I am using the following code to convert all strings to one specific date format (yyyy-MM-dd) in hive :
select
from_unixtime(unix_timestamp('31/02/2021','dd/MM/yyyy'),'yyyy-MM-dd')
but this gives me 2021-03-03 in HIVE.
Is there any other way to identify such invalid dates and give null.
Assume, you recognized format correctly and it is exactly 'dd/MM/yyyy' and date is invalid one '31/02/2021'.
unix_timestamp function in such case will move date to the next month and there is no way to change it's behavior. But you can check if the date double-converted from original string to timestamp and back to original format is the same. In case it is not the same, then the date is invalid one.
case
-- check double-converted date is the same as original string
when from_unixtime(unix_timestamp(date_col,'dd/MM/yyyy'),'dd/MM/yyyy') = date_col
--convert to yyyy-MM-dd if the date is valid
then from_unixtime(unix_timestamp('31/02/2021','dd/MM/yyyy'),'yyyy-MM-dd')
else null -- null if invalid date
end as date_converted
I have a spark data frame having two columns (SEQ - Integer, MAIN_DATE - Date) as:
Now I want to add a column based on the condition that if the format of MAIN_DATE is "MMM-YYYY" then it should be converted to Last day of the month and new data frame should look like this:
Any suggestion will be much appreciated.
You can use Spark's when/otherwise methods in order to operate differently for each different date format of the MAIN_DATE column.
More specifically, you can simply match the MMM-yyyy date format values of the column based on the field's String length (since we know that those values we always have 8 characters) as a condition in when and then:
use to_date to convert the String value to a valid date based on a format we give as an argument, and
use last_date to get the last day of the month each curry date in MAIN_DATE is referring to.
As for the "regular" rows with the dd-MMM-yyyy date format, just a to_date conversion would be sufficient within the otherwise method.
After that, all there's left to do is to convert the dates back to the desired dd-MMM-yyyy format (because to_date converts a given date to the yyyy-MM-dd format).
This is the solution in Scala (split in into two withColumns to make it more readable, instead of an one-liner):
df.withColumn("END_DATE",
when(length(col("MAIN_DATE")).equalTo(8), last_day(to_date(col("MAIN_DATE"), "MMM-yyyy")))
.otherwise(to_date(col("MAIN_DATE"), "dd-MMM-yyyy")))
.withColumn("END_DATE", date_format(col("END_DATE"), "dd-MMM-yyyy"))
This is what the resulting df DataFrame will look like:
+---+-----------+-----------+
|SEQ| MAIN_DATE| END_DATE|
+---+-----------+-----------+
| 1|16-JAN-2020|16-Jan-2020|
| 2| FEB-2017|28-Feb-2017|
+---+-----------+-----------+
I have table ABC in which I have column Z of datatype Date. The format of the data is YYYYMMDD. Now I am looking to convert the above format to YYYY-MON-DD format. Can someone help?
You can use to_char
TO_CHAR(Z,'YYYY-MON-DD')
Depending on what the purpose of the reformatting is, you can either explicitly cast it to a VARCHAR/CHAR and define the format, or you can change your display format to however you'd like to see all dates:
ALTER SESSION SET DATE_OUTPUT_FORMAT = 'YYYY-MON-DD';
It's important to understand that if the data is in a DATE field, then it is stored as a date, and the format of the date is dependent on your viewing preferences, not how it is stored.
Since the value of the date field is stored as a number, you have to convert it to date.
ALTER SESSION SET DATE_OUTPUT_FORMAT = 'YYYY-MON-DD';
select to_date(to_char( z ), 'YYYYMMDD');
(adding this answer to summarize and resolve the question - since the clues and answers are scattered through comments)
The question stated that column Z is of type DATE, but it really seems to be a NUMBER.
Then before parsing a number like 20201017 to a date, first you need to transform it to a STRING.
Once the original number is parsed to a date, it can be represented as a new string formatted as desired.
WITH data AS (
SELECT 20201017 AS z
)
SELECT TO_CHAR(TO_DATE(TO_CHAR(z), 'YYYYMMDD'), 'YYYY-MON-DD')
FROM data;
# 2020-Oct-17
Say I have a dataframe with two columns, both that need to be converted to datetime format. However, the current formatting of the columns varies from row to row, and when I apply to to_date method, I get all nulls returned.
Here's a screenshot of the format....
the code I tried is...
date_subset.select(col("InsertDate"),to_date(col("InsertDate")).as("to_date")).show()
which returned
Your datetime is not in the default format, so you should give the format.
to_date(col("InsertDate"), "MM/dd/yyyy HH:mm")
I don't know which one is month and date, but you can do that in this way.
I'm very new to sql/hive. At first, I loaded a txt file into hive using:
drop table if exists Tran_data;
create table Tran_data(tran_time string,
resort string, settled double)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
Load data local inpath 'C:\Users\me\Documents\transaction_data.txt' into table Tran_Data;
The variable tran_time in the txt file is like this:10-APR-2014 15:01. After loading this Tran_data table, I tried to convert tran_time to a "standard" format so that I can join this table to another table using tran_time as the join key. The date format desired is 'yyyymmdd'. I searched online resources, and found this: unix_timestamp(substr(tran_time,1,11),'dd-MMM-yyyy')
So essentially, I'm doing this: unix_timestamp('10-APR-2014','dd-MMM-yyyy'). However, the output is "NULL".
So my question is: how to convert the date format to a "standard" format, and then further convert it to 'yyyymmdd' format?
from_unixtime(unix_timestamp('20150101' ,'yyyyMMdd'), 'yyyy-MM-dd')
My current Hive Version: Hive 0.12.0-cdh5.1.5
I converted datetime in first column to date in second column using the below hive date functions. Hope this helps!
select inp_dt, from_unixtime(unix_timestamp(substr(inp_dt,0,11),'dd-MMM-yyyy')) as todateformat from table;
inp_dt todateformat
12-Mar-2015 07:24:55 2015-03-12 00:00:00
unix_timestamp function will convert given string date format to unix timestamp in seconds , but not like this format dd-mm-yyyy.
You need to write your own custom udf to convert a given string date to the format that you need as present Hive do not have any predefined functions. We have to_date function to convert a timestamp to date , remaining all unix_timestamp functions won't help your problem.
select from_unixtime(unix_timestamp('01032018' ,'MMddyyyy'), 'yyyyMMdd');
input format: mmddyyyy
01032018
output after query: yyyymmdd
20180103
To help someone in the future:
The following function should work as it worked in my case
to_date(from_unixtime(UNIX_TIMESTAMP('10-APR-2014','dd-MMM-yyyy'))
unix_timestamp('2014-05-01','dd-mmm-yyyy') will work, your input string should be in this format for hive yyyy-mm-dd or yyyy-mm-dd hh:mm:ss
Where as you are trying with '01-MAY-2014' hive won't understand it as a date string