extract dates and times from string in Redshift - amazon-redshift

I have a column like the one below. The last two sets of numbers are date and time. I want to create date-time column by extracting values from the column.
1002206391240385-sponsoredProducts-SameDayPull-20190627-012313.json
The started by extracting the date but it does not give what I need
Select regexp_substr('1002206391240385-sponsoredProducts-SameDayPull-20190627-012313.json','-[\\d{8}]-')

This substring extracts the date time part from your string.
SELECT substring(col_name,regexp_instr(col_name,'-',1,regexp_count(col_name,'-')-1)+1,
regexp_instr(col_name,'.json',1)-regexp_instr(col_name,'-',1,regexp_count(col_name,'-')-1)-1)
regexp_count counts have many hyphens in the string
regexp_instr gives the position of the hyphen
substring returns starting from second to last hyphen till .json in the string
To test I have used
WITH test(col_name) AS (
SELECT '1002206391240385-sponsoredProducts-SameDayPull-20190627-012313.json'::TEXT
)
SELECT col_name,
substring(col_name,regexp_instr(col_name,'-',1,regexp_count(col_name,'-')-1)+1,
regexp_instr(col_name,'.json',1)-regexp_instr(col_name,'-',1,regexp_count(col_name,'-')-1)-1) datetime
FROM test
Output is
col_name datetime
1002206391240385-sponsoredProducts-SameDayPull-20190627-012313.json 20190627-012313

As an alternative, and if the filename format is consistent, you could use a non-regex solution e.g. extract the part of the filename string that contains the date and then use TO_TIMESTAMP with a format string to extract the date and time:
SELECT TO_TIMESTAMP(RIGHT('1002206391240385-sponsoredProducts-SameDayPull-20190627-012313.json', 20), 'YYYYMMDD-HH24MISS.json') AS extracted_datetime
which returns
extracted_datetime |
----------------------|
2019-06-27 01:23:13+00|

Related

how to converting string date in 'yyyy-m-dd' to 'yyyy-mm-dd' in Hive query?

I searched up and down but couldn't find anything that works.
I have a date that is stored as a string in this format: '2021-9-01' so there are no leading zeros in the month column. This is an issue when trying to select a max date as it interprets September to be greater than October.
Any time I run something that tried to convert this it literally never finishes. I can pull back 1 row when selecting * from... but this fails to complete:
select unix_timestamp(bad_date, 'yyyy-m-dd') from mytable
I'm using hive query so not sure how to make this conversion work so I can actually get October (this month) to show up as the max date?
Correct pattern for month is MM. mm is minutes.
from_unixtime(unix_timestamp(bad_date, 'yyyy-M-dd'),'yyyy-MM-dd')
One more method is to split and concatenate with lpad:
select concat_ws('-',splitted[0], lpad(splitted[1],2,0),splitted[2])
from
(
select split('2021-9-01','-') splitted
)s
Result:
2021-09-01

Converting string to date , datetime or Int in Mapping dataflow

I have a parquet file with a start_date and end_date columns
Formatted like this
01-Jan-2021
I've tried every combination conversion toDate, toString, toInterger functions but I still get Nulls returned when viewing the data (see image).
I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns.
eg 01012021 and 01-01-2021
I'm sure the default format has caused this issue.
Thanks
First, for the Date formatter, you need to first tell ADF what each part of your string represents. Use dd-MMM-yyy for your format. Then, use a string formatter to manipulate the output as such: toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyy-MM-dd')
For the integer representation: toInteger(toString(toDate('01-Jan-2021', 'dd-MMM-yyyy'), 'yyyyMMdd'))
Ah, you say *"I would like to have see the result in two ways YYYYMMDD as integer column and YYYY-MM-DD as Date columns. eg 01012021 and 01-01-2021"* Do you want in YYYYMMDD or dd-mm-yyy cause your example is in the later format.
Anyways, please see below expression you could use:
My source:
Use derived column:
Edit expression:
start_date_toInteger : toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyymmdd')
start_date_toDate: toString(toDate(substring(start_date,1,11), 'dd-MMM-yyyy'), 'yyyy-mm-dd')
Final results:

If column having dates in multiple format, Get last date of month for specific date format

I have a spark data frame having two columns (SEQ - Integer, MAIN_DATE - Date) as:
Now I want to add a column based on the condition that if the format of MAIN_DATE is "MMM-YYYY" then it should be converted to Last day of the month and new data frame should look like this:
Any suggestion will be much appreciated.
You can use Spark's when/otherwise methods in order to operate differently for each different date format of the MAIN_DATE column.
More specifically, you can simply match the MMM-yyyy date format values of the column based on the field's String length (since we know that those values we always have 8 characters) as a condition in when and then:
use to_date to convert the String value to a valid date based on a format we give as an argument, and
use last_date to get the last day of the month each curry date in MAIN_DATE is referring to.
As for the "regular" rows with the dd-MMM-yyyy date format, just a to_date conversion would be sufficient within the otherwise method.
After that, all there's left to do is to convert the dates back to the desired dd-MMM-yyyy format (because to_date converts a given date to the yyyy-MM-dd format).
This is the solution in Scala (split in into two withColumns to make it more readable, instead of an one-liner):
df.withColumn("END_DATE",
when(length(col("MAIN_DATE")).equalTo(8), last_day(to_date(col("MAIN_DATE"), "MMM-yyyy")))
.otherwise(to_date(col("MAIN_DATE"), "dd-MMM-yyyy")))
.withColumn("END_DATE", date_format(col("END_DATE"), "dd-MMM-yyyy"))
This is what the resulting df DataFrame will look like:
+---+-----------+-----------+
|SEQ| MAIN_DATE| END_DATE|
+---+-----------+-----------+
| 1|16-JAN-2020|16-Jan-2020|
| 2| FEB-2017|28-Feb-2017|
+---+-----------+-----------+

compare extracted date with today() in excel

Column 1 : I have this date-time format in one column = 2018-10-08T04:30:23Z
Column 3 : I extracted date with formula = =LEFT(A11,10) and changed column format to date.
Column 32 : today(). Just to make sure both date columns match
Now when I want to compare both dates
Column 4 : =IF(C11=D11,TRUE(),FALSE())
It does not work. What did I do wrong?
One option using formulas only would be to use Excel's DATE function, which takes three parameters:
=DATE(YEAR, MONTH, DAY)
Use the following formula to extract a date from your timestamp:
=DATE(LEFT(A1,4), MID(A1,6,2), MID(A1,9,2))
This assumes that the timestamp is in cell A1, with the format in your question. Now, comparing this date value against TODAY() should work, if the original timestamp were also from today.
Should be worth trying:
=1*LEFT(A1,10)=TODAY()
May depend upon your configuration. Without format conversion (the 1*) you are trying to compare text (all string functions return Text) with a Number.

Reading/using format YYYYMMDD and "0" values in date field

We have a system feed that's changing...it's currently Julian date and is converting to YYYYMMDD--although for blank values they're feeding in "0". Not sure what format it's coming in as...
First, I took out the 0's by doing "if XXXfield = 0 then XXXfield = ' ' which returns a "." for that record. Then I tried to code using "format XXXfield YYMMDD8." and that's returning blanks for anything with a date. I'm not creating the table, just reading it in....how can I successfully get the date to be a date with no "0" values for blanks and in a format that I can use in SAS (ie XXXfield >= Xdate)?? Thanks in advance for your advice!
Sample Data (one blank and 4 with values):
reporting_date
0
20141122
20130604
20130626
20140930
The format of reporting_date is BEST12. according to your comment below.
data work.have;
input reporting_date BEST12.;
datalines;
0
20141122
20130604
20130626
20140930
;
run;
So, one way to make SAS to interpret reporting_date as an actual date is to temporarly format it as as string and thereafter convert it with YYMMDD10. as the format.
proc sql;
create table work.want as
select input(put(t1.reporting_date, 8.), YYMMDD10.) format=YYMMDDn8. as reporting_date
from work.have t1;
;
run;
In this case,
put converts it to a string with eight characters as the width,
input will convert the string to an actual SAS date, and
format will display the date in a human readable way.
The third bullet is optional, and the value of format is just one of many date and time formats available. The conversion also takes care of rows with 0 as value and transforms them to missing (.).
Now, when you have the column as a SAS date, you are ready to filter it:
proc sql;
create table work.filter as
select t1.reporting_date
from work.want t1
where t1.reporting_date > '01JUL2013'd
;
run;
But instead of make these workarounds, you should look into changing the format in the job where you read this column into the SAS dataset.