Regular Expression for date extraction - pyspark

I have a file that has a name: Yonder_CompetitionEntries_20210928080000, I want to extract 20210928. Basically the year, month and day.
So far I have this and it's not working. The file has an extension of csv.gz
date_key = """RIGHT(regexp_replace(regexp_replace(filename,'.gz',''),'.csv',''), 8)"""

what about without regex?
We split on _ and take the last element, then we parse get the first few elements of the string.
date_key = """substring(element_at(split(filename, '_'), -1), 1, 8)"""

Related

Syntax for Combining BETWEEN and LIKE

I have a syntax, but it doesn't work.
Here is my query:
SELECT *
FROM aqua.reading
WHERE
CAST(reading.pres_date AS VARCHAR)
BETWEEN LIKE '2022-10-18%' AND LIKE '2022-10-18%'
it says:
ERROR: type "like" does not exist
LINE 1: ... WHERE CAST(reading.pres_date AS VARCHAR) BETWEEN LIKE '2022...
^
SQL state: 42704
Character: 77
I am trying to get all the data with timestamp and timezone and implement a date range
Don't compare dates (or timestamps) as strings. Compare them to proper date (or timestamp) values. Given the fact that you use the same "date" but with a wildcard at the end, I am assuming(!) that pres_date is in fact a timestamp column and you want to find all rows with a given date regardless of the time value of the timestamp.
The best approach is to use a range query with >= ("greater than or equal) on the lower value and < (strictly lower than) on the next day:
SELECT *
FROM aqua.reading
WHERE reading.pres_date >= DATE '2022-10-18'
AND reading.pres_date < DATE '2022-10-19'
Alternatively you can cast the timestamp to a date and use the = operator if you really want to pick just one day:
SELECT *
FROM aqua.reading
WHERE cast(reading.pres_date as DATE) = DATE '2022-10-18'
However that will not make use of a potential index on pres_date so is likely to be slower than the range query from the first solution

Pyspark convert string type date into dd-mm-yyyy format

Using pyspark 2.4.0
I have the date column in the dateframe as follows :
I need to convert it into DD-MM-YYYY format. I have tried a few solutions including the following code but it returns me null values,
df_students_2 = df_students.withColumn(
'new_date',
F.to_date(
F.unix_timestamp('dt', '%B %d, %Y').cast('timestamp')))
Note that different types of date format in the dt column. It would be easier if i could make the whole column in one format just for the ease of converting ,but since the dataframe is big it is not possible to go through each column and change it to one format. I have also tried the following code, just for the future readers i am including it, for the 2 types of date i tried to go through in a loop, but did not succeed.
def to_date_(col, formats=(datetime.strptime(col,"%B %d, %Y"), \
datetime.strptime(col,"%d %B %Y"), "null")):
return F.coalesce(*[F.to_date(col, f) for f in formats])
Any ideas?
Try this-
implemented in scala, but can be done pyspark with minimal change
// I've put the example formats, but just replace this list with expected formats in the dt column
val dt_formats= Seq("dd-MMM-yyyy", "MMM-dd-yyyy", "yyyy-MM-dd","MM/dd/yy","dd-MM-yy","dd-MM-yyyy","yyyy/MM/dd","dd/MM/yyyy")
val newDF = df_students.withColumn("new_date", coalesce(dt_formats.map(fmt => to_date($"dt", fmt)):_*))
Try this should work...
from pyspark.sql.functions import to_date
df = spark.createDataFrame([("Mar 25, 1991",), ("May 1, 2020",)],['date_str'])
df.select(to_date(df.date_str, 'MMM d, yyyy').alias('dt')).collect()
[Row(dt=datetime.date(1991, 3, 25)), Row(dt=datetime.date(2020, 5, 1))]
see also - Datetime Patterns for Formatting and Parsing

Talend date and time combine

I combine two columns; date and time. When I pass the date and time hot coded it works fine but when I pass it through a column it throws the error:
Unparseable date: "05/05/1992"
I already tried this:
MaterialCodeCSV.xdate == null ?
TalendDate.parseDate("yyyy-MM-dd HH:mm:ss", TalendDate.getDate("yyyy-MM-dd HH:mm:ss")) :
TalendDate.parseDateLocale("yyyy/mm/dd HH:mm:ss",MaterialCodeCSV.xdate.toString() + MaterialCodeCSV.xtime.toString(),"EN");
Java code in Talend:
Date handling can be a bit tricky if using wrong data types. I assume you want to fill a field which is a Date. There are several errors with this way:
MaterialCodeCSV.xdate == null ?
TalendDate.parseDate("yyyy-MM-dd HH:mm:ss", TalendDate.getDate("yyyy-MM-dd HH:mm:ss")) :
TalendDate.parseDateLocale("yyyy/mm/dd H:mm:ss",MaterialCodeCSV.xdate.toString()+ MaterialCodeCSV.xtime.toString(),"EN");
If MaterialCodeCSV.xdate == null you create a date and parse it again instantly? That seems unneccessary complex and inefficient. Change this to TalendDate.getCurrentDate()
Then if xdate is not null, you just concat xdate and xtime, use toString() and try to parse this. Again, this seems unneccesary complex. If I just assume now and xdate and xtime are already Date fields, you could write it as this: MaterialCodeCSV.xdate + MaterialCodeCSV.xtime.
If both are String fields, you have to make sure that xdate is formatted yyyy/MM/dd and xtime is HH:mm:ss. Then you could exclude .toString()
Also, if both are String fields, you have to add an additional space: MaterialCodeCSV.xdate + ' ' + MaterialCodeCSV.xtime
Additionally, in the first case you parse with yyyy-MM-dd HH:mm:ss. In the second case you parse with yyyy/mm/dd H:mm:ss. This reads "year/minute/day". Also there is only one hour digit, not allowing anything after 9:59:59 o'clock to be parsed. Correctly you should use yyyy/MM/dd HH:mm:ss.
So to conclude it should look like this (if I assume correctly and you are using correctly formatted String fields for xdate and xtime):
MaterialCodeCSV.xdate == null ?
TalendDate.getCurrentDate() :
TalendDate.parseDateLocale("yyyy/MM/dd HH:mm:ss", MaterialCodeCSV.xdate + ' ' + MaterialCodeCSV.xtime,"EN");

Crystal Report-Convert date string (with day of week) to date format

I'm new to crystal report. I have a date in string format like 2015-03-25 (Wed) and I want to convert it to date format like 03/25/2015. I tried with CDate and DateValue but it returned bad date string format. Any suggestions to convert such date string to proper date format?
If you have a DateTime field in Crystal Reports, you will see Date and Time tab option on the Format Editor when you right click on the field and select Format Field menu item. From the Date and Time tab, you may select the desired format and select OK.
It would be recommended to use the formats you want to use.
For eg : if you are giving string format for money or decimal you may not be able to use it at its full,like you may not be able to auto sum and other properties related to the datatype you intend to use
Not to do any thing in the code, Crystal Report have facility to this type of simple format.
#utility, you are near to answer.
As above image, in last Custom Format option, where you just go in Date tab and give format as
http://www.c-sharpcorner.com/UploadFile/mahesh/DateFormatInCR06132007092248AM/DateFormatInCR.aspx
Updated : sorry for above answer, that will work if you have valid date string.
In your case, where any arbitrary string need to convert into other date format. There is 2 option. In both case you have to extract the date and then format as you need and again combined with other sub-string.
Second you already done ie. crsytal report side, grab the date , format it and concatenate. this will slow down as need to process for each row.
SqlServer side - This option is faster from first option.
declare #t nvarchar(16) = '2015-03-25 (Wed)'
--get the acual date select SUBSTRING ( #t, 1, charindex('(' , #t ) -1 )
--above result give the charter datatype, so you first convert into date and then convert into other format select cast( SUBSTRING ( #t,
1, charindex('(' , #t ) -1 ) as date) --convert into date select
convert (varchar(15) , cast( SUBSTRING ( #t, 1, charindex('(' , #t )
-1 ) as date) , 103) --convert into dd/mm/yyyy format
--Above is for your understand, this is the actual execution of your code (Only write the below line) select convert (varchar(15) , cast(
SUBSTRING ( #t, 1, charindex('(' , #t ) -1 ) as date) , 103) + ' ' +
datename(dw, getdate() )
I suggest, go with Sqlserver side.

How to write expression to convert yyyymm to mm-yyyy in ssrs?

I have a table with a YearMonth column (201302) in it.
I created YearMonth a parameter.
Now I would like to write an expression in SSRS so that every time the report runs the date and month would look like this Feb-2013.
Could you please suggest me why following expression did not work.
Thanks
=CStr(Right(MonthName(Month(Parameters!NLR_YearMonth.Value)),3))
+ "-" +
Left(Year(Parameters!NLR_YearMonth.Value),4)
Try This:
=Format(DateValue(MonthName(Right(Parameters!NLR_YearMonth.Value, 2))
+ "," +
Left(Parameters!NLR_YearMonth.Value,4)), "Y")
The expression goes as follows:
Cutting the string to get the month.
Converting the mount to MountName
Creating a comma seperator between the month and the year
Cutting the string to get the year
Converting the returns string to Date object using the DateValue function
Formatting the Date to fits your needs
The results should be:
February, 2013