How to convert Date column format in case class of Scala? - scala

I am using Scala spark.I have two similar CSV files with 10 columns.One difference is with the Date column format.
1st file Date format yyyy-MM-dd
2nd file Date format dd-MM-yyyy
Objective is to: create seperate schema rdd for each file and finally merge both the Rdds.
For the first case class, I have used Date.valueOf [java.sql.Date] in the case class mapping.No issues here..
Am finding issue with the 2nd file Date format..
I have used the same Date.valueOf mapping..but it's throwing error in the date format...
How can I map the date format in the second file as like the 1st format yyyy-MM-dd? Please assist

Use java.util.Date:
val sDate1="31/12/1998"
val date1=new SimpleDateFormat("dd/MM/yyyy").parse(sDate1)
import java.text.SimpleDateFormat
Result:
sDate1: String = 31/12/1998
date1: java.util.Date = Thu Dec 31 00:00:00 CET 1998
to change the output format as a common string format.
val date2=new SimpleDateFormat("yyyy/MM/dd")
date2.format(date1)
Result:
res1: String = 1998/12/31

Related

Spark scala: obtaining weekday from utcstamp (function works for specific date, not for entire column)

I have a scala / spark dataframe, with one column named "utcstamp" with values of the following format: 2018-12-12 21:15:00
I want to obtain a new column with the week day, and inspired by this question in the forum, used the following code:
import java.util.Calendar
import java.text.SimpleDateFormat
val dowText = new SimpleDateFormat("E")
df = df.withColumn("weekday" , dowText.format(df.select(col("utcstamp"))))
However, I get the following error:
<console>:58: error: type mismatch;
found : String
required: org.apache.spark.sql.Column
When I try this applied to a specific date (like in the link provided) it works, I just can't apply it to the whole column.
Can anyone help me with this? If you have an alternative way of converting an utc column into weekday that'll also do for me.
You can use dayofweek function of Spark SQL, which gives you a number from 1-7, for Sunday to Saturday:
val df2 = df.withColumn("weekday", dayofweek(col("utcstamp").cast("timestamp")))
Or if you want words (Sun-Sat) instead,
val df2 = df.withColumn("weekday", date_format(col("utcstamp").cast("timestamp"), "EEE"))
You can simply get the day of week with date format as "E" or EEEE (eg. for Sun and Sunday)
df.withColumn("weekday", date_format(to_timestamp($"utcstamp"), "E"))
If you want day of week as numeric value use dayofweek function which is availabe from spark 2.3+

Pyspark convert string type date into dd-mm-yyyy format

Using pyspark 2.4.0
I have the date column in the dateframe as follows :
I need to convert it into DD-MM-YYYY format. I have tried a few solutions including the following code but it returns me null values,
df_students_2 = df_students.withColumn(
'new_date',
F.to_date(
F.unix_timestamp('dt', '%B %d, %Y').cast('timestamp')))
Note that different types of date format in the dt column. It would be easier if i could make the whole column in one format just for the ease of converting ,but since the dataframe is big it is not possible to go through each column and change it to one format. I have also tried the following code, just for the future readers i am including it, for the 2 types of date i tried to go through in a loop, but did not succeed.
def to_date_(col, formats=(datetime.strptime(col,"%B %d, %Y"), \
datetime.strptime(col,"%d %B %Y"), "null")):
return F.coalesce(*[F.to_date(col, f) for f in formats])
Any ideas?
Try this-
implemented in scala, but can be done pyspark with minimal change
// I've put the example formats, but just replace this list with expected formats in the dt column
val dt_formats= Seq("dd-MMM-yyyy", "MMM-dd-yyyy", "yyyy-MM-dd","MM/dd/yy","dd-MM-yy","dd-MM-yyyy","yyyy/MM/dd","dd/MM/yyyy")
val newDF = df_students.withColumn("new_date", coalesce(dt_formats.map(fmt => to_date($"dt", fmt)):_*))
Try this should work...
from pyspark.sql.functions import to_date
df = spark.createDataFrame([("Mar 25, 1991",), ("May 1, 2020",)],['date_str'])
df.select(to_date(df.date_str, 'MMM d, yyyy').alias('dt')).collect()
[Row(dt=datetime.date(1991, 3, 25)), Row(dt=datetime.date(2020, 5, 1))]
see also - Datetime Patterns for Formatting and Parsing

R: Converting timestamp to date

I have a timestamp pulled from MongoDB
example: 2007-01-01 01:00:00
I need it to be a simple date: 2007-01-01
Been looking at: Convert UNIX epoch to Date object
Having a hard time formatting
Using sources:
Convert column in data.frame to date
I had to assess if the class that I was trying to convert was a class that as.Date was able to take.
Turns out it was a data.frame class
Used the following to convert it:
dat_dump %>%
mutate( date = as.Date(date, format = "%Y-%m-%d"))

pyspark : Convert string to date format without minute, decod and hour

Hello I would like to convert string date to date format:
for example from 190424 to 2019-01-24
I try with this code :
tx_wd_df = tx_wd_df.select(
'dateTransmission',
from_unixtime(unix_timestamp('dateTransmission', 'yymmdd')).alias('dateTransmissionDATE')
)
But I got this format : 2019-01-24 00:04:00
I would like only 2019-01-24
Any idea please?
Thanks
tx_wd_df.show(truncate=False)
You can simply use to_date(). This will discard the rest of the date, and pick up only the format that matches the input date format string.
import pyspark.sql.functions as F
date_column = "dateTransmission"
# MM because mm in Java Simple Date Format is minutes, and MM is months
date_format = "yyMMdd"
df = df.withColumn(date_column, F.to_date(F.col(date_column), date_format))

ParseException: Unparseable date: "04 December"

I have the birthday date 04 December I want to save it as 04-12 in the database, for that I do this:
val birthday = theForm.field("birthday") //String
val date = new java.text.SimpleDateFormat("dd-mm", Locale.ENGLISH).parse(birthday)
But i get the error: ParseException: Unparseable date: "04 December"
Any idea? Thanks!
Your date format string is not correct. Try it as "dd MMMM". The javadocs for SimpleDateFormat are pretty comprehensive for format possibilities:
http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
It seems like you want to parse it as one format and then re-format it into another. For this you can use two separate SimpleDateFormat instances, one with "dd MMMM" for parsing the 04 December format and one with "dd-MM" for re-formatting into the format you want to save to your db. The code would look like this:
val date = new SimpleDateFormat("dd MMMM", Locale.ENGLISH).parse(birthday)
val dbDateString = new SimpleDateFormat("dd-MM", Locale.ENGLISH).format(date)
What about:
import java.time._
val birthDay = MonthDay.parse("--12-04")