Is there any way to get current timestamp in Date format - scala

Is there any way to get current timestamp in Date format in scala. I needed to create a date histogram and new Date() gives time in seconds and not in dd-mm-yyyy hh:mm format

Try to use the classes from the java.time package:
import java.time._
import java.time.format._
val format = DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss");
LocalDateTime.now().format(format)

Related

How to convert Date column format in case class of Scala?

I am using Scala spark.I have two similar CSV files with 10 columns.One difference is with the Date column format.
1st file Date format yyyy-MM-dd
2nd file Date format dd-MM-yyyy
Objective is to: create seperate schema rdd for each file and finally merge both the Rdds.
For the first case class, I have used Date.valueOf [java.sql.Date] in the case class mapping.No issues here..
Am finding issue with the 2nd file Date format..
I have used the same Date.valueOf mapping..but it's throwing error in the date format...
How can I map the date format in the second file as like the 1st format yyyy-MM-dd? Please assist
Use java.util.Date:
val sDate1="31/12/1998"
val date1=new SimpleDateFormat("dd/MM/yyyy").parse(sDate1)
import java.text.SimpleDateFormat
Result:
sDate1: String = 31/12/1998
date1: java.util.Date = Thu Dec 31 00:00:00 CET 1998
to change the output format as a common string format.
val date2=new SimpleDateFormat("yyyy/MM/dd")
date2.format(date1)
Result:
res1: String = 1998/12/31

How to make timestamp date column into preferred format dd/MM/yyyy?

I have a column YDate in the form yyyy-MM-dd HH:mm:ss (timestamp type) but would like to convert it to dd/MM/yyyy.
I tried that;
df = df.withColumn('YDate',F.to_date(F.col('YDate'),'dd/MM/yyyy'))
but get yyyy-MM-dd.
How can I effectively do this.
Use date_format instead:
df = df.withColumn('YDate',F.date_format(F.col('YDate'),'dd/MM/yyyy'))
to_date converts from the given format, while date_format converts into the given format.
You can use date_format function present in the pyspark library.
For more information about date formats you can refer to Date Format Documentation.
Below is the code snippet to solve your usecase.
from pyspark.sql import functions as F
df = spark.createDataFrame([('2015-12-28 23:59:59',)], ['YDate'])
df = df.withColumn('YDate', F.date_format('YDate', 'dd/MM/yyy'))

Pyspark: Output to csv -- Timestamp format is different

I am working with a dataset with the following Timestamp format: yyyy-MM-dd HH:mm:ss
When I output the data to csv the format changes to something like this: 2019-04-29T00:15:00.000Z
Is there any way to get it to the original format like: 2019-04-29 00:15:00
Do I need to convert that column to string and then push it to csv?
I am saying my file to csv like so:
df.coalesce(1).write.format("com.databricks.spark.csv"
).mode('overwrite'
).option("header", "true"
).save("date_fix.csv")
Alternative
spark >=2.0.0
set option("timestampFormat", "yyyy-MM-dd HH:mm:ss") for format("csv")
df.coalesce(1).write.format("csv"
).mode('overwrite'
).option("header", "true"
).option("timestampFormat", "yyyy-MM-dd HH:mm:ss"
).save("date_fix.csv")
As per documentation-
timestampFormat (default yyyy-MM-dd'T'HH:mm:ss.SSSXXX): sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type.
spark < 2.0.0
set option("dateFormat", "yyyy-MM-dd HH:mm:ss") for format("csv")
df.coalesce(1).write.format("com.databricks.spark.csv"
).mode('overwrite'
).option("header", "true"
).option("dateFormat", "yyyy-MM-dd HH:mm:ss"
).save("date_fix.csv")
As per documentation-
dateFormat: specifies a string that indicates the date format to use when reading dates or timestamps. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to both DateType and TimestampType. By default, it is null which means trying to parse times and date by java.sql.Timestamp.valueOf() and java.sql.Date.valueOf()
ref - readme
Yes, that's correct. The easiest way to achieve this is using pyspark.sql.functions.date_format such as:
import pyspark.sql.functions as f
df.withColumn(
"date_column_formatted",
f.date_format(f.col("timestamp"), "yyyy-MM-dd HH:mm:ss")
)
More info about it here https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.date_format.
Hope this helps!

pyspark : Convert string to date format without minute, decod and hour

Hello I would like to convert string date to date format:
for example from 190424 to 2019-01-24
I try with this code :
tx_wd_df = tx_wd_df.select(
'dateTransmission',
from_unixtime(unix_timestamp('dateTransmission', 'yymmdd')).alias('dateTransmissionDATE')
)
But I got this format : 2019-01-24 00:04:00
I would like only 2019-01-24
Any idea please?
Thanks
tx_wd_df.show(truncate=False)
You can simply use to_date(). This will discard the rest of the date, and pick up only the format that matches the input date format string.
import pyspark.sql.functions as F
date_column = "dateTransmission"
# MM because mm in Java Simple Date Format is minutes, and MM is months
date_format = "yyMMdd"
df = df.withColumn(date_column, F.to_date(F.col(date_column), date_format))

joda Datetime. How to SET timezone to the parsed DateTime?

I'm using Scala along with play framework. I want to parse the String with simple date and say it's in UTC. So if I have 2018-10-04 I want to get 2018-10-04T00:00:00.000Z
With this:
DateTime.parse("2018-10-04", DateTimeFormat.forPattern("yyyy-mm-dd")).withZone(DateTimeZone.UTC)
I keep getting 2018-10-03T22:00:00.000Z if I have +2 timezone. How to just say that it's already in UTC?
One way is to use LocalDatetime to initially ignore the timezone:
scala> LocalDateTime.parse("2018-10-04", DateTimeFormat.forPattern("yyyy-MM-dd"))
res11: org.joda.time.LocalDateTime = 2018-10-04T00:00:00.000
then you can use toDateTime to make this UTC:
scala> LocalDateTime.parse("2018-10-04", DateTimeFormat.forPattern("yyyy-MM-dd")).toDateTime(DateTimeZone.UTC)
res12: org.joda.time.DateTime = 2018-10-04T00:00:00.000Z
Also: You should use MM (month) rather than mm (minute).
First the imports.
import org.joda.time.{DateTime, DateTimeZone}
import org.joda.time.format.DateTimeFormat
Since the time part is fixed, we can use a constant suffix for the time part.
val timeString = "T00:00:00.000Z"
We use string-interpolation to append this to the incoming dates.
DateTime.parse(s"2018-10-04$timeString", DateTimeFormat.forPattern("yyyy-mm-dd'T'HH:mm:ss.SSSZ")).withZone(DateTimeZone.UTC)