I have loaded data to Scala Dataframe and I have a field which of type String in this format "20201208140823", as you can see it is date and time.
The question how can I convert it to date in this format dd-mm-yyyy hh24:mi:ss. I tried and search the web and I could not find the appropriate function and answer. Can anyone help ?
last_edit is the field
%scala
import org.apache.spark.sql.functions._
val DF_MelbParkBayInfo = spark.sql("select the_geom,marker_id,meter_id,bay_id, last_edit , rd_seg_id,rd_seg_dsc from temp_MelbParkBayInfo")
DF_MelbParkBayInfo:org.apache.spark.sql.DataFrame
the_geom:string
marker_id:string
meter_id:string
bay_id:string
last_edit:string
rd_seg_id:string
rd_seg_dsc:string
We have built-in functions to achieve the required result in spark:
val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
import org.apache.spark.sql.functions._
spark.sparkContext.setLogLevel("ERROR")
// Sample dataframe
val df = Seq("20201208140823","20211210140823").toDF("last_edit")
// Method#1
df.withColumn("last_edit",
date_format(to_timestamp('last_edit,"yyyyMMddHHmmss"),"dd-MM-yyyy HH:mm:ss"))
.show()
// Method#2
df.withColumn("last_edit", from_unixtime(unix_timestamp('last_edit,"yyyyMMddHHmmss"),"dd-MM-yyyy HH:mm:ss"))
.show()
+-------------------+
| last_edit|
+-------------------+
|08-12-2020 14:08:23|
|10-12-2021 14:08:23|
+-------------------+
This is how we do it in Java and I believe you should be able to use/adapt it in Scala.
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
public class Main {
public static void main(String[] args) {
String strDateTime = "20201208140823";
DateTimeFormatter dtfInput = DateTimeFormatter.ofPattern("uuuuMMddHHmmss", Locale.ENGLISH);
LocalDateTime ldt = LocalDateTime.parse(strDateTime, dtfInput);
System.out.println(ldt);
// Formatted
DateTimeFormatter dtfOutput = DateTimeFormatter.ofPattern("dd-MM-uuuu HH:mm:ss", Locale.ENGLISH);
String formatted = dtfOutput.format(ldt);
System.out.println(formatted);
}
}
Output:
2020-12-08T14:08:23
08-12-2020 14:08:23
ONLINE DEMO
Learn more about the modern Date-Time API from Trail: Date Time.
Related
How can I convert the dd/mm/yyyy to yyyymmdd format and also
dd/m/yyyy to yyyymmdd format by using joda time in Scala.
am using this dependency
"joda-time" % "joda-time" % "2.9.9",
This answer already answers your question in Java, but here it is translated into Scala:
import org.joda.time.DateTime
import org.joda.time.format.{DateTimeFormat, DateTimeFormatter}
// define original date format
val originalFormat: DateTimeFormatter = DateTimeFormat.forPattern("dd/MM/yyyy")
val input: DateTime = originalFormat.parseDateTime("02/09/2017")
// define new format
val newFormat: DateTimeFormatter = DateTimeFormat.forPattern("yyyyMMdd")
val output: String = newFormat.print(input) // 20170902
This code will already account for missing a leading 0 from your date (ie it will see 02/9/2017 and 2/9/2017 as the same thing). It will not predict missing parts of the year though, so 2/9/17 will be outputted as 00170902 instead of 20170902.
As the answer I linked to earlier mentions though, you can just use java.time to do the same thing:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val originalFormat = DateTimeFormatter.ofPattern("dd/MM/yyyy")
val input = LocalDate.parse("02/09/2017", originalFormat)
val newFormat = DateTimeFormatter.ofPattern("yyyyMMdd")
val output = input.format(newFormat)
import org.joda.time.DateTime
import org.joda.time.format._
val fmt = DateTimeFormat.forPattern("dd/mm/yyyy")
val dt = fmt.parseDateTime("02/02/2017")
val fmt2 = DateTimeFormat.forPattern("yyyymmdd")
fmt2.print(dt)
I have a data frame with a column of unix timestamp(eg.1435655706000), and I want to convert it to data with format 'yyyy-MM-DD', I've tried nscala-time but it doesn't work.
val time_col = sqlc.sql("select ts from mr").map(_(0).toString.toDateTime)
time_col.collect().foreach(println)
and I got error:
java.lang.IllegalArgumentException: Invalid format: "1435655706000" is malformed at "6000"
Here it is using Scala DataFrame functions: from_unixtime and to_date
// NOTE: divide by 1000 required if milliseconds
// e.g. 1446846655609 -> 2015-11-06 21:50:55 -> 2015-11-06
mr.select(to_date(from_unixtime($"ts" / 1000)))
Since spark1.5 , there is a builtin UDF for doing that.
val df = sqlContext.sql("select from_unixtime(ts,'YYYY-MM-dd') as `ts` from mr")
Please check Spark 1.5.2 API Doc for more info.
import org.joda.time.{DateTime, DateTimeZone}
import org.joda.time.format.DateTimeFormat
You need to import the following libraries.
val stri = new DateTime(timeInMillisec).toString("yyyy/MM/dd")
Or adjusting to your case :
val time_col = sqlContext.sql("select ts from mr")
.map(line => new DateTime(line(0).toInt).toString("yyyy/MM/dd"))
There could be another way :
import com.github.nscala_time.time.Imports._
val date = (new DateTime() + ((threshold.toDouble)/1000).toInt.seconds )
.toString("yyyy/MM/dd")
Hope this helps :)
You needn't convert to String before applying toDataTime with nscala_time
import com.github.nscala_time.time.Imports._
scala> 1435655706000L.toDateTime
res4: org.joda.time.DateTime = 2015-06-30T09:15:06.000Z
`
I have solved this issue using the joda-time library by mapping on the DataFrame and converting the DateTime into a String :
import org.joda.time._
val time_col = sqlContext.sql("select ts from mr")
.map(line => new DateTime(line(0)).toString("yyyy-MM-dd"))
You can use the following syntax in Java
input.select("timestamp)
.withColumn("date", date_format(col("timestamp").$div(1000).cast(DataTypes.TimestampType), "yyyyMMdd").cast(DataTypes.IntegerType))
What you can do is:
input.withColumn("time", concat(from_unixtime(input.col("COL_WITH_UNIX_TIME")/1000,
"yyyy-MM-dd'T'HH:mm:ss"), typedLit("."), substring(input.col("COL_WITH_UNIX_TIME"), 11, 3),
typedLit("Z")))
where time is a new column name and COL_WITH_UNIX_TIME is the name of the column which you want to convert. This will give data in millis, making your data more accurate, like: "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
I have a date key of type 20170501 which is in YYYYmmdd format. How can we get a date x days back from this date in Scala?
This is what I have in program
val runDate = 20170501
Now I want a date say 30 days back from this date.
Using Scala/JVM/Java 8...
scala> import java.time._
import java.time._
scala> import java.time.format._
import java.time.format._
scala> val formatter = DateTimeFormatter.ofPattern("yyyyMMdd")
formatter: java.time.format.DateTimeFormatter = Value(YearOfEra,4,19,EXCEEDS_PAD)Value(MonthOfYear,2)Value(DayOfMonth,2)
scala> val runDate = 20170501
runDate: Int = 20170501
scala> val runDay = LocalDate.parse(runDate.toString, formatter)
runDay: java.time.LocalDate = 2017-05-01
scala> val runDayMinus30 = runDay.minusDays(30)
runDayMinus30: java.time.LocalDate = 2017-04-01
You can also use joda-time API with which has really good functions like
date.minusMonths
date.minusYear
date.minusDays
date.minusHours
date.minusMinutes
Here is simple example usinf JodaTIme API '
import org.joda.time.format.DateTimeFormat
val dtf = DateTimeFormat.forPattern("yyyyMMdd")
val dt= "20170531"
val date = dtf.parseDateTime(dt)
println(date.minusDays(30))
Output:
2017-05-01T00:00:00.000+05:45
For this you need to use udf and create a DateTime object with your input format "YYYYmmdd" and do the operations.
Hope this helps!
Suppose my date format is 21/05/2017 then the output will be SUN.
How can I get the day given a date?
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val df = DateTimeFormatter.ofPattern("dd/MM/yyyy")
val dayOfWeek = LocalDate.parse("21/05/2017",df).getDayOfWeek
You can use SimpleDateFormat as illustrated below:
import java.util.Calendar
import java.text.SimpleDateFormat
val now = Calendar.getInstance.getTime
val date = new SimpleDateFormat("yyyy-MM-dd")
date.format(now)
res1: String = 2017-05-20
val dowInt = new SimpleDateFormat("u")
dowInt.format(now)
res2: String = 6
val dowText = new SimpleDateFormat("E")
dowText.format(now)
res3: String = Sat
[UPDATE]
As per comments below, please note that SimpleDateFormat isn't thread-safe.
You can also use nscala-time which is a Scala wrapper for Java's joda-time which is a datetime library with a ton of functionality.
import com.github.nscala_time.time.Imports._
val someDate =(newDateTime).withYear(2017)
.withMonthOfYear(5)
.withDayOfMonth(21)
someDate.getDayOfWeek
res7: Int = 7
Now you need to know the mapping between integer days of the week and names. Here, 7 corresponds to Sunday.
I have a data frame with a column of unix timestamp(eg.1435655706000), and I want to convert it to data with format 'yyyy-MM-DD', I've tried nscala-time but it doesn't work.
val time_col = sqlc.sql("select ts from mr").map(_(0).toString.toDateTime)
time_col.collect().foreach(println)
and I got error:
java.lang.IllegalArgumentException: Invalid format: "1435655706000" is malformed at "6000"
Here it is using Scala DataFrame functions: from_unixtime and to_date
// NOTE: divide by 1000 required if milliseconds
// e.g. 1446846655609 -> 2015-11-06 21:50:55 -> 2015-11-06
mr.select(to_date(from_unixtime($"ts" / 1000)))
Since spark1.5 , there is a builtin UDF for doing that.
val df = sqlContext.sql("select from_unixtime(ts,'YYYY-MM-dd') as `ts` from mr")
Please check Spark 1.5.2 API Doc for more info.
import org.joda.time.{DateTime, DateTimeZone}
import org.joda.time.format.DateTimeFormat
You need to import the following libraries.
val stri = new DateTime(timeInMillisec).toString("yyyy/MM/dd")
Or adjusting to your case :
val time_col = sqlContext.sql("select ts from mr")
.map(line => new DateTime(line(0).toInt).toString("yyyy/MM/dd"))
There could be another way :
import com.github.nscala_time.time.Imports._
val date = (new DateTime() + ((threshold.toDouble)/1000).toInt.seconds )
.toString("yyyy/MM/dd")
Hope this helps :)
You needn't convert to String before applying toDataTime with nscala_time
import com.github.nscala_time.time.Imports._
scala> 1435655706000L.toDateTime
res4: org.joda.time.DateTime = 2015-06-30T09:15:06.000Z
`
I have solved this issue using the joda-time library by mapping on the DataFrame and converting the DateTime into a String :
import org.joda.time._
val time_col = sqlContext.sql("select ts from mr")
.map(line => new DateTime(line(0)).toString("yyyy-MM-dd"))
You can use the following syntax in Java
input.select("timestamp)
.withColumn("date", date_format(col("timestamp").$div(1000).cast(DataTypes.TimestampType), "yyyyMMdd").cast(DataTypes.IntegerType))
What you can do is:
input.withColumn("time", concat(from_unixtime(input.col("COL_WITH_UNIX_TIME")/1000,
"yyyy-MM-dd'T'HH:mm:ss"), typedLit("."), substring(input.col("COL_WITH_UNIX_TIME"), 11, 3),
typedLit("Z")))
where time is a new column name and COL_WITH_UNIX_TIME is the name of the column which you want to convert. This will give data in millis, making your data more accurate, like: "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"