subtracting a DateTime from a DateTime in scala - scala

I'm relatively new to both scala and jodatime, but have been pretty impressed with both. I'm trying to figure out if there is a more elegant way to do some date arithmetic. Here's a method:
private def calcDuration() : String = {
val p = new Period(calcCloseTime.toInstant.getMillis - calcOpenTime.toInstant.getMillis)
val s : String = p.getHours.toString + ":" + p.getMinutes.toString +
":" + p.getSeconds.toString
return s
}
I convert everything to a string because I am putting it into a MongoDB and I'm not sure how to serialize a joda Duration or Period. If someone knows that I would really appreciate the answer.
Anyway, the calcCloseTime and calcOpenTime methods return DateTime objects. Converting them to Instants is the best way I found to get the difference. Is there a better way?
Another side question: When the hours, minutes or seconds are single digit, the resulting string is not zero filled. Is there a straightforward way to make that string look like HH:MM:SS?
Thanks,
John

Period formatting is done by the PeriodFormatter class. You can use a default one, or construct your own using PeriodFormatterBuilder. It may take some more code as you might like to set this builder up properly, but you can use it for example like so:
scala> import org.joda.time._
import org.joda.time._
scala> import org.joda.time.format._
import org.joda.time.format._
scala> val d1 = new DateTime(2010,1,1,10,5,1,0)
d1: org.joda.time.DateTime = 2010-01-01T10:05:01.000+01:00
scala> val d2 = new DateTime(2010,1,1,13,7,2,0)
d2: org.joda.time.DateTime = 2010-01-01T13:07:02.000+01:00
scala> val p = new Period(d1, d2)
p: org.joda.time.Period = PT3H2M1S
scala> val hms = new PeriodFormatterBuilder() minimumPrintedDigits(2) printZeroAlways() appendHours() appendSeparator(":") appendMinutes() appendSuffix(":") appendSeconds() toFormatter
hms: org.joda.time.format.PeriodFormatter = org.joda.time.format.PeriodFormatter#4d2125
scala> hms print p
res0: java.lang.String = 03:02:01
You should perhaps also be aware that day transitions are not taken into account:
scala> val p2 = new Period(new LocalDate(2010,1,1), new LocalDate(2010,1,2))
p2: org.joda.time.Period = P1D
scala> hms print p2
res1: java.lang.String = 00:00:00
so if you need to hanldes those as well, you would also need to add the required fields (days, weeks, years maybe) to the formatter.

You might want to take a look at Jorge Ortiz's wrapper for Joda-Time, scala-time for something that's a bit nicer to work with in Scala.
You should then be able to use something like
(calcOpenTime to calcCloseTime).millis

Does this link help?
How do I calculate the difference between two dates?
This question has more than one answer! If you just want the number of whole days between two dates, then you can use the new Days class in version 1.4 of Joda-Time.
Days d = Days.daysBetween(startDate, endDate);
int days = d.getDays();
This method, and other static methods on the Days class have been designed to operate well with the JDK5 static import facility.
If however you want to calculate the number of days, weeks, months and years between the two dates, then you need a Period By default, this will split the difference between the two datetimes into parts, such as "1 month, 2 weeks, 4 days and 7 hours".
Period p = new Period(startDate, endDate);
You can control which fields get extracted using a PeriodType.
Period p = new Period(startDate, endDate, PeriodType.yearMonthDay());
This example will return not return any weeks or time fields, thus the previous example becomes "1 month and 18 days".

Related

Add seconds and epochs to obtain datetime

Beginner learner here, trying to add an array of integers (which are meant to be seconds) to an array of Epochs:
Sample input:
AddSeconds = [3,4]
TimeEpoch = [1575165652000, 1576424223000] // Which are 2019-12-01 02:00:52 and 2019-12-15 15:37:03
Desired output:
endDate = [2019-12-01 02:00:55, 2019-12-15 15:37:07]
I need to convert the TimeEpoch to dates with "yyyy-MM-dd hh:mm:ss" format
I need to add "AddSeconds" to the obtained dates
Thanks!
You can do this (I changed your variables to start with a lower-case letter, because Groovy guesses that upper case letter variables are actually classnames, so can cause confusion):
addSeconds = [3,4]
timeEpoch = [1575165652000, 1576424223000]
import java.time.*
import java.time.format.*
def formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd hh:mm:ss")
def datesAsStrings = [addSeconds, timeEpoch]
.transpose()
.collect { a, t -> Instant.ofEpochMilli(t).plusSeconds(a).atZone(ZoneId.systemDefault()).toLocalDateTime() }
.collect { d -> d.format(formatter) }
datesAsStrings.each { println it }
That takes your two lists, and joins them together with transpose():
[ [3, 1575165652000], [4, 1576424223000] ]
Then for each of these, we create an instant, add the seconds, and convert it to a LocalDateTime using the current system timezone -- You need to consider timezones 😉
Then we convert them to the String format you wanted, and pint each of them out

Get date out of year and day of year from a value - Scala

I have a 6 digit value from which i have to get the date in scala. For eg if the value is - 119003 then the output should be
1=20 century
19=2019 year
003= january 3
The output should be 2019/01/03
I have tried ti split the value first and then get the date. But i am not sure how to proceed as i am new to scala
I think you'll have to do the century calculations manually. After that you can let the java.time library do all the rest.
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val in = "119003"
val cent = in.head.asDigit + 19
val res = LocalDate.parse(cent+in.tail, DateTimeFormatter.ofPattern("yyyyDDD"))
.format(DateTimeFormatter.ofPattern("yyyy/MM/dd"))
//res: String = 2019/01/03
The Date class of Java 1.0 used 1900-based years, so 119 would mean 2019, for example. This use was deprecated already in Java 1.1 more than 20 years ago, so it’s surprising to see it survive into Scala.
When you say 6 digit value, I take it to be a number (not a string).
The answer by jwvh is correct. My variant would be like (sorry about the Java code, please translate yourself):
int value = 119003;
int year1900based = value / 1000;
int dayOfYear = value % 1000;
LocalDate date = LocalDate.ofYearDay(year1900based + 1900, dayOfYear);
System.out.println(date);
2019-01-03
If you’ve got a string, I would slice it into two parts only, 119 and 003 (not three parts as in your comment). Parse each into an int and proceed as above.
If you need 2019/01/03 format in your output, use a DateTimeFormatter for that. Inside your program, do keep the LocalDate, not a String.

What's the simplest way to get a Spark DataFrame from arbitrary Array Data in Scala?

I've been breaking my head about this one for a couple of days now. It feels like it should be intuitively easy... Really hope someone can help!
I've built an org.nd4j.linalg.api.ndarray.INDArray of word occurrence from some semi-structured data like this:
import org.nd4j.linalg.factory.Nd4j
import org.nd4s.Implicits._
val docMap = collection.mutable.Map[Int,Map[Int,Int]] //of the form Map(phrase -> Map(phrasePosition -> word)
val words = ArrayBuffer("word_1","word_2","word_3",..."word_n")
val windows = ArrayBuffer("$phrase,$phrasePosition_1","$phrase,$phrasePosition_2",..."$phrase,$phrasePosition_n")
var matrix = Nd4j.create(windows.length*words.length).reshape(windows.length,words.length)
for (row <- matrix.shape(0)){
for(column <- matrix.shape(1){
//+1 to (row,column) if word occurs at phrase, phrasePosition indicated by window_n.
}
}
val finalmatrix = matrix.T.dot(matrix) // to get co-occurrence matrix
So far so good...
Downstream of this point I need to integrate the data into an existing pipeline in Spark, and use that implementation of pca etc, so I need to create a DataFrame, or at least an RDD. If I knew the number of words and/or windows in advance I could do something like:
case class Row(window : String, word_1 : Double, word_2 : Double, ...etc)
val dfSeq = ArrayBuffer[Row]()
for (row <- matrix.shape(0)){
dfSeq += Row(windows(row),matrix.get(NDArrayIndex.point(row), NDArrayIndex.all()))
}
sc.parallelize(dfSeq).toDF("window","word_1","word_2",...etc)
but the number of windows and words is determined at runtime. I'm looking for a WindowsxWords org.apache.spark.sql.DataFrame as output, input is a WindowsxWords org.nd4j.linalg.api.ndarray.INDArray
Thanks in advance for any help you can offer.
Ok, so after several days work it looks like the simple answer is: there isn't one. In fact, it looks like trying to use Nd4j in this context at all is a bad idea for several reasons:
It's (really) hard to get data out of the native INDArray format once you've put it in.
Even using something like guava, the .data() method brings everything on heap which will quickly become expensive.
You've got the added hassle of having to compile an assembly jar or use hdfs etc to handle the library itself.
I did also consider using Breeze which may actually provide a viable solution but carries some of the same problems and can't be used on distributed data structures.
Unfortunately, using native Spark / Scala datatypes, although easier once you know how, is - for someone like me coming from Python + numpy + pandas heaven at least - painfully convoluted and ugly.
Nevertheless, I did implement this solution successfully:
import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,DenseMatrix,DenseVector}
import org.apache.spark.mllib.linalg.distributed.RowMatrix
//first make a pseudo-matrix from Scala Array[Double]:
var rowSeq = Seq.fill(windows.length)(Array.fill(words.length)(0d))
//iterate through 'rows' and 'columns' to fill it:
for (row 0 until windows.length){
for (column 0 until words.length){
// rowSeq(row)(column) += 1 if word occurs at phrase, phrasePosition indicated by window_n.
}
}
//create Spark DenseMatrix
val rows : Array[Double] = rowSeq.transpose.flatten.toArray
val matrix = new DenseMatrix(windows.length,words.length,rows)
One of the main operations that I needed Nd4J for was matrix.T.dot(matrix) but it turns out that you can't multiply 2 matrices of Type org.apache.spark.mllib.linalg.DenseMatrix together, one of them (A) has to be a org.apache.spark.mllib.linalg.distributed.RowMatrix and - you guessed it - you can't call matrix.transpose() on a RowMatrix, only on a DenseMatrix! Since it's not really relevant to the question, I'll leave that part out, except to explain that what comes out of that step is a RowMatrix. Credit is also due here and here for the final part of the solution:
val rowMatrix : [RowMatrix] = transposeAndDotDenseMatrix(matrix)
// get DataFrame from RowMatrix via DenseMatrix
val newdense = new DenseMatrix(rowMatrix.numRows().toInt,rowMatrix.numCols().toInt,rowMatrix.rows.collect.flatMap(x => x.toArray)) // the call to collect() here is undesirable...
val matrixRows = newdense.rowIter.toSeq.map(_.toArray)
val df = spark.sparkContext.parallelize(matrixRows).toDF("Rows")
// then separate columns:
val df2 = (0 until words.length).foldLeft(df)((df, num) =>
df.withColumn(words(num), $"Rows".getItem(num)))
.drop("Rows")
Would love to hear improvements and suggestions on this, thanks.

Java.util.Calendar.add() function in scala

val now = Calendar.getInstance();
val toDt= now.get(Calendar.MONTH)
val fromDt= now.add(Calendar.MONTH,-6)
I am trying fetch minus 6 month date value by using above code. Looks like now.add is generating Unit value
val fromDt = now.clone().asInstanceOf[Calendar]
fromDt.add(Calendar.MONTH, -6) // after call `add` method, the `fromDt` internal state has changed, so you can use `fromDt` directly. like below `print`
println(fromDt.get(Calendar.MONTH))
> 7
You can clone calendar and add Month.

Type parameterized arithmetic?

Trying to think of a way to subtract 5 minutes from 2 hours.
It doesn't make sense to subtract 5 from 2, because we end up with -3 generic time units, which is useless. But if "hour" is a subtype of "minute", we could convert 2 hours to 120 minutes, and yield 115 minutes, or 1 hour and 55 minutes.
Similarly, if we want to add 5 apples to 5 oranges, we cannot evaluate this in terms of apples, but might expect to end up with 10 fruit.
It seems in the above examples, and generally when using a number as an adjective, the integers need to be parameterized by the type of object they describing. I think it would be very useful if instead of declaring
val hours = 2
val minutes = 5
you could do something like
val hours = 2[Hour]
val minutes = 5[Minute]
val result = hours - minutes
assert (result == 115[Minute])
Does anything like this exist, would it be useful, and is it something that could be implemented?
EDIT: to clarify, the time example above is just a random example I thought up. My question is more whether in general the idea of parameterized Numerics is a useful concept, just as you have parameterized Lists etc. (The answer might be "no", I don't know!)
You can accomplish this by having two classes for Hours and Minutes, along with an implicit conversion function from hours to minutes
trait TimeUnit
case class Hour(val num: Int) extends TimeUnit
case class Minute(val num: Int) extends TimeUnit {
def - (sub: Minute) = Minute(num - sub.num)
}
implicit def hour2Minute(hour: Hour) = Minute(hour.num * 60)
This allows you to do something like
val h = Hour(2) - Minute(30) //returns Minute(90)
You can find some examples for this in the lift framework (spec).
import net.liftweb.utils.TimeHelpers._
3.minutes == 6 * 30.seconds
(Note: it seems you need to have reasonable numbers for correct comparison. Eg. There may be no more than 60 seconds.)
You might try scala-time, which is a wrapper around Joda Time and makes it a bit more idiomatic for Scala, including some DSL to do time period computations, similar to what Brian Agnew suggested in his answer.
For instance,
2.hours + 45.minutes + 10.seconds
creates a Joda Period.
It seems to me a DSL would be of use here. So you could write
2.hours - 5.minutes
and the appropriate conversions would take place to convert 2 hours into a Hours object (value 2) etc.
Lots of resources exist describing Scala's DSL capabilities. e.g. see this from O'Reilly