Calculate average in scala - scala

I'm getting a session from database which content result which content dimensions, now I'm trying to calculate average for dimensions:
sessionService.findById(sessionId).map {
case Some(session) =>
val result = session.result.getOrElse(Seq.empty)
for (dimension <- result.dimensions) {
var test += dimension.average
}
Ok(Json.toJson(session)).as("application/json")
case None => NotFound(Json.toJson("Not found"))
}
but I get this error :
UPDATE :
When trying
var test = 0
for (dimension <- result.dimensions) {
test += dimension.average
}
I get this error:

var test += dimension.average
is invalid syntax. You can't simultaneously declare and increase a variable... well it just doesn't make sense.
You probably meant something like
var test = 0
for (dimension <- result.dimensions) {
test += dimension.average
}
By the way, have you considered a different, more functional approach?
val test = result.dimensions.reduce(_ + _.average)
About the update, the problem is with getOrElse(Seq.empty)
You can try something like
sessionService.findById(sessionId).map {
case Some(Session(_, _, Some(result), _)) =>
result.dimensions.reduce { case (avg, d) => avg + d.average }
case None =>
NotFound(Json.toJson("Not found"))
}

Related

scala using calculations from pattern matching's guard (if) in body

I'm using pattern matching in scala a lot. Many times I need to do some calculations in guard part and sometimes they are pretty expensive. Is there any way to bind calculated values to separate value?
//i wan't to use result of prettyExpensiveFunc in body safely
people.collect {
case ...
case Some(Right((x, y))) if prettyExpensiveFunc(x, y) > 0 => prettyExpensiveFunc(x)
}
//ideally something like that could be helpful, but it doesn't compile:
people.collect {
case ...
case Some(Right((x, y))) if {val z = prettyExpensiveFunc(x, y); y > 0} => z
}
//this sollution works but it isn't safe for some `Seq` types and is risky when more cases are used.
var cache:Int = 0
people.collect {
case ...
case Some(Right((x, y))) if {cache = prettyExpensiveFunc(x, y); cache > 0} => cache
}
Is there any better solution?
ps: Example is simplified and I don't expect anwers that shows that I don't need pattern matching here.
You can use cats.Eval to make expensive calculations lazy and memoizable, create Evals using .map and extract .value (calculated at most once - if needed) in .collect
values.map { value =>
val expensiveCheck1 = Eval.later { prettyExpensiveFunc(value) }
val expensiveCheck2 = Eval.later { anotherExpensiveFunc(value) }
(value, expensiveCheck1, expensiveCheck2)
}.collect {
case (value, lazyResult1, _) if lazyResult1.value > 0 => ...
case (value, _, lazyResult2) if lazyResult2.value > 0 => ...
case (value, lazyResult1, lazyResult2) if lazyResult1.value > lazyResult2.value => ...
...
}
I don't see a way of doing what you want without creating some implementation of lazy evaluation, and if you have to use one, you might as well use existing one instead of rolling one yourself.
EDIT. Just in case you haven't noticed - you aren't losing the ability to pattern match by using tuple here:
values.map {
// originial value -> lazily evaluated memoized expensive calculation
case a # Some(Right((x, y)) => a -> Some(Eval.later(prettyExpensiveFunc(x, y)))
case a => a -> None
}.collect {
// match type and calculation
...
case (Some(Right((x, y))), Some(lazyResult)) if lazyResult.value > 0 => ...
...
}
Why not run the function first for every element and then work with a tuple?
Seq(1,2,3,4,5).map(e => (e, prettyExpensiveFunc(e))).collect {
case ...
case (x, y) if y => y
}
I tried own matchers and effect is somehow OK, but not perfect. My matcher is untyped, and it is bit ugly to make it fully typed.
class Matcher[T,E](f:PartialFunction[T, E]) {
def unapply(z: T): Option[E] = if (f.isDefinedAt(z)) Some(f(z)) else None
}
def newMatcherAny[E](f:PartialFunction[Any, E]) = new Matcher(f)
def newMatcher[T,E](f:PartialFunction[T, E]) = new Matcher(f)
def prettyExpensiveFunc(x:Int) = {println(s"-- prettyExpensiveFunc($x)"); x%2+x*x}
val x = Seq(
Some(Right(22)),
Some(Right(10)),
Some(Left("Oh now")),
None
)
val PersonAgeRank = newMatcherAny { case Some(Right(x:Int)) => (x, prettyExpensiveFunc(x)) }
x.collect {
case PersonAgeRank(age, rank) if rank > 100 => println("age:"+age + " rank:" + rank)
}
https://scalafiddle.io/sf/hFbcAqH/3

How do you flatten a Sequence of Sequence in DBIOaction SLICK?

Hey guys i am new in slick, how can I flatten this sequence of sequence? so that can return the commented code
def insertIfNotExists(mapCountryStates: Map[String, Iterable[StateUtil]]): Future[Seq[Seq[StateTable]]] /*: Future[Seq[StateTable]]*/ = {
val interaction = DBIO.sequence(mapCountryStates.toSeq.map { case (alpha2Country, statesUtil) =>
val codes = statesUtil.map(_.alpha3Code)
for {
countryId <- Countries.filter(_.alpha2Code === alpha2Country).map(_.id).result.head
existing <- States.filter(s => (s.alpha3Code inSet codes) && s.countryId === countryId).result
stateTables = statesUtil.map(x => StateTable(0L, x.name, x.alpha3Code, countryId))
statesInserted <- StatesInsertQuery ++= stateTables.filter(s => !existing.exists(x => x.alpha3Code == s.alpha3Code && x.countryId == s.countryId))
} yield existing ++ statesInserted
})
db.run(interaction.transactionally)
}
if I write it here:
val interaction = DBIO.sequence(...).flatten
or here:
db.run(interaction.flatten.transactionally)
[error] Cannot prove that Seq[Seq[StateRepository.this.StateTableMapping#TableElementType]] <:< slick.dbio.DBIOAction[R2,S2,E2].
but when the application runs, because the IDE does not detect it as an error:
I update my definition with DBIO.fold:
It looks like you might be after DBIO.fold. This provides a way to take a number of actions and reduce them down to a single value. In this case, your single value is a Seq[StateTable] from a Seq[Seq[StateTable]].
A sketch of how this could look might be...
def insertIfNotExists(...): DBIO[Seq[StateTable]] = {
val interaction: Seq[DBIO[Seq[StateTable]]] = ...
val startingPoint: Seq[StateTable] = Seq.empty
DBIO.fold(interaction, startingPoint) {
(total, list) => total ++ list
}
}
It looks like the types will line up using fold. Hope it's of some use in your case.
There's some more information about fold in Chapter 4 of Essential Slick.
A viable solution should be to flatten the sequence once the Future has been completed:
def insertIfNotExists(mapCountryStates: Map[String, Iterable[StateUtil]]): Future[Seq[StateTable]] = {
val interaction = DBIO.sequence(mapCountryStates.toSeq.map { case (alpha2Country, statesUtil) =>
val codes = statesUtil.map(_.alpha3Code)
for {
countryId <- Countries.filter(_.alpha2Code === alpha2Country).map(_.id).result.head
existing <- States.filter(s => (s.alpha3Code inSet codes) && s.countryId === countryId).result
stateTables = statesUtil.map(x => StateTable(0L, x.name, x.alpha3Code, countryId))
statesInserted <- StatesInsertQuery ++= stateTables.filter(s => !existing.exists(x => x.alpha3Code == s.alpha3Code && x.countryId == s.countryId))
} yield existing ++ statesInserted
})
db.run(interaction.transactionally).map(_.flatten)
}

Use of Scala Loan pattern in Success Case

I'm following the tutorial from Alvin Alexander to use Loan Pattern
Here is the code what I use -
val year = 2016
val nationalData = {
val source = io.Source.fromFile(s"resources/Babynames/names/yob$year.txt")
// names is iterator of String, split() gives the array
//.toArray & toSeq is a slow process compare to .toSet // .toSeq gives Stream Closed error
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")(0)).toSet
source.close()
names
// println(names.mkString(","))
}
println("Names " + nationalData)
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
val source = io.Source.fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
source.close()
(stateFile.take(2), names)
}
println(info(0)._2.size + " names from state "+ info(0)._1)
println(info(1)._2.size + " names from state "+ info(1)._1)
for ((state, sname) <- info) {
println("State: " +state + " Coverage of name in "+ year+" "+ sname.count(n => nationalData.contains(n)).toDouble / nationalData.size) // Set doesn't have length method
}
This is how I applied readTextFile, readTextFileWithTry on the above code to learn/experiment Loan Pattern in the above code
def using[A <: { def close(): Unit }, B](resource: A)(f: A => B): B =
try {
f(resource)
} finally {
resource.close()
}
def readTextFile(filename: String): Option[List[String]] = {
try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
}
Some(lines)
} catch {
case e: Exception => None
}
}
def readTextFileWithTry(filename: String): Try[List[String]] = {
Try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
}
lines
}
}
val year = 2016
val data = readTextFile(s"resources/Babynames/names/yob$year.txt") match {
case Some(lines) =>
val n = lines.filter(_.nonEmpty).map(_.split(",")(0)).toSet
println(n)
case None => println("couldn't read file")
}
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
data1 match {
case Success(lines) => {
val info = for (stateFile <- data1; if stateFile.endsWith(".TXT")) yield {
val source = fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
(stateFile.take(2), names)
println(names)
}
}
But in the second case, readTextFileWithTry, I am getting the following error -
Failed, message is: java.io.FileNotFoundException: resources\Babynames\namesbystate (Access is denied)
I guess the reason for the failure is from SO what I understand -
I am trying to open the same file on each iteration of the for loop
Apart from that, I have few concerns regarding how I use -
Is it the good way to use? Can some help me how can I use the TRY on multiple occasions?
I tried to change the return type of readTextFileWithTry like Option[A] or Set/Map or Scala Collection to apply higher-order functions later on that. but not able to succeed. Not sure that is a good practice or not.
How can I use higher-order functions in Success case, as there are multiple operations and in Success case the code blocks get bigger? I can't use any field outside of Success case.
Can someone help me to understand?
I think that you problem has nothing to do with "I am trying to open the same file on each iteration of the for loop" and it is actually the same as in the accepted answer
Unfortunately you didn't provide stack trace so it is not clear on which line this happens. I would guess that the falling call is
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
And looking at your first code sample:
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
it looks like the path "resources/Babynames/namesbystate" points to a directory. But in your second example you are trying to read it as a file and this is the reason for the error. It comes from the fact that your readTextFileWithTry is not a valid substitute for java.io.File.list call. And File.list doesn't need a wrapper because it doesn't use any intermediate closeable/disposable entity.
P.S. it might make more sense to use File.list(FilenameFilter filter) instead of if stateFile.endsWith(".TXT"))

Apache Spark: dealing with Option/Some/None in RDDs

I'm mapping over an HBase table, generating one RDD element per HBase row. However, sometimes the row has bad data (throwing a NullPointerException in the parsing code), in which case I just want to skip it.
I have my initial mapper return an Option to indicate that it returns 0 or 1 elements, then filter for Some, then get the contained value:
// myRDD is RDD[(ImmutableBytesWritable, Result)]
val output = myRDD.
map( tuple => getData(tuple._2) ).
filter( {case Some(y) => true; case None => false} ).
map( _.get ).
// ... more RDD operations with the good data
def getData(r: Result) = {
val key = r.getRow
var id = "(unk)"
var x = -1L
try {
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
Some( ( id, ( List(x),
// more stuff ...
) ) )
} catch {
case e: NullPointerException => {
logWarning("Skipping id=" + id + ", x=" + x + "; \n" + e)
None
}
}
}
Is there a more idiomatic way to do this that's shorter? I feel like this looks pretty messy, both in getData() and in the map.filter.map dance I'm doing.
Perhaps a flatMap could work (generate 0 or 1 items in a Seq), but I don't want it to flatten the tuples I'm creating in the map function, just eliminate empties.
An alternative, and often overlooked way, would be using collect(PartialFunction pf), which is meant to 'select' or 'collect' specific elements in the RDD that are defined at the partial function.
The code would look like this:
val output = myRDD.collect{case Success(tuple) => tuple }
def getData(r: Result):Try[(String, List[X])] = Try {
val id = Bytes.toString(key, 0, 11)
val x = Long.MaxValue - Bytes.toLong(key, 11)
(id, List(x))
}
If you change your getData to return a scala.util.Try then you can simplify your transformations considerably. Something like this could work:
def getData(r: Result) = {
val key = r.getRow
var id = "(unk)"
var x = -1L
val tr = util.Try{
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
( id, ( List(x)
// more stuff ...
) )
}
tr.failed.foreach(e => logWarning("Skipping id=" + id + ", x=" + x + "; \n" + e))
tr
}
Then your transform could start like so:
myRDD.
flatMap(tuple => getData(tuple._2).toOption)
If your Try is a Failure it will be turned into a None via toOption and then removed as part of the flatMap logic. At that point, your next step in the transform will only be working with the successful cases being whatever the underlying type is that is returned from getData without the wrapping (i.e. No Option)
If you are ok with dropping the data then you can just use mapPartitions. Here is a sample:
import scala.util._
val mixedData = sc.parallelize(List(1,2,3,4,0))
mixedData.mapPartitions(x=>{
val foo = for(y <- x)
yield {
Try(1/y)
}
for{goodVals <- foo.partition(_.isSuccess)._1}
yield goodVals.get
})
If you want to see the bad values, then you can use an accumulator or just log as you have been.
Your code would look something like this:
val output = myRDD.
mapPartitions( tupleIter => getCleanData(tupleIter) )
// ... more RDD operations with the good data
def getCleanData(iter: Iter[???]) = {
val triedData = getDataInTry(iter)
for{goodVals <- triedData.partition(_.isSuccess)._1}
yield goodVals.get
}
def getDataInTry(iter: Iter[???]) = {
for(r <- iter) yield {
Try{
val key = r._2.getRow
var id = "(unk)"
var x = -1L
id = Bytes.toString(key, 0, 11)
x = Long.MaxValue - Bytes.toLong(key, 11)
// ... more code that might throw exceptions
}
}
}

How can I speed up flatten?

I have this method:
val reportsWithCalculatedUsage = time("Calculate USAGE") {
reportsHavingCalculatedCounter.flatten.flatten.toList.groupBy(_._2.product).mapValues(_.map(_._2)) mapValues { list =>
list.foldLeft(List[ReportDataHelper]()) {
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (
usage = if (head.machine == previous.machine) head.counter - previous.counter else head.usage)
current :: tail
} reverse
}
}
Where reportsHavingCalculatedCounter is of type: val reportsHavingCalculatedCounter:
scala.collection.immutable.Iterable[scala.collection.immutable.IndexedSeq[scala.collection.immutable.Map[Strin
g,com.agilexs.machinexs.logic.ReportDataHelper]]].
This code works perfectly. The problem is that this reportsHavingCalculatedCounter has maps inside it whom sum of ReportDataHelper objects (map values) is about 50 000 entries and the flatten.flatten takes about 15s to be processed.
I've also tried with 2 flat maps but that's almost the same (time consuming). Is there any way to improve this? (please ignore foldLeft or reverse; if I remove that the issue is still present, the most time consuming are those 2 flatten).
UPDATE: I've tried with a different scenario:
val reportsHavingCalculatedCounter2: Seq[ReportDataHelper] = time("Counter2") {
val builder = new ArrayBuffer[ReportDataHelper](50000)
var c = 0
reportsHavingCalculatedCounter.foreach { v =>
v.foreach { v =>
v.values.foreach { v =>
c += 1
builder += v
}
}
}
println("Count:" + c)
builder.result
}
And it takes: Counter2 (15.075s).
I can't imagine that scala is slow. This is the slowest part v.values.foreach.