How can I speed up flatten? - scala

I have this method:
val reportsWithCalculatedUsage = time("Calculate USAGE") {
reportsHavingCalculatedCounter.flatten.flatten.toList.groupBy(_._2.product).mapValues(_.map(_._2)) mapValues { list =>
list.foldLeft(List[ReportDataHelper]()) {
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (
usage = if (head.machine == previous.machine) head.counter - previous.counter else head.usage)
current :: tail
} reverse
}
}
Where reportsHavingCalculatedCounter is of type: val reportsHavingCalculatedCounter:
scala.collection.immutable.Iterable[scala.collection.immutable.IndexedSeq[scala.collection.immutable.Map[Strin
g,com.agilexs.machinexs.logic.ReportDataHelper]]].
This code works perfectly. The problem is that this reportsHavingCalculatedCounter has maps inside it whom sum of ReportDataHelper objects (map values) is about 50 000 entries and the flatten.flatten takes about 15s to be processed.
I've also tried with 2 flat maps but that's almost the same (time consuming). Is there any way to improve this? (please ignore foldLeft or reverse; if I remove that the issue is still present, the most time consuming are those 2 flatten).
UPDATE: I've tried with a different scenario:
val reportsHavingCalculatedCounter2: Seq[ReportDataHelper] = time("Counter2") {
val builder = new ArrayBuffer[ReportDataHelper](50000)
var c = 0
reportsHavingCalculatedCounter.foreach { v =>
v.foreach { v =>
v.values.foreach { v =>
c += 1
builder += v
}
}
}
println("Count:" + c)
builder.result
}
And it takes: Counter2 (15.075s).
I can't imagine that scala is slow. This is the slowest part v.values.foreach.

Related

Scala count number of times function returns each value, functionally

I want to count up the number of times that a function f returns each value in it's range (0 to f_max, inclusive) when applied to a given list l, and return the result as an array, in Scala.
Currently, I accomplish as follows:
def count (l: List): Array[Int] = {
val arr = new Array[Int](f_max + 1)
l.foreach {
el => arr(f(el)) += 1
}
return arr
}
So arr(n) is the number of times that f returns n when applied to each element of l. This works however, it is imperative style, and I am wondering if there is a clean way to do this purely functionally.
Thank you
how about a more general approach:
def count[InType, ResultType](l: Seq[InType], f: InType => ResultType): Map[ResultType, Int] = {
l.view // create a view so we don't create new collections after each step
.map(f) // apply your function to every item in the original sequence
.groupBy(x => x) // group the returned values
.map(x => x._1 -> x._2.size) // count returned values
}
val f = (i:Int) => i
count(Seq(1,2,3,4,5,6,6,6,4,2), f)
l.foldLeft(Vector.fill(f_max + 1)(0)) { (acc, el) =>
val result = f(el)
acc.updated(result, acc(result) + 1)
}
Alternatively, a good balance of performance and external purity would be:
def count(l: List[???]): Vector[Int] = {
val arr = l.foldLeft(Array.fill(f_max + 1)(0)) { (acc, el) =>
val result = f(el)
acc(result) += 1
}
arr.toVector
}

Calculate average in scala

I'm getting a session from database which content result which content dimensions, now I'm trying to calculate average for dimensions:
sessionService.findById(sessionId).map {
case Some(session) =>
val result = session.result.getOrElse(Seq.empty)
for (dimension <- result.dimensions) {
var test += dimension.average
}
Ok(Json.toJson(session)).as("application/json")
case None => NotFound(Json.toJson("Not found"))
}
but I get this error :
UPDATE :
When trying
var test = 0
for (dimension <- result.dimensions) {
test += dimension.average
}
I get this error:
var test += dimension.average
is invalid syntax. You can't simultaneously declare and increase a variable... well it just doesn't make sense.
You probably meant something like
var test = 0
for (dimension <- result.dimensions) {
test += dimension.average
}
By the way, have you considered a different, more functional approach?
val test = result.dimensions.reduce(_ + _.average)
About the update, the problem is with getOrElse(Seq.empty)
You can try something like
sessionService.findById(sessionId).map {
case Some(Session(_, _, Some(result), _)) =>
result.dimensions.reduce { case (avg, d) => avg + d.average }
case None =>
NotFound(Json.toJson("Not found"))
}

Scala Stream/Iterator that Generates Excel Column Names?

I would like a Scala Stream/Iterator that generates Excel column names.
e.g. the first would be 'A' second would be 'B' and onwards to 'AA' and beyond.
I have a function (shown below) that does it from an index but it seems wasteful to generate from an index each time when all I'll ever be doing is generating them in order. In practice this isn't a problem so I am fine using this method but just thought I would ask to see if anyone has anything nicer.
val charArray = ('A' to 'Z').toArray
def indexToExcelColumnName(i:Int):String = {
if (i < 0) {
""
} else {
indexToExcelColumnName((i / 26) - 1) + charArray(i % 26)
}
}
Something like that?
class ExcelColumnIterator extends Iterator[String]{
private var currentColumnName = "A"
private def nextColumn(str: String):String = str.last match {
case 'Z' if str.length == 1 => "AA"
case 'Z' => nextColumn(str.init) + 'A'
case c => str.init + (c+1).toChar
}
override def hasNext = true
override def next() = {
val t = currentColumnName
currentColumnName = nextColumn(currentColumnName)
t
}
}
First I'd write something generating names of a fixed size.
val namesOfLength: Int => Iterator[String] = {
case 1 => ('A' to 'Z').iterator.map(_.toString)
case n => ('A' to 'Z').iterator.flatMap(a => namesOfLength(n-1).map(a + _))
}
or
def namesOfLength(n: Int) =
(1 until n).foldLeft[Iterable[String]](('A' to 'Z').view.map(_.toString)) {
case (it, _) => ('A' to 'Z').view.flatMap(a => it.map(a + _))
}
Then chain them together.
Iterator.iterate(1)(_ + 1).flatMap(namesOfLength).take(100).toStream.force
Here's a one-liner solution:
Stream.iterate(List(""))(_.flatMap(s => ('A' to 'Z').map(s + _)))
.flatten.tail
If you'd prefer to get an Iterator out, substitute Iterator.iterate for Stream.iterate and drop(1) for tail.
And here's an alternate solution you might find amusing:
Stream.from(0)
.map(n => Integer.toString(n, 36))
.map(_.toUpperCase)
.filterNot(_.exists(_.isDigit))
😜

workaround for prepending to a LinkedHashMap in Scala?

I have a LinkedHashMap which I've been using in a typical way: adding new key-value
pairs to the end, and accessing them in order of insertion. However, now I have a
special case where I need to add pairs to the "head" of the map. I think there's
some functionality inside the LinkedHashMap source for doing this, but it has private
accessibility.
I have a solution where I create a new map, add the pair, then add all the old mappings.
In Java syntax:
newMap.put(newKey, newValue)
newMap.putAll(this.map)
this.map = newMap
It works. But the problem here is that I then need to make my main data structure
(this.map) a var rather than a val.
Can anyone think of a nicer solution? Note that I definitely need the fast lookup
functionality provided by a Map collection. The performance of a prepending is not
such a big deal.
More generally, as a Scala developer how hard would you fight to avoid a var
in a case like this, assuming there's no foreseeable need for concurrency?
Would you create your own version of LinkedHashMap? Looks like a hassle frankly.
This will work but is not especially nice either:
import scala.collection.mutable.LinkedHashMap
def prepend[K,V](map: LinkedHashMap[K,V], kv: (K, V)) = {
val copy = map.toMap
map.clear
map += kv
map ++= copy
}
val map = LinkedHashMap('b -> 2)
prepend(map, 'a -> 1)
map == LinkedHashMap('a -> 1, 'b -> 2)
Have you taken a look at the code of LinkedHashMap? The class has a field firstEntry, and just by taking a quick peek at updateLinkedEntries, it should be relatively easy to create a subclass of LinkedHashMap which only adds a new method prepend and updateLinkedEntriesPrepend resulting in the behavior you need, e.g. (not tested):
private def updateLinkedEntriesPrepend(e: Entry) {
if (firstEntry == null) { firstEntry = e; lastEntry = e }
else {
val oldFirstEntry = firstEntry
firstEntry = e
firstEntry.later = oldFirstEntry
oldFirstEntry.earlier = e
}
}
Here is a sample implementation I threw together real quick (that is, not thoroughly tested!):
class MyLinkedHashMap[A, B] extends LinkedHashMap[A,B] {
def prepend(key: A, value: B): Option[B] = {
val e = findEntry(key)
if (e == null) {
val e = new Entry(key, value)
addEntry(e)
updateLinkedEntriesPrepend(e)
None
} else {
// The key already exists, so we might as well call LinkedHashMap#put
put(key, value)
}
}
private def updateLinkedEntriesPrepend(e: Entry) {
if (firstEntry == null) { firstEntry = e; lastEntry = e }
else {
val oldFirstEntry = firstEntry
firstEntry = e
firstEntry.later = oldFirstEntry
oldFirstEntry.earlier = firstEntry
}
}
}
Tested like this:
object Main {
def main(args:Array[String]) {
val x = new MyLinkedHashMap[String, Int]();
x.prepend("foo", 5)
x.prepend("bar", 6)
x.prepend("olol", 12)
x.foreach(x => println("x:" + x._1 + " y: " + x._2 ));
}
}
Which, on Scala 2.9.0 (yeah, need to update) results in
x:olol y: 12
x:bar y: 6
x:foo y: 5
A quick benchmark shows order of magnitude in performance difference between the extended built-in class and the "map rewrite" approach (I used the code from Debilski's answer in "ExternalMethod" and mine in "BuiltIn"):
benchmark length us linear runtime
ExternalMethod 10 1218.44 =
ExternalMethod 100 1250.28 =
ExternalMethod 1000 19453.59 =
ExternalMethod 10000 349297.25 ==============================
BuiltIn 10 3.10 =
BuiltIn 100 2.48 =
BuiltIn 1000 2.38 =
BuiltIn 10000 3.28 =
The benchmark code:
def timeExternalMethod(reps: Int) = {
var r = reps
while(r > 0) {
for(i <- 1 to 100) prepend(map, (i, i))
r -= 1
}
}
def timeBuiltIn(reps: Int) = {
var r = reps
while(r > 0) {
for(i <- 1 to 100) map.prepend(i, i)
r -= 1
}
}
Using a scala benchmarking template.

Tune Nested Loop in Scala

I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>