Scala most efficient operator to add to a list - scala

Building a scala list by modifying it incrementally. What is the most efficient "add" operator? In terms of CPU and resource consumption.
For example, from List(1,2,3) we want to create a list of tuples of consecutive numbers. Which gives the result: List((1,2), (2,3))
Method 1 - using :+ operator
def createConsecutiveNumPair1[T](inList: List[T]) : List[(T, T)] = {
var listResult = List[(T, T)]()
var prev = inList(0)
for (curr <- inList.tail)
{
listResult = listResult :+ (prev, curr)
prev = curr
}
listResult
}
Method 2 - using ::= operator
def createConsecutiveNumPair2[T](inList: List[T]) : List[(T, T)] = {
var listResult = List[(T, T)]()
var prev = inList(0)
for (curr <- inList.tail)
{
listResult ::= (prev, curr)
prev = curr
}
listResult
}
TEST
scala> val l1 = List(1,2,3)
l1: List[Int] = List(1, 2, 3)
scala> createConsecutiveNumPair1(l1)
res77: List[(Int, Int)] = List((1,2), (2,3))
scala> createConsecutiveNumPair2(l1)
res78: List[(Int, Int)] = List((2,3), (1,2))
QUESTION: which operator has lowest CPU, resource consumption? Would also appreciate if you can suggest a better scala way to rewrite the method above.

The problem with your first code is that it appends, which is O(n) on List. So the algorithm basically is O(n^2).
It very efficient to prepend on List, because it runs in constant time O(1), which you do in your second algorithm. You could use that and do a reverse at the end to make the result of the two methods equal, which would roughly make it it run in O(n).
However there is already a nice method in the library that does what you want. sliding is what you are looking for. The parameter for sliding defines the size of the tuples.
This would give you a List[List[Int]]:
List(1,2,3,4).sliding(2).toList //List(List(1,2), List(2,3), List(3,4))
If you insist on tuples, you can additionally use collect or map. Be aware that map will throw an exception when the list only has one element.
List(1,2,3,4).sliding(2).collect{
case List(a,b) => (a,b)
}.toList //List((1,2), (2,3), (3,4))

These methods often just call each other (though check the implementation to be sure). List is a singly-linked list, optimized for accessing the head rather than the tail, so adding elements to the front rather than the end is much more efficient. If you want to access / add elements at the end of the list, it's better to use Vector instead.
(As always, if you're asking the question at all you should have automated tooling to be able to tell you the answer. If you're not using a profiler that tells you which parts of your app are slow, it's not worth spending time on this kind of microoptimization - you're almost certainly optimizing the wrong part).

Related

What's the diff between reduceLeft and reduceRight in Scala?

What's the diff between reduceLeft and reduceRight in Scala?
val list = List(1, 0, 0, 1, 1, 1)
val sum1 = list reduceLeft {_ + _}
val sum2 = list reduceRight {_ + _}
println { sum2 == sum2 }
In my snippet sum1 = sum2 = 4, so the order does not matter here.
When do they produce the same result
As Lionel already pointed out, reduceLeft and reduceRight only produce the same result if the function you are using to combine the elements is associative (this isn't always true, see my note at the bottom). For instance when running reduceLeft and reduceRight on Seq(1,2,3) with the function (a: Int, b: Int) => a - b you get a different result.
scala> Seq(1,2,3)
res0: Seq[Int] = List(1, 2, 3)
scala> res0.reduceLeft(_ - _)
res5: Int = -4
scala> res0.reduceRight(_ - _)
res6: Int = 2
Why this happens can be made clear if we look at how each of the functions is applied over the list.
For reduceRight this is what the calls look like if we were to unwrap them.
(1 - (2 - 3))
(1 - (-1))
2
For reduceLeft the sequence is built up starting from the left,
((1 - 2) - 3)
((-1) - 3)
(-4)
Tail Recursion
Further because reduceLeft is implemented using Tail Recursion, it will not stack overflow when operating on very large collections (possibly even infinite). reduceRight is not tail recursive, so given a collection of large enough size, it will produce a stack overflow.
For instance, on my machine if I run the following I get an Out of Memory error,
scala> (0 to 100000000).reduceRight(_ - _)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Integer.java:832)
at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:65)
at scala.collection.immutable.Range.apply(Range.scala:61)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reversed(TraversableOnce.scala:99)
at scala.collection.AbstractIterator.reversed(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reduceRight(TraversableOnce.scala:197)
at scala.collection.AbstractIterator.reduceRight(Iterator.scala:1194)
at scala.collection.IterableLike$class.reduceRight(IterableLike.scala:85)
at scala.collection.AbstractIterable.reduceRight(Iterable.scala:54)
... 20 elided
But if I compute with reduceLeft I don't get the OOM,
scala> (0 to 100000000).reduceLeft(_ - _)
res16: Int = -987459712
You might get slightly different results on your system, depending your JVM default memory settings.
Prefer left versions
So, because of tail recursion, if you know that reduceLeft and reduceRight will produce the same value, you should prefer the reduceLeft variant. This generally holds true of the other left/right functions, such as foldRight and foldLeft (which are just more general versions of reduceRight and reduceLeft).
When do they really always produce the same result
A small note about reduceLeft and reduceRight and the Associative Property of the function you are using. I said that reduceRight and reduceLeft only produce the same results if the operator is associative. This isn't always true for all collection types. That is somewhat of another topic though, so consult the ScalaDoc, but in short the function you are reducing with needs to be both commutative and associative in order to get the same results for all collection types.
Reduce left doesn't always equal the same result as reduce right. Consider an asymmetric function on your array.
Assuming the same result, performance is one obvious difference
See performance-characteristics
The data structure is build with constant access time to head and tail. Iterating backwards will perform worse for large lists.
The best way to know the difference is read the source code in library/scala/collection/LinearSeqOptimized.scala:
def reduceLeft[B >: A](f: (B, A) => B): B =
......
tail.foldLeft[B](head)(f)
def reduceRight[B >: A](op: (A, B) => B): B =
......
op(head, tail.reduceRight(op))
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
The above is some key part of the code, and you can see reduceLeft is based on foldLeft , while reduceRight is implemented via recursion.
I guess reduceLeft has a better performance.

Scala efficient set inclusion detection

Let a collection of tuples where the first item is a set, for instance
val xs = Seq(
((1 to 5).toSet ++ Set(9), "apple"),
((15 to 17).toSet, "pear"),
((21 to 30).toSet, "grape"))
Given a value x:Int, how to efficiently identify the second item ? (The real use case includes thousands of sets.)
For val x = 22 the result would be Some("grape"), for val x = 19 the result would be None.
Note Values in each set are not necessarily consecutive.
Note Sets do not overlap (any sets intersection proves empty).
Depends on your use case, but given you're concerned with efficiency, I assume you're going to do a lot of lookups.
I also assume you use one xs, and lookup in that a lot of times.
Preprocess xs into a map of Int->String
val xsMap = (xs flatMap { case (s, v) => s.map((_,v))}).toMap[Int, String]
Then it's trivial (and O(1)) to look up elements
xsMap.get(22) //> res0: Option[String] = Some(grape)
xsMap.get(19) //> res1: Option[String] = None
What about:
s.find(_._1.contains(11)).map(_._2)

Scala - increasing prefix of a sequence

I was wondering what is the most elegant way of getting the increasing prefix of a given sequence. My idea is as follows, but it is not purely functional or any elegant:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
var currentElement = sequence.head - 1
val increasingPrefix = sequence.takeWhile(e =>
if (e > currentElement) {
currentElement = e
true
} else
false)
The result of the above is:
List(1,2,3)
You can take your solution, #Samlik, and effectively zip in the currentElement variable, but then map it out when you're done with it.
sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Also works with infinite sequences:
val sequence = Seq(1, 2, 3).toStream ++ Stream.from(1)
sequence is now an infinite Stream, but we can peek at the first 10 items:
scala> sequence.take(10).toList
res: List[Int] = List(1, 2, 3, 1, 2, 3, 4, 5, 6, 7)
Now, using the above snippet:
val prefix = sequence.take(1) ++ sequence.zip(sequence.drop(1)).
takeWhile({case (a, b) => a < b}).map({case (a, b) => b})
Again, prefix is a Stream, but not infinite.
scala> prefix.toList
res: List[Int] = List(1, 2, 3)
N.b.: This does not handle the cases when sequence is empty, or when the prefix is also infinite.
If by elegant you mean concise and self-explanatory, it's probably something like the following:
sequence.inits.dropWhile(xs => xs != xs.sorted).next
inits gives us an iterator that returns the prefixes longest-first. We drop all the ones that aren't sorted and take the next one.
If you don't want to do all that sorting, you can write something like this:
sequence.scanLeft(Some(Int.MinValue): Option[Int]) {
case (Some(last), i) if i > last => Some(i)
case _ => None
}.tail.flatten
If the performance of this operation is really important, though (it probably isn't), you'll want to use something more imperative, since this solution still traverses the entire collection (twice).
And, another way to skin the cat:
val sequence = Seq(1,2,3,1,2,3,4,5,6)
sequence.head :: sequence
.sliding(2)
.takeWhile{case List(a,b) => a <= b}
.map(_(1)).toList
// List[Int] = List(1, 2, 3)
I will interpret elegance as the solution that most closely resembles the way we humans think about the problem although an extremely efficient algorithm could also be a form of elegance.
val sequence = List(1,2,3,2,3,45,5)
val increasingPrefix = takeWhile(sequence, _ < _)
I believe this code snippet captures the way most of us probably think about the solution to this problem.
This of course requires defining takeWhile:
/**
* Takes elements from a sequence by applying a predicate over two elements at a time.
* #param xs The list to take elements from
* #param f The predicate that operates over two elements at a time
* #return This function is guaranteed to return a sequence with at least one element as
* the first element is assumed to satisfy the predicate as there is no previous
* element to provide the predicate with.
*/
def takeWhile[A](xs: Traversable[A], f: (Int, Int) => Boolean): Traversable[A] = {
// function that operates over tuples and returns true when the predicate does not hold
val not = f.tupled.andThen(!_)
// Maybe one day our languages will be better than this... (dependant types anyone?)
val twos = sequence.sliding(2).map{case List(one, two) => (one, two)}
val indexOfBreak = twos.indexWhere(not)
// Twos has one less element than xs, we need to compensate for that
// An intuition is the fact that this function should always return the first element of
// a non-empty list
xs.take(i + 1)
}

Scala lists and splitting

I have a homework assignment that's having us use a list and splitting it into two parts with the elements in the first part are no greater than p, and the elements in the second part are greater than p. so it's like a quick sort, except we can't use any sorting. I really need some tips on how to go about this. I know I'm using cases, but I'm not familiar with how the list class works in scala. below is what I have so far, but in not sure how to go about the splitting of the 2 lists.
using
def split(p:Int, xs:List[Int]): List[Int] = {
xs match {
case Nil => (Nil, Nil)
case head :: tail =>
}
First off, you want split to return a pair of lists, so the return type needs to be (List[Int], List[Int]). However, working with pairs and lists together can often mean decomposing return values frequently. You may want to have an auxiliary function do the heavy lifting for you.
For instance, your auxiliary function might be given two lists, initially empty, and build up the contents until the first list is empty. The result would then be the pair of lists.
The next thing you have to decide in your recursive function design is, "What is the key decision?" In your case, it is "value no greater than p". That leads to the following code:
def split(p:Int, xs: List[Int]): (List[Int], List[Int]) = {
def splitAux(r: List[Int], ngt: List[Int], gt: List[Int]): (List[Int], List[Int]) =
r match {
case Nil => (ngt, gt)
case head :: rest if head <= p =>
splitAux(rest, head :: ngt, gt)
case head :: rest if head > p =>
splitAux(rest, ngt, head :: gt)
}
val (ngt, gt) = splitAux(xs, List(), List())
(ngt.reverse, gt.reverse)
}
The reversing step isn't strictly necessary, but probably is least surprising. Similarly, the second guard predicate makes explicit the path being taken.
However, there is a much simpler way: use builtin functionality.
def split(p:Int, xs: List[Int]): (List[Int], List[Int]) = {
(xs.filter(_ <= p), xs.filter(_ > p))
}
filter extracts only those items meeting the criterion. This solution walks the list twice, but since you have the reverse step in the previous solution, you are doing that anyway.

Extract elements from one list that aren't in another

Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.
Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")
retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)
The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))
val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)