Scala: Safe access to index in a List[DataFrame] - scala

I receive a List[DataFrame] and I want to store each df in a variable. Some values always exist in the list:
val routes = dataframes(0)
val stops = dataframes(1)
But other ones may also come so the size list is variable.
How could I perform a safely access to a index of list that may be out of bounds? I thought that with Some() and handling the result it would works:
val fare_attributes : Option[DataFrame] = Some(dataframes(10))
fare_attributes match {
case Some(fare) => upload())
println("fare_attributes uploaded")
case None => println("No fare_attributes found")
}
But I receive: java.lang.IndexOutOfBoundsException: 2

You can use .lift on your list:
val fare_attributes : Option[DataFrame] = dataframes.lift(10)

I think you will have to rely on checking the length of the list before accessing the indexed value. You may want to implement some wrapper function to do so. So that it isn't done repeatedly.
You can be a bit "elegant" about it with currying with two parameter lists. So that your code is a bit concise. Here's a sample which you may improve.
def safeList(list: List[Int])(index: Int): Int = {
if (index < list.length) list(index)
else 0
}
val x = List(1, 2 ,3 )
val y = safeList(x)(_)
val a = y(0) // returns 1
val b = y(1) // returns 2
val c = y(4) // returns 0

Related

Scala Closures - how these closures work as parameters to high order functions

I had the following code snippet and somehow i got it working . However, I didnt understand how these lines work
val x = List.range(1, 10)
val evens1 = x.filter((i: Int) => i % 2 == 0) // I understand this , closure is passed as a function variable to filter function
val isEven = (i: Int) => i % 2 == 0
val evens = x.filter(i =>isEven(i)) // this is working, but i didnt understand this
val evens1 = x.filter(isEven(i))//why this is not working, how we can make this work
val evens2 = x.filter(isEven)//this works
So for all cases remember that filter is a method on List which accepts a function as an argument.
val evens = x.filter((i: Int) => i % 2 == 0)
This is creating a function using the lambda syntax.
PS: This can be simplified as:
val evens = x.filter(i => i % 2 == 0)
// No need to provide the type of i, it is inferred by the context.
You may even simplify it even more like this:
val evens = x.filter(_ % 2 == 0)
// No need to name the parameter if it is only used once.
But I personally don't recommend abusing the _ syntax, since it can be confusing.
val isEven = (i: Int) => i % 2 == 0
This creates a function using the lambda syntax and stores it in the isEven variable.
val evens = x.filter(i => isEven(i))
This is exactly the same as the first one; a lambda. Which is this case is redundant since the body of the lambda is just a function call.
That is why you can also do this:
val evens = x.filter(isEven)
Finally, this:
val evens1 = x.filter(isEven(i))
Won't work, since this is a compile error because there is no variable i defined anywhere; and even if there would be one then this would fail since filter expects an Int => Boolean and isEven(i) would return a plain Boolean
how we can make this work
Is not possible to make this to work since what you want is not valid syntax.
...
Well, you may do this:
val evens = x.filter(isEven(_))
// Which would be equivalent to:
val evens = x.filter(i => isEven(i))
Or maybe you may do something like this:
def createPredicateEqualsTo(to: Int): Int => Boolean =
i => i == to
val allTwos = x.filter(createPredicateEqualsTo(2))
But, this is way different to what you had since now I have a method that returns a function and I am calling it with a constant to create a predicate which will be passed down to filter

TapeEquilibrium ScalaCheck

I have been trying to code some scalacheck property to verify the Codility TapeEquilibrium problem. For those who do not know the problem, see the following link: https://app.codility.com/programmers/lessons/3-time_complexity/tape_equilibrium/.
I coded the following yet incomplete code.
test("Lesson 3 property"){
val left = Gen.choose(-1000, 1000).sample.get
val right = Gen.choose(-1000, 1000).sample.get
val expectedSum = Math.abs(left - right)
val leftArray = Gen.listOfN(???, left) retryUntil (_.sum == left)
val rightArray = Gen.listOfN(???, right) retryUntil (_.sum == right)
val property = forAll(leftArray, rightArray){ (r: List[Int], l: List[Int]) =>
val array = (r ++ l).toArray
Lesson3.solution3(array) == expectedSum
}
property.check()
}
The idea is as follows. I choose two random numbers (values left and right) and calculate its absolute difference. Then, my idea is to generate two arrays. Each array will be random numbers whose sum will be either "left" or "right". Then by concatenating these array, I should be able to verify this property.
My issue is then generating the leftArray and rightArray. This itself is a complex problem and I would have to code a solution for this. Therefore, writing this property seems over-complicated.
Is there any way to code this? Is coding this property an overkill?
Best.
My issue is then generating the leftArray and rightArray
One way to generate these arrays or (lists), is to provide a generator of nonEmptyList whose elements sum is equal to a given number, in other word, something defined by method like this:
import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
def listOfSumGen(expectedSum: Int): Gen[List[Int]] = ???
That verifies the property:
forAll(Gen.choose(-1000, 1000)){ sum: Int =>
forAll(listOfSumGen(sum)){ listOfSum: List[Int] =>
(listOfSum.sum == sum) && listOfSum.nonEmpty
}
}
To build such a list only poses a constraint on one element of the list, so basically here is a way to generate:
Generate list
The extra constrained element, will be given by the expectedSum - the sum of list
Insert the constrained element into a random index of the list (because obviously any permutation of the list would work)
So we get:
def listOfSumGen(expectedSum: Int): Gen[List[Int]] =
for {
list <- Gen.listOf(Gen.choose(-1000,1000))
constrainedElement = expectedSum - list.sum
index <- Gen.oneOf(0 to list.length)
} yield list.patch(index, List(constrainedElement), 0)
Now we the above generator, leftArray and rightArray could be define as follows:
val leftArray = listOfSumGen(left)
val rightArray = listOfSumGen(right)
However, I think that the overall approach of the property described is incorrect, as it builds an array where a specific partition of the array equals the expectedSum but this doesn't ensure that another partition of the array would produce a smaller sum.
Here is a counter-example run-through:
val left = Gen.choose(-1000, 1000).sample.get // --> 4
val right = Gen.choose(-1000, 1000).sample.get // --> 9
val expectedSum = Math.abs(left - right) // --> |4 - 9| = 5
val leftArray = listOfSumGen(left) // Let's assume one of its sample would provide List(3,1) (whose sum equals 4)
val rightArray = listOfSumGen(right)// Let's assume one of its sample would provide List(2,4,3) (whose sum equals 9)
val property = forAll(leftArray, rightArray){ (l: List[Int], r: List[Int]) =>
// l = List(3,1)
// r = List(2,4,3)
val array = (l ++ r).toArray // --> Array(3,1,2,4,3) which is the array from the given example in the exercise
Lesson3.solution3(array) == expectedSum
// According to the example Lesson3.solution3(array) equals 1 which is different from 5
}
Here is an example of a correct property that essentially applies the definition:
def tapeDifference(index: Int, array: Array[Int]): Int = {
val (left, right) = array.splitAt(index)
Math.abs(left.sum - right.sum)
}
forAll(Gen.nonEmptyListOf(Gen.choose(-1000,1000))) { list: List[Int] =>
val array = list.toArray
forAll(Gen.oneOf(array.indices)) { index =>
Lesson3.solution3(array) <= tapeDifference(index, array)
}
}
This property definition might collides with the way the actual solution has been implemented (which is one of the potential pitfall of scalacheck), however, that would be a slow / inefficient solution hence this would be more a way to check an optimized and fast implementation against slow and correct implementation (see this presentation)
Try this with c# :
using System;
using System.Collections.Generic;
using System.Linq;
private static int TapeEquilibrium(int[] A)
{
var sumA = A.Sum();
var size = A.Length;
var take = 0;
var res = new List<int>();
for (int i = 1; i < size; i++)
{
take = take + A[i-1];
var resp = Math.Abs((sumA - take) - take);
res.Add(resp);
if (resp == 0) return resp;
}
return res.Min();
}

Efficient way to fold list in scala, while avoiding allocations and vars

I have a bunch of items in a list, and I need to analyze the content to find out how many of them are "complete". I started out with partition, but then realized that I didn't need to two lists back, so I switched to a fold:
val counts = groupRows.foldLeft( (0,0) )( (pair, row) =>
if(row.time == 0) (pair._1+1,pair._2)
else (pair._1, pair._2+1)
)
but I have a lot of rows to go through for a lot of parallel users, and it is causing a lot of GC activity (assumption on my part...the GC could be from other things, but I suspect this since I understand it will allocate a new tuple on every item folded).
for the time being, I've rewritten this as
var complete = 0
var incomplete = 0
list.foreach(row => if(row.time != 0) complete += 1 else incomplete += 1)
which fixes the GC, but introduces vars.
I was wondering if there was a way of doing this without using vars while also not abusing the GC?
EDIT:
Hard call on the answers I've received. A var implementation seems to be considerably faster on large lists (like by 40%) than even a tail-recursive optimized version that is more functional but should be equivalent.
The first answer from dhg seems to be on-par with the performance of the tail-recursive one, implying that the size pass is super-efficient...in fact, when optimized it runs very slightly faster than the tail-recursive one on my hardware.
The cleanest two-pass solution is probably to just use the built-in count method:
val complete = groupRows.count(_.time == 0)
val counts = (complete, groupRows.size - complete)
But you can do it in one pass if you use partition on an iterator:
val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
val counts = (complete.size, incomplete.size)
This works because the new returned iterators are linked behind the scenes and calling next on one will cause it to move the original iterator forward until it finds a matching element, but it remembers the non-matching elements for the other iterator so that they don't need to be recomputed.
Example of the one-pass solution:
scala> val groupRows = List(Row(0), Row(1), Row(1), Row(0), Row(0)).view.map{x => println(x); x}
scala> val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
Row(0)
Row(1)
complete: Iterator[Row] = non-empty iterator
incomplete: Iterator[Row] = non-empty iterator
scala> val counts = (complete.size, incomplete.size)
Row(1)
Row(0)
Row(0)
counts: (Int, Int) = (3,2)
I see you've already accepted an answer, but you rightly mention that that solution will traverse the list twice. The way to do it efficiently is with recursion.
def counts(xs: List[...], complete: Int = 0, incomplete: Int = 0): (Int,Int) =
xs match {
case Nil => (complete, incomplete)
case row :: tail =>
if (row.time == 0) counts(tail, complete + 1, incomplete)
else counts(tail, complete, incomplete + 1)
}
This is effectively just a customized fold, except we use 2 accumulators which are just Ints (primitives) instead of tuples (reference types). It should also be just as efficient a while-loop with vars - in fact, the bytecode should be identical.
Maybe it's just me, but I prefer using the various specialized folds (.size, .exists, .sum, .product) if they are available. I find it clearer and less error-prone than the heavy-duty power of general folds.
val complete = groupRows.view.filter(_.time==0).size
(complete, groupRows.length - complete)
How about this one? No import tax.
import scala.collection.generic.CanBuildFrom
import scala.collection.Traversable
import scala.collection.mutable.Builder
case class Count(n: Int, total: Int) {
def not = total - n
}
object Count {
implicit def cbf[A]: CanBuildFrom[Traversable[A], Boolean, Count] = new CanBuildFrom[Traversable[A], Boolean, Count] {
def apply(): Builder[Boolean, Count] = new Counter
def apply(from: Traversable[A]): Builder[Boolean, Count] = apply()
}
}
class Counter extends Builder[Boolean, Count] {
var n = 0
var ttl = 0
override def +=(b: Boolean) = { if (b) n += 1; ttl += 1; this }
override def clear() { n = 0 ; ttl = 0 }
override def result = Count(n, ttl)
}
object Counting extends App {
val vs = List(4, 17, 12, 21, 9, 24, 11)
val res: Count = vs map (_ % 2 == 0)
Console println s"${vs} have ${res.n} evens out of ${res.total}; ${res.not} were odd."
val res2: Count = vs collect { case i if i % 2 == 0 => i > 10 }
Console println s"${vs} have ${res2.n} evens over 10 out of ${res2.total}; ${res2.not} were smaller."
}
OK, inspired by the answers above, but really wanting to only pass over the list once and avoid GC, I decided that, in the face of a lack of direct API support, I would add this to my central library code:
class RichList[T](private val theList: List[T]) {
def partitionCount(f: T => Boolean): (Int, Int) = {
var matched = 0
var unmatched = 0
theList.foreach(r => { if (f(r)) matched += 1 else unmatched += 1 })
(matched, unmatched)
}
}
object RichList {
implicit def apply[T](list: List[T]): RichList[T] = new RichList(list)
}
Then in my application code (if I've imported the implicit), I can write var-free expressions:
val (complete, incomplete) = groupRows.partitionCount(_.time != 0)
and get what I want: an optimized GC-friendly routine that prevents me from polluting the rest of the program with vars.
However, I then saw Luigi's benchmark, and updated it to:
Use a longer list so that multiple passes on the list were more obvious in the numbers
Use a boolean function in all cases, so that we are comparing things fairly
http://pastebin.com/2XmrnrrB
The var implementation is definitely considerably faster, even though Luigi's routine should be identical (as one would expect with optimized tail recursion). Surprisingly, dhg's dual-pass original is just as fast (slightly faster if compiler optimization is on) as the tail-recursive one. I do not understand why.
It is slightly tidier to use a mutable accumulator pattern, like so, especially if you can re-use your accumulator:
case class Accum(var complete = 0, var incomplete = 0) {
def inc(compl: Boolean): this.type = {
if (compl) complete += 1 else incomplete += 1
this
}
}
val counts = groupRows.foldLeft( Accum() ){ (a, row) => a.inc( row.time == 0 ) }
If you really want to, you can hide your vars as private; if not, you still are a lot more self-contained than the pattern with vars.
You could just calculate it using the difference like so:
def counts(groupRows: List[Row]) = {
val complete = groupRows.foldLeft(0){ (pair, row) =>
if(row.time == 0) pair + 1 else pair
}
(complete, groupRows.length - complete)
}

How to yield a single element from for loop in scala?

Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.

Tune Nested Loop in Scala

I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>