Scala - Use of .indexOf() and .indexWhere() - scala

I have a tuple like the following:
(Age, List(19,17,11,3,2))
and I would like to get the position of the first element where their position in the list is greater than their value. To do this I tried to use .indexOf() and .indexWhere() but I probably can't find exactly the right syntax and so I keep getting:
value indexWhere is not a member of org.apache.spark.rdd.RDD[(String,
Iterable[Int])]
My code so far is:
val test =("Age", List(19,17,11,3,2))
test.indexWhere(_.2(_)<=_.2(_).indexOf(_.2(_)) )
I also searched the documentation here with no result: http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List

If you want to perform this for each element in an RDD, you can use RDD's mapValues (which would only map the right-hand-side of the tuple) and pass a function that uses indexWhere:
rdd.mapValues(_.zipWithIndex.indexWhere { case (v, i) => i+1 > v} + 1)
Notes:
Your example seems wrong, if you want the last matching item it should be 5 (position of 2) and not 4
You did not define what should be done when no item matches your condition, e.g. for List(0,0,0) - in this case the result would be 0 but not sure that's what you need

Related

Scala combination function issue

I have a input file like this:
The Works of Shakespeare, by William Shakespeare
Language: English
and I want to use flatMap with the combinations method to get the K-V pairs per line.
This is what I do:
var pairs = input.flatMap{line =>
line.split("[\\s*$&#/\"'\\,.:;?!\\[\\(){}<>~\\-_]+")
.filter(_.matches("[A-Za-z]+"))
.combinations(2)
.toSeq
.map{ case array => array(0) -> array(1)}
}
I got 17 pairs after this, but missed 2 of them: (by,shakespeare) and (william,shakespeare). I think there might be something wrong with the last word of the first sentence, but I don't know how to solve it, can anyone tell me?
The combinations method will not give duplicates even if the values are in the opposite order. So the values you are missing already appear in the solution in the other order.
This code will create all ordered pairs of words in the text.
for {
line <- input
t <- line.split("""\W+""").tails if t.length > 1
a = t.head
b <- t.tail
} yield a -> b
Here is the description of the tails method:
Iterates over the tails of this traversable collection. The first value will be this traversable collection and the final one will be an empty traversable collection, with the intervening values the results of successive applications of tail.

Scala. Need for loop where the iterations return a growing list

I have a function that takes a value and returns a list of pairs, pairUp.
and a key set, flightPass.keys
I want to write a for loop that runs pairUp for each value of flightPass.keys, and returns a big list of all these returned values.
val result:List[(Int, Int)] = pairUp(flightPass.keys.toSeq(0)).toList
for (flight<- flightPass.keys.toSeq.drop(1))
{val result:List[(Int, Int)] = result ++ pairUp(flight).toList}
I've tried a few different variations on this, always getting the error:
<console>:23: error: forward reference extends over definition of value result
for (flight<- flightPass.keys.toSeq.drop(1)) {val result:List[(Int, Int)] = result ++ pairUp(flight).toList}
^
I feel like this should work in other languages, so what am I doing wrong here?
First off, you've defined result as a val, which means it is immutable and can't be modified.
So if you want to apply "pairUp for each value of flightPass.keys", why not map()?
val result = flightPass.keys.map(pairUp) //add .toList if needed
A Scala method which converts a List of values into a List of Lists and then reduces them to a single List is called flatMap which is short for map then flatten. You would use it like this:
flightPass.keys.toSeq.flatMap(k => pairUp(k))
This will take each 'key' from flightPass.keys and pass it to pairUp (the mapping part), then take the resulting Lists from each call to pairUp and 'flatten' them, resulting in a single joined list.

Check a condition within a foreach in scala

Is there a way to check a condition within a foreach loop in scala. The example is that I want to go through an array of integers and do some arbitrary math on the positive numbers.
val arr = Array(-1,2,3,0,-7,4) //The array
//What I want to do but doesn't work
arr.foreach{if(/*This condition is true*/)
{/*Do some math and print the answer*/}}
//Expected answer for division by six is 0.333333333 \n 0.5 \n 0.666666667
//Which is 2/6, 3/6 and 4/6 respectively as they appear in the array
I know how to do it with a normal for loop and if statement but I want to use this because I want to get away from java.
Thanks
foreach function brings every item in the list/array one by one, you should set it to a variable before to use it.
For example:
arr.foreach( variable_name => {
if(/*This condition is true*/){
/*Do some math and print the answer*/
}
})
The argument to foreach is a function, taking one argument, and returning a Unit. The argument is current element of the list, as it has been pointed out in other answers. You can just give it a name, and reference it as you would any other variable.
arr.foreach { x => if(x > 0) println(x/6.0) }
It is generally better and more idiomatic to split your logic into a chain of simpler "atomic" transformations rather than putting everything into one long function:
arr
.iterator
.filter(_ > 0)
.map(_ / 6.0)
.foreach(println)
The underscore _ above is shorthand for the function argument. You can use it in short functions when you only need to reference the argument once, and a few other conditions are satisfied. The last line doesn't need to pass the argument to println, because println itself is a function, being passed to foreach. I could write it as .foreach(println(_)) or .foreach(x => println(x)), it would do the same thing, but is technically a little different: this form creates an anonymous function like def foo(x: Double) { println(x) } and passes it to foreach as an argument, the way I wrote it originally, just passes println itself as an argument.
Also, note a call to .iterator in the beginning. Everything would work the same way if you take it out. The difference is that iterators are lazy. The way it is written, the code will take first argument from the array, send it through filter, if it returns false, it'll stop, and go back to the second element, if filter returns true, it'll send that element to map, then print it out, then go back, grab the next element etc.
Without .iterator call, it'd work differently: first, it would run the entire array through filter, and create a new array, containing only positive numbers, then, it'd run that new array through map, and create a new one, with the numbers divided by 6, then it'd go through this last array to print out the values. Using .iterator makes it more efficient by avoiding all the intermediate copies.
First, you'll want to use map() instead of foreach() because map() returns a result whereas foreach() does not and can only be used for side effects (which should be avoided when possible).
As has been pointed out, you can filter() before the map(), or you can combine them using collect().
arr.collect{case x if x > 0 => x/6.0}
// res0: Array[Double] = Array(0.3333333333333333, 0.5, 0.6666666666666666)
Use the filter function before using the foreach.
arr.filter(_ > 0).foreach { value => ... }
var list: ListSet[Int] = ListSet(-1, -5, -3, 8, 7, 9, 4, 6, 2, 1, 0)
list.filter(p => p > 5).foreach(f => {
print(f + " ")
})
Output : 8 7 9 6
Just do a filter and a map.
Don't forget that scala consider the array you want as Array[Int], so if you apply /6, you gonna have 0, ensure the cast by add .toDouble
val arr = Array(-1,2,3,0,-7,4)
val res = arr.filter(_>0).map(_.toDouble/6)
res.foreach(println)
Result:
0.3333333333333333
0.5
0.6666666666666666

Spark Group By Key to (Key,List) Pair

I am trying to group some data by key where the value would be a list:
Sample data:
A 1
A 2
B 1
B 2
Expected result:
(A,(1,2))
(B,(1,2))
I am able to do this with the following code:
data.groupByKey().mapValues(List(_))
The problem is that when I then try to do a Map operation like the following:
groupedData.map((k,v) => (k,v(0)))
It tells me I have the wrong number of parameters.
If I try:
groupedData.map(s => (s(0),s(1)))
It tells me that "(Any,List(Iterable(Any)) does not take parameters"
No clue what I am doing wrong. Is my grouping wrong? What would be a better way to do this?
Scala answers only please. Thanks!!
You're almost there. Just replace List(_) with _.toList
data.groupByKey.mapValues(_.toList)
When you write an anonymous inline function of the form
ARGS => OPERATION
the entire part before the arrow (=>) is taken as the argument list. So, in the case of
(k, v) => ...
the interpreter takes that to mean a function that takes two arguments. In your case, however, you have a single argument which happens to be a tuple (here, a Tuple2, or a Pair - more fully, you appear to have a list of Pair[Any,List[Any]]). There are a couple of ways to get around this. First, you can use the sugared form of representing a pair, wrapped in an extra set of parentheses to show that this is the single expected argument for the function:
((x, y)) => ...
or, you can write the anonymous function in the form of a partial function that matches on tuples:
groupedData.map( case (k,v) => (k,v(0)) )
Finally, you can simply go with a single specified argument, as per your last attempt, but - realising it is a tuple - reference the specific field(s) within the tuple that you need:
groupedData.map(s => (s._2(0),s._2(1))) // The key is s._1, and the value list is s._2

scala: accumulate a var from collection in a functional manner (that is, no vars)

this is a newbie question
I have the following code:
var total = 0L
docs.foreach(total += _.length)
in docs I have a collection of objects with the .length property
I'd like something like:
val total = docs.[someScalaMethod](0, (element, acum) => acum + element.length )
I mean, a method that iterates each element passing an accumulator variable...
The first zero I pass should be the initial value of the accumulator var..
How can it be achieved?
This called a fold. It's almost exactly what you stated:
docs.foldLeft(0)((accum, element) => accum + element.length)
for the version that traverses the collection from left to right (usually preferable; right to left is foldRight, and 2.9 has a fold that can start anywhere, but has limitations on how it can transform the type).
Once you get used to this, there is a short-hand version of fold left, where the accumulator goes on the left (think of it being pushed from left to right through the list), and you use placeholders for the variable names since you only use them once each: (0 /: docs)(_ + _.length)
docs map { _.length } reduce { _ + _ }
or (the thx goes to Luigi Plinge)
docs.map(_.length).sum
Here is a Scalaz version:
docs.foldMap(_.length)
This is equivalent to docs.map(_.length).sum but takes only one pass. Also, works with all monoids.