How do I group collection elements using specific grouping condition?

How do I group collection elements using specific grouping condition? - scala

By example:
For a list = (1,2,3,4,5,6,7,8,9)
I want to group elements of this list, based on the following condition:
If x is divisible by 4, sum it up with adjacent elements
Expected result = (1, 2, 12, 6, 23)
In java, I'd iterate over collection using for loop:
List<Integer> out = new ArrayList<Integer>();
for (int i = 0; i < in.size() - 2; i++)
{
if (in.get(i+1) % 4 == 0) {
out.add(in.get(i) + in.get(i+1) + in.get(i+2)));
i = i + 2;
}
else {
out.add(in.get(i))
}
}
Unfortunately, in Scala, I cannot do i = i + 2, because loop index is immutable. Is it like I have to use while loop for this purpose? Or maybe some clever functional way?

Just like in Java, you have to iterate over each element. However, instead of updating the result object each step, you return a new object each step based on the result of the previous step and the value.
The problem in your post can be solved with a fold. You give a result to start with and then apply each element to that result to get the final result.
val in = List(1,2,3,4,5,6,7,8,9)
in.sliding(3).foldRight(List.empty[Int]){ (row, result) =>
if ((row(0) % 4) == 0) {
result
} else if ((row(1) % 4) == 0) {
(row(0) + row(1) + row(2)) :: result
} else {
row(0) :: result
}
} toList
Sliding gives a "sliding view" over the List. For example: (1,2,3), (2,3,4), … I used foldRight here, because it is more efficient to add a new element to the beginning of the List instead of at the end.
You could also write a recursive function yourself to solve this problem, which applies the same principle.

Related

Find last index where an element should be inserted in order to maintain order

I am trying to implement the searchsorted algorithm(side=right) in scala
that is, find last index where an element should be inserted in order to maintain the sorting order.
For ex.
val arr = List(1,2,2,3,4,4,6,6). // assume list is sorted, also array can have negetive elements(albeit sorted)
if elem = 2, output index = 3
if elem = 3, output index = 4
if elem = 5, output index = 6
if elem = -3, output index = 0
I came up with this, iterating through each element in the list, and checking the smallest diff
val a = List(1,1,2,2,3,3,4,5,5)
val elem = 1
val (_, index) = a
.foldLeft(Int.MaxValue, 0) {
case ((smallestDiff, index), currNum) =>
val currDiff = (elem - currNum).abs
if (currDiff > smallestDiff) (smallestDiff, index)
else (currDiff, index + 1)
}
This one works fine for 1st 2 examples, but completely breaks for last 2 ex

indexWhere() can do most of the heavy lifting.
def insertHere(ns:Seq[Int], elem:Int):Int = {
val idx = ns.indexWhere(_ > elem)
if (idx < 0) ns.length
else idx
}
insertHere( List(1,2,2,3,4,4,6,6), 2) // 3
insertHere( Seq(1,2,2,3,4,4,6,6), 3) // 4
insertHere( Array(1,2,2,3,4,4,6,6), 5) // 6
insertHere(Vector(1,2,2,3,4,4,6,6),-3) // 0
insertHere( List(1,2,2,3,4,4,6,6),12) // 8
This returns the non-existing index when elem should be appended at the end.

List whose elements depend on the previous ones

Suppose I have a list of increasing integers. If the difference of 2 consecutive numbers is less than a threshold, then we index them by the same number, starting from 0. Otherwise, we increase the index by 1.
For example: for the list (1,2,5,7,8,11,15,16,20) and the threshold = 3, the output will be: (0, 0, 1, 1, 1, 2, 3, 3, 4).
Here is my code:
val listToGroup = List(1,2,5,7,8,11,15,16,20)
val diff_list = listToGroup.sliding(2,1).map{case List(i, j) => j-i}.toList
val thres = 2
var j=0
val output_ = for(i <- diff_list.indices) yield {
if (diff_list(i) > thres ) {
j += 1
}
j
}
val output = List.concat(List(0), output_)
I'm new to Scala and I feel the list is not used efficiently. How can this code be improved?

You can avoid the mutable variable by using scanLeft to get a more idiomatic code:
val output = diff_list.scanLeft(0) { (count, i) =>
if (i > thres) count + 1
else count
}
Your code shows some constructs which are usually avoided in Scala, but common when coming from procedural langugues, like: for(i <- diff_list.indices) ... diff_list(i) can be replaced with for(i <- diff_list).
Other than that, I think your code is efficient - you need to traverse the list anyway and you do it in O(N). I would not worry about efficiency here, more about style and readability.
My rewrite to how I think it would be more natural in Scala for the whole code would be:
val listToGroup = List(1,2,5,7,8,11,15,16,20)
val thres = 2
val output = listToGroup.zip(listToGroup.drop(1)).scanLeft(0) { case (count, (i, j)) =>
if (j - i > thres) count + 1
else count
}
My adjustments to your code:
I use scanLeft to perform the result collection construction
I prefer x.zip(x.drop(1)) over x.sliding(2, 1) (constructing tuples seems a bit more efficient than constructing collections). You could also use x.zip(x.tail), but that does not handle empty x
I avoid the temporary result diff_list

val listToGroup = List(1, 2, 5, 7, 8, 11, 15, 16, 20)
val thres = 2
listToGroup
.sliding(2)
.scanLeft(0)((a, b) => { if (b.tail.head - b.head > thres) a + 1 else a })
.toList
.tail
You don't need to use mutable variable, you can achieve the same with scanLeft.

How to Map Partial Elements in Scala/Spark

I have a list of integers:
val mylist = List(1, 2, 3, 4)
What I want to do is to map the element which are even numbers in mylist, and multiply them by 2.
Maybe the code should be:
mylist.map{ case x%2==2 => x*2 }
I expect the result to be List(4, 8) but it's not. What is the correct code?
I know I could realize this function by using filter + map
a.filter(_%2 == 0).map(_*2)
but is there some way to realize this function by only using map()?

map does not reduce number of elements in transformation.
filter + map is right approach.
But if single method is needed, use collect:
mylist.collect{ case x if x % 2 == 0 => 2 * x }
Edit:
withFilter + map is more efficient than filter + map (as withFilter does not create intermediate collection, i.e. it works lazily):
mylist.withFilter(_ % 2 == 0).map(_ * 2)
which is same as for :
for { e <- mylist if (e % 2 == 0) } yield 2 * e

Scala List.filter with two conditions, applied only once

Don't know if this is possible, but I have some code like this:
val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
val evens = list.filter { e => e % 2 == 0 }
if(someCondition) {
val result = evens.filter { e => e % 3 == 0 }
} else {
val result = evens.filter { e => e % 5 == 0 }
}
But I don't want to iterate over all elements twice, so is there a way that I can create a "generic pick-all-the-evens numbers on this collection" and apply some other function, so that it would only iterate once?

If you turn list into a lazy collection, such as an Iterator, then you can apply all the filter operations (or other things like map etc) in one pass:
val list = (1 to 12).toList
val doubleFiltered: List[Int] =
list.iterator
.filter(_ % 2 == 0)
.filter(_ % 3 == 0)
.toList
println(doubleFiltered)
When you convert the collection to an Iterator with .iterator, Scala will keep track of the operations to be performed (here, two filters), but will wait to perform them until the result is actually accessed (here, via the call to .toList).
So I might rewrite your code like this:
val list = (1 to 12).toList
val evens = list.iterator.filter(_ % 2 == 0)
val result =
if(someCondition)
evens.filter(_ % 3 == 0)
else
evens.filter(_ % 5 == 0)
result foreach println
Depending on exactly what you want to do, you might want an Iterator, a Stream, or a View. They are all lazily computed (so the one-pass aspect will apply), but they differ on things like whether they can be iterated over multiple times (Stream and View) or whether they keep the computed value around for later access (Stream).
To really see these different lazy behaviors, try running this bit of code and set <OPERATION> to either toList, iterator, view, or toStream:
val result =
(1 to 12).<OPERATION>
.filter { e => println("filter 1: " + e); e % 2 == 0 }
.filter { e => println("filter 2: " + e); e % 3 == 0 }
result foreach println
result foreach println
Here's the behavior you will see:
List (or any other non-lazy collection): Each filter is requires a separate iteration through the collection. The resulting filtered collection is stored in memory so that each foreach can just display it.
Iterator: Both filters and the first foreach are done in a single iteration. The second foreach does nothing since the Iterator has been consumed. Results are not stored in memory.
View: Both foreach calls result in their own single-pass iteration over the collection to perform the filters. Results are not stored in memory.
Stream: Both filters and the first foreach are done in a single iteration. The resulting filtered collection is stored in memory so that each foreach can just display it.

You could use function composition. someCondition here is only called once, when deciding which function to compose with:
def modN(n: Int)(xs: List[Int]) = xs filter (_ % n == 0)
val f = modN(2) _ andThen (if (someCondition) modN(3) else modN(5))
val result = f(list)
(This doesn't do what you want - it still traverses the list twice)
Just do this:
val f: Int => Boolean = if (someCondition) { _ % 3 == 0 } else { _ % 5 == 0 }
val result = list filter (x => x % 2 == 0 && f(x))
or maybe better:
val n = if (someCondition) 3 else 5
val result = list filter (x => x % 2 == 0 && x % n == 0)

Wouldn't this work:
list.filter{e => e % 2 == 0 && (if (someCondition) e % 3 == 0 else e % 5 == 0)}
also FYI e % 2 == 0 is going to give you all the even numbers, unless you're naming the val odds for another reason.

You just write two conditions in the filter:
val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
var result = List(0)
val someCondition = true
result = if (someCondition) list.filter { e => e % 2 == 0 && e % 3 == 0 }
else list.filter { e => e % 2 == 0 && e % 5 == 0 }

How do I break out of a loop in Scala?

How do I break out a loop?
var largest=0
for(i<-999 to 1 by -1) {
for (j<-i to 1 by -1) {
val product=i*j
if (largest>product)
// I want to break out here
else
if(product.toString.equals(product.toString.reverse))
largest=largest max product
}
}
How do I turn nested for loops into tail recursion?
From Scala Talk at FOSDEM 2009 http://www.slideshare.net/Odersky/fosdem-2009-1013261
on the 22nd page:
Break and continue
Scala does not have them. Why?
They are a bit imperative; better use many smaller functions
Issue how to interact with closures.
They are not needed!
What is the explanation?

You have three (or so) options to break out of loops.
Suppose you want to sum numbers until the total is greater than 1000. You try
var sum = 0
for (i <- 0 to 1000) sum += i
except you want to stop when (sum > 1000).
What to do? There are several options.
(1a) Use some construct that includes a conditional that you test.
var sum = 0
(0 to 1000).iterator.takeWhile(_ => sum < 1000).foreach(i => sum+=i)
(warning--this depends on details of how the takeWhile test and the foreach are interleaved during evaluation, and probably shouldn't be used in practice!).
(1b) Use tail recursion instead of a for loop, taking advantage of how easy it is to write a new method in Scala:
var sum = 0
def addTo(i: Int, max: Int) {
sum += i; if (sum < max) addTo(i+1,max)
}
addTo(0,1000)
(1c) Fall back to using a while loop
var sum = 0
var i = 0
while (i <= 1000 && sum <= 1000) { sum += 1; i += 1 }
(2) Throw an exception.
object AllDone extends Exception { }
var sum = 0
try {
for (i <- 0 to 1000) { sum += i; if (sum>=1000) throw AllDone }
} catch {
case AllDone =>
}
(2a) In Scala 2.8+ this is already pre-packaged in scala.util.control.Breaks using syntax that looks a lot like your familiar old break from C/Java:
import scala.util.control.Breaks._
var sum = 0
breakable { for (i <- 0 to 1000) {
sum += i
if (sum >= 1000) break
} }
(3) Put the code into a method and use return.
var sum = 0
def findSum { for (i <- 0 to 1000) { sum += i; if (sum>=1000) return } }
findSum
This is intentionally made not-too-easy for at least three reasons I can think of. First, in large code blocks, it's easy to overlook "continue" and "break" statements, or to think you're breaking out of more or less than you really are, or to need to break two loops which you can't do easily anyway--so the standard usage, while handy, has its problems, and thus you should try to structure your code a different way. Second, Scala has all sorts of nestings that you probably don't even notice, so if you could break out of things, you'd probably be surprised by where the code flow ended up (especially with closures). Third, most of Scala's "loops" aren't actually normal loops--they're method calls that have their own loop, or they are recursion which may or may not actually be a loop--and although they act looplike, it's hard to come up with a consistent way to know what "break" and the like should do. So, to be consistent, the wiser thing to do is not to have a "break" at all.
Note: There are functional equivalents of all of these where you return the value of sum rather than mutate it in place. These are more idiomatic Scala. However, the logic remains the same. (return becomes return x, etc.).

This has changed in Scala 2.8 which has a mechanism for using breaks. You can now do the following:
import scala.util.control.Breaks._
var largest = 0
// pass a function to the breakable method
breakable {
for (i<-999 to 1 by -1; j <- i to 1 by -1) {
val product = i * j
if (largest > product) {
break // BREAK!!
}
else if (product.toString.equals(product.toString.reverse)) {
largest = largest max product
}
}
}

It is never a good idea to break out of a for-loop. If you are using a for-loop it means that you know how many times you want to iterate. Use a while-loop with 2 conditions.
for example
var done = false
while (i <= length && !done) {
if (sum > 1000) {
done = true
}
}

To add Rex Kerr answer another way:
(1c) You can also use a guard in your loop:
var sum = 0
for (i <- 0 to 1000 ; if sum<1000) sum += i

Simply We can do in scala is
scala> import util.control.Breaks._
scala> object TestBreak {
def main(args : Array[String]) {
breakable {
for (i <- 1 to 10) {
println(i)
if (i == 5)
break;
} } } }
output :
scala> TestBreak.main(Array())
1
2
3
4
5

Since there is no break in Scala yet, you could try to solve this problem with using a return-statement. Therefore you need to put your inner loop into a function, otherwise the return would skip the whole loop.
Scala 2.8 however includes a way to break
http://www.scala-lang.org/api/rc/scala/util/control/Breaks.html

An approach that generates the values over a range as we iterate, up to a breaking condition, instead of generating first a whole range and then iterating over it, using Iterator, (inspired in #RexKerr use of Stream)
var sum = 0
for ( i <- Iterator.from(1).takeWhile( _ => sum < 1000) ) sum += i

// import following package
import scala.util.control._
// create a Breaks object as follows
val loop = new Breaks;
// Keep the loop inside breakable as follows
loop.breakable{
// Loop will go here
for(...){
....
// Break will go here
loop.break;
}
}
use Break module
http://www.tutorialspoint.com/scala/scala_break_statement.htm

Just use a while loop:
var (i, sum) = (0, 0)
while (sum < 1000) {
sum += i
i += 1
}

Here is a tail recursive version. Compared to the for-comprehensions it is a bit cryptic, admittedly, but I'd say its functional :)
def run(start:Int) = {
#tailrec
def tr(i:Int, largest:Int):Int = tr1(i, i, largest) match {
case x if i > 1 => tr(i-1, x)
case _ => largest
}
#tailrec
def tr1(i:Int,j:Int, largest:Int):Int = i*j match {
case x if x < largest || j < 2 => largest
case x if x.toString.equals(x.toString.reverse) => tr1(i, j-1, x)
case _ => tr1(i, j-1, largest)
}
tr(start, 0)
}
As you can see, the tr function is the counterpart of the outer for-comprehensions, and tr1 of the inner one. You're welcome if you know a way to optimize my version.

Close to your solution would be this:
var largest = 0
for (i <- 999 to 1 by -1;
j <- i to 1 by -1;
product = i * j;
if (largest <= product && product.toString.reverse.equals (product.toString.reverse.reverse)))
largest = product
println (largest)
The j-iteration is made without a new scope, and the product-generation as well as the condition are done in the for-statement (not a good expression - I don't find a better one). The condition is reversed which is pretty fast for that problem size - maybe you gain something with a break for larger loops.
String.reverse implicitly converts to RichString, which is why I do 2 extra reverses. :) A more mathematical approach might be more elegant.

I am new to Scala, but how about this to avoid throwing exceptions and repeating methods:
object awhile {
def apply(condition: () => Boolean, action: () => breakwhen): Unit = {
while (condition()) {
action() match {
case breakwhen(true) => return ;
case _ => { };
}
}
}
case class breakwhen(break:Boolean);
use it like this:
var i = 0
awhile(() => i < 20, () => {
i = i + 1
breakwhen(i == 5)
});
println(i)
if you don’t want to break:
awhile(() => i < 20, () => {
i = i + 1
breakwhen(false)
});

The third-party breakable package is one possible alternative
https://github.com/erikerlandson/breakable
Example code:
scala> import com.manyangled.breakable._
import com.manyangled.breakable._
scala> val bkb2 = for {
| (x, xLab) <- Stream.from(0).breakable // create breakable sequence with a method
| (y, yLab) <- breakable(Stream.from(0)) // create with a function
| if (x % 2 == 1) continue(xLab) // continue to next in outer "x" loop
| if (y % 2 == 0) continue(yLab) // continue to next in inner "y" loop
| if (x > 10) break(xLab) // break the outer "x" loop
| if (y > x) break(yLab) // break the inner "y" loop
| } yield (x, y)
bkb2: com.manyangled.breakable.Breakable[(Int, Int)] = com.manyangled.breakable.Breakable#34dc53d2
scala> bkb2.toVector
res0: Vector[(Int, Int)] = Vector((2,1), (4,1), (4,3), (6,1), (6,3), (6,5), (8,1), (8,3), (8,5), (8,7), (10,1), (10,3), (10,5), (10,7), (10,9))

import scala.util.control._
object demo_brk_963
{
def main(args: Array[String])
{
var a = 0;
var b = 0;
val numList1 = List(1,2,3,4,5,6,7,8,9,10);
val numList2 = List(11,12,13);
val outer = new Breaks; //object for break
val inner = new Breaks; //object for break
outer.breakable // Outer Block
{
for( a <- numList1)
{
println( "Value of a: " + a);
inner.breakable // Inner Block
{
for( b <- numList2)
{
println( "Value of b: " + b);
if( b == 12 )
{
println( "break-INNER;");
inner.break;
}
}
} // inner breakable
if( a == 6 )
{
println( "break-OUTER;");
outer.break;
}
}
} // outer breakable.
}
}
Basic method to break the loop, using Breaks class.
By declaring the loop as breakable.

Ironically the Scala break in scala.util.control.Breaks is an exception:
def break(): Nothing = { throw breakException }
The best advice is: DO NOT use break, continue and goto! IMO they are the same, bad practice and an evil source of all kind of problems (and hot discussions) and finally "considered be harmful". Code block structured, also in this example breaks are superfluous.
Our Edsger W. Dijkstra† wrote:
The quality of programmers is a decreasing function of the density of go to statements in the programs they produce.

I got a situation like the code below
for(id<-0 to 99) {
try {
var symbol = ctx.read("$.stocks[" + id + "].symbol").toString
var name = ctx.read("$.stocks[" + id + "].name").toString
stocklist(symbol) = name
}catch {
case ex: com.jayway.jsonpath.PathNotFoundException=>{break}
}
}
I am using a java lib and the mechanism is that ctx.read throw a Exception when it can find nothing.
I was trapped in the situation that :I have to break the loop when a Exception was thrown, but scala.util.control.Breaks.break using Exception to break the loop ,and it was in the catch block thus it was caught.
I got ugly way to solve this: do the loop for the first time and get the count of the real length.
and use it for the second loop.
take out break from Scala is not that good,when you are using some java libs.

Clever use of find method for collection will do the trick for you.
var largest = 0
lazy val ij =
for (i <- 999 to 1 by -1; j <- i to 1 by -1) yield (i, j)
val largest_ij = ij.find { case(i,j) =>
val product = i * j
if (product.toString == product.toString.reverse)
largest = largest max product
largest > product
}
println(largest_ij.get)
println(largest)

Below is code to break a loop in a simple way
import scala.util.control.Breaks.break
object RecurringCharacter {
def main(args: Array[String]) {
val str = "nileshshinde";
for (i <- 0 to str.length() - 1) {
for (j <- i + 1 to str.length() - 1) {
if (str(i) == str(j)) {
println("First Repeted Character " + str(i))
break() //break method will exit the loop with an Exception "Exception in thread "main" scala.util.control.BreakControl"
}
}
}
}
}

I don't know how much Scala style has changed in the past 9 years, but I found it interesting that most of the existing answers use vars, or hard to read recursion. The key to exiting early is to use a lazy collection to generate your possible candidates, then check for the condition separately. To generate the products:
val products = for {
i <- (999 to 1 by -1).view
j <- (i to 1 by -1).view
} yield (i*j)
Then to find the first palindrome from that view without generating every combination:
val palindromes = products filter {p => p.toString == p.toString.reverse}
palindromes.head
To find the largest palindrome (although the laziness doesn't buy you much because you have to check the entire list anyway):
palindromes.max
Your original code is actually checking for the first palindrome that is larger than a subsequent product, which is the same as checking for the first palindrome except in a weird boundary condition which I don't think you intended. The products are not strictly monotonically decreasing. For example, 998*998 is greater than 999*997, but appears much later in the loops.
Anyway, the advantage of the separated lazy generation and condition check is you write it pretty much like it is using the entire list, but it only generates as much as you need. You sort of get the best of both worlds.