Combine multiple sequential entries in Scala/Spark - scala

I have an array of numbers separated by comma as shown:
a:{108,109,110,112,114,115,116,118}
I need the output something like this:
a:{108-110, 112, 114-116, 118}
I am trying to group the continuous numbers with "-" in between.
For example, 108,109,110 are continuous numbers, so I get 108-110. 112 is separate entry; 114,115,116 again represents a sequence, so I get 114-116. 118 is separate and treated as such.
I am doing this in Spark. I wrote the following code:
import scala.collection.mutable.ArrayBuffer
def Sample(x:String):ArrayBuffer[String]={
val x1 = x.split(",")
var a:Int = 0
var present=""
var next:Int = 0
var yrTemp = ""
var yrAr= ArrayBuffer[String]()
var che:Int = 0
var storeV = ""
var p:Int = 0
var q:Int = 0
var count:Int = 1
while(a < x1.length)
{
yrTemp = x1(a)
if(x1.length == 1)
{
yrAr+=x1(a)
}
else
if(a < x1.length - 1)
{
present = x1(a)
if(che == 0)
{
storeV = present
}
p = x1(a).toInt
q = x1(a+1).toInt
if(p == q)
{
yrTemp = yrTemp
che = 1
}
else
if(p != q)
{
yrTemp = storeV + "-" + present
che = 0
yrAr+=yrTemp
}
}
else
if(a == x1.length-1)
{
present = x1(a)
yrTemp = present
che = 0
yrAr+=yrTemp
}
a = a+1
}
yrAr
}
val SampleUDF = udf(Sample(_:String))
I am getting the output as follows:
a:{108-108, 109-109, 110-110, 112, 114-114, 115-115, 116-116, 118}
I am not able to figure out where I am going wrong. Can you please help me in correcting this. TIA.

Here's another way:
def rangeToString(a: Int, b: Int) = if (a == b) s"$a" else s"$a-$b"
def reduce(xs: Seq[Int], min: Int, max: Int, ranges: Seq[String]): Seq[String] = xs match {
case y +: ys if (y - max <= 1) => reduce(ys, min, y, ranges)
case y +: ys => reduce(ys, y, y, ranges :+ rangeToString(min, max))
case Seq() => ranges :+ rangeToString(min, max)
}
def output(xs: Array[Int]) = reduce(xs, xs.head, xs.head, Vector())//.toArray
Which you can test:
println(output(Array(108,109,110,112,114,115,116,118)))
// Vector(108-110, 112, 114-116, 118)
Basically this is a tail recursive function - i.e. you take your "variables" as the input, then it calls itself with updated "variables" on each loop. So here xs is your array, min and max are integers used to keep track of the lowest and highest numbers so far, and ranges is the output sequence of Strings that gets added to when required.
The first pattern (y being the first element, and ys being the rest of the sequence - because that's how the +: extractor works) is matched if there's at least one element (ys can be an empty list) and it follows on from the previous maximum.
The second is if it doesn't follow on, and needs to reset the minimum and add the completed range to the output.
The third case is where we've got to the end of the input and just output the result, rather than calling the loop again.
Internet karma points to anyone who can work out how to eliminate the duplication of ranges :+ rangeToString(min, max)!

here is a solution :
def combineConsecutive(s: String): Seq[String] = {
val ints: List[Int] = s.split(',').map(_.toInt).toList.reverse
ints
.drop(1)
.foldLeft(List(List(ints.head)))((acc, e) => if ((acc.head.head - e) <= 1)
(e :: acc.head) :: acc.tail
else
List(e) :: acc)
.map(group => if (group.size > 1) group.min + "-" + group.max else group.head.toString)
}
val in = "108,109,110,112,114,115,116,118"
val result = combineConsecutive(in)
println(result) // List(108-110, 112, 114-116, 118)
}
This solution partly uses code from this question: Grouping list items by comparing them with their neighbors

Related

Facing Issues in Recursion of Perfect Number Problem

I've been working on the scala recursion problem. I used to develop the program using loops and then use the concept of recursion to convert the existing loop problem in a recursive solution.
So I have written the following code to find the perfect number using loops.
def isPerfect(n: Int): Boolean = {
var sum = 1
// Find all divisors and add them
var i = 2
while ( {
i * i <= n
}) {
if (n % i == 0) if (i * i != n) sum = sum + i + n / i
else sum = sum + i
i += 1
}
// If sum of divisors is equal to
// n, then n is a perfect number
if (sum == n && n != 1) return true
false
}
Here is my attempt to convert it into a recursive solution. But I'm getting the incorrect result.
def isPerfect(n: Int): Boolean = {
var sum = 1
// Find all divisors and add them
var i = 2
def loop(i:Int, n:Int): Any ={
if(n%i == 0) if (i * i != n) return sum + i + n / i
else
return loop(i+1, sum+i)
}
val sum_ = loop(2, n)
// If sum of divisors is equal to
// n, then n is a perfect number
if (sum_ == n && n != 1) return true
false
}
Thank you in advance.
Here is a tail-recursive solution
def isPerfectNumber(n: Int): Boolean = {
#tailrec def loop(d: Int, acc: List[Int]): List[Int] = {
if (d == 1) 1 :: acc
else if (n % d == 0) loop(d - 1, d :: acc)
else loop(d - 1, acc)
}
loop(n-1, Nil).sum == n
}
As a side-note, functions that have side-effects such as state mutation scoped locally are still considered pure functions as long as the mutation is not visible externally, hence having while loops in such functions might be acceptable.

Longest alphabetical Order sub String from a String in Scala

Need to write the below logic in Scala code
I have a string let say 'abcsfdhdefghihqwtpqr'
need to print the longest string from the above that is in alphabetical order
like from above string the sub strings in alphabetical order is
abc,defghi,pqr and the logest is defghi so the result will be defghi
So how to write this above logic in scala ?
below is the code I have written
def main(args: Array[String]): Unit = {
val setofletters: String = "aaakkcccccczz"
/* 15 */
val output: Int = runLongestIndex(setofletters)
println("Longest run that first appeared in index:" + output)
}
def runLongestIndex(setofletters: String): Int = {
var ctr: Int = 1
var output: Int = 0
var j: Int = 0
for (i <- 0 until setofletters.length - 1) {
j = i
while (i < setofletters.length - 1 &&
setofletters.charAt(i) == setofletters.charAt(i + 1)) {
{ i += 1; i - 1 }
{ ctr += 1; ctr - 1 }
}
if (ctr > output) {
output = j
}
ctr = 1
}
output
}
}
but getting error += is not a member of int
Can help me to change the code and to resolve this error
Your code uses many mutable variables and doesn't look very Scala-like at all.
Here's a different approach.
val str = "abcsfdhdefghihqwtpqr"
List.unfold(str) { s =>
if (s.lengthIs > 1) {
val pairs = s.sliding(2)
val (a, b) = s.splitAt(pairs.indexWhere(p => p(0) > p(1))+1)
if (a.isEmpty) Option(b, "")
else Option(a, b)
}
else if (s.nonEmpty) Option(s, "")
else None
}.maxBy(_.length) //res0: String = defghi
Note: unfold() is newly available with Scala 2.13.
I don't have 2.11 installed, but this should work.
val str = "abcsfdhdefghihqwtpqr"
assert(str.nonEmpty)
(str.head+str).sliding(2).foldRight(""::Nil){
case (p, hd::tl) =>
if (p(0) > p(1)) "" :: p(1) + hd :: tl
else p(1) + hd :: tl
case _ => Nil //just to suppress the warning
}.maxBy(_.length) //res1: String = defghi
Explanation
sliding(2) - Pair up each letter with its neighbor: "ab","bc","cs", etc.
foldRight - Process each pair from right to left (end to start).
""::Nil - The accumulator will be a List[String] starting with a single element of an empty string. (Could also have been written as List("").)
case (p, hd::tl) - Put the current pair of letters to be processed into the variable p. Split the accumulator into its head and tail parts.
p(1) + hd :: tl - The 2nd letter of the pair is always added (pre-pended) to the current head of the accumulator. If the two letters are not in alphabetical order then a new, empty, head element is also added to the accumulator.
str.head+str - Because only the 2nd letter of each pair is being added to the accumulator, we have to make an adjustment so that the 1st letter of the original string is also included.
maxBy(_.length) - Pretty easy to understand. Comment this out to see the result of the foldRight operation.

Scala: transform a collection, yielding 0..many elements on each iteration

Given a collection in Scala, I'd like to traverse this collection and for each object I'd like to emit (yield) from 0 to multiple elements that should be joined together into a new collection.
For example, I expect something like this:
val input = Range(0, 15)
val output = input.somefancymapfunction((x) => {
if (x % 3 == 0)
yield(s"${x}/3")
if (x % 5 == 0)
yield(s"${x}/5")
})
to build an output collection that will contain
(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
Basically, I want a superset of what filter (1 → 0..1) and map (1 → 1) allows to do: mapping (1 → 0..n).
Solutions I've tried
Imperative solutions
Obviously, it's possible to do so in non-functional maneer, like:
var output = mutable.ListBuffer()
input.foreach((x) => {
if (x % 3 == 0)
output += s"${x}/3"
if (x % 5 == 0)
output += s"${x}/5"
})
Flatmap solutions
I know of flatMap, but it again, either:
1) becomes really ugly if we're talking about arbitrary number of output elements:
val output = input.flatMap((x) => {
val v1 = if (x % 3 == 0) {
Some(s"${x}/3")
} else {
None
}
val v2 = if (x % 5 == 0) {
Some(s"${x}/5")
} else {
None
}
List(v1, v2).flatten
})
2) requires usage of mutable collections inside it:
val output = input.flatMap((x) => {
val r = ListBuffer[String]()
if (x % 3 == 0)
r += s"${x}/3"
if (x % 5 == 0)
r += s"${x}/5"
r
})
which is actually even worse that using mutable collection from the very beginning, or
3) requires major logic overhaul:
val output = input.flatMap((x) => {
if (x % 3 == 0) {
if (x % 5 == 0) {
List(s"${x}/3", s"${x}/5")
} else {
List(s"${x}/3")
}
} else if (x % 5 == 0) {
List(s"${x}/5")
} else {
List()
}
})
which is, IMHO, also looks ugly and requires duplicating the generating code.
Roll-your-own-map-function
Last, but not least, I can roll my own function of that kind:
def myMultiOutputMap[T, R](coll: TraversableOnce[T], func: (T, ListBuffer[R]) => Unit): List[R] = {
val out = ListBuffer[R]()
coll.foreach((x) => func.apply(x, out))
out.toList
}
which can be used almost like I want:
val output = myMultiOutputMap[Int, String](input, (x, out) => {
if (x % 3 == 0)
out += s"${x}/3"
if (x % 5 == 0)
out += s"${x}/5"
})
Am I really overlooking something and there's no such functionality in standard Scala collection libraries?
Similar questions
This question bears some similarity to Can I yield or map one element into many in Scala? — but that question discusses 1 element → 3 elements mapping, and I want 1 element → arbitrary number of elements mapping.
Final note
Please note that this is not the question about division / divisors, such conditions are included purely for illustrative purposes.
Rather than having a separate case for each divisor, put them in a container and iterate over them in a for comprehension:
val output = for {
n <- input
d <- Seq(3, 5)
if n % d == 0
} yield s"$n/$d"
Or equivalently in a collect nested in a flatMap:
val output = input.flatMap { n =>
Seq(3, 5).collect {
case d if n % d == 0 => s"$n/$d"
}
}
In the more general case where the different cases may have different logic, you can put each case in a separate partial function and iterate over the partial functions:
val output = for {
n <- input
f <- Seq[PartialFunction[Int, String]](
{case x if x % 3 == 0 => s"$x/3"},
{case x if x % 5 == 0 => s"$x/5"})
if f.isDefinedAt(n)
} yield f(n)
You can also use some functional library (e.g. scalaz) to express this:
import scalaz._, Scalaz._
def divisibleBy(byWhat: Int)(what: Int): List[String] =
(what % byWhat == 0).option(s"$what/$byWhat").toList
(0 to 15) flatMap (divisibleBy(3) _ |+| divisibleBy(5))
This uses the semigroup append operation |+|. For Lists this operation means a simple list concatenation. So for functions Int => List[String], this append operation will produce a function that runs both functions and appends their results.
If you have complex computation, during which you should sometimes add some elements to operation global accumulator, you can use popular approach named Writer Monad
Preparation in scala is somewhat bulky but results are extremely composable thanks to Monad interface
import scalaz.Writer
import scalaz.syntax.writer._
import scalaz.syntax.monad._
import scalaz.std.vector._
import scalaz.syntax.traverse._
type Messages[T] = Writer[Vector[String], T]
def yieldW(a: String): Messages[Unit] = Vector(a).tell
val output = Vector.range(0, 15).traverse { n =>
yieldW(s"$n / 3").whenM(n % 3 == 0) >>
yieldW(s"$n / 5").whenM(n % 5 == 0)
}.run._1
Here is my proposition for a custom function, might be better with pimp my library pattern
def fancyMap[A, B](list: TraversableOnce[A])(fs: (A => Boolean, A => B)*) = {
def valuesForElement(elem: A) = fs collect { case (predicate, mapper) if predicate(elem) => mapper(elem) }
list flatMap valuesForElement
}
fancyMap[Int, String](0 to 15)((_ % 3 == 0, _ + "/3"), (_ % 5 == 0, _ + "/5"))
You can try collect:
val input = Range(0,15)
val output = input.flatMap { x =>
List(3,5) collect { case n if (x%n == 0) => s"${x}/${n}" }
}
System.out.println(output)
I would us a fold:
val input = Range(0, 15)
val output = input.foldLeft(List[String]()) {
case (acc, value) =>
val acc1 = if (value % 3 == 0) s"$value/3" :: acc else acc
val acc2 = if (value % 5 == 0) s"$value/5" :: acc1 else acc1
acc2
}.reverse
output contains
List(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
A fold takes an accumumlator (acc), a collection, and a function. The function is called with the initial value of the accumumator, in this case an empty List[String], and each value of the collection. The function should return an updated collection.
On each iteration, we take the growing accumulator and, if the inside if statements are true, prepend the calculation to the new accumulator. The function finally returns the updated accumulator.
When the fold is done, it returns the final accumulator, but unfortunately, it is in reverse order. We simply reverse the accumulator with .reverse.
There is a nice paper on folds: A tutorial on the universality and expressiveness of fold, by Graham Hutton.

Scala - can 'for-yield' clause yields nothing for some condition?

In Scala language, I want to write a function that yields odd numbers within a given range. The function prints some log when iterating even numbers. The first version of the function is:
def getOdds(N: Int): Traversable[Int] = {
val list = new mutable.MutableList[Int]
for (n <- 0 until N) {
if (n % 2 == 1) {
list += n
} else {
println("skip even number " + n)
}
}
return list
}
If I omit printing logs, the implementation become very simple:
def getOddsWithoutPrint(N: Int) =
for (n <- 0 until N if (n % 2 == 1)) yield n
However, I don't want to miss the logging part. How do I rewrite the first version more compactly? It would be great if it can be rewritten similar to this:
def IWantToDoSomethingSimilar(N: Int) =
for (n <- 0 until N) if (n % 2 == 1) yield n else println("skip even number " + n)
def IWantToDoSomethingSimilar(N: Int) =
for {
n <- 0 until N
if n % 2 != 0 || { println("skip even number " + n); false }
} yield n
Using filter instead of a for expression would be slightly simpler though.
I you want to keep the sequentiality of your traitement (processing odds and evens in order, not separately), you can use something like that (edited) :
def IWantToDoSomethingSimilar(N: Int) =
(for (n <- (0 until N)) yield {
if (n % 2 == 1) {
Option(n)
} else {
println("skip even number " + n)
None
}
// Flatten transforms the Seq[Option[Int]] into Seq[Int]
}).flatten
EDIT, following the same concept, a shorter solution :
def IWantToDoSomethingSimilar(N: Int) =
(0 until N) map {
case n if n % 2 == 0 => println("skip even number "+ n)
case n => n
} collect {case i:Int => i}
If you will to dig into a functional approach, something like the following is a good point to start.
First some common definitions:
// use scalaz 7
import scalaz._, Scalaz._
// transforms a function returning either E or B into a
// function returning an optional B and optionally writing a log of type E
def logged[A, E, B, F[_]](f: A => E \/ B)(
implicit FM: Monoid[F[E]], FP: Pointed[F]): (A => Writer[F[E], Option[B]]) =
(a: A) => f(a).fold(
e => Writer(FP.point(e), None),
b => Writer(FM.zero, Some(b)))
// helper for fixing the log storage format to List
def listLogged[A, E, B](f: A => E \/ B) = logged[A, E, B, List](f)
// shorthand for a String logger with List storage
type W[+A] = Writer[List[String], A]
Now all you have to do is write your filtering function:
def keepOdd(n: Int): String \/ Int =
if (n % 2 == 1) \/.right(n) else \/.left(n + " was even")
You can try it instantly:
scala> List(5, 6) map(keepOdd)
res0: List[scalaz.\/[String,Int]] = List(\/-(5), -\/(6 was even))
Then you can use the traverse function to apply your function to a list of inputs, and collect both the logs written and the results:
scala> val x = List(5, 6).traverse[W, Option[Int]](listLogged(keepOdd))
x: W[List[Option[Int]]] = scalaz.WriterTFunctions$$anon$26#503d0400
// unwrap the results
scala> x.run
res11: (List[String], List[Option[Int]]) = (List(6 was even),List(Some(5), None))
// we may even drop the None-s from the output
scala> val (logs, results) = x.map(_.flatten).run
logs: List[String] = List(6 was even)
results: List[Int] = List(5)
I don't think this can be done easily with a for comprehension. But you could use partition.
def getOffs(N:Int) = {
val (evens, odds) = 0 until N partition { x => x % 2 == 0 }
evens foreach { x => println("skipping " + x) }
odds
}
EDIT: To avoid printing the log messages after the partitioning is done, you can change the first line of the method like this:
val (evens, odds) = (0 until N).view.partition { x => x % 2 == 0 }

Integer partitioning in Scala

Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people.
So we need all possible n-tuples that sum to s
My Scala code below:
def weights(n:Int,s:Int):List[List[Int]] = {
List.concat( (0 to s).toList.map(List.fill(n)(_)).flatten, (0 to s).toList).
combinations(n).filter(_.sum==s).map(_.permutations.toList).toList.flatten
}
println(weights(3,100))
This works for small values of n. ( n=1, 2, 3 or 4).
Beyond n=4, it takes a very long time, practically unusable.
I'm looking for ways to rework my code using lazy evaluation/ Stream.
My requirements : Must work for n upto 10.
Warning : The problem gets really big really fast. My results from Matlab -
---For s =100, n = 1 thru 5 results are ---
n=1 :1 combinations
n=2 :101 combinations
n=3 :5151 combinations
n=4 :176851 combinations
n=5: 4598126 combinations
---
You need dynamic programming, or memoization. Same concept, anyway.
Let's say you have to divide s among n. Recursively, that's defined like this:
def permutations(s: Int, n: Int): List[List[Int]] = n match {
case 0 => Nil
case 1 => List(List(s))
case _ => (0 to s).toList flatMap (x => permutations(s - x, n - 1) map (x :: _))
}
Now, this will STILL be slow as hell, but there's a catch here... you don't need to recompute permutations(s, n) for numbers you have already computed. So you can do this instead:
val memoP = collection.mutable.Map.empty[(Int, Int), List[List[Int]]]
def permutations(s: Int, n: Int): List[List[Int]] = {
def permutationsWithHead(x: Int) = permutations(s - x, n - 1) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoP getOrElseUpdate ((s, n),
(0 to s).toList flatMap permutationsWithHead)
}
}
And this can be even further improved, because it will compute every permutation. You only need to compute every combination, and then permute that without recomputing.
To compute every combination, we can change the code like this:
val memoC = collection.mutable.Map.empty[(Int, Int, Int), List[List[Int]]]
def combinations(s: Int, n: Int, min: Int = 0): List[List[Int]] = {
def combinationsWithHead(x: Int) = combinations(s - x, n - 1, x) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoC getOrElseUpdate ((s, n, min),
(min to s / 2).toList flatMap combinationsWithHead)
}
}
Running combinations(100, 10) is still slow, given the sheer numbers of combinations alone. The permutations for each combination can be obtained simply calling .permutation on the combination.
Here's a quick and dirty Stream solution:
def weights(n: Int, s: Int) = (1 until s).foldLeft(Stream(Nil: List[Int])) {
(a, _) => a.flatMap(c => Stream.range(0, n - c.sum + 1).map(_ :: c))
}.map(c => (n - c.sum) :: c)
It works for n = 6 in about 15 seconds on my machine:
scala> var x = 0
scala> weights(100, 6).foreach(_ => x += 1)
scala> x
res81: Int = 96560646
As a side note: by the time you get to n = 10, there are 4,263,421,511,271 of these things. That's going to take days just to stream through.
My solution of this problem, it can computer n till 6:
object Partition {
implicit def i2p(n: Int): Partition = new Partition(n)
def main(args : Array[String]) : Unit = {
for(n <- 1 to 6) println(100.partitions(n).size)
}
}
class Partition(n: Int){
def partitions(m: Int):Iterator[List[Int]] = new Iterator[List[Int]] {
val nums = Array.ofDim[Int](m)
nums(0) = n
var hasNext = m > 0 && n > 0
override def next: List[Int] = {
if(hasNext){
val result = nums.toList
var idx = 0
while(idx < m-1 && nums(idx) == 0) idx = idx + 1
if(idx == m-1) hasNext = false
else {
nums(idx+1) = nums(idx+1) + 1
nums(0) = nums(idx) - 1
if(idx != 0) nums(idx) = 0
}
result
}
else Iterator.empty.next
}
}
}
1
101
5151
176851
4598126
96560646
However , we can just show the number of the possible n-tuples:
val pt: (Int,Int) => BigInt = {
val buf = collection.mutable.Map[(Int,Int),BigInt]()
(s,n) => buf.getOrElseUpdate((s,n),
if(n == 0 && s > 0) BigInt(0)
else if(s == 0) BigInt(1)
else (0 to s).map{k => pt(s-k,n-1)}.sum
)
}
for(n <- 1 to 20) printf("%2d :%s%n",n,pt(100,n).toString)
1 :1
2 :101
3 :5151
4 :176851
5 :4598126
6 :96560646
7 :1705904746
8 :26075972546
9 :352025629371
10 :4263421511271
11 :46897636623981
12 :473239787751081
13 :4416904685676756
14 :38393094575497956
15 :312629484400483356
16 :2396826047070372396
17 :17376988841260199871
18 :119594570260437846171
19 :784008849485092547121
20 :4910371215196105953021