Does Scala have a statement equivalent to ML's "as" construct? - scala

In ML, one can assign names for each element of a matched pattern:
fun findPair n nil = NONE
| findPair n (head as (n1, _))::rest =
if n = n1 then (SOME head) else (findPair n rest)
In this code, I defined an alias for the first pair of the list and matched the contents of the pair. Is there an equivalent construct in Scala?

You can do variable binding with the # symbol, e.g.:
scala> val wholeList # List(x, _*) = List(1,2,3)
wholeList: List[Int] = List(1, 2, 3)
x: Int = 1

I'm sure you'll get a more complete answer later as I'm not sure how to write it recursively like your example, but maybe this variation would work for you:
scala> val pairs = List((1, "a"), (2, "b"), (3, "c"))
pairs: List[(Int, String)] = List((1,a), (2,b), (3,c))
scala> val n = 2
n: Int = 2
scala> pairs find {e => e._1 == n}
res0: Option[(Int, String)] = Some((2,b))
OK, next attempt at direct translation. How about this?
scala> def findPair[A, B](n: A, p: List[Tuple2[A, B]]): Option[Tuple2[A, B]] = p match {
| case Nil => None
| case head::rest if head._1 == n => Some(head)
| case _::rest => findPair(n, rest)
| }
findPair: [A, B](n: A, p: List[(A, B)])Option[(A, B)]

Related

How to process cogroup values?

I am cogrouping two RDDs and I want to process its values. That is,
rdd1.cogroup(rdd2)
as a result of this cogrouping I get results as below:
(ion,(CompactBuffer(100772C121, 100772C111, 6666666666),CompactBuffer(100772C121)))
Considering this result I would like to obtain all distinct pairs. e.g.
For the key 'ion'
100772C121 - 100772C111
100772C121 - 666666666
100772C111 - 666666666
How can I do this in scala?
You could try something like the following:
(l1 ++ l2).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
You would need to update l1 and l2 for your CompactBuffer fields. When I tried this locally, I get this (which is what I believe you want):
scala> val l1 = List("100772C121", "100772C111", "6666666666")
l1: List[String] = List(100772C121, 100772C111, 6666666666)
scala> val l2 = List("100772C121")
l2: List[String] = List(100772C121)
scala> val combine = (l1 ++ l2).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
combine: List[(String, String)] = List((100772C121,100772C111), (100772C121,6666666666), (100772C111,6666666666))
If you would like all of these pairs on separate rows, you can enclose this logic within a flatMap.
EDIT: Added steps per your example above.
scala> val rdd1 = sc.parallelize(Array(("ion", "100772C121"), ("ion", "100772C111"), ("ion", "6666666666")))
rdd1: org.apache.spark.rdd.RDD[(String, String)] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val rdd2 = sc.parallelize(Array(("ion", "100772C121")))
rdd2: org.apache.spark.rdd.RDD[(String, String)] = ParallelCollectionRDD[1] at parallelize at <console>:12
scala> val cgrp = rdd1.cogroup(rdd2).flatMap {
| case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
| (l1.toSeq ++ l2.toSeq).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
| }
cgrp: org.apache.spark.rdd.RDD[(String, String)] = FlatMappedRDD[4] at flatMap at <console>:16
scala> cgrp.foreach(println)
...
(100772C121,100772C111)
(100772C121,6666666666)
(100772C111,6666666666)
EDIT 2: Updated again per your use case.
scala> val cgrp = rdd1.cogroup(rdd2).flatMap {
| case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
| for { e1 <- l1.toSeq; e2 <- l2.toSeq; if (e1 != e2) }
| yield if (e1 > e2) ((e1, e2), 1) else ((e2, e1), 1)
| }.reduceByKey(_ + _)
...
((6666666666,100772C121),2)
((6666666666,100772C111),1)
((100772C121,100772C111),1)

Best way to implement "zipLongest" in Scala

I need to implement a "zipLongest" function in Scala; that is, combine two sequences together as pairs, and if one is longer than the other, use a default value. (Unlike the standard zip method, which will just truncate to the shortest sequence.)
I've implemented it directly as follows:
def zipLongest[T](xs: Seq[T], ys: Seq[T], default: T): Seq[(T, T)] = (xs, ys) match {
case (Seq(), Seq()) => Seq()
case (Seq(), y +: rest) => (default, y) +: zipLongest(Seq(), rest, default)
case (x +: rest, Seq()) => (x, default) +: zipLongest(rest, Seq(), default)
case (x +: restX, y +: restY) => (x, y) +: zipLongest(restX, restY, default)
}
Is there a better way to do it?
Use zipAll :
scala> val l1 = List(1,2,3)
l1: List[Int] = List(1, 2, 3)
scala> val l2 = List("a","b")
l2: List[String] = List(a, b)
scala> l1.zipAll(l2,0,".")
res0: List[(Int, String)] = List((1,a), (2,b), (3,.))
If you want to use the same default value for the first and second seq :
scala> def zipLongest[T](xs:Seq[T], ys:Seq[T], default:T) = xs.zipAll(ys, default, default)
zipLongest: [T](xs: Seq[T], ys: Seq[T], default: T)Seq[(T, T)]
scala> val l3 = List(4,5,6,7)
l3: List[Int] = List(4, 5, 6, 7)
scala> zipLongest(l1,l3,0)
res1: Seq[(Int, Int)] = List((1,4), (2,5), (3,6), (0,7))
You can do this as a oneliner:
xs.padTo(ys.length, x).zip(ys.padTo(xs.length, y))

How to generate the power set of a set in Scala

I have a Set of items of some type and want to generate its power set.
I searched the web and couldn't find any Scala code that adresses this specific task.
This is what I came up with. It allows you to restrict the cardinality of the sets produced by the length parameter.
def power[T](set: Set[T], length: Int) = {
var res = Set[Set[T]]()
res ++= set.map(Set(_))
for (i <- 1 until length)
res = res.map(x => set.map(x + _)).flatten
res
}
This will not include the empty set. To accomplish this you would have to change the last line of the method simply to res + Set()
Any suggestions how this can be accomplished in a more functional style?
Looks like no-one knew about it back in July, but there's a built-in method: subsets.
scala> Set(1,2,3).subsets foreach println
Set()
Set(1)
Set(2)
Set(3)
Set(1, 2)
Set(1, 3)
Set(2, 3)
Set(1, 2, 3)
Notice that if you have a set S and another set T where T = S ∪ {x} (i.e. T is S with one element added) then the powerset of T - P(T) - can be expressed in terms of P(S) and x as follows:
P(T) = P(S) ∪ { p ∪ {x} | p ∈ P(S) }
That is, you can define the powerset recursively (notice how this gives you the size of the powerset for free - i.e. adding 1-element doubles the size of the powerset). So, you can do this tail-recursively in scala as follows:
scala> def power[A](t: Set[A]): Set[Set[A]] = {
| #annotation.tailrec
| def pwr(t: Set[A], ps: Set[Set[A]]): Set[Set[A]] =
| if (t.isEmpty) ps
| else pwr(t.tail, ps ++ (ps map (_ + t.head)))
|
| pwr(t, Set(Set.empty[A])) //Powerset of ∅ is {∅}
| }
power: [A](t: Set[A])Set[Set[A]]
Then:
scala> power(Set(1, 2, 3))
res2: Set[Set[Int]] = Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3), Set(1, 2))
It actually looks much nicer doing the same with a List (i.e. a recursive ADT):
scala> def power[A](s: List[A]): List[List[A]] = {
| #annotation.tailrec
| def pwr(s: List[A], acc: List[List[A]]): List[List[A]] = s match {
| case Nil => acc
| case a :: as => pwr(as, acc ::: (acc map (a :: _)))
| }
| pwr(s, Nil :: Nil)
| }
power: [A](s: List[A])List[List[A]]
Here's one of the more interesting ways to write it:
import scalaz._, Scalaz._
def powerSet[A](xs: List[A]) = xs filterM (_ => true :: false :: Nil)
Which works as expected:
scala> powerSet(List(1, 2, 3)) foreach println
List(1, 2, 3)
List(1, 2)
List(1, 3)
List(1)
List(2, 3)
List(2)
List(3)
List()
See for example this discussion thread for an explanation of how it works.
(And as debilski notes in the comments, ListW also pimps powerset onto List, but that's no fun.)
Use the built-in combinations function:
val xs = Seq(1,2,3)
(0 to xs.size) flatMap xs.combinations
// Vector(List(), List(1), List(2), List(3), List(1, 2), List(1, 3), List(2, 3),
// List(1, 2, 3))
Note, I cheated and used a Seq, because for reasons unknown, combinations is defined on SeqLike. So with a set, you need to convert to/from a Seq:
val xs = Set(1,2,3)
(0 to xs.size).flatMap(xs.toSeq.combinations).map(_.toSet).toSet
//Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3),
//Set(1, 2))
Can be as simple as:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] =
xs.foldLeft(Seq(Seq[A]())) {(sets, set) => sets ++ sets.map(_ :+ set)}
Recursive implementation:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] = {
def go(xsRemaining: Seq[A], sets: Seq[Seq[A]]): Seq[Seq[A]] = xsRemaining match {
case Nil => sets
case y :: ys => go(ys, sets ++ sets.map(_ :+ y))
}
go(xs, Seq[Seq[A]](Seq[A]()))
}
All the other answers seemed a bit complicated, here is a simple function:
def powerSet (l:List[_]) : List[List[Any]] =
l match {
case Nil => List(List())
case x::xs =>
var a = powerSet(xs)
a.map(n => n:::List(x)):::a
}
so
powerSet(List('a','b','c'))
will produce the following result
res0: List[List[Any]] = List(List(c, b, a), List(b, a), List(c, a), List(a), List(c, b), List(b), List(c), List())
Here's another (lazy) version... since we're collecting ways of computing the power set, I thought I'd add it:
def powerset[A](s: Seq[A]) =
Iterator.range(0, 1 << s.length).map(i =>
Iterator.range(0, s.length).withFilter(j =>
(i >> j) % 2 == 1
).map(s)
)
Here's a simple, recursive solution using a helper function:
def concatElemToList[A](a: A, list: List[A]): List[Any] = (a,list) match {
case (x, Nil) => List(List(x))
case (x, ((h:List[_]) :: t)) => (x :: h) :: concatElemToList(x, t)
case (x, (h::t)) => List(x, h) :: concatElemToList(x, t)
}
def powerSetRec[A] (a: List[A]): List[Any] = a match {
case Nil => List()
case (h::t) => powerSetRec(t) ++ concatElemToList(h, powerSetRec (t))
}
so the call of
powerSetRec(List("a", "b", "c"))
will give the result
List(List(c), List(b, c), List(b), List(a, c), List(a, b, c), List(a, b), List(a))

Cartesian product of two lists

Given a map where a digit is associated to several characters
scala> val conversion = Map("0" -> List("A", "B"), "1" -> List("C", "D"))
conversion: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] =
Map(0 -> List(A, B), 1 -> List(C, D))
I want to generate all possible character sequences based on a sequence of digits. Examples:
"00" -> List("AA", "AB", "BA", "BB")
"01" -> List("AC", "AD", "BC", "BD")
I can do this with for comprehensions
scala> val number = "011"
number: java.lang.String = 011
Create a sequence of possible characters per index
scala> val values = number map { case c => conversion(c.toString) }
values: scala.collection.immutable.IndexedSeq[List[java.lang.String]] =
Vector(List(A, B), List(C, D), List(C, D))
Generate all the possible character sequences
scala> for {
| a <- values(0)
| b <- values(1)
| c <- values(2)
| } yield a+b+c
res13: List[java.lang.String] = List(ACC, ACD, ADC, ADD, BCC, BCD, BDC, BDD)
Here things get ugly and it will only work for sequences of three digits. Is there any way to achieve the same result for any sequence length?
The following suggestion is not using a for-comprehension. But I don't think it's a good idea after all, because as you noticed you'd be tied to a certain length of your cartesian product.
scala> def cartesianProduct[T](xss: List[List[T]]): List[List[T]] = xss match {
| case Nil => List(Nil)
| case h :: t => for(xh <- h; xt <- cartesianProduct(t)) yield xh :: xt
| }
cartesianProduct: [T](xss: List[List[T]])List[List[T]]
scala> val conversion = Map('0' -> List("A", "B"), '1' -> List("C", "D"))
conversion: scala.collection.immutable.Map[Char,List[java.lang.String]] = Map(0 -> List(A, B), 1 -> List(C, D))
scala> cartesianProduct("01".map(conversion).toList)
res9: List[List[java.lang.String]] = List(List(A, C), List(A, D), List(B, C), List(B, D))
Why not tail-recursive?
Note that above recursive function is not tail-recursive. This isn't a problem, as xss will be short unless you have a lot of singleton lists in xss. This is the case, because the size of the result grows exponentially with the number of non-singleton elements of xss.
I could come up with this:
val conversion = Map('0' -> Seq("A", "B"), '1' -> Seq("C", "D"))
def permut(str: Seq[Char]): Seq[String] = str match {
case Seq() => Seq.empty
case Seq(c) => conversion(c)
case Seq(head, tail # _*) =>
val t = permut(tail)
conversion(head).flatMap(pre => t.map(pre + _))
}
permut("011")
I just did that as follows and it works
def cross(a:IndexedSeq[Tree], b:IndexedSeq[Tree]) = {
a.map (p => b.map( o => (p,o))).flatten
}
Don't see the $Tree type that am dealing it works for arbitrary collections too..

Scala Get First and Last elements of List using Pattern Matching

I am doing a pattern matching on a list. Is there anyway I can access the first and last element of the list to compare?
I want to do something like..
case List(x, _*, y) if(x == y) => true
or
case x :: _* :: y =>
or something similar...
where x and y are first and last elements of the list..
How can I do that.. any Ideas?
Use the standard :+ and +: extractors from the scala.collection package
ORIGINAL ANSWER
Define a custom extractor object.
object :+ {
def unapply[A](l: List[A]): Option[(List[A], A)] = {
if(l.isEmpty)
None
else
Some(l.init, l.last)
}
}
Can be used as:
val first :: (l :+ last) = List(3, 89, 11, 29, 90)
println(first + " " + l + " " + last) // prints 3 List(89, 11, 29) 90
(For your case: case x :: (_ :+ y) if(x == y) => true)
In case you missed the obvious:
case list # (head :: tail) if head == list.last => true
The head::tail part is there so you don’t match on the empty list.
simply:
case head +: _ :+ last =>
for example:
scala> val items = Seq("ham", "spam", "eggs")
items: Seq[String] = List(ham, spam, eggs)
scala> items match {
| case head +: _ :+ last => Some((head, last))
| case List(head) => Some((head, head))
| case _ => None
| }
res0: Option[(String, String)] = Some((ham,eggs))
Lets understand the concept related to this question, there is a difference between '::', '+:' and ':+':
1st Operator:
'::' - It is right associative operator which works specially for lists
scala> val a :: b :: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
2nd Operator:
'+:' - It is also right associative operator but it works on seq which is more general than just list.
scala> val a +: b +: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
3rd Operator:
':+' - It is also left associative operator but it works on seq which is more general than just list
scala> val a :+ b :+ c = List(1,2,3,4)
a: List[Int] = List(1, 2)
b: Int = 3
c: Int = 4
The associativity of an operator is determined by the operator’s last character. Operators ending in a colon ‘:’ are right-associative. All other operators are left-associative.
A left-associative binary operation e1;op;e2 is interpreted as e1.op(e2)
If op is right-associative, the same operation is interpreted as { val x=e1; e2.op(x) }, where x is a fresh name.
Now comes answer for your question:
So now if you need to get first and last element from the list, please use following code
scala> val firstElement +: b :+ lastElement = List(1,2,3,4)
firstElement: Int = 1
b: List[Int] = List(2, 3)
lastElement: Int = 4