Scala regex and for comprehension - scala

I am trying to reason about how for comprehension works, because it is doing something different from what I expect it to do. I read several answers, the most relevant of which is this one Scala "<-" for comprehension However, I am still perplexed.
The following code works as expected. It prints lines where the values matched by two different Regexes are not equal (one for the value in a session cookie and another for the value in the GET args, just to give context):
file.getLines().foreach { line =>
val whidSession: String = rWhidSession.findAllMatchIn(line) flatMap {m => m.group(1)} mkString ""
val whidArg: String = rWhidArg.findAllMatchIn(line) flatMap {m => m.group(1)} mkString ""
if(whidSession != whidArg) println(line)
}
The following is the problematic code, which iterates on the letters within the matching strings, thus printing the line as many times as there are different letters in the two values:
/**
* This would compare letters, regardless of the use of mkString.. even without the flatMap step.
*/
val whidTuples = for {
line <- file.getLines().toList
whidSession <- rWhidSession.findAllMatchIn(line) flatMap {m => m.group(1) mkString ""}
whidArg <- rWhidEOL.findAllMatchIn(line) flatMap {m => m.group(1) mkString ""} if whidArg != whidSession
} yield line

To check that corresponding matches are equal:
scala> val ss = "foo/foo" :: "bar/bar" :: "foo/bar" :: Nil
ss: List[String] = List(foo/foo, bar/bar, foo/bar)
scala> val ra = "(.*)/.*".r ; val rb = ".*/(.*)".r
ra: scala.util.matching.Regex = (.*)/.*
rb: scala.util.matching.Regex = .*/(.*)
scala> for (s <- ss; ra(x) = s; rb(y) = s if x != y) yield s
res0: List[String] = List(foo/bar)
but allow multiple matches on a line:
scala> val ss = "foo/foo" :: "bar/bar" :: "baz/baz foo/bar" :: Nil
ss: List[String] = List(foo/foo, bar/bar, baz/baz foo/bar)
this would still compare the first matches:
scala> val ra = """(\w*)/\w*""".r.unanchored ; val rb = """\w*/(\w*)""".r.unanchored
ra: scala.util.matching.UnanchoredRegex = (\w*)/\w*
rb: scala.util.matching.UnanchoredRegex = \w*/(\w*)
scala> for (s <- ss; ra(x) = s; rb(y) = s if x != y) yield s
res2: List[String] = List()
so compare all matches:
scala> val ra = """(\w*)/\w*""".r ; val rb = """\w*/(\w*)""".r
ra: scala.util.matching.Regex = (\w*)/\w*
rb: scala.util.matching.Regex = \w*/(\w*)
scala> for (s <- ss; ma <- ra findAllMatchIn s; mb <- rb findAllMatchIn s; ra(x) = ma; rb(y) = mb if x != y) yield s
res3: List[String] = List(baz/baz foo/bar, baz/baz foo/bar, baz/baz foo/bar)
or
scala> for (s <- ss; (ma, mb) <- (ra findAllMatchIn s) zip (rb findAllMatchIn s); ra(x) = ma; rb(y) = mb if x != y) yield s
res4: List[String] = List(baz/baz foo/bar)
scala> for (s <- ss; (ra(x), rb(y)) <- (ra findAllMatchIn s) zip (rb findAllMatchIn s) if x != y) yield s
res5: List[String] = List(baz/baz foo/bar)
where the match ra(x) = ma should not be re-evaluating the regex but just doing ma group 1.

Related

Type mismatch in Scala's for-comprehension

I have tried to define a recursive Scala function that looks something like this:
def doSomething: (List[List[(Int, Int)]], List[(Int, Int)], Int, Int) => List[Int] =
(als, rs, d, n) =>
if (n == 0) {
for (entry <- rs if (entry._1 == d)) yield entry._2
} else {
for (entry <- rs; adj <- als(entry._1)) yield doSomething(als, rs.::((adj._1, adj._2 + entry._2)), d, n - 1)
}
Now, the compiler tells me:
| | | | | | <console>:17: error: type mismatch;
found : List[List[Int]]
required: List[Int]
for (entry <- rs; adj <- als(entry._1)) yield doSomething(als, rs.::((adj._1, adj._2 + entry._2)), d, n - 1)
^
I cannot figure out what the problem is. I'm sure that I'm using <- correctly. On the other hand, I'm a Scala newbie coming from the Java world...
Regarding the types of the input:
als : List[List[(Int,Int)]],
rs : List[(Int,Int)],
d and n : Int
The compiler error appears as soon as I tell IntelliJ to send my code to the Scala console.
When you yield an A when iterating on a List, you return a List[A]. doSomething returns a List[Int], so by yielding that you return a List[List[Int]]. You can unroll that like this:
def doSomethingElse(als: List[List[(Int, Int)]], rs: List[(Int, Int)], d: Int, n: Int): List[Int] =
if (n == 0) {
for ((k, v) <- rs if k == d) yield v
} else {
for {
(k, v) <- rs
(adjk, adjv) <- als(k)
item <- doSomethingElse(als, (adjk, adjv + v) :: rs, d, n - 1)
} yield item
}
Notice that I also used a method notation for brevity and destructured the pairs and leveraged the right-associativity of methods whose name ends in : for readability, feel free to use whatever convention you might want (but I don't see really a reading why having a method that returns a constant function (maybe you'd want to just use a val to declare it).
As a further note, you are using random access on a linear sequence (als(k)), you may want to consider an indexed sequence (like a Vector). More info on the complexity characteristics of the Scala Collection API can be found here.
for test purpose I created some sample data that meets the input datatypes as
val als = List(List((1,2), (3,4)), List((1,2), (3,4)), List((1,2), (3,4)))
//als: List[List[(Int, Int)]] = List(List((1,2), (3,4)), List((1,2), (3,4)), List((1,2), (3,4)))
val rs = List((1,2), (2,3))
//rs: List[(Int, Int)] = List((1,2), (2,3))
val d = 1
//d: Int = 1
val n = 3
//n: Int = 3
And in you doSomething function when n == 0 you are doing
for (entry <- rs if (entry._1 == d)) yield entry._2
//res0: List[Int] = List(2)
You can see that the return type is List[Int]
And for the else part you are calling recursively doSomething.
I have created dummy doSomething method of yours as your doSomething function definition lacks input variables as
def dosomething(nn: Int)={
for (entry <- rs if (entry._1 == d)) yield entry._2
}
and I call the method recursively as
for (entry <- rs; adj <- als(entry._1)) yield dosomething(0)
//res1: List[List[Int]] = List(List(2), List(2), List(2), List(2))
Clearly you can see that the second nested for loop is returning List[List[Int]]
And thats what the compiler is warning you
error: type mismatch;
found : List[List[Int]]
required: List[Int]
I hope the answer is helpful

Pattern Matching Function Call in Scala

I am originally posting the question on CodeReview but it seems to be not fitted there. I'll reask here. Please tell me if it's also not fit here, and where should I post this kind of question. Thanks.
I am a newbie in Scala and functional programming. I want to call a function several times, with combination of parameters based on two variables. Basically, What I am doing right now is this:
def someFunction(a: Int, b: Int): Future[Int] = ???
val value1 = true
val value2 = false
(value1, value2) match {
case (true, true) =>
val res1 = someFunction(0, 0)
val res2 = someFunction(0, 1)
val res3 = someFunction(1, 0)
val res4 = someFunction(1, 1)
for {
r1 <- res1
r2 <- res2
r3 <- res3
r4 <- res4
} yield r1 + r2 + r3 + r4
case (true, false) =>
val res1 = someFunction(0, 0)
val res2 = someFunction(1, 0)
for {
r1 <- res1
r2 <- res2
} yield r1 + r2
case (false, true) =>
val res1 = someFunction(0, 0)
val res2 = someFunction(0, 1)
for {
r1 <- res1
r2 <- res2
} yield r1 + r2
case (false, false) =>
for { r1 <- someFunction(0, 0) } yield r1
}
I am not satisfied with the above code as it is repetitive and hard to read / maintain. Is there any better way I could do this? I've tried to search on how to combine function by pattern matching value like this, but finds nothing to work with. Looks like I don't know the right term for this.
Any help would be appreciated, and feel free to change the title if there's a better wording.
Thanks before :)
An easier way could be to pregenerate a sequence of argument tuples:
val arguments = for {
arg1 <- 0 to (if (value1) 1 else 0)
arg2 <- 0 to (if (value2) 1 else 0)
} yield (arg1, arg2)
Then you can combine function executions on the arguments with Future.traverse to get a Future of the sequence of results, and then sum the results:
Future.traverse(arguments)(Function.tupled(someFunction)).map(_.sum)
I think this should solve your problem:
def someFunction(x: Int, y: Int): Future[Int] = ???
def someFunctionTupled: ((Int, Int)) => Future[Int] = (someFunction _).tupled // Same as someFunction but you can pass in a tuple here
def genParamList(b: Boolean) = if (b)
List(0, 1)
else
List(0)
val value1 = true
val value2 = false
val l1 = genParamList(value1)
val l2 = genParamList(value2)
// Combine the two parameter lists by constructing the cartesian product
val allParams = l1.foldLeft(List[(Int, Int)]()){
case (acc, elem) => acc ++ l2.map((elem, _))
}
allParams.map((someFunction _).tupled).sum
The above code will result in a Future[Int] which is the sum of all results of someFunction applied to the elements of the allParams list.

How to process cogroup values?

I am cogrouping two RDDs and I want to process its values. That is,
rdd1.cogroup(rdd2)
as a result of this cogrouping I get results as below:
(ion,(CompactBuffer(100772C121, 100772C111, 6666666666),CompactBuffer(100772C121)))
Considering this result I would like to obtain all distinct pairs. e.g.
For the key 'ion'
100772C121 - 100772C111
100772C121 - 666666666
100772C111 - 666666666
How can I do this in scala?
You could try something like the following:
(l1 ++ l2).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
You would need to update l1 and l2 for your CompactBuffer fields. When I tried this locally, I get this (which is what I believe you want):
scala> val l1 = List("100772C121", "100772C111", "6666666666")
l1: List[String] = List(100772C121, 100772C111, 6666666666)
scala> val l2 = List("100772C121")
l2: List[String] = List(100772C121)
scala> val combine = (l1 ++ l2).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
combine: List[(String, String)] = List((100772C121,100772C111), (100772C121,6666666666), (100772C111,6666666666))
If you would like all of these pairs on separate rows, you can enclose this logic within a flatMap.
EDIT: Added steps per your example above.
scala> val rdd1 = sc.parallelize(Array(("ion", "100772C121"), ("ion", "100772C111"), ("ion", "6666666666")))
rdd1: org.apache.spark.rdd.RDD[(String, String)] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val rdd2 = sc.parallelize(Array(("ion", "100772C121")))
rdd2: org.apache.spark.rdd.RDD[(String, String)] = ParallelCollectionRDD[1] at parallelize at <console>:12
scala> val cgrp = rdd1.cogroup(rdd2).flatMap {
| case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
| (l1.toSeq ++ l2.toSeq).distinct.combinations(2).map { case Seq(x, y) => (x, y) }.toList
| }
cgrp: org.apache.spark.rdd.RDD[(String, String)] = FlatMappedRDD[4] at flatMap at <console>:16
scala> cgrp.foreach(println)
...
(100772C121,100772C111)
(100772C121,6666666666)
(100772C111,6666666666)
EDIT 2: Updated again per your use case.
scala> val cgrp = rdd1.cogroup(rdd2).flatMap {
| case (key: String, (l1: Iterable[String], l2: Iterable[String])) =>
| for { e1 <- l1.toSeq; e2 <- l2.toSeq; if (e1 != e2) }
| yield if (e1 > e2) ((e1, e2), 1) else ((e2, e1), 1)
| }.reduceByKey(_ + _)
...
((6666666666,100772C121),2)
((6666666666,100772C111),1)
((100772C121,100772C111),1)

Convert iterative two sum k to functional

I have this code in Python that finds all pairs of numbers in an array that sum to k:
def two_sum_k(array, k):
seen = set()
out = set()
for v in array:
if k - v in seen:
out.add((min(v, k-v), max(v, k-v)))
seen.add(v)
return out
Can anyone help me convert this to Scala (in a functional style)? Also with linear complexity.
I think this is a classic case of when a for-comprehension can provide additional clarity
scala> def algo(xs: IndexedSeq[Int], target: Int) =
| for {
| i <- 0 until xs.length
| j <- (i + 1) until xs.length if xs(i) + xs(j) == target
| }
| yield xs(i) -> xs(j)
algo: (xs: IndexedSeq[Int], target: Int)scala.collection.immutable.IndexedSeq[(Int, Int)]
Using it:
scala> algo(1 to 20, 15)
res0: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,14), (2,13), (3,12), (4,11), (5,10), (6,9), (7,8))
I think it also doesn't suffer from the problems that your algorithm has
I'm not sure this is the clearest, but folds usually do the trick:
def two_sum_k(xs: Seq[Int], k: Int) = {
xs.foldLeft((Set[Int](),Set[(Int,Int)]())){ case ((seen,out),v) =>
(seen+v, if (seen contains k-v) out+((v min k-v, v max k-v)) else out)
}._2
}
You could just filter for (k-x <= x) by only using those x as first element, which aren't bigger than k/2:
def two_sum_k (xs: List[Int], k: Int): List [(Int, Int)] =
xs.filter (x => (x <= k/2)).
filter (x => (xs contains k-x) && (xs.indexOf (x) != xs.lastIndexOf (x))).
map (x => (x, k-x)).distinct
My first filter on line 3 was just filter (x => xs contains k-x)., which failed as found in the comment by Someone Else. Now it's more complicated and doesn't find (4, 4).
scala> li
res6: List[Int] = List(2, 3, 3, 4, 5, 5)
scala> two_sum_k (li, 8)
res7: List[(Int, Int)] = List((3,5))
def twoSumK(xs: List[Int], k: Int): List[(Int, Int)] = {
val tuples = xs.iterator map { x => (x, k-x) }
val potentialValues = tuples map { case (a, b) => (a min b) -> (a max b) }
val values = potentialValues filter { xs contains _._2 }
values.toSet.toList
}
Well, a direct translation would be this:
import scala.collection.mutable
def twoSumK[T : Numeric](array: Array[T], k: T) = {
val num = implicitly[Numeric[T]]
import num._
val seen = mutable.HashSet[T]()
val out: mutable.Set[(T, T)] = mutable.HashSet[(T, T)]()
for (v <- array) {
if (seen contains k - v) out += min(v, k - v) -> max(v, k - v)
seen += v
}
out
}
One clever way of doing it would be this:
def twoSumK[T : Numeric](array: Array[T], k: T) = {
val num = implicitly[Numeric[T]]
import num._
// One can write all the rest as a one-liner
val s1 = array.toSet
val s2 = s1 map (k -)
val s3 = s1 intersect s2
s3 map (v => min(v, k - v) -> max(v, k - v))
}
This does the trick:
def two_sum_k(xs: List[Int], k: Int): List[(Int, Int)] ={
xs.map(a=>xs.map(b=>(b,a+b)).filter(_._2 == k).map(b=>(b._1,a))).flatten.collect{case (a,b)=>if(a>b){(b,a)}else{(a,b)}}.distinct
}

Scala Get First and Last elements of List using Pattern Matching

I am doing a pattern matching on a list. Is there anyway I can access the first and last element of the list to compare?
I want to do something like..
case List(x, _*, y) if(x == y) => true
or
case x :: _* :: y =>
or something similar...
where x and y are first and last elements of the list..
How can I do that.. any Ideas?
Use the standard :+ and +: extractors from the scala.collection package
ORIGINAL ANSWER
Define a custom extractor object.
object :+ {
def unapply[A](l: List[A]): Option[(List[A], A)] = {
if(l.isEmpty)
None
else
Some(l.init, l.last)
}
}
Can be used as:
val first :: (l :+ last) = List(3, 89, 11, 29, 90)
println(first + " " + l + " " + last) // prints 3 List(89, 11, 29) 90
(For your case: case x :: (_ :+ y) if(x == y) => true)
In case you missed the obvious:
case list # (head :: tail) if head == list.last => true
The head::tail part is there so you don’t match on the empty list.
simply:
case head +: _ :+ last =>
for example:
scala> val items = Seq("ham", "spam", "eggs")
items: Seq[String] = List(ham, spam, eggs)
scala> items match {
| case head +: _ :+ last => Some((head, last))
| case List(head) => Some((head, head))
| case _ => None
| }
res0: Option[(String, String)] = Some((ham,eggs))
Lets understand the concept related to this question, there is a difference between '::', '+:' and ':+':
1st Operator:
'::' - It is right associative operator which works specially for lists
scala> val a :: b :: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
2nd Operator:
'+:' - It is also right associative operator but it works on seq which is more general than just list.
scala> val a +: b +: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
3rd Operator:
':+' - It is also left associative operator but it works on seq which is more general than just list
scala> val a :+ b :+ c = List(1,2,3,4)
a: List[Int] = List(1, 2)
b: Int = 3
c: Int = 4
The associativity of an operator is determined by the operator’s last character. Operators ending in a colon ‘:’ are right-associative. All other operators are left-associative.
A left-associative binary operation e1;op;e2 is interpreted as e1.op(e2)
If op is right-associative, the same operation is interpreted as { val x=e1; e2.op(x) }, where x is a fresh name.
Now comes answer for your question:
So now if you need to get first and last element from the list, please use following code
scala> val firstElement +: b :+ lastElement = List(1,2,3,4)
firstElement: Int = 1
b: List[Int] = List(2, 3)
lastElement: Int = 4