scala partial string match - scala

I have a question about List of Strings partial Matching to a List of Strings (intersect I guess).
List1:List = [a,b,c,d,e,f]
List2:Iterable[String] = [a b,e f,g h,x y]
I want to take any element or combination of elements in List1 that also happens to be in List2, and replace it with the element in List2, for example, [a,b] are in List1, List 2 contains element [a b], in this case, [a,b] in List1 will be replaced with [a b]. The result for List 1 should be:
List1result = [a b,c,d,e f]
I've tried intersect, which would return [a b, e f]

Ok, I edited my answer after the comment bellow, I think I understood the question now.
take each element of the second list, convert it into a list of elements and use containsSlice to filter out the value.
containsSlice will return true if all the elements in the slice are present in the first list.
val lst1 = List("a","b","c","d","e","f")
val lst2 = List("a b","e f","g h","x y")
lst2.filter{ pair =>
val xss = pair.split(" ")
lst1.containsSlice(xss)
}

You can try something like this :
val l1 = List("a", "b", "c", "d", "e", "f")
val l2 = List("a b", "e f", "g h", "x y")
l1.filterNot(x=>l2.flatten.filter(_ != ' ').contains(x.toCharArray.head))
l2.foldLeft(List[String]()) { case (x, y) => if (l1.containsSlice(y.split(" "))) x :+ y else x} ++
l1.filterNot(x=>l2.flatten.filter(_ != ' ').contains(x.toCharArray.head))
l1: List[String] = List(a, b, c, d, e, f)
l2: List[String] = List(a b, e f, g h, x y)
res0: List[String] = List(a b, e f, c, d)

Related

Loop over a List getting an increasing number of elements each time

Let's say I have a list that looks like
{A, B, C, D, E}
And I want to loop over this list, getting an increasing number of elements each time, so each iteration would look like:
Iteration 1: {A}
Iteration 2: {A, B}
Iteration 3: {A, B, C}
Iteration 4: {A, B, C, D}
Iteration 5: {A, B, C, D, E}
Currently I am accomplishing this with:
(1 to list.size).foreach( n => {
val elements = list.take(n)
// Do something with elements
})
But that feels messy. Is there a more 'scala' way of accomplishing this behavior?
You could use list.inits :
scala> List(1,2,3,4,5).inits.foreach(println)
List(1, 2, 3, 4, 5)
List(1, 2, 3, 4)
List(1, 2, 3)
List(1, 2)
List(1)
List()
To get your desired out put you would need to create a list from the iterator, reverse it and take the tail to omit the empty list:
scala> List(1,2,3,4,5).inits.toList.reverse.tail.foreach(println)
List(1)
List(1, 2)
List(1, 2, 3)
List(1, 2, 3, 4)
List(1, 2, 3, 4, 5)
You can use foldLeft with a linked list to accumulate the elements.
However, this will reverse the order, so you'll need to call the .reverse function if you really care about the order, which wouldn't be efficient.
list.foldLeft(Nil : List[String]){(l, n) => {
val elements = n :: l
println(elements)
elements
}}
Output:
List(A)
List(B, A)
List(C, B, A)
List(D, C, B, A)
List(E, D, C, B, A)
Here's a version that preserves order but uses a ListBuffer, which isn't great
val elems = ListBuffer[String]()
list.foreach{ s =>
elems += s
println(elems)
}
The same, but with a fold
list.foldLeft(ListBuffer[String]()){(elems, s) =>
elems += s
println(elems)
elems
}
Output:
ListBuffer(A)
ListBuffer(A, B)
ListBuffer(A, B, C)
ListBuffer(A, B, C, D)
ListBuffer(A, B, C, D, E)
Here is a recursive version. You will need to reverse the list first.
#tailrec
def doIt(l: List[Int], acc: List[List[Int]] = Nil): List[List[Int]] = l match {
case Nil => acc
case h :: t => doIt(t, List(h) :: acc.map(l => h :: l))
}
doIt(List(1,2,3).reverse).foreach(println)
// output
List(1)
List(1, 2)
List(1, 2, 3)

MapReduce example in Scala

I have this problem in Scala for a Homework.
The idea I have had but have not been able to successfully implement is
Iterate through each word, if the word is basketball, take the next word and add it to a map. Reduce by key, and sort from highest to lowest.
Unfortunately I do not know how to take the next next word in a list of words.
For example, i would like to do something like this:
val lines = spark.textFile("basketball_words_only.txt") // process lines in file
// split into individual words
val words = lines.flatMap(line => line.split(" "))
var listBuff = new ListBuffer[String]() // a list Buffer to hold each following word
val it = Iterator(words)
while (it.hasNext) {
listBuff += it.next().next() // <-- this is what I would like to do
}
val follows = listBuff.map(word => (word, 1))
val count = follows.reduceByKey((x, y) => x + y) // another issue as I cannot reduceByKey with a listBuffer
val sort = count.sortBy(_._2,false,1)
val result2 = sort.collect()
for (i <- 0 to result2.length - 1) {
printf("%s follows %d times\n", result1(2)._1, result2(i)._2);
}
Any help would be appreciated
You can get the max count for the first word in all distinct word pairs in a few steps:
Strip punctuations, split content into words which get lowercased
Use sliding(2) to create array of word pairs
Use reduceByKey to count occurrences of distinct word pairs
Use reduceByKey again to capture word pairs with max count for the first word
Sample code as follows:
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.rdd.RDDFunctions._
val wordPairCountRDD = sc.textFile("/path/to/textfile").
flatMap( _.split("""[\s,.;:!?]+""") ).
map( _.toLowerCase ).
sliding(2).
map{ case Array(w1, w2) => ((w1, w2), 1) }.
reduceByKey( _ + _ )
val wordPairMaxRDD = wordPairCountRDD.
map{ case ((w1, w2), c) => (w1, (w2, c)) }.
reduceByKey( (acc, x) =>
if (x._2 > acc._2) (x._1, x._2) else acc
).
map{ case (w1, (w2, c)) => ((w1, w2), c) }
[UPDATE]
If you only need the word pair counts to be sorted (in descending order) per your revised requirement, you can skip step 4 and use sortBy on wordPairCountRDD:
wordPairCountRDD.
sortBy( z => (z._2, z._1._1, z._1._2), ascending = false )
This is from https://spark.apache.org/examples.html:
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
As you can see it counts the occurrence of individual words because the key-value pairs are of the form (word, 1). Which part do you need to change to count combinations of words?
This might help you: http://daily-scala.blogspot.com/2009/11/iteratorsliding.html
Well, my text uses "b" instead of "basketball" and "a", "c" for other words.
scala> val r = scala.util.Random
scala> val s = (1 to 20).map (i => List("a", "b", "c")(r.nextInt (3))).mkString (" ")
s: String = c a c b a b a a b c a b b c c a b b c b
The result is gained by split, sliding, filter, map, groupBy, map and sortBy:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}.toList.sortBy (-_._2)
counts: List[(Char, Int)] = List((c,3), (b,2), (a,2))
In small steps, sliding:
scala> val counts = s.split (" ").sliding (2).toList
counts: List[Array[String]] = List(Array(c, a), Array(a, c), Array(c, b), Array(b, a), Array(a, b), Array(b, a), Array(a, a), Array(a, b), Array(b, c), Array(c, a), Array(a, b), Array(b, b), Array(b, c), Array(c, c), Array(c, a), Array(a, b), Array(b, b), Array(b, c), Array(c, b))
filter:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").toList
counts: List[Array[String]] = List(Array(b, a), Array(b, a), Array(b, c), Array(b, b), Array(b, c), Array(b, b), Array(b, c))
map (_(1)) (Array access element 2)
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList
counts: List[String] = List(a, a, c, b, c, b, c)
groupBy (_(0))
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0))
counts: scala.collection.immutable.Map[Char,List[String]] = Map(b -> List(b, b), a -> List(a, a), c -> List(c, c, c))
to size of List:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}
counts: scala.collection.immutable.Map[Char,Int] = Map(b -> 2, a -> 2, c -> 3)
Finally sort descending:
scala> val counts = s.split (" ").sliding (2).filter (_(0) == "b").map (_(1)).toList.groupBy (_(0)).map { case (c: Char, l: List[String]) => (c, l.size)}.toList.sortBy (-_._2)
counts: List[(Char, Int)] = List((c,3), (b,2), (a,2))

List[List[String]] in Scala

We have a list of strings and i grouping them by below program.
input: val k = List("a", "a", "a", "a", "b", "c", "c", "a", "a", "d", "e", "e", "e", "e")
output: *List(List(a, a, a, a), List(b), List(c, c), List(a, a), List(d), List(e, e, e, e))*
Program:
def pack(ls:List[String]):List[List[String]]={
val (a,next) = ls span {_ == ls.head}
if ((next) == Nil) List(a)
else a :: pack(next)
}
However when i do a List cons operators, i get the output as mentioned below.
Input:
val a =List("a", "a", "a", "a")
val b = List ("b")
val c = List ("c", "c" )
val a1 = List("a", "a")
val d = List("d")
val e = List( "e", "e", "e", "e")
*List(a::b::c::a1::d::e)*
output:
*List(List(List(a, a, a, a), List(b), List(c, c), List(a, a), List(d), e, e, e, e))*
Is there any way i can the output as below in a single command in scala?
*List(List(a, a, a, a), List(b), List(c, c), List(a, a), List(d), List(e, e, e, e))*
scala> a::b::c::a1::d::List(e)
res0: List[List[String]] = List(List(a, a, a, a), List(b), List(c, c), List(a, a), List(d), List(e, e, e, e))
The cons operator prepends an item to a list - so construct a List around the last one if you want to then prepend all the other items one by one.
The easiest way to think about this is noticing the types:
To construct a List[List[String]], the cons operator expects to operate on an a List[String] on the left and a List[List[String]] on the right, to produce a new List[List[String]]: on the left should be an item in the resulting list, and on the right - a list with the same type as the expected result
When you write d::e, you're doing List[String] :: List[String], which already means you're not going to produce a List[List[String]] - so the right-hand side must be "wrapped" with a list to get the types right: d::List(e)
Prepending the other items follows the same rule - prepending List[String]s to a List[List[String]]
Once this is done - you get the expected result, without having to wrap the entire result with another list
If you are planning to achieve the end result using ::, adding a Nil at the end to the cons operation could yield the desired result.
a::b::c::a1::d::e::Nil
or you could wrap the last element in a List as #Tzach Zohar has mentioned.
a::b::c::a1::d::List(e)
Otherwise use
List(a,b,c,a1,d,e)
Yes. If you really want to use that syntax:
List(a::b::c::a1::d::e::Nil: _*)
You need the : _* at the end because otherwise you are passing a single element (of type List) to List.apply() and it is not interpreting it as a sequence, which is why you get List[List[List]] instead of the desired List[List].

Cartesian product of two lists

Given a map where a digit is associated to several characters
scala> val conversion = Map("0" -> List("A", "B"), "1" -> List("C", "D"))
conversion: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] =
Map(0 -> List(A, B), 1 -> List(C, D))
I want to generate all possible character sequences based on a sequence of digits. Examples:
"00" -> List("AA", "AB", "BA", "BB")
"01" -> List("AC", "AD", "BC", "BD")
I can do this with for comprehensions
scala> val number = "011"
number: java.lang.String = 011
Create a sequence of possible characters per index
scala> val values = number map { case c => conversion(c.toString) }
values: scala.collection.immutable.IndexedSeq[List[java.lang.String]] =
Vector(List(A, B), List(C, D), List(C, D))
Generate all the possible character sequences
scala> for {
| a <- values(0)
| b <- values(1)
| c <- values(2)
| } yield a+b+c
res13: List[java.lang.String] = List(ACC, ACD, ADC, ADD, BCC, BCD, BDC, BDD)
Here things get ugly and it will only work for sequences of three digits. Is there any way to achieve the same result for any sequence length?
The following suggestion is not using a for-comprehension. But I don't think it's a good idea after all, because as you noticed you'd be tied to a certain length of your cartesian product.
scala> def cartesianProduct[T](xss: List[List[T]]): List[List[T]] = xss match {
| case Nil => List(Nil)
| case h :: t => for(xh <- h; xt <- cartesianProduct(t)) yield xh :: xt
| }
cartesianProduct: [T](xss: List[List[T]])List[List[T]]
scala> val conversion = Map('0' -> List("A", "B"), '1' -> List("C", "D"))
conversion: scala.collection.immutable.Map[Char,List[java.lang.String]] = Map(0 -> List(A, B), 1 -> List(C, D))
scala> cartesianProduct("01".map(conversion).toList)
res9: List[List[java.lang.String]] = List(List(A, C), List(A, D), List(B, C), List(B, D))
Why not tail-recursive?
Note that above recursive function is not tail-recursive. This isn't a problem, as xss will be short unless you have a lot of singleton lists in xss. This is the case, because the size of the result grows exponentially with the number of non-singleton elements of xss.
I could come up with this:
val conversion = Map('0' -> Seq("A", "B"), '1' -> Seq("C", "D"))
def permut(str: Seq[Char]): Seq[String] = str match {
case Seq() => Seq.empty
case Seq(c) => conversion(c)
case Seq(head, tail # _*) =>
val t = permut(tail)
conversion(head).flatMap(pre => t.map(pre + _))
}
permut("011")
I just did that as follows and it works
def cross(a:IndexedSeq[Tree], b:IndexedSeq[Tree]) = {
a.map (p => b.map( o => (p,o))).flatten
}
Don't see the $Tree type that am dealing it works for arbitrary collections too..

Expand a Set[Set[String]] into Cartesian Product in Scala

I have the following set of sets. I don't know ahead of time how long it will be.
val sets = Set(Set("a","b","c"), Set("1","2"), Set("S","T"))
I would like to expand it into a cartesian product:
Set("a&1&S", "a&1&T", "a&2&S", ..., "c&2&T")
How would you do that?
I think I figured out how to do that.
def combine(acc:Set[String], set:Set[String]) = for (a <- acc; s <- set) yield {
a + "&" + s
}
val expanded = sets.reduceLeft(combine)
expanded: scala.collection.immutable.Set[java.lang.String] = Set(b&2&T, a&1&S,
a&1&T, b&1&S, b&1&T, c&1&T, a&2&T, c&1&S, c&2&T, a&2&S, c&2&S, b&2&S)
Nice question. Here's one way:
scala> val seqs = Seq(Seq("a","b","c"), Seq("1","2"), Seq("S","T"))
seqs: Seq[Seq[java.lang.String]] = List(List(a, b, c), List(1, 2), List(S, T))
scala> val seqs2 = seqs.map(_.map(Seq(_)))
seqs2: Seq[Seq[Seq[java.lang.String]]] = List(List(List(a), List(b), List(c)), List(List(1), List(2)), List(List(S), List(T)))
scala> val combined = seqs2.reduceLeft((xs, ys) => for {x <- xs; y <- ys} yield x ++ y)
combined: Seq[Seq[java.lang.String]] = List(List(a, 1, S), List(a, 1, T), List(a, 2, S), List(a, 2, T), List(b, 1, S), List(b, 1, T), List(b, 2, S), List(b, 2, T), List(c, 1, S), List(c, 1, T), List(c, 2, S), List(c, 2, T))
scala> combined.map(_.mkString("&"))
res11: Seq[String] = List(a&1&S, a&1&T, a&2&S, a&2&T, b&1&S, b&1&T, b&2&S, b&2&T, c&1&S, c&1&T, c&2&S, c&2&T)
Came after the batle ;) but another one:
sets.reduceLeft((s0,s1)=>s0.flatMap(a=>s1.map(a+"&"+_)))
Expanding on dsg's answer, you can write it more clearly (I think) this way, if you don't mind the curried function:
def combine[A](f: A => A => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => x.view flatMap { y map f(_) } }
Another alternative (slightly longer, but much more readable):
def combine[A](f: (A, A) => A)(xs:Iterable[Iterable[A]]) =
xs reduceLeft { (x, y) => for (a <- x.view; b <- y) yield f(a, b) }
Usage:
combine[String](a => b => a + "&" + b)(sets) // curried version
combine[String](_ + "&" + _)(sets) // uncurried version
Expanding on #Patrick's answer.
Now it's more general and lazier:
def combine[A](f:(A, A) => A)(xs:Iterable[Iterable[A]]) =
xs.reduceLeft { (x, y) => x.view.flatMap {a => y.map(f(a, _)) } }
Having it be lazy allows you to save space, since you don't store the exponentially many items in the expanded set; instead, you generate them on the fly. But, if you actually want the full set, you can still get it like so:
val expanded = combine{(x:String, y:String) => x + "&" + y}(sets).toSet