I'm trying to learn Scala and tried to write a sequence comprehension that extracts unigrams, bigrams and trigrams from a sequence. E.g., [1,2,3,4] should be transformed to (not Scala syntax)
[1; _,1; _,_,1; 2; 1,2; _,1,2; 3; 2,3; 1,2,3; 4; 3,4; 2,3,4]
In Scala 2.8, I tried the following:
def trigrams(tokens : Seq[T]) = {
var t1 : Option[T] = None
var t2 : Option[T] = None
for (t3 <- tokens) {
yield t3
yield (t2,t3)
yield (t1,t2,Some(t3))
t1 = t2
t2 = t3
}
}
But this doesn't compile as, apparently, only one yield is allowed in a for-comprehension (no block statements either). Is there any other elegant way to get the same behavior, with only one pass over the data?
You can't have multiple yields in a for loop because for loops are syntactic sugar for the map (or flatMap) operations:
for (i <- collection) yield( func(i) )
translates into
collection map {i => func(i)}
Without a yield at all
for (i <- collection) func(i)
translates into
collection foreach {i => func(i)}
So the entire body of the for loop is turned into a single closure, and the presence of the yield keyword determines whether the function called on the collection is map or foreach (or flatMap). Because of this translation, the following are forbidden:
Using imperative statements next to a yield to determine what will be yielded.
Using multiple yields
(Not to mention that your proposed verison will return a List[Any] because the tuples and the 1-gram are all of different types. You probably want to get a List[List[Int]] instead)
Try the following instead (which put the n-grams in the order they appear):
val basis = List(1,2,3,4)
val slidingIterators = 1 to 4 map (basis sliding _)
for {onegram <- basis
ngram <- slidingIterators if ngram.hasNext}
yield (ngram.next)
or
val basis = List(1,2,3,4)
val slidingIterators = 1 to 4 map (basis sliding _)
val first=slidingIterators head
val buf=new ListBuffer[List[Int]]
while (first.hasNext)
for (i <- slidingIterators)
if (i.hasNext)
buf += i.next
If you prefer the n-grams to be in length order, try:
val basis = List(1,2,3,4)
1 to 4 flatMap { basis sliding _ toList }
scala> val basis = List(1, 2, 3, 4)
basis: List[Int] = List(1, 2, 3, 4)
scala> val nGrams = (basis sliding 1).toList ::: (basis sliding 2).toList ::: (basis sliding 3).toList
nGrams: List[List[Int]] = ...
scala> nGrams foreach (println _)
List(1)
List(2)
List(3)
List(4)
List(1, 2)
List(2, 3)
List(3, 4)
List(1, 2, 3)
List(2, 3, 4)
I guess I should have given this more thought.
def trigrams(tokens : Seq[T]) : Seq[(Option[T],Option[T],T)] = {
var t1 : Option[T] = None
var t2 : Option[T] = None
for (t3 <- tokens)
yield {
val tri = (t1,t2,t3)
t1 = t2
t2 = Some(t3)
tri
}
}
Then extract the unigrams and bigrams from the trigrams. But can anyone explain to me why 'multi-yields' are not permitted, and if there's any other way to achieve their effect?
val basis = List(1, 2, 3, 4)
val nGrams = basis.map(x => (x)) ::: (for (a <- basis; b <- basis) yield (a, b)) ::: (for (a <- basis; b <- basis; c <- basis) yield (a, b, c))
nGrams: List[Any] = ...
nGrams foreach (println(_))
1
2
3
4
(1,1)
(1,2)
(1,3)
(1,4)
(2,1)
(2,2)
(2,3)
(2,4)
(3,1)
(3,2)
(3,3)
(3,4)
(4,1)
(4,2)
(4,3)
(4,4)
(1,1,1)
(1,1,2)
(1,1,3)
(1,1,4)
(1,2,1)
(1,2,2)
(1,2,3)
(1,2,4)
(1,3,1)
(1,3,2)
(1,3,3)
(1,3,4)
(1,4,1)
(1,4,2)
(1,4,3)
(1,4,4)
(2,1,1)
(2,1,2)
(2,1,3)
(2,1,4)
(2,2,1)
(2,2,2)
(2,2,3)
(2,2,4)
(2,3,1)
(2,3,2)
(2,3,3)
(2,3,4)
(2,4,1)
(2,4,2)
(2,4,3)
(2,4,4)
(3,1,1)
(3,1,2)
(3,1,3)
(3,1,4)
(3,2,1)
(3,2,2)
(3,2,3)
(3,2,4)
(3,3,1)
(3,3,2)
(3,3,3)
(3,3,4)
(3,4,1)
(3,4,2)
(3,4,3)
(3,4,4)
(4,1,1)
(4,1,2)
(4,1,3)
(4,1,4)
(4,2,1)
(4,2,2)
(4,2,3)
(4,2,4)
(4,3,1)
(4,3,2)
(4,3,3)
(4,3,4)
(4,4,1)
(4,4,2)
(4,4,3)
(4,4,4)
You could try a functional version without assignments:
def trigrams[T](tokens : Seq[T]) = {
val s1 = tokens.map { Some(_) }
val s2 = None +: s1
val s3 = None +: s2
s1 zip s2 zip s3 map {
case ((t1, t2), t3) => (List(t1), List(t1, t2), List(t1, t2, t3))
}
}
Related
I have two lists, namely
val a = List(1,2,3)
val b = List(4,5)
I want to perform N to N bipartite mapping and want to get output
List((1,4),(1,5),(2,4),(2,5),(3,4),(3,5))
How can I do this?
Assuming that B = List(4,5), then you can use for comprehensions to achieve your goal:
val A = List(1,2,3)
val B = List(4,5)
val result = for(a <- A; b <- B) yield {
(a, b)
}
The output is
result:List[(Int, Int)] = List((1,4), (1,5), (2,4), (2,5), (3,4), (3,5))
Consider also
a.flatMap(x => b.map(y => (x,y)))
though not so concise as a for comprehension.
Given a List of Int and variable X of Int type . What is the best in Scala functional way to retain only those values in the List (starting from beginning of list) such that sum of list values is less than equal to variable.
This is pretty close to a one-liner:
def takeWhileLessThan(x: Int)(l: List[Int]): List[Int] =
l.scan(0)(_ + _).tail.zip(l).takeWhile(_._1 <= x).map(_._2)
Let's break that into smaller pieces.
First you use scan to create a list of cumulative sums. Here's how it works on a small example:
scala> List(1, 2, 3, 4).scan(0)(_ + _)
res0: List[Int] = List(0, 1, 3, 6, 10)
Note that the result includes the initial value, which is why we take the tail in our implementation.
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail
res1: List[Int] = List(1, 3, 6, 10)
Now we zip the entire thing against the original list. Taking our example again, this looks like the following:
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail.zip(List(1, 2, 3, 4))
res2: List[(Int, Int)] = List((1,1), (3,2), (6,3), (10,4))
Now we can use takeWhile to take as many values as we can from this list before the cumulative sum is greater than our target. Let's say our target is 5 in our example:
scala> res2.takeWhile(_._1 <= 5)
res3: List[(Int, Int)] = List((1,1), (3,2))
This is almost what we want—we just need to get rid of the cumulative sums:
scala> res2.takeWhile(_._1 <= 5).map(_._2)
res4: List[Int] = List(1, 2)
And we're done. It's worth noting that this isn't very efficient, since it computes the cumulative sums for the entire list, etc. The implementation could be optimized in various ways, but as it stands it's probably the simplest purely functional way to do this in Scala (and in most cases the performance won't be a problem, anyway).
In addition to Travis' answer (and for the sake of completeness), you can always implement these type of operations as a foldLeft:
def takeWhileLessThanOrEqualTo(maxSum: Int)(list: Seq[Int]): Seq[Int] = {
// Tuple3: the sum of elements so far; the accumulated list; have we went over x, or in other words are we finished yet
val startingState = (0, Seq.empty[Int], false)
val (_, accumulatedNumbers, _) = list.foldLeft(startingState) {
case ((sum, accumulator, finished), nextNumber) =>
if(!finished) {
if (sum + nextNumber > maxSum) (sum, accumulator, true) // We are over the sum limit, finish
else (sum + nextNumber, accumulator :+ nextNumber, false) // We are still under the limit, add it to the list and sum
} else (sum, accumulator, finished) // We are in a finished state, just keep iterating over the list
}
accumulatedNumbers
}
This only iterates over the list once, so it should be more efficient, but is more complicated and requires a bit of reading code to understand.
I will go with something like this, which is more functional and should be efficient.
def takeSumLessThan(x:Int,l:List[Int]): List[Int] = (x,l) match {
case (_ , List()) => List()
case (x, _) if x<= 0 => List()
case (x, lh :: lt) => lh :: takeSumLessThan(x-lh,lt)
}
Edit 1 : Adding tail recursion and implicit for shorter call notation
import scala.annotation.tailrec
implicit class MyList(l:List[Int]) {
def takeSumLessThan(x:Int) = {
#tailrec
def f(x:Int,l:List[Int],acc:List[Int]) : List[Int] = (x,l) match {
case (_,List()) => acc
case (x, _ ) if x <= 0 => acc
case (x, lh :: lt ) => f(x-lh,lt,acc ++ List(lh))
}
f(x,l,Nil)
}
}
Now you can use this like
List(1,2,3,4,5,6,7,8).takeSumLessThan(10)
Let's say I want to create all possible combinations of letters "a" and "b". For combinations of length 2 using for-comprehensions it will be:
for {
x <- Seq("a", "b")
y <- Seq("a", "b")
} yield x + y
And for combinations of length 3 it will be:
for {
x <- Seq("a", "b")
y <- Seq("a", "b")
z <- Seq("a", "b")
} yield x + y + z
Pretty similar. Is this possible to abstract this pattern and write generic function?
I can think of such signature:
def genericCombine[A,B](length: Int, elements: Seq[A])(reducer: Seq[A] => B): Seq[B]
How can I parametrize number of generators used in for comprehension?
This is more like permutations with replacement than combinations, and a recursive implementation is fairly straightforward:
def select(n: Int)(input: List[String]): List[String] =
if (n == 1) input else for {
c <- input
s <- select(n - 1)(input)
} yield c + s
Which works as expected:
scala> select(2)(List("a", "b"))
res0: List[String] = List(aa, ab, ba, bb)
scala> select(3)(List("a", "b"))
res1: List[String] = List(aaa, aab, aba, abb, baa, bab, bba, bbb)
(You should of course check for invalid input in a real application.)
This question discusses how to interleave two lists in an alternating fashion, i.e. intercalate them.
What is the inverse of "intercalate" called?
Is there an idiomatic way to implement this in Scala?
The topic is discussed on this Haskell IRC session.
Possibilities include "deintercalate", "extracalate", "ubercalate", "outercalate", and "chocolate" ;-)
Assuming we go for "extracalate", it can be implemented as a fold:
def extracalate[A](a: List[A]) =
a.foldRight((List[A](), List[A]())){ case (b, (a1,a2)) => (b :: a2, a1) }
For example:
val mary = List("Mary", "had", "a", "little", "lamb")
extracalate(mary)
//> (List(Mary, a, lamb),List(had, little)
Note that the original lists can only be reconstructed if either:
the input lists were the same length, or
the first list was 1 longer than the second list
The second case actually turns out to be useful for the geohashing algorithm, where the latitude bits and longitude bits are intercalated, but there may be an odd number of bits.
Note also that the definition of intercalate in the linked question is different from the definition in the Haskell libraries, which intersperses a list in between a list of lists!
Update: As for any fold, we supply a starting value and a function to apply to each value of the input list. This function modifies the starting value and passes it to the next step of the fold.
Here, we start with a pair of empty output lists: (List[A](), List[A]())
Then for each element in the input list, we add it onto the front of one of the output lists using cons ::. However, we also swap the order of the two output lists , each time the function is invoked; (a1, a2) becomes (b :: a2, a1). This divides the input list between the two output lists in alternating fashion. Because it's a right fold, we start at the end of the input list, which is necessary to get each output list in the correct order. Proceeding from the starting value to the final value, we would get:
([], [])
([lamb], [])
([little],[lamb])
([a, lamb],[little])
([had, little],[a, lamb])
([Mary, a, lamb],[had, little])
Also, using standard methods
val mary = List("Mary", "had", "a", "little", "lamb")
//> mary : List[String] = List(Mary, had, a, little, lamb)
val (f, s) = mary.zipWithIndex.partition(_._2 % 2 == 0)
//> f : List[(String, Int)] = List((Mary,0), (a,2), (lamb,4))
//| s : List[(String, Int)] = List((had,1), (little,3))
(f.unzip._1, s.unzip._1)
//> res0: (List[String], List[String]) = (List(Mary, a, lamb),List(had, little))
Not really recommending it, though, the fold will beat it hands down on performance
Skinning the cat another way
val g = mary.zipWithIndex.groupBy(_._2 % 2)
//> g : scala.collection.immutable.Map[Int,List[(String, Int)]] = Map(1 -> List
//| ((had,1), (little,3)), 0 -> List((Mary,0), (a,2), (lamb,4)))
(g(0).unzip._1, g(1).unzip._1)
//> res1: (List[String], List[String]) = (List(Mary, a, lamb),List(had, little))
Also going to be slow
I think it's inferior to #DNA's answer as it's more code and it requires passing through the list twice.
scala> list
res27: List[Int] = List(1, 2, 3, 4, 5)
scala> val first = list.zipWithIndex.filter( x => x._1 % 2 == 1).map(x => x._2)
first: List[Int] = List(0, 2, 4)
scala> val second = list.zipWithIndex.filter( x => x._1 % 2 == 0).map(x => x._2)
second: List[Int] = List(1, 3)
scala> (first, second)
res28: (List[Int], List[Int]) = (List(0, 2, 4),List(1, 3))
In ML, one can assign names for each element of a matched pattern:
fun findPair n nil = NONE
| findPair n (head as (n1, _))::rest =
if n = n1 then (SOME head) else (findPair n rest)
In this code, I defined an alias for the first pair of the list and matched the contents of the pair. Is there an equivalent construct in Scala?
You can do variable binding with the # symbol, e.g.:
scala> val wholeList # List(x, _*) = List(1,2,3)
wholeList: List[Int] = List(1, 2, 3)
x: Int = 1
I'm sure you'll get a more complete answer later as I'm not sure how to write it recursively like your example, but maybe this variation would work for you:
scala> val pairs = List((1, "a"), (2, "b"), (3, "c"))
pairs: List[(Int, String)] = List((1,a), (2,b), (3,c))
scala> val n = 2
n: Int = 2
scala> pairs find {e => e._1 == n}
res0: Option[(Int, String)] = Some((2,b))
OK, next attempt at direct translation. How about this?
scala> def findPair[A, B](n: A, p: List[Tuple2[A, B]]): Option[Tuple2[A, B]] = p match {
| case Nil => None
| case head::rest if head._1 == n => Some(head)
| case _::rest => findPair(n, rest)
| }
findPair: [A, B](n: A, p: List[(A, B)])Option[(A, B)]