I want to create a Scala sequence comprising tuples. The input is a text file like this:
A
B
C
D
E
I'm looking for an elegant way to construct "lagged" tuples like this:
(A, B), (B, C), (C, D), (D, E)
The easiest way to do this is by using the tail and zip:
val xs = Seq('A', 'B', 'C', 'D', 'E')
xs zip xs.tail
If efficiency is a concern (i.e. you don't want to create an extra intermediate sequence by calling tail and the Seq you use are not Lists, meaning that tail takes O(n)) then you can use views:
xs zip xs.view.tail
I'm not quite sure how elegant it is, but this will work for at least all lists of more than 1 element:
val l = List('A,'B,'C,'D,'E,'F)
val tupled = l.sliding(2).map{case x :: y :: Nil => (x,y)}
tupled.toList
// res8: List[(Symbol, Symbol)] = List(('A,'B), ('B,'C), ('C,'D), ('D,'E), ('E,'F))
If you want something more elegant than that, I'd advise you look at Shapeless for nice ways to convert between lists and tuples.
Related
Using two Lists, element wise multiplication of these lists and sum of resultant list can be calculated in following way.
(List1 , List2).zipped.foldLeft(0.0) { case (a, (b, c)) => a + b * c }
How can I preform this operation for two iterators in Scala in an optimal and fast way?
(iterator1 zip iterator2).foldLeft(0.0) { case (a, (b, c)) => a + b * c }
is okay I suppose. If you want to squeeze the last bit of performance out of it, use arrays and a while loop.
You can use this piece of code that should work with any collection and any numeric type.
It tries to be efficient by doing everything in one traversal. However, as #Martijn said, if you need it to be the most efficient solution then just use plain Arrays of a primitive type like Int or Double and a while.
def dotProduct[N : Numeric](l1: IterableOnce[N], l2: IterableOnce[N]): N =
l1.iterator.zip(l2).map {
case (x, y) => Numeric[N].times(x, y)
}.sum
(note: this code is intended for 2.13+, for 2.12- you may use Iterable instead of IterableOnce)
I have a paired RDD that looks like
(a1, (a2, a3))
(b1, (b2, b3))
...
I want to flatten the values to obtain
(a1, a2, a3)
(b1, b2, b3)
...
Currently I'm doing
rddData.map(x => (x._1, x._2._1, x._2._2))
Is there a better way of performing the conversion? The above solution gets ugly if value contains many elements instead of just 2.
When I'm trying to avoid all the ugly underscore number stuff that comes with tuple manipulation I like to use case notation:
rddData.map { case (a, (b, c)) => (a, b, c) }
You can also give your variables meaningful names to make your code self documenting and the use of curly braces means you have fewer nested parentheses.
EDIT:
The map { case ... } pattern is pretty compact and can be used for surprisingly deep nested tuples as long as the structure is known at compile time. If you absolutely, positively cannot know the structure of the tuple at compile time, then here is some hacky, slow code that, probably, can flatten any arbitrarily nested tuple... as long as there are no more than 23 elements in total. It works by recursivly converting each element of the tuple to a list, flatmap-ing it to a single list, then using scary reflection to convert the list back into a tuple as seen here.
def flatten(b:Product): List[Any] = {
b.productIterator.toList.flatMap {
case x: Product => flatten(x)
case y: Any => List(y)
}
}
def toTuple[Any](as:List[Any]):Product = {
val tupleClass = Class.forName("scala.Tuple" + as.size)
tupleClass.getConstructors.apply(0).newInstance(as.map(_.asInstanceOf[AnyRef]):_*).asInstanceOf[Product]
}
rddData.map(t => toTuple(flatten(t)))
There is no better way. The 1st answer is equivalent to:
val abc2 = xyz.map{ case (k, v) => (k, v._1, v._2) }
which is equivalent to your own example.
If I wan't to see if each element in a list corresponds correctly to an element of the same index in another list, could I use forall to do this? For example something like
val p=List(2,4,6)
val q=List(1,2,3)
p.forall(x=>x==q(x)/2)
I understand that the x isn't an index of of q, and thats the problem I'm having, is there any way to make this work?
The most idiomatic way to handle this situation would be to zip the two lists:
scala> p.zip(q).forall { case (x, y) => x == y * 2 }
res0: Boolean = true
You could also use zipped, which can be slightly more efficient in some situations, as well as letting you be a bit more concise (or maybe just obfuscated):
scala> (p, q).zipped.forall(_ == _ * 2)
res1: Boolean = true
Note that both of these solutions will silently ignore extra elements if the lists don't have the same length, which may or may not be what you want.
Your best bet is probably to use zip
p.zip(q).forall{case (fst, snd) => fst == snd * 2}
Sequences from scala collection library have corresponds method which does exactly what you need:
p.corresponds(q)(_ == _ * 2)
It will return false if p and q are of different length.
Scala offers a List#flatten method for going from List[Option[A]] to List[A].
scala> val list = List(Some(10), None)
list: List[Option[Int]] = List(Some(10), None)
scala> list.flatten
res11: List[Int] = List(10)
I attempted to implement it in Haskell:
flatten :: [Maybe a] -> [a]
flatten xs = map g $ xs >>= f
f :: Maybe a -> [Maybe a]
f x = case x of Just _ -> [x]
Nothing -> []
-- partial function!
g :: Maybe a -> a
g (Just x) = x
However I don't like the fact that g is a partial, i.e. non-total, function.
Is there a total way to write such flatten function?
Your flatten is the same as catMaybes (link) which is defined like this:
catMaybes :: [Maybe a] -> [a]
catMaybes ls = [x | Just x <- ls]
The special syntax Just x <- ls in a list comprehension means to draw an element from ls and discard it if it is not a Just. Otherwise assign x by pattern matching the value against Just x.
A slight modification of the code you have will do the trick:
flatten :: [Maybe a] -> [a]
flatten xs = xs >>= f
f :: Maybe a -> [a]
f x = case x of Just j -> [j]
Nothing -> []
If we extract the value inside of the Just constructor in f, we avoid g altogether.
Incidentally, f already exists as maybeToList and flatten is called catMaybes, both in Data.Maybe.
One could quite easily write a simple recursive function which goes through a list and rejects all the Nothings from the Maybe monad. Here's how I'd do it as a recursive sequence:
flatten :: [Maybe a] -> [a]
flatten [] = []
flatten (Nothing : xs) = flatten xs
flatten (Just x : xs) = x : flatten xs
However, it may be clearer to write it as a fold:
flatten :: [Maybe a] -> [a]
flatten = foldr go []
where go Nothing xs = xs
go (Just x) xs = x : xs
Or, we could use a blindingly elegant solution thanks to #user2407038, which I'd recommend playing around with in GHCi to work out the individual functions' jobs:
flatten :: [Maybe a] -> [a]
flatten = (=<<) (maybe [] (:[])
And it's faster, folded brother:
flatten :: [Maybe a] -> [a]
flatten = foldr (maybe id (:))
Your solution is halfway there. My suggestion if to rewrite your function f to use pattern matching (like my temporary go function), and enclose it in a where statement to keep relevant functions in one place. You've got to remember the differences in function syntax within scala and Haskell.
The big problem you're having is you don't know the differences I've mentioned. Your g function can use pattern matching with multiple patterns:
g :: Maybe a -> [a]
g (Just x) = [x]
g Nothing = []
There you go: your g function is now what you call 'complete', though more accurately, it would be said to have exhaustive patterns.
You can find more about function syntax here.
Suppose I'm doing something like the following:
val a = complicatedChainOfSteps("c")
val b = complicatedChainOfSteps("d")
I'm interested in writing code like the following (to reduce code and copy/paste errors):
val Seq(a, b) = Seq("c", "d").map(complicatedChainOfSteps(_))
but having the compiler ensure that the number of elements matches, so the following don't compile:
val Seq(a, b) = Seq("c", "d", "e").map(s => s + s)
val Seq(a, b) = Seq("c").map(s => s + s)
I know that using tuples instead to ensure that the number of elements matches works when performing multiple assignment (e.g., val (a, b) = ("c", "d")), but you cannot map over tuples (which makes sense because they have heterogeneous types).
I also know I can just define my own types for sequence of 2 elements and sequence of 3 elements or whatever, but is there a convenient built in way of doing this? If not, what's the simplest way to define a type that is a sequence of a specific number of elements?