I'm trying to calculate the vector product between two vector using the map and reduce functions.
Let's see what happens in the REPL of Scala:
First of all I define 2 vectors with same length
scala> val v1 = Array(1,4,5,2)
v1: Array[Int] = Array(1, 4, 5, 2)
scala> val v2 = Array (3,5,1,5)
v2: Array[Int] = Array(3, 5, 1, 5)
Now I create a new array vecZip using the zip function
scala> val vecZip = v1 zip v2
vecZip: Array[(Int, Int)] = Array((1,3), (4,5), (5,1), (2,5))
Now I'd like to apply the reduce method
(to do the product of each tuple) for each element of this array.
I thought this:
val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
I want to get a list (vecToSum) where apply the reduce method to calculate the total result. However I get this error:
scala> val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
<console>:10: error: value * is not a member of (Int, Int)
val vecToSum = vecZip.map(x=>(List(x).reduce(_*_)))
^
You just need to call map and multiply the tuples values with each other, like this:
val vecToSum = vecZip.map(x => x._1 * x._2)
vecToSum is a List of tuples, so x is a Tuple of (Int, Int). Therefore if you call List(x).reduce(...), you're creating a List with the only value being the tuple, so that's not really what you want.
What your code is actually trying to do is it creates a list of a single tuple element, and then tries to reduce it. It would never work this way, as there is nothing to reduce - there is already single element in a list - a tuple.
Instead you need to map your vecZip array elements (tuples) via multiplying their elements:
vecZip.map { case (x, y) => x * y }
You don't need to reduce here. Reducing an Array[(Int, Int)] would mean performing some associative binary operation on all tuples inside the array. Note that it could be performing the operation on the first couple of tuples, then on the result of that and the third tuple, then on the result of that and the fourth tuple etc. but also, due to associativity, it could perform the operation on first and second tuple, simultaneously on third and fourth tuple, and then on their results etc., which is nice for parallelization (and frameworks such as Spark rely on it heavily)).
For example you could sum all first elements and all second elements of each tuple:
val reduced = vecZip.reduce((pair1, pair2) => (pair1._1 + pair2._1, pair1._2 + pair2._2))
What you want however is to simply map each tuple into the product of its elements:
val vecToSum = vecZip.map { case (x, y) => x * y }
Note that I used the partial function (see that case over there) in order to perform pattern matching on the tuple; without the partial function it would look like this:
val vecToSum = vecZip.map(tuple => tuple._1 * tuple._2)
Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.
Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.
Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))
Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.
Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")
retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)
The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))
val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)
In scala, what is a good way to loop through a linked list(scala.collection.mutable.LinkedList) of objects? For example, I want to have 'for' loop traverse through each object on the linked list and process it.
With foreach:
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) Client VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val ll = scala.collection.mutable.LinkedList[Int](1,2,3)
ll: scala.collection.mutable.LinkedList[Int] = LinkedList(1, 2, 3)
scala> ll.foreach(i => println(i * 2))
2
4
6
or, if your processing of each object returns a new value, use map:
scala> ll.map(_ * 2)
res3: scala.collection.mutable.LinkedList[Int] = LinkedList(2, 4, 6)
Some people prefer for comprehensions instead of foreach and map. They look like this:
scala> for (i <- ll) println(i)
1
2
3
scala> for (i <- ll) yield i * 2
res5: scala.collection.mutable.LinkedList[Int] = LinkedList(2, 4, 6)
To expand on the previous answer...
for, foreach and map are all higher-order functions - they can all take a function as an argument, so starting here:
val list = List(1,2,3)
list.foreach(i => println(i * 2))
You have a number of ways that you can make the code more declarative in nature, and cleaner at the same time.
First, you don't really need to use the name - i - for each member of the collection, you can use _ as a placeholder instead:
list.foreach(println(_ * 2))
You can also separate the logic out into a distinct method, and continue to use placeholder syntax:
def printTimesTwo(i:Int) = println(i * 2)
list.foreach(printTimesTwo(_))
Even cleaner, just pass the raw function without specifying parameters (look ma, no placeholders!)
list.foreach(printTimesTwo)
And to take it to a logical conclusion, this can be made cleaner still by using infix syntax. Which I show here working with a standard library method. Note: you could even use a method imported from a java library, if you wanted:
list foreach println
This thinking extends to anonymous functions and partially-applied functions and also to the map operation:
// "2 *" creates an anonymous function that will double its one-and-only argument
list map { 2 * }
For-comprehensions aren't really very useful when working at this level, they just add boilerplate. But they do come into their own when working with deeper nested structures:
//a list of lists, print out all the numbers
val grid = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
grid foreach { _ foreach println } //hmm, could get confusing
for(line <- grid; cell <- line) println(cell) //that's clearer
I didn't need the yield keyword there, as nothing is being returned. But if I wanted to get back a list of Strings (un-nested):
for(line <- grid; cell <- line) yield { cell.toString }
With lots of generators, you'll want to split them over multiple lines:
for {
listOfGrids <- someMasterCollection
grid <- listOfGrids
line <- grid
cell <- line
} yield {
cell.toString
}
I understand Ruby and Python's yield. What does Scala's yield do?
I think the accepted answer is great, but it seems many people have failed to grasp some fundamental points.
First, Scala's for comprehensions are equivalent to Haskell's do notation, and it is nothing more than a syntactic sugar for composition of multiple monadic operations. As this statement will most likely not help anyone who needs help, let's try again… :-)
Scala's for comprehensions is syntactic sugar for composition of multiple operations with map, flatMap and filter. Or foreach. Scala actually translates a for-expression into calls to those methods, so any class providing them, or a subset of them, can be used with for comprehensions.
First, let's talk about the translations. There are very simple rules:
This
for(x <- c1; y <- c2; z <-c3) {...}
is translated into
c1.foreach(x => c2.foreach(y => c3.foreach(z => {...})))
This
for(x <- c1; y <- c2; z <- c3) yield {...}
is translated into
c1.flatMap(x => c2.flatMap(y => c3.map(z => {...})))
This
for(x <- c; if cond) yield {...}
is translated on Scala 2.7 into
c.filter(x => cond).map(x => {...})
or, on Scala 2.8, into
c.withFilter(x => cond).map(x => {...})
with a fallback into the former if method withFilter is not available but filter is. Please see the section below for more information on this.
This
for(x <- c; y = ...) yield {...}
is translated into
c.map(x => (x, ...)).map((x,y) => {...})
When you look at very simple for comprehensions, the map/foreach alternatives look, indeed, better. Once you start composing them, though, you can easily get lost in parenthesis and nesting levels. When that happens, for comprehensions are usually much clearer.
I'll show one simple example, and intentionally omit any explanation. You can decide which syntax was easier to understand.
l.flatMap(sl => sl.filter(el => el > 0).map(el => el.toString.length))
or
for {
sl <- l
el <- sl
if el > 0
} yield el.toString.length
withFilter
Scala 2.8 introduced a method called withFilter, whose main difference is that, instead of returning a new, filtered, collection, it filters on-demand. The filter method has its behavior defined based on the strictness of the collection. To understand this better, let's take a look at some Scala 2.7 with List (strict) and Stream (non-strict):
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> Stream.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
The difference happens because filter is immediately applied with List, returning a list of odds -- since found is false. Only then foreach is executed, but, by this time, changing found is meaningless, as filter has already executed.
In the case of Stream, the condition is not immediatelly applied. Instead, as each element is requested by foreach, filter tests the condition, which enables foreach to influence it through found. Just to make it clear, here is the equivalent for-comprehension code:
for (x <- List.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
for (x <- Stream.range(1, 10); if x % 2 == 1 && !found)
if (x == 5) found = true else println(x)
This caused many problems, because people expected the if to be considered on-demand, instead of being applied to the whole collection beforehand.
Scala 2.8 introduced withFilter, which is always non-strict, no matter the strictness of the collection. The following example shows List with both methods on Scala 2.8:
scala> var found = false
found: Boolean = false
scala> List.range(1,10).filter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
7
9
scala> found = false
found: Boolean = false
scala> List.range(1,10).withFilter(_ % 2 == 1 && !found).foreach(x => if (x == 5) found = true else println(x))
1
3
This produces the result most people expect, without changing how filter behaves. As a side note, Range was changed from non-strict to strict between Scala 2.7 and Scala 2.8.
It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too).
It is applied in combination with for and writes a new element into the resulting sequence.
Simple example (from scala-lang)
/** Turn command line arguments to uppercase */
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
The corresponding expression in F# would be
[ for a in args -> a.toUpperCase ]
or
from a in args select a.toUpperCase
in Linq.
Ruby's yield has a different effect.
Yes, as Earwicker said, it's pretty much the equivalent to LINQ's select and has very little to do with Ruby's and Python's yield. Basically, where in C# you would write
from ... select ???
in Scala you have instead
for ... yield ???
It's also important to understand that for-comprehensions don't just work with sequences, but with any type which defines certain methods, just like LINQ:
If your type defines just map, it allows for-expressions consisting of a
single generator.
If it defines flatMap as well as map, it allows for-expressions consisting
of several generators.
If it defines foreach, it allows for-loops without yield (both with single and multiple generators).
If it defines filter, it allows for-filter expressions starting with an if
in the for expression.
Unless you get a better answer from a Scala user (which I'm not), here's my understanding.
It only appears as part of an expression beginning with for, which states how to generate a new list from an existing list.
Something like:
var doubled = for (n <- original) yield n * 2
So there's one output item for each input (although I believe there's a way of dropping duplicates).
This is quite different from the "imperative continuations" enabled by yield in other languages, where it provides a way to generate a list of any length, from some imperative code with almost any structure.
(If you're familiar with C#, it's closer to LINQ's select operator than it is to yield return).
Consider the following for-comprehension
val A = for (i <- Int.MinValue to Int.MaxValue; if i > 3) yield i
It may be helpful to read it out loud as follows
"For each integer i, if it is greater than 3, then yield (produce) i and add it to the list A."
In terms of mathematical set-builder notation, the above for-comprehension is analogous to
which may be read as
"For each integer , if it is greater than , then it is a member of the set ."
or alternatively as
" is the set of all integers , such that each is greater than ."
The keyword yield in Scala is simply syntactic sugar which can be easily replaced by a map, as Daniel Sobral already explained in detail.
On the other hand, yield is absolutely misleading if you are looking for generators (or continuations) similar to those in Python. See this SO thread for more information: What is the preferred way to implement 'yield' in Scala?
Yield is similar to for loop which has a buffer that we cannot see and for each increment, it keeps adding next item to the buffer. When the for loop finishes running, it would return the collection of all the yielded values. Yield can be used as simple arithmetic operators or even in combination with arrays.
Here are two simple examples for your better understanding
scala>for (i <- 1 to 5) yield i * 3
res: scala.collection.immutable.IndexedSeq[Int] = Vector(3, 6, 9, 12, 15)
scala> val nums = Seq(1,2,3)
nums: Seq[Int] = List(1, 2, 3)
scala> val letters = Seq('a', 'b', 'c')
letters: Seq[Char] = List(a, b, c)
scala> val res = for {
| n <- nums
| c <- letters
| } yield (n, c)
res: Seq[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,b), (2,c), (3,a), (3,b), (3,c))
Hope this helps!!
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.filter(_ > 3).map(_ + 1)
println( res3 )
println( res4 )
These two pieces of code are equivalent.
val res3 = for (al <- aList) yield al + 1 > 3
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
These two pieces of code are also equivalent.
Map is as flexible as yield and vice-versa.
val doubledNums = for (n <- nums) yield n * 2
val ucNames = for (name <- names) yield name.capitalize
Notice that both of those for-expressions use the yield keyword:
Using yield after for is the “secret sauce” that says, “I want to yield a new collection from the existing collection that I’m iterating over in the for-expression, using the algorithm shown.”
taken from here
According to the Scala documentation, it clearly says "yield a new collection from the existing collection".
Another Scala documentation says, "Scala offers a lightweight notation for expressing sequence comprehensions. Comprehensions have the form for (enums) yield e, where enums refers to a semicolon-separated list of enumerators. An enumerator is either a generator which introduces new variables, or it is a filter. "
yield is more flexible than map(), see example below
val aList = List( 1,2,3,4,5 )
val res3 = for ( al <- aList if al > 3 ) yield al + 1
val res4 = aList.map( _+ 1 > 3 )
println( res3 )
println( res4 )
yield will print result like: List(5, 6), which is good
while map() will return result like: List(false, false, true, true, true), which probably is not what you intend.