Scala AST avoid traversal of Tree children - scala

I am working on a scalac plugin in which I am traversing the AST. Right now I am using a for each loop over the unit.body, which I get from the passed Global value. The issue I am facing is that due to the recursive nature of said for each traversal, I visit each Tree and its children, though I do not want to traverse the latter since I am using pattern matching to match for function calls.
So for example if I would traverse the statement:
x.do(arg1)(arg2)(arg3)
I will get the following things in my traversal:
1. x.do(arg1)(arg2)(arg3)
2. x.do(arg1)(arg2)
3. x.do(arg1)
4. x.do
5. x
6. do
7. arg1
8. arg2
9. arg3
Here I indented the Tree objects in order of traversal. Now if I would use the Apply case class to match with each of these, I would get a match out of 1,2, and 3, while I really only want 1. I thought about using a context sensitive solution (like check against the previous traversed Tree) but it is not consistent enough for my purposes. I am also not looking for specific methods, which would make matching easier, but for methods in general, but I cannot allow for these children to be counted.
Is there a better pattern I can match function calls with when writing a scalac plugin, or a better way to do the traversal, so that I do not run into this recursion issue?

I found the answer myself:
Instead of using the intuitive
for(tree <- treeObject), where treeObject : Tree
You can just use the children object in each Tree object and do the following
def func(treeObject : Tree) : Something = {
for(tree <- treeObject.children) {
if(/*your condition for further traversal*/)
func(treeObject : Tree) //RECURSE
else
someValue //do whatever and RETURN
}
}
With this approach you can set a stopping condition for traversal. So let's say you only want to look at each method invocation. A method invocation can be matched for using the Apply case class. Now you can just check whether or not your current tree matches with that pattern. If it does you do your work on it and return. If it is not traverse the Tree further by using recursion, to get possible method invocations further "down".

Related

why is the map function inherently parallel?

I was reading the following presentation:
http://www.idt.mdh.se/kurser/DVA201/slides/parallel-4up.pdf
and the author claims that the map function is built very well for parallelism (specifically he supports his claim on page 3 or slides 9 and 10).
If one were given the problem of increasing each value of a list by +1, I can see how looping through the list imperatively would require a index value to change and hence cause potential race condition problems. But I'm curious how the map function better allows a programmer to successfully code in parallel.
Is it due to the way map is recursively defined? So each function call can be thrown to a different thread?
I hoping someone can provide some specifics, thanks!
The map function applies the same pure function to n elements in a collection and aggregates the results. It doesn't matter the order in which you apply the function to the members of the collection because by definition the return value of the function is purely dependent upon the input.
The others already explained that the standard map implementation isn't parallel.
But in Scala, since you tagged it, you can get the parallel version as simply as
val list = ... // some list
list.par.map(x => ...) // instead of list.map(x => ...)
See also Parallel Collections Overview and documentation for ParIterable and other types in the scala.collection.parallel package.
You can find the implementation of the parallel map in https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/parallel/ParIterableLike.scala, if you want (look for def map and class Map). It requires very non-trivial infrastructure and certainly isn't just taking the recursive definition of sequential map and parallelizing it.
If one had defined map via a loop how would that break down?
The slides give F# parallel arrays as the example at the end and at https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs#L266 you can see the non-parallel implementation there is a loop:
let inline map (mapping: 'T -> 'U) (array:'T[]) =
checkNonNull "array" array
let res : 'U[] = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked array.Length
for i = 0 to res.Length-1 do
res.[i] <- mapping array.[i]
res

Traversing Scalaz Tree

I'm trying to understand the scalaz tree structure and am having some difficulty!
First I've defined a tree:
val tree: Tree[Int] =
1.node(
2.leaf,
3.node(
4.leaf,
5.leaf))
So far using TreeLoc I've worked out how to find the first element that matches some predicate. E.g. to find the first node where the value is 3:
tree.loc.find(x => x.getLabel == 3)
My next challenge was to try and find all nodes that match some predicate. For example I would like to find all leaf nodes (which should be pretty easy using TreeLoc and isLeaf). Unfortunately I can't for the life of me work out how to walk the tree to do this.
Edit: Sorry I don't think I was clear enough in my original question. To be clear I want to walk the tree in such a way that I have information about the Node available to me. Flatten, foldRight etc just allow me to operate on [Int] whereas I want to be able to operate on Tree[Int] (or TreeLoc[Int]).
Having a look to how find is implemented in scalaz, my suggestion is to implement something like:
implicit class FilterTreeLoc[A](treeLoc: TreeLoc[A]){
def filter(p: TreeLoc[A] => Boolean): Stream[TreeLoc[A]] =
Cobind[TreeLoc].cojoin(treeLoc).tree.flatten.filter(p)
}
It behaves like the find but it gives you back instead a Stream[TreeLoc[A]] instead of an Option[TreeLoc[A]].
You can use this as tree.loc.filter(_.isLeaf) and tree.loc.filter(_.getLabel == 3).
Note: the use of an implicit class can be obviously avoided if you prefer to have this declared as a method instead.

When should we not use "for loops" in scala?

Twitter's Effective Scala says:
"for provides both succinct and natural expression for looping and aggregation. It is especially useful when flattening many sequences. The syntax of for belies the underlying mechanism as it allocates and dispatches closures. This can lead to both unexpected costs and semantics; for example
for (item <- container) {
if (item != 2)
return
}
may cause a runtime error if the container delays computation, making the return nonlocal!
For these reasons, it is often preferrable to call foreach, flatMap, map, and filter directly — but do use fors when they clarify."
I don't understand why there can be runtime errors here.
The Twitter manual is warning that the code inside of the for may be bound inside a closure, which is (in this case) a function running inside of an execution environment that is built to capture the local environment present at the call site. You can read more about closures here - http://en.wikipedia.org/wiki/Closure_%28computer_programming%29.
So the for construct may package all of the code inside the loop into a separate function, bind the particular item and the the local environment to the closure (so that all references to variables outside of the loop are still valid inside the function), and then execute that function somewhere else, ie in another, hidden call frame.
This becomes a problem for your return, which is supposed to immediately exit the current call frame. But if your closure is executing elsewhere, then the closure function becomes the target for the return, rather than the actual surrounding method, which is what you probably meant.
Simon has a very nice answer regarding the return problem. I would like to add that in general using for loops is simply bad Scala style when foreach, map, etc would do. The exception is when you want nested foreach, map, etc like behaviour and in this case a for loop, rather, to be specific, a "for comprehension" would be appropriate.
E.g.
// Good, better than for loop
myList.foreach(println)
// Ok but might be easier to understsnd the for comprehension
(1 to N).map(i => (1 to M).map((i, _)))
// Equivilent to above and some say easier to read
for {
i <- (1 to N)
j <- (1 to M)
} yield (i, j)

Why `Source.fromFile(...).getLines()` is empty after I've iterated over it?

It was quite a surprise for me that (line <- lines) is so devastating! It completely unwinds lines iterator. So running the following snippet will make size = 0 :
val lines = Source.fromFile(args(0)).getLines()
var cnt = 0
for (line <- lines) {
cnt = readLines(line, cnt)
}
val size = lines.size
Is it a normal Scala practice to have well-hidden side-effects like this?
Source.getLines() returns an iterator. For every iterator, if you invoke a bulk operation such as foreach above, or map, take, toList, etc., then the iterator is no longer in a usable state.
That is the contract for Iterators and, more generally, classes that inherit TraversableOnce.
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
This is not the case for classes that inherit Traversable -- for those you can invoke the bulk traversal operations as many times as you want.
Source.getLines() returns an Iterator, and walking through an Iterator will mutate it. This is made quite clear in the Scala documentation
An iterator is mutable: most operations on it change its state. While it is often used to iterate through the elements of a collection, it can also be used without being backed by any collection (see constructors on the companion object).
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
Using for notation is just syntactic sugar for calling map, flatMap and foreach methods on the Iterator, which again have quite clear documentation stating not to use the iterator:
Reuse: After calling this method, one should discard the iterator it was called on, and use only the iterator that was returned. Using the old iterator is undefined, subject to change, and may result in changes to the new iterator as well.
Scala generally aims to be a 'pragmatic' language - mutation and side effects are allowed for performance and inter-operability reasons, although not encouraged. To call it 'well-hidden' is, however, something of a stretch.

Examples of using some Scala Option methods

I have read the blog post recommended me here. Now I wonder what some those methods are useful for. Can you show examples of using forall (as opposed to foreach) and toList of Option?
map: Allows you to transform a value "inside" an Option, as you probably already know for Lists. This operation makes Option a functor (you can say "endofunctor" if you want to scare your colleagues)
flatMap: Option is actually a monad, and flatMap makes it one (together with something like a constuctor for a single value). This method can be used if you have a function which turns a value into an Option, but the value you have is already "wrapped" in an Option, so flatMap saves you the unwrapping before applying the function. E.g. if you have an Option[Map[K,V]], you can write mapOption.flatMap(_.get(key)). If you would use a simple map here, you would get an Option[Option[V]], but with flatMap you get an Option[V]. This method is cooler than you might think, as it allows to chain functions together in a very flexible way (which is one reason why Haskell loves monads).
flatten: If you have a value of type Option[Option[T]], flatten turns it into an Option[T]. It is the same as flatMap(identity(_)).
orElse: If you have several alternatives wrapped in Options, and you want the first one that holds actually a value, you can chain these alternatives with orElse: steakOption.orElse(hamburgerOption).orElse(saladOption)
getOrElse: Get the value out of the Option, but specify a default value if it is empty, e.g. nameOption.getOrElse("unknown").
foreach: Do something with the value inside, if it exists.
isDefined, isEmpty: Determine if this Option holds a value.
forall, exists: Tests if a given predicate holds for the value. forall is the same as option.map(test(_)).getOrElse(true), exists is the same, just with false as default.
toList: Surprise, it converts the Option to a List.
Many of the methods on Option may be there more for the sake of uniformity (with collections) rather than for their usefulness, as they are all very small functions and so do not spare much effort, yet they serve a purpose, and their meanings are clear once you are familiar with the collection framework (as is often said, Option is like a list which cannot have more than one element).
forall checks a property of the value inside an option. If there is no value, the check pass. For example, if in a car rental, you are allowed one additionalDriver: Option[Person], you can do
additionalDriver.forall(_.hasDrivingLicense)
exactly the same thing that you would do if several additional drivers were allowed and you had a list.
toList may be a useful conversion. Suppose you have options: List[Option[T]], and you want to get a List[T], with the values of all of the options that are Some. you can do
for(option <- options; value in option.toList) yield value
(or better options.flatMap(_.toList))
I have one practical example of toList method. You can find it in scaldi (my Scala dependency injection framework) in Module.scala at line 72:
https://github.com/OlegIlyenko/scaldi/blob/f3697ecaa5d6e96c5486db024efca2d3cdb04a65/src/main/scala/scaldi/Module.scala#L72
In this context getBindings method can return either Nil or List with only one element. I can retrieve it as Option with discoverBinding. I find it convenient to be able to convert Option to List (that either empty or has one element) with toList method.