How can I use Stream to traverse a tree in Scala? - scala

I have a simple file system abstraction:
trait PathItem { val label: String }
case class PathEnd(label: String, uri: String) extends PathItem
case class PathDirectory(
label: String = "",
contents: List[PathItem] = List.empty[PathItem]
) extends PathItem
With this structure I can build up an arbitrarily complex tree of subdirectories (PathDirectory) and files (PathEnd).
How could I use Scala Streams to extract a list of the "files" something like this:
getFileStream( rootDir ).foreach( f => println(f.uri) )
getFileStream( rootDir ).find( _.uri == "someTargetURI" )
// where getFileStream creates a Stream[PathEnd] given a starting rootDir
Passing through the tree like this would be kinda cool, but I'm not understanding how to create a Stream for this from the scaladoc.
(I know I can just write a simple recursive function, but I'm trying to grok Streams here.)

As mentioned in comments, you can essentially treat a Stream the same as you would a List and you'll get the desired lazily evaluated sequence. Your solution:
def fileStream(p: PathItem): Stream[PathEnd] = {
p match {
case pe: PathEnd => Stream(pe)
case pd: PathDirectory => pd.contents.toStream.flatMap(fileStream)
}
}
Note the flatMap to avoid creating a Stream of Stream instances.
Test:
scala> val pd = PathDirectory(root,List(
PathDirectory("src",List(PathDirectory("main",List(PathEnd("file.scala","file.uri"))))),
PathDirectory("test",List(PathDirectory("main",List(PathEnd("test.scala","test.uri")))))))
scala> fileStream(pd).foreach(println)
PathEnd(file.scala,file.uri)
PathEnd(test.scala,test.uri)

Related

Scala: for-comprehension for chain of operations

I have a task to transform the following code-block:
val instance = instanceFactory.create
val result = instance.ackForResult
to for-comprehension expression.
As for-comprehension leans on enumeration of elements, I tried to get around it with wrapper class:
case class InstanceFactoryWrapper(value:InstanceFactory) {
def map(f: InstanceFactory => Instance): Instance
= value.create()
}
where map-method must handle only one element and return a single result: Instance
I tested this approach with this expression:
for {
mediationApi <- InstanceFactoryWrapper(instanceFactoryWrapper)
}
But it does't work: IDEA recommends me to use foreach in this part. But "foreach" doesn't return anything, as opposed to map.
What am I doing wrong?
Simply put when working with List\Option\Either or other lang types comprehensions are useful to transform nested map\flatMap\withFilter into sequences.
Use custom classes in for-comprehension
But what about your own classes or other 3rd party ones?
You need to implement monadic operations in order to use them in for-comprehensions.
The bare minimum: map and flatMap.
Take the following example with a custom Config class:
case class Config[T](content: T) {
def flatMap[S](f: T => Config[S]): Config[S] =
f(content)
def map[S](f: T => S): Config[S] =
this.copy(content = f(content))
}
for {
first <- Config("..")
_ = println("Going through a test")
second <- Config(first + "..")
third <- Config(second + "..")
} yield third
This is how you enable for-comprehension.

Write a class support for yield keywords in Scala

How can I make a class support for keywords in scala?
e.g:
class A(data: String) {
...
}
val a = A("I'm A")
for {
data <- a
} yield {
data
}
Thanks
The compiler rewrites all for comprehensions into the necessary constituent parts: map(), flatMap(), withFilter(), foreach(). That's why many Scala syntax rules are suspended inside the for comprehension, e.g. can't create variables in the standard fashion, val x = 2, and can't throw in println() statements.
In your example, this will work.
class A(data: String) {
def map[B](f: (String) => B) = f(data)
}
val a = new A("I'm A")
for {
data <- a
} yield {
data
} // res0: String = I'm A
But note that if you have multiple generators (the <- is a generator) then only the final one is turned into a map() call. The previous generators are all flatMap() calls.
If your for comprehension includes an if condition then you'll need a withFilter() as well.
I recommend avoiding for comprehensions until you have a good feel for how they work.

Pattern matching and RDDs

I have a very simple (n00b) question but I'm somehow stuck. I'm trying to read a set of files in Spark with wholeTextFiles and want to return an RDD[LogEntry], where LogEntry is just a case class. I want to end up with an RDD of valid entries and I need to use a regular expression to extract the parameters for my case class. When an entry is not valid I do not want the extractor logic to fail but simply write an entry in a log. For that I use LazyLogging.
object LogProcessors extends LazyLogging {
def extractLogs(sc: SparkContext, path: String, numPartitions: Int = 5): RDD[Option[CleaningLogEntry]] = {
val pattern = "<some pattern>".r
val logs = sc.wholeTextFiles(path, numPartitions)
val entries = logs.map(fileContent => {
val file = fileContent._1
val content = fileContent._2
content.split("\\r?\\n").map(line => line match {
case pattern(dt, ev, seq) => Some(LogEntry(<...>))
case _ => logger.error(s"Cannot parse $file: $line"); None
})
})
That gives me an RDD[Array[Option[LogEntry]]]. Is there a neat way to end up with an RDD of the LogEntrys? I'm somehow missing it.
I was thinking about using Try instead, but I'm not sure if that's any better.
Thoughts greatly appreciated.
To get rid of the Array - simply replace the map command with flatMap - flatMap will treat a result of type Traversable[T] for each record as separate records of type T.
To get rid of the Option - collect only the successful ones: entries.collect { case Some(entry) => entry }.
Note that this collect(p: PartialFunction) overload (which performs something equivelant to a map and a filter combined) is very different from collect() (which sends all data to the driver).
Altogether, this would be something like:
def extractLogs(sc: SparkContext, path: String, numPartitions: Int = 5): RDD[CleaningLogEntry] = {
val pattern = "<some pattern>".r
val logs = sc.wholeTextFiles(path, numPartitions)
val entries = logs.flatMap(fileContent => {
val file = fileContent._1
val content = fileContent._2
content.split("\\r?\\n").map(line => line match {
case pattern(dt, ev, seq) => Some(LogEntry(<...>))
case _ => logger.error(s"Cannot parse $file: $line"); None
})
})
entries.collect { case Some(entry) => entry }
}

How to match against the pattern of a partial function's case definition in a Scala macro?

As part of a macro, I want to manipulate the case definitions of a partial function.
To do so, I use a Transformer to manipulate the case definitions of the partial function and a Traverser to inspect the patterns of the case definitions:
def myMatchImpl[A: c.WeakTypeTag, B: c.WeakTypeTag](c: Context)
(expr: c.Expr[A])(patterns: c.Expr[PartialFunction[A, B]]): c.Expr[B] = {
import c.universe._
val transformer = new Transformer {
override def transformCaseDefs(trees: List[CaseDef]) = trees map {
case caseDef # CaseDef(pattern, guard , body) => {
// println(show(pattern))
val traverser = new Traverser {
override def traverse(tree: Tree) = tree match {
// match against a specific pattern
}
}
traverser.traverse(pattern)
}
}
}
val transformedPartialFunction = transformer.transform(patterns.tree)
c.Expr[B](q"$transformedPartialFunction($expr)")
}
Now let us assume, the interesting data I want to match against is represented by the class Data (which is part of the object Example):
case class Data(x: Int, y: String)
When now invoking the macro on the example below
abstract class Foo
case class Bar(data: Data) extends Foo
case class Baz(string: String, data: Data) extends Foo
def test(foo: Foo) = myMatch(foo){
case Bar(Data(x,y)) => y
case Baz(_, Data(x,y)) => y
}
the patterns of the case definitions of the partial function are transformed by the compiler as following (the Foo, Bar, and Baz classes are members of the object Example, too):
(data: Example.Data)Example.Bar((x: Int, y: String)Example.Data((x # _), (y # _)))
(string: String, data: Example.Data)Example.Baz(_, (x: Int, y: String)Example.Data((x # _), (y # _)))
This is the result of printing the patterns as hinted in the macro above (using show), the raw abstract syntax trees (printed using showRaw) look like this:
Apply(TypeTree().setOriginal(Select(This(newTypeName("Example")), Example.Bar)), List(Apply(TypeTree().setOriginal(Select(This(newTypeName("Example")), Example.Data)), List(Bind(newTermName("x"), Ident(nme.WILDCARD)), Bind(newTermName("y"), Ident(nme.WILDCARD))))))
Apply(TypeTree().setOriginal(Select(This(newTypeName("Example")), Example.Baz)), List(Ident(nme.WILDCARD), Apply(TypeTree().setOriginal(Select(This(newTypeName("Example")), Example.Data)), List(Bind(newTermName("x"), Ident(nme.WILDCARD)), Bind(newTermName("y"), Ident(nme.WILDCARD))))))
How do I write a pattern-quote which matches against these trees?
First of all, there is a special flavor of quasiquotes specifically for CaseDefs called cq:
override def transformCaseDefs(trees: List[CaseDef]) = trees map {
case caseDef # cq"$pattern if $guard => $body" => ...
}
Secondly, you should use pq to deconstruct patterns:
pattern match {
case pq"$name # $nested" => ...
case pq"$extractor($arg1, $arg2: _*)" => ...
...
}
If you are interested in internals of trees that are used for pattern matching they are created by patvarTransformer defined in TreeBuilder.scala
On the other hand if you're are working with UnApply trees (that are being produced after typechecking) I have bad news for you: quasiquotes currently don't support them. Follow SI-7789 to get notified when this is fixed.
After Den Shabalin pointed out, that quasiquotes can't be used in this particular setting, I managed to find a pattern which matches against the patterns of a partial function's case definitions.
The key problem is, that the constructor we want to match against (in our example Data) is stored in the TypeTree of the Apply node. Matching against a tree wrapped up in a TypeTree is a bit tricky, since the only extractor of this class (TypeTree()) isn't very helpful for this particular task. Instead we have to select the wrapped up tree using the original method:
override def transform(tree: Tree) = tree match {
case Apply(constructor # TypeTree(), args) => constructor.original match {
case Select(_, sym) if (sym == newTermName("Data")) => ...
}
}
In our use case the wrapped up tree is a Select node and we can now check if the symbol of this node is the one we are looking for.

traverse collection of type "Any" in Scala

I would like to traverse a collection resulting from the Scala JSON toolkit at github.
The problem is that the JsonParser returns "Any" so I am wondering how I can avoid the following error:
"Value foreach is not a member of Any".
val json = Json.parse(urls)
for(l <- json) {...}
object Json {
def parse(s: String): Any = (new JsonParser).parse(s)
}
You will have to do pattern matching to traverse the structures returned from the parser.
/*
* (untested)
*/
def printThem(a: Any) {
a match {
case l:List[_] =>
println("List:")
l foreach printThem
case m:Map[_, _] =>
for ( (k,v) <- m ) {
print("%s -> " format k)
printThem(v)
}
case x =>
println(x)
}
val json = Json.parse(urls)
printThem(json)
You might have more luck using the lift-json parser, available at: http://github.com/lift/lift/tree/master/framework/lift-base/lift-json/
It has a much richer type-safe DSL available, and (despite the name) can be used completely standalone outside of the Lift framework.
If you are sure that in all cases there will be only one type you can come up with the following cast:
for (l <- json.asInstanceOf[List[List[String]]]) {...}
Otherwise do a Pattern-Match for all expected cases.