Relationship between ScalaMeta, ScalaFix, and SemanticDB - scala

I have the following information:
Scalameta: has the ability to produce ASTs from a source file
SemanticDB: Contains information about symbols from a parsed source
file
ScalaFix: is based on ScalaMeta and SemanticDB so it has the ability to access symbol information and traverse ASTs.
Loading a source file using
ScalaMeta is as easy as the following:
val path = java.nio.file.Paths.get("path to source file")
val bytes = java.nio.file.Files.readAllBytes(path)
val text = new String(bytes, "UTF-8")
val input = Input.VirtualFile(path.toString, text)
val tree = input.parse[Source].get
As you can observe from the above code snippet, ScalaMeta parses the source file as type Source.
Now consider the below code snippet where ScalaFix uses a tree of type SemanticDocument:
class NamedLiteralArguments extends SemanticRule("NamedLiteralArguments") {
override def fix(implicit doc: SemanticDocument): Patch = {
doc.tree
.collect {
case Term.Apply(fun, args) =>
args.zipWithIndex.collect {
case (t # Lit.Boolean(_), i) =>
fun.symbol.info match {
case Some(info) =>
info.signature match {
case method: MethodSignature
if method.parameterLists.nonEmpty =>
val parameter = method.parameterLists.head(i)
val parameterName = parameter.displayName
Patch.addLeft(t, s"$parameterName = ")
case _ =>
// Do nothing, the symbol is not a method
Patch.empty
}
case None =>
// Do nothing, we don't have information about this symbol.
Patch.empty
}
}
}
.flatten
.asPatch
}
}
Inspecting the above two code snippets shows that ScalaMeta can parse a Scala source into type Source. ScalaFix seems to parse the same into an implicit SemanticDocument. The SemanticDocument has field tree which is implemented by ScalaMeta which results into a traversable AST data structure just like the one produced by parsing source file as type Source. This shows the relationship between ScalaMeta and ScalaFix. However, my concern is that I need to load a Scala source code and use ScalaFix on it to access symbol.info but the ScalaFix documentation does not show how to do this.
When I attempt to load source file as a SemanticDocument like this in the first code snippet instead of Source:
val tree = input.parse[SemanticDocument].get
I get an error that no parameters found for parameter parse in parse[SemanticDocument]. Also note that trying to use symbol.info
in the first code snippet also produces errors about implicit types. This is not the case with the second code snippet as the loaded doc parameter is an implicit parameter of type SemanticDocument.
So how does ScalaFix load source files as SemanticDocument?

Related

How to return successfully parsed rows that converted into my case class

I have a file, each row is a json array.
I reading each line of the file, and trying to convert the rows into a json array, and then for each element I am converting to a case class using json spray.
I have this so far:
for (line <- source.getLines().take(10)) {
val jsonArr = line.parseJson.convertTo[JsArray]
for (ele <- jsonArr.elements) {
val tryUser = Try(ele.convertTo[User])
}
}
How could I convert this entire process into a single line statement?
val users: Seq[User] = source.getLines.take(10).map(line => line.parseJson.convertTo[JsonArray].elements.map(ele => Try(ele.convertTo[User])
The error is:
found : Iterator[Nothing]
Note: I used Scala 2.13.6 for all my examples.
There is a lot to unpack in these few lines of code. First of all, I'll share some code that we can use to generate some meaningful input to play around with.
object User {
import scala.util.Random
private def randomId: Int = Random.nextInt(900000) + 100000
private def randomName: String = Iterator
.continually(Random.nextInt(26) + 'a')
.map(_.toChar)
.take(6)
.mkString
def randomJson(): String = s"""{"id":$randomId,"name":"$randomName"}"""
def randomJsonArray(size: Int): String =
Iterator.continually(randomJson()).take(size).mkString("[", ",", "]")
}
final case class User(id: Int, name: String)
import scala.util.{Try, Success, Failure}
import spray.json._
import DefaultJsonProtocol._
implicit val UserFormat = jsonFormat2(User.apply)
This is just some scaffolding to define some User domain object and come up with a way to generate a JSON representation of an array of such objects so that we can then use a JSON library (spray-json in this case) to parse it back into what we want.
Now, going back to your question. This is a possible way to massage your data into its parsed representation. It may not fit 100% what your are trying to do, but there's some nuance in the data types involved and how they work:
val parsedUsers: Iterator[Try[User]] =
for {
line <- Iterator.continually(User.randomJsonArray(4)).take(10)
element <- line.parseJson.convertTo[JsArray].elements
} yield Try(element.convertTo[User])
First difference: notice that I use the for comprehension in a form in which the "outcome" of an iteration is not a side effect (for (something) { do something }) but an actual value for (something) yield { return a value }).
Second difference: I explicitly asked for an Iterator[Try[User]] rather than a Seq[User]. We can go very down into a rabbit hole on the topic of why the types are what they are here, but the simple explanation is that a for ... yield expression:
returns the same type as the one in the first line of the generation -- if you start with a val ns: Iterator[Int]; for (n<- ns) ... you'll get an iterator at the end
if you nest generators, they need to be of the same type as the "outermost" one
You can read more on for comprehensions on the Tour of Scala and the Scala Book.
One possible way of consuming this is the following:
for (user <- parsedUsers) {
user match {
case Success(user) => println(s"parsed object $user")
case Failure(error) => println(s"ERROR: '${error.getMessage}'")
}
As for how to turn this into a "one liner", for comprehensions are syntactic sugar applied by the compiler which turns every nested call into a flatMap and the final one into map, as in the following example (which yields an equivalent result as the for comprehension above and very close to what the compiler does automatically):
val parsedUsers: Iterator[Try[User]] = Iterator
.continually(User.randomJsonArray(4))
.take(10)
.flatMap(line =>
line.parseJson
.convertTo[JsArray]
.elements
.map(element => Try(element.convertTo[User]))
)
One note that I would like to add is that you should be mindful of readability. Some teams prefer for comprehensions, others manually rolling out their own flatMap/map chains. Coders discretion is advised.
You can play around with this code here on Scastie (and here is the version with the flatMap/map calls).

Scala - What is type Input and what are Input.Source and Input.Offset mean?

I read this question:Link
And this is a block of code its accepted answer:
/** A parser that matches a regex string */
implicit def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(source.subSequence(start, start + matched.end).toString,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
I dont understand some parts of the code:
The code between curly braces is like it is defining a new class although before it is "new Parser[String]" which I knew is create a new instant of class Parser[String]
In the code, there is a function apply take a parameter of type Input, but I didnt found any class like it on Scaladoc and its members: source, offset
Can you explain those parts to me?
Input is a type alias in Parsers from the scala-parser-combinators library.
At the time that answer was written, parser combinators were still in the scala standard library. They have been removed as of Scala 2.11.
The doc for Parsers (and also for RegexParsers, which I'd guess is more specifically what's being used here) says type Input = Reader[Elem], so Reader is the type that has source and offset fields.
new Parser { ... } defines an anonymous class that extends Parser[String].

How to invoke the Scala compiler programmatically?

I want my Scala code to take a Scala class as input, compile and execute that class. How can I programmatically invoke a Scala compiler? I will be using the latest Scala version, i.e. 2.10.
ToolBox
I think the proper way of invoking the Scala compiler is doing it via Reflection API documented in Overview. Specifically, Tree Creation via parse on ToolBoxes section in 'Symbols, Trees, and Types' talks about parsing String into Tree using ToolBox. You can then invoke eval() etc.
scala.tools.nsc.Global
But as Shyamendra Solanki wrote, in reality you can drive scalac's Global to get more done. I've written CompilerMatcher so I can compile generated code with sample code to do integration tests for example.
scala.tools.ncs.IMain
You can invoke the REPL IMain to evaluate the code (this is also available in the above CompilerMatcher if you want something that works with Scala 2.10):
val main = new IMain(s) {
def lastReq = prevRequestList.last
}
main.compileSources(files.map(toSourceFile(_)): _*)
code map { c => main.interpret(c) match {
case IR.Error => sys.error("Error interpreting %s" format (c))
case _ =>
}}
val holder = allCatch opt {
main.lastReq.lineRep.call("$result")
}
This was demonstrated in Embedding the Scala Interpreter post by Josh Suereth back in 2009.
The class to be compiled and run (in file test.scala)
class Test {
println ("Hello World!")
}
// compileAndRun.scala (in same directory)
import scala.tools.nsc._
import java.io._
val g = new Global(new Settings())
val run = new g.Run
run.compile(List("test.scala")) // invoke compiler. it creates Test.class.
val classLoader = new java.net.URLClassLoader(
Array(new File(".").toURI.toURL), // Using current directory.
this.getClass.getClassLoader)
val clazz = classLoader.loadClass("Test") // load class
clazz.newInstance // create an instance, which will print Hello World.

Inference Labeled LDA/pLDA [Topic Modelling Toolbox]

I have been trying to get through with the code for inference from trained labeled LDA model and pLDA using TMT toolbox(stanford nlp group).
I have gone through the examples provided in the following links:
http://nlp.stanford.edu/software/tmt/tmt-0.3/
http://nlp.stanford.edu/software/tmt/tmt-0.4/
Here is the code I'm trying for labeled LDA inference
val modelPath = file("llda-cvb0-59ea15c7-31-61406081-75faccf7");
val model = LoadCVB0LabeledLDA(modelPath);`
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);
val text = {
source ~> // read from the source file
Column(4) ~> // select column containing text
TokenizeWith(model.tokenizer.get) //tokenize with model's tokenizer
}
val labels = {
source ~> // read from the source file
Column(2) ~> // take column two, the year
TokenizeWith(WhitespaceTokenizer())
}
val outputPath = file(modelPath, source.meta[java.io.File].getName.replaceAll(".csv",""));
val dataset = LabeledLDADataset(text,labels,model.termIndex,model.topicIndex);
val perDocTopicDistributions = InferCVB0LabeledLDADocumentTopicDistributions(model, dataset);
val perDocTermTopicDistributions =EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
TSVFile(outputPath+"-word-topic-distributions.tsv").write({
for ((terms,(dId,dists)) <- text.iterator zip perDocTermTopicDistributions.iterator) yield {
require(terms.id == dId);
(terms.id,
for ((term,dist) <- (terms.value zip dists)) yield {
term + " " + dist.activeIterator.map({
case (topic,prob) => model.topicIndex.get.get(topic) + ":" + prob
}).mkString(" ");
});
}
});
Error
found : scalanlp.collection.LazyIterable[(String, Array[Double])]
required: Iterable[(String, scalala.collection.sparse.SparseArray[Double])]
EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
I understand it's a type mismatch error. But I don't know how to resolve this for scala.
Basically I don't understand how should I extract the
1. per doc topic distribution
2. per doc label distribution after the output of the infer command.
Please help.
Same in the case of pLDA.
I reach the inference command and after that clueless what to do with it.
Scala type system is much more complex then Java one and understanding it will make you a better programmer. The problem lies here:
val perDocTermTopicDistributions =EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
because either model, or dataset, or perDocTopicDistributions are of type:
scalanlp.collection.LazyIterable[(String, Array[Double])]
while EstimateLabeledLDAPerWordTopicDistributions.apply expects a
Iterable[(String, scalala.collection.sparse.SparseArray[Double])]
The best way to investigate this type errors is to look at the ScalaDoc (for example the one for tmt is there: http://nlp.stanford.edu/software/tmt/tmt-0.4/api/#package ) and if you cannot find out where the problem lies easily, you should explicit the type of your variables inside your code like the following:
val perDocTopicDistributions:LazyIterable[(String, Array[Double])] = InferCVB0LabeledLDADocumentTopicDistributions(model, dataset)
If we look together to the javadoc of edu.stanford.nlp.tmt.stage:
def
EstimateLabeledLDAPerWordTopicDistributions (model: edu.stanford.nlp.tmt.model.llda.LabeledLDA[_, _, _], dataset: Iterable[LabeledLDADocumentParams], perDocTopicDistributions: Iterable[(String, SparseArray[Double])]): LazyIterable[(String, Array[SparseArray[Double]])]
def
InferCVB0LabeledLDADocumentTopicDistributions (model: CVB0LabeledLDA, dataset: Iterable[LabeledLDADocumentParams]): LazyIterable[(String, Array[Double])]
It now should be clear to you that the return of InferCVB0LabeledLDADocumentTopicDistributions cannot be used directly to feed EstimateLabeledLDAPerWordTopicDistributions.
I never used stanford nlp but this is by design how the api works, so you only need to convert your scalanlp.collection.LazyIterable[(String, Array[Double])] into Iterable[(String, scalala.collection.sparse.SparseArray[Double])] before calling the function.
If you look at the scaladoc on how to do this conversion it's pretty simple. Inside the package stage, in package.scala I can read import scalanlp.collection.LazyIterable;
So I know where to look, and in fact inside http://www.scalanlp.org/docs/core/data/#scalanlp.collection.LazyIterable you have a toIterable method which turns a LazyIterable into an Iterable, still you have to transform your internal array into a SparseArray
Again, I look into the package.scala for the stage package inside tmt and I see: import scalala.collection.sparse.SparseArray; And I look for scalala documentation :
http://www.scalanlp.org/docs/scalala/0.4.1-SNAPSHOT/#scalala.collection.sparse.SparseArray
It turns out that the constructors seems complicated to me, so it sounds much like I would have to look into the companion object for a factory method. It turns out that the method I am looking for is there, and it's called apply like as usual in Scala.
def
apply [T] (values: T*)(implicit arg0: ClassManifest[T], arg1: DefaultArrayValue[T]): SparseArray[T]
By using this, you can write a function with the following signature:
def f: Array[Double] => SparseArray[Double]
Once this has done, you can turn your result of InferCVB0LabeledLDADocumentTopicDistributions into a non-lazy iterable of sparse Array with one line of code:
result.toIterable.map { case (name, values => (name, f(values)) }

Scala: Read file from IO and line by line match it construct a case class and add the case class to a List

I have been trying to learn Scala in my free time and learn to write code in a more functional manner; anyways I have hit a problem trying to write (what I thought) would be a simple tool for reading formatted name data out of a file and creating a List of that data containing case classes... this does not work however:
object NameDataUtil {
/**
* reads a data file containing formatted name data into a list containing case classes.
*
* #param fileName name of the file (with extension) to read the data from.
* #return a list of case classes containing the data from the specified file.
*/
def readInFile(fileName: String): List[NameDatum] = {
val names: List[NameDatum] = Source.fromURL(getClass.getResource("/"+fileName)).getLines() foreach { line =>
line.trim.split("\\s+")
match {
case Array(name,freq,cumFreq,rank) => NameDatum(name, freq.toDouble, cumFreq.toDouble, rank.toInt)
}
}
names
}
}
Any help is greatly appreciated. Thanks!
Replace foreach with map and you'll get an Iterator[NameDatum]. Add a toList after getLines() to get a List[NameDatum] instead, though it's probably better to use toStream and work with a Stream[NameDatum] instead, if you expect the file to be too big to fit all in memory.
i think the problem is that you're using foreach, which has a return-type of unit; see the signature of foreach from the following:
http://www.scala-lang.org/api/current/index.html#scala.collection.Iterator
def foreach (f: (A) ⇒ Unit): Unit
what would do the trick is map, you can see the method signature (as in the above link) is
def map [B] (f: (A) ⇒ B): Iterator[B]
so when you perform your match on each line, you will map the type A, which is the type of line (i.e., String) to the type B, which is your NameDatum.
also, something you might find interesting is that you can use a regex in a match... this would be an alternative to using the split, check out the technique at the following
http://ikaisays.com/2009/04/04/using-pattern-matching-with-regular-expressions-in-scala/