Scala's compiler lexer - scala

I'm trying to get a list of tokens (I'm most interested in keywords) and their positions for a given scala source file.
I think there is a lexer utility inside scala compiler, but I can't find it. Can you point me into the right direction?

A simple lexer for a Scala-like language is provided in a standard library.
A small utility program which tokenizes Scala source using the same lexer as compiler does lives here

Scalariform has an accurate Scala lexer you can use:
import scalariform.lexer._
val tokens = ScalaLexer.rawTokenise("class A", forgiveErrors = true)
val keywords = tokens.find(_.tokenType.isKeyword)
val comments = tokens.find(_.tokenType.isComment)

Parser Combinators might help in what you are trying to achieve here, especially if you later on are not only interessted in keyword parsing.

Related

Functional Style reading large csv file in Scala

I am new to functional-style programming and scala, so my question might seem to be bit primitive.
Is there a specific way to read csv file in scala using functional style? Also, how the inner joins are performed for combining 2 csv files in scala using functional style?
I know spark and generally use data frame but don't have any idea in scala and finding it tough to search on google as well, since don't have much knowledge about it. Also, if anyone knows good links for the functional style programming for scala it would be great help.
The question is indeed too broad.
Is there a specific way to read csv file in scala using functional
style?
So far I don't know of a king's road to parse CSVs completely hassle-free.
CSV parsing includes
going through the input line-by-line
understanding, what to do with the (optional) header
accurately parsing each line each line according to the CSV specification
turning line parts into a business object
I recommend to
turn your input into Iterator[String]
split each line into parts using a library of your choice (e.g. opencsv)
manually create a desired domain object from the line parts
Here is an simple example which (ignores error handling and potential header)
case class Person(name: String, street: String)
val lineParser = new CSVParserBuilder().withSeparator(',').build()
val lines: Iterator[String] = Source.fromInputStream(new FileInputStream("file.csv")).getLines()
val parsedObjects: Iterator[Person] = lines.map(line => {
val parts: Array[String] = lineParser.parseLine(line)
Person(parts(0), parts(1))
})

Variables documentation in Scala

I'm new in Scala (I come from Java) and I would like to know how is the correct way to generate documentation for the class variables.
For example, if I have the following code:
class MyClass (bar:bar) {
val foo = bar
def function {
...
...
}
}
what's the correct way to create the documentation for the variable foo ? Do I just add the comment right before the declaration? Isn't it a bit confusing?
Thanks!
You can use Javadoc in Scala. But Scala also introduces its own documentation generator called Scaladoc. This is what is used to generate the standard language documentation.
In general, Scaladoc follows similar conventions to Javadoc but introduces new features. You can read more about Scaladoc comments style here.
There is no difference between java and scala comments. You can choose any documentation strategy you used in java.

How Scalding DSL translates into regular Scala code?

Please help to find out how Scalding DSL translates into regular Scala code.
https://github.com/twitter/scalding/wiki/Fields-based-API-Reference#sortBy
For example:
val fasterBirds = birds.map('speed -> 'doubledSpeed) { speed : Int => speed * 2 }
Questions:
What conventions I need to follow to add my own functions to Scalding map,reduce, groupBy,sort and `scanLeft?
How Scalding translates expressions on fields like `'inpFld -> 'outFld to Scala code?
What data structures/functions Scalding translator creates? Where to find them in Scalding source code?
Thanks!
That IS regular Scala code. One strength of Scala lies in its extensibility. The syntax allows the programmer to extend the syntax of programs to create domain-specific languages. This is especially helpful when using underlying libraries.
The domain-specific language of Scala doesn't translate so much as allow you to defer application of code until the appropriate time. The tick character (') means that the following set of characters is a symbol, built-in datatype. The -> operator is syntactic sugar that can be expressed in the same way that a comma is, but visually, it imparts the concept of "translation" or "from this to that".
The domain-specific language you are looking at doesn't create structures, although it looks like it does create a functor. In this case it is a seen by the Java Virtual Machine as a Function1[Type,Type] instance which has an apply method that takes its argument and returns a result which is calculated by the provided code.

How can I generate my own ScalaSig?

I've dynamically defined a Scala class, but in order to use it "properly" it needs to have a ScalaSig.
So, how might I generate a ScalaSig outside of normal compilation? Perhaps from a tree? Maybe like:
val tb = runtimeMirror(getClass.getClassLoader).mkToolBox()
val classDef = """class MyRecord(x: String)"""
val tree = showRaw(tb.parse(classDef))
But where does the pickler come in?
Thanks for any advice
-Julian
Artisanal-Pickle-Maker will reproduce a Scala pickled signature byte-for-byte (see restrictions).
Tapping into the compiler's pickler phase, as well as reuse of the Pickler's code, proved too challenging, so instead I used PickleBuffer, ShowPickled and a whole lotta diff -y to figure out how to generate arbitrary pickled Scala sigs.

Metaphone or Soundex for Scala

I have found Apache's impelementation of Soundex and Metaphone in Java but I would prefer to keep the text comparison libraries I am using in Scala only if possible. Google searches have yielded me nothing useful in finding either of these algorithms in Scala.
Worst case scenario I can translate these algorithms into Scala but that is less than ideal.
http://commons.apache.org/codec/
You are looking for Stringmetric from https://stackoverflow.com/users/554647/rocky-madden :
https://github.com/rockymadden/stringmetric
Not to answer my own question or anything but a viable option would be to utilize a Java library and create some companion objects in scala to help expose them more appropriately and to allow to code to document itself more effectively.
//Metaphone companion object for org.apache.commons.codec.language.Metaphone in /lib/commons-codec-1.7
object Metaphone {
val metaphone = new Metaphone
metaphone setMaxCodeLen 5
def encode(str:String) : String = {
metaphone encode str
}
}
Implementation:
val str_meta = Metaphone encode "Starbucks"