Can I use Scala Macros to internalise an external DSL? - scala

I would like to implement an external DSL such as SQL in Scala using Macros. I have already seen papers on how to implement internal DSLs with Scala. Also, I've recently written an article about how this can be done in Java, myself.
Now, internal DSLs always feel a bit clumsy as they have to be implemented and used in the host language (e.g. Scala) and adhere to the host language's syntax constraints. That's why I'm hoping that Scala Macros will allow to internalise an external DSL without any such constraints. However, I don't fully understand Scala Macros and how far I can go with them. I've seen that SLICK and also a much less-known library called sqltyped have started using Macros, but SLICK uses a "Scalaesque" syntax for querying, which isn't really SQL, whereas sqltyped uses Macros to parse SQL strings (which can be done without Macros, too). Also, the various examples given on the Scala website are too trivial for what I'm trying to do
My question is:
Given an example external DSL defined as some BNF grammar like this:
MyGrammar ::= (
'SOME-KEYWORD' 'OPTION'?
(
( 'CHOICE-1' 'ARG-1'+ )
| ( 'CHOICE-2' 'ARG-2' )
)
)
Can I implement the above grammar using Scala Macros to allow for client programs like this? Or are Scala Macros not powerful enough to implement such a DSL?
// This function would take a Scala compile-checked argument and produce an AST
// of some sort, that I can further process
def evaluate(args: MyGrammar): MyGrammarEvaluated = ...
// These expressions produce a valid result, as the argument is valid according
// to my grammar
val result1 = evaluate(SOME-KEYWORD CHOICE-1 ARG-1 ARG-1)
val result2 = evaluate(SOME-KEYWORD CHOICE-2 ARG-2)
val result3 = evaluate(SOME-KEYWORD OPTION CHOICE-1 ARG-1 ARG-1)
val result4 = evaluate(SOME-KEYWORD OPTION CHOICE-2 ARG-2)
// These expressions produce a compilation error, as the argument is invalid
// according to my grammar
val result5 = evaluate(SOME-KEYWORD CHOICE-1)
val result6 = evaluate(SOME-KEYWORD CHOICE-2 ARG-2 ARG-2)
Note, I'm not interested in solutions that parse strings, the way sqltyped does

It's been some time since this question was answered by paradigmatic, but I've just stumbled upon it and thought it's worth extending.
An internalized DSL must indeed be valid Scala code with all the names defined before macro expansion, however one can overcome this restriction with a carefully designed syntax and Dynamics.
Let's say we wanted to create a simple, silly DSL allowing us to introduce people in a classy way. It might look like this:
people {
introduce John please
introduce Frank and Lilly please
}
We would like to translate (as part of compilation) the above code to an object (of a class derived for example from class People) containing definitions of fields of type Person for every introduced person - something like this:
new People {
val john: Person = new Person("John")
val frank: Person = new Person("Frank")
val lilly: Person = new Person("Lilly")
}
To make it possible we need to define some artificial objects and classes having two purposes: defining grammar (somewhat...) and tricking the compiler into accepting undefined names (like John or Lilly).
import scala.language.dynamics
trait AllowedAfterName
object and extends Dynamic with AllowedAfterName {
def applyDynamic(personName: String)(arg: AllowedAfterName): AllowedAfterName = this
}
object please extends AllowedAfterName
object introduce extends Dynamic {
def applyDynamic(personName: String)(arg: AllowedAfterName): and.type = and
}
These dummy definitions make our DSL code legal - the compiler translates it to the below code before proceeding to macro expansion:
people {
introduce.applyDynamic("John")(please)
introduce.applyDynamic("Frank")(and).applyDynamic("Lilly")(please)
}
Do we need this ugly and seemingly redundant please? One could probably come up with a nicer syntax, for example using Scala's postfix operator notation (language.postfixOps), but that gets tricky due to semicolon inference (you can try it yourself in the REPL console or IntelliJ's Scala Worksheet). It's easiest to just interlace keywords with undefined names.
As we've made the syntax legal, we can process the block with a macro:
def people[A](block: A): People = macro Macros.impl[A]
class Macros(val c: whitebox.Context) {
import c.universe._
def impl[A](block: c.Tree) = {
val introductions = block.children
def getNames(t: c.Tree): List[String] = t match {
case q"applyDynamic($name)(and).$rest" =>
name :: getNames(q"$rest")
case q"applyDynamic($name)(please)" =>
List(name)
}
val names = introductions flatMap getNames
val defs = names map { n =>
val varName = TermName(n.toLowerCase())
q"val $varName: Person = new Person($n)"
}
c.Expr[People](q"new People { ..$defs }")
}
}
The macro finds all the introduced names by pattern matching against the expanded dynamic calls and generates the desired output code. Notice that the macro must be whitebox in order to be allowed to return an expression of a type derived from the one declared in the signature.

I don't think so. The expression you pass to a macro must be a valid Scala expression and identifiers should be defined.

Related

Scala How to create an Array of defs/Objects and the call those defs in a foreach?

I have a bunch of Scala objects with def's that do a bunch of processing
Foo\CatProcessing (def processing)
Foo\DogProcessing (def processing)
Foo\BirdProcessing (def processing)
Then I have a my main def that will call all of the individual Foo\obj defProcessing. Passing in common parameter values and such
I am trying to put all the list of objects into an Array or List, and then do a 'Foreach' to loop through the list passing in the parameter values or such. ie
foreach(object in objList){
object.Processing(parametmers)
}
Coming from C#, I could do this via binders or the like, so who would I manage this is in Scala?
for (obj <- objList) {
obj.processing(parameters) // `object` is a reserved keyword in Scala
}
or
objList.foreach(obj => obj.processing(parameters))
They are actually the same thing, the former being "syntactic sugar" for the latter.
In the second case, you can bind the only parameter of the anonymous function passed to the foreach function to _, resulting in the following
objList.foreach(_.processing(parameters))
for comprehensions in Scala can be quite expressive and go beyond simple iteration, if you're curious you can read more about it here.
Since you are coming from C#, if by any chance you have had any exposure to LINQ you will find yourself at home with the Scala Collection API. The official documentation is quite extensive in this regard and you can read more about it here.
As it came up in the comments following my reply, you also need the objects you want to iterate to:
have a common type that
exposes the processing method
Alternatively, Scala allows to use structural typing but that relies on runtime reflection and it's unlikely something you really need or want in this case.
You can achieve it by having a common trait for your objects, as in the following example:
trait Processing {
def processing(): Unit
}
final class CatProcessing extends Processing {
def processing(): Unit = println("cat")
}
final class DogProcessing extends Processing {
def processing(): Unit = println("dog")
}
final class BirdProcessing extends Processing {
def processing(): Unit = println("bird")
}
val cat = new CatProcessing
val dog = new DogProcessing
val bird = new BirdProcessing
for (process <- List(cat, dog, bird)) {
process.processing()
}
You can run the code above and play around with it here on Scastie.
Using a Map instead, you can do it as such. (wonder if this works through other types of lists)
val test = Map("foobar" -> CatProcessing)
test.values.foreach(
(movie) => movie.processing(spark)
)

Building Dynamic Expressions with Scala3 macro

In Scala2's macro, you could dynamically construct expressions by using Context.parse and so on.
def generateExpr[T: c.WeakTypeTag](target: C#Symbol): C#Expr[T] = {
if (target.isModule) {
c.Expr[T](c.parse(target.fullName))
} else {
c.Expr[T] {
c.parse(
s"""new ${target.fullName}(${
val a = target.asClass.primaryConstructor.asMethod
val b = a.paramLists
target.asClass.primaryConstructor.asMethod.paramLists
.collect {
case curry if !curry.exists(_.isImplicit) =>
curry.map { param => s"inject[${param.typeSignature.toString}]" }.mkString(",")
}
.mkString(")(")
}) with MixIn"""
)
}
}
}
Even if the pre-composition syntax literal refers to a symbol that is not in the current classpath, the compilation will succeed if it can be resolved in the module that calls macro.
However, there is no such feature in the released Scala3.
My goal is to use
to refer to a symbol that is not in the classpath of a module using macro.
I want to use new $T(???) with Trait from the type information.
Are these no longer feasible?
=====
https://github.com/lampepfl/dotty/discussions/12590
As mentioned in this issue, the deployment of variable length parameters to Trees is also unknown. These will be relevant as dynamic AST construction.
Those are use cases that we explicitly did not want to support since they cause language fragmentation, with many possible dialects supported by macros. I am sure there are good use cases
for this, but overall I believe allowing this would be detrimental to the language ecosystem as a whole.
https://contributors.scala-lang.org/t/compatibility-required-for-migration-from-scala2-macro/5100/2?u=giiita

How to compile/eval a Scala expression at runtime?

New to Scala and looking for pointers to an idiomatic solution, if there is one.
I'd like to have arbitrary user-supplied Scala functions (which are allowed to reference functions/classes I have defined in my code) applied to some data.
For example: I have foo(s: String): String and bar(s: String): String functions defined in my myprog.scala. The user runs my program like this:
$ scala myprog data.txt --func='(s: Str) => foo(bar(s)).reverse'
This would run line by line through the data file and emit the result of applying the user-specified function to that line.
For extra points, can I ensure that there are no side-effects in the user-defined function? If not, can I restrict the function to use only a restricted subset of functions (which I can assure to be safe)?
#kenjiyoshida has a nice gist that shows how to eval Scala code. Note that when using Eval from that gist, not specifying a return value will result in a runtime failure when Scala defaults to inferring Nothing.
scala> Eval("println(\"Hello\")")
Hello
java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.runtime.Nothing$
... 42 elided
vs
scala> Eval[Unit]("println(\"Hello\")")
Hello
It nicely handles whatever's in scope as well.
object Thing {
val thing: Int = 5
}
object Eval {
def apply[A](string: String): A = {
val toolbox = currentMirror.mkToolBox()
val tree = toolbox.parse(string)
toolbox.eval(tree).asInstanceOf[A]
}
def fromFile[A](file: File): A =
apply(scala.io.Source.fromFile(file).mkString(""))
def fromFileName[A](file: String): A =
fromFile(new File(file))
}
object Thing2 {
val thing2 = Eval[Int]("Thing.thing") // 5
}
Twitter's util package used to have util-eval, but that seems to have been deprecated now (and also triggers a compiler bug when compiled).
As for the second part of your question, the answer seems to be no. Even if you disable default Predef and imports yourself, a user can always get to those functions with the fully qualified package name. You could perhaps use Scala's scala.tools.reflect.ToolBox to first parse your string and then compare against a whitelist, before passing to eval, but at that point things could get pretty hairy since you'll be manually writing code to sanitize the Scala AST (or at the very least reject dangerous input). It definitely doesn't seem to be an "idiomatic solution."
This should be possible by using the standard Java JSR 223 Scripting Engine
see https://issues.scala-lang.org/browse/SI-874
(also mentions using scala.tools.nsc.Interpreter but not sure this is still available)
import javax.script.*;
ScriptEngine e = new ScriptEngineManager().getEngineByName("scala");
e.getContext().setAttribute("label", new Integer(4), ScriptContext.ENGINE_SCOPE);
try {
engine.eval("println(2+label)");
} catch (ScriptException ex) {
ex.printStackTrace();
}

Why do you need Arbitraries in scalacheck?

I wonder why Arbitrary is needed because automated property testing requires property definition, like
val prop = forAll(v: T => check that property holds for v)
and value v generator. The user guide says that you can create custom generators for custom types (a generator for trees is exemplified). Yet, it does not explain why do you need arbitraries on top of that.
Here is a piece of manual
implicit lazy val arbBool: Arbitrary[Boolean] = Arbitrary(oneOf(true, false))
To get support for your own type T you need to define an implicit def
or val of type Arbitrary[T]. Use the factory method Arbitrary(...) to
create the Arbitrary instance. This method takes one parameter of type
Gen[T] and returns an instance of Arbitrary[T].
It clearly says that we need Arbitrary on top of Gen. Justification for arbitrary is not satisfactory, though
The arbitrary generator is the generator used by ScalaCheck when it
generates values for property parameters.
IMO, to use the generators, you need to import them rather than wrapping them into arbitraries! Otherwise, one can argue that we need to wrap arbitraries also into something else to make them usable (and so on ad infinitum wrapping the wrappers endlessly).
You can also explain how does arbitrary[Int] convert argument type into generator. It is very curious and I feel that these are related questions.
forAll { v: T => ... } is implemented with the help of Scala implicits. That means that the generator for the type T is found implicitly instead of being explicitly specified by the caller.
Scala implicits are convenient, but they can also be troublesome if you're not sure what implicit values or conversions currently are in scope. By using a specific type (Arbitrary) for doing implicit lookups, ScalaCheck tries to constrain the negative impacts of using implicits (this use also makes it similar to Haskell typeclasses that are familiar for some users).
So, you are entirely correct that Arbitrary is not really needed. The same effect could have been achieved through implicit Gen[T] values, arguably with a bit more implicit scoping confusion.
As an end-user, you should think of Arbitrary[T] as the default generator for the type T. You can (through scoping) define and use multiple Arbitrary[T] instances, but I wouldn't recommend it. Instead, just skip Arbitrary and specify your generators explicitly:
val myGen1: Gen[T] = ...
val mygen2: Gen[T] = ...
val prop1 = forAll(myGen1) { t => ... }
val prop2 = forAll(myGen2) { t => ... }
arbitrary[Int] works just like forAll { n: Int => ... }, it just looks up the implicit Arbitrary[Int] instance and uses its generator. The implementation is simple:
def arbitrary[T](implicit a: Arbitrary[T]): Gen[T] = a.arbitrary
The implementation of Arbitrary might also be helpful here:
sealed abstract class Arbitrary[T] {
val arbitrary: Gen[T]
}
ScalaCheck has been ported from the Haskell QuickCheck library. In Haskell type-classes only allow one instance for a given type, forcing you into this sort of separation.
In Scala though, there isn't such a constraint and it would be possible to simplify the library. My guess is that, ScalaCheck being (initially written as) a 1-1 mapping of QuickCheck, makes it easier for Haskellers to jump into Scala :)
Here is the Haskell definition of Arbitrary
class Arbitrary a where
-- | A generator for values of the given type.
arbitrary :: Gen a
And Gen
newtype Gen a
As you can see they have a very different semantic, Arbitrary being a type class, and Gen a wrapper with a bunch of combinators to build them.
I agree that the argument of "limiting the scope through semantic" is a bit vague and does not seem to be taken seriously when it comes to organizing the code: the Arbitrary class sometimes simply delegates to Gen instances as in
/** Arbirtrary instance of Calendar */
implicit lazy val arbCalendar: Arbitrary[java.util.Calendar] =
Arbitrary(Gen.calendar)
and sometimes defines its own generator
/** Arbitrary BigInt */
implicit lazy val arbBigInt: Arbitrary[BigInt] = {
val long: Gen[Long] =
Gen.choose(Long.MinValue, Long.MaxValue).map(x => if (x == 0) 1L else x)
val gen1: Gen[BigInt] = for { x <- long } yield BigInt(x)
/* ... */
Arbitrary(frequency((5, gen0), (5, gen1), (4, gen2), (3, gen3), (2, gen4)))
}
So in effect this leads to code duplication (each default Gen being mirrored by an Arbitrary) and some confusion (why isn't Arbitrary[BigInt] not wrapping a default Gen[BigInt]?).
My reading of that is that you might need to have multiple instances of Gen, so Arbitrary is used to "flag" the one that you want ScalaCheck to use?

Is it possible to simplify Scala method arguments declaration using macros?

Methods are often declared with obvious parameter names, e.g.
def myMethod(s: String, image: BufferedImage, mesh: Mesh) { ... }
Parameter names correspond to parameter types.
1) "s" is often used for String
2) "i" for Int
3) lowercased class name for one word named classes (Mesh -> mesh)
4) lowercased last word from class name for long class names (BufferedImage -> image)
(Of course, it would not be convenient for ALL methods and arguments. Of course, somebody would prefer other rules…)
Scala macros are intended to generate some expressions in code. I would like to write some specific macros to convert to correct Scala expressions something like this:
// "arguments interpolation" style
// like string interpolation
def myMethod s[String, BufferedImage, Mesh]
{ /* code using vars "s", "image", "mesh" */ }
// or even better:
mydef myMethod[String, BufferedImage, Mesh]
{ /* code using vars "s", "image", "mesh" */ }
Is it possible?
Currently it is not possible and probably it will never be. Macros can not introduce their own syntax - they must be represented through valid Scala code (which can be executed at compile time) and, too, they must generate valid Scala code (better say a valid Scala AST).
Both of your shown examples are not valid Scala code, thus Macros can not handle them. Nevertheless, the current nightly build of Macro Paradise includes untyped macros. They allow to write Scala code which is typechecked after they are expanded, this means it is possible to write:
forM({i = 0; i < 10; i += 1}) {
println(i)
}
Notice, that the curly braces inside of the first parameter list are needed because, although the code is not typechecked when one writes it, it must represent a valid Scala AST.
The implementation of this macro looks like this:
def forM(header: _)(body: _) = macro __forM
def __forM(c: Context)(header: c.Tree)(body: c.Tree): c.Tree = {
import c.universe._
header match {
case Block(
List(
Assign(Ident(TermName(name)), Literal(Constant(start))),
Apply(Select(Ident(TermName(name2)), TermName(comparison)), List(Literal(Constant(end))))
),
Apply(Select(Ident(TermName(name3)), TermName(incrementation)), List(Literal(Constant(inc))))
) =>
// here one can generate the behavior of the loop
// but omit full implementation for clarity now ...
}
}
Instead of an already typechecked expression, the macro expects only a tree, that is typechecked after the expansion. The method call itself expects two parameter lists, whose parameter types can be delayed after the expansion phase if one uses an underscore.
Currently there is a little bit of documentation available but because it is extremely beta a lot of things will probably change in future.
With type macros it is possible to write something like this:
object O extends M {
// traverse the body of O to find what you want
}
type M(x: _) = macro __M
def __M(c: Context)(x: c.Tree): c.Tree = {
// omit full implementation for clarity ...
}
This is nice in order to delay the typechecking of the whole body because it allows to to cool things...
Macros that can change Scalas syntax are not planned at the moment and are probably not a good idea. I can't say if they will happen one day only future can tell us this.
Aside from the "why" (no really, why do you want to do that?), the answer is no, because as far as I know macros cannot (in their current state) generate methods or types, only expressions.