Traversing a Z3Context using the z3.scala.dsl API - scala

I'm using the scala^Z3 tool for a small library that (among other things) prints the constraints of a Z3Context in latex format. While it's possible to traverse the Z3AST and latex-ify the expressions by string comparison, it would be much nicer to use the object structure of the z3.scala.dsl package. Is there a way to obtain a z3.scala.dsl.Tree from a Z3AST?

It's true that the DSL is currently "write only", in that you can use it to create trees and ship them to Z3 but not to read them back.
The standard way to read Z3 trees is to use getASTKind and getDeclKind from Z3Context. The classes that represent the results are Z3ASTKind and Z3DeclKind respectively. (Since most trees are applications, the latter is where most of the information is).

It looks like the way to do this is create the original constraints using z3.scala.dsl, then add each constraint using Z3Context.assertCnstr (tree: Tree[BoolSort]). This way I have the whole DSL tree for easy transformation to latex. For some reason the examples on the scala^Z3 website assemble the AST without using the DSL at all, so this alternative wasn't obvious.

Related

Is there a way to parse SMT-LIB2 strings through the CVC4 C++ API?

I have a program that can dynamically generate expressions in SMT-LIB format and I am trying to connect these expressions to CVC4 to test satisfiability and get the models. I am wondering if there is a convenient way to parse these strings through the CVC4 C++ API or if it would be best to just store the generated SMT-LIB code in a file and redirect input to the cvc4 executable.
A cursory look at their API doesn't reveal anything obvious, so I don't think they support this mode of operation. In general, loading such statements "on the fly" is tricky, since an expression by itself doesn't make much sense: You'd have to be in a context that has all the relevant sorts defined, along with all the definitions that your expressions rely on, including the selection of the proper logic. That is, for instance, why the corresponding function in z3 has extra arguments: https://z3prover.github.io/api/html/classz3_1_1context.html#af2b9bef14b4f338c7bdd79a1bb155a0f
Having said that, your best bet might be to ask directly at https://github.com/CVC4/CVC4/issues to see if they've something similar.

Text Preprocessing in Spark-Scala

I want to apply preprocessing phase on a large amount of text data in Spark-Scala such as Lemmatization - Remove Stop Words(using Tf-Idf) - POS tagging , there is any way to implement them in Spark - Scala ?
for example here is one sample of my data:
The perfect fit for my iPod photo. Great sound for a great price. I use it everywhere. it is very usefulness for me.
after preprocessing:
perfect fit iPod photo great sound great price use everywhere very useful
and they have POS tags e.g (iPod,NN) (photo,NN)
there is a POS tagging (sister.arizona) is it applicable in Spark?
Anything is possible. The question is what YOUR preferred way of doing this would be.
For example, do you have a stop word dictionary that works for you (it could just simply be a Set), or would you want to run TF-IDF to automatically pick the stop words (note that this would require some supervision, such as picking the threshold at which the word would be considered a stop word). You can provide the dictionary, and Spark's MLLib already comes with TF-IDF.
The POS tags step is tricky. Most NLP libraries on the JVM (e.g. Stanford CoreNLP) don't implement java.io.Serializable, but you can perform the map step using them, e.g.
myRdd.map(functionToEmitPOSTags)
On the other hand, don't emit an RDD that contains non-serializable classes from that NLP library, since steps such as collect(), saveAsNewAPIHadoopFile, etc. will fail. Also to reduce headaches with serialization, use Kryo instead of the default Java serialization. There are numerous posts about this issue if you google around, but see here and here.
Once you figure out the serialization issues, you need to figure out which NLP library to use to generate the POS tags. There are plenty of those, e.g. Stanford CoreNLP, LingPipe and Mallet for Java, Epic for Scala, etc. Note that you can of course use the Java NLP libraries with Scala, including with wrappers such as the University of Arizona's Sista wrapper around Stanford CoreNLP, etc.
Also, why didn't your example lower-case the processed text? That's pretty much the first thing I would do. If you have special cases such as iPod, you could apply the lower-casing except in those cases. In general, though, I would lower-case everything. If you're removing punctuation, you should probably first split the text into sentences (split on the period using regex, etc.). If you're removing punctuation in general, that can of course be done using regex.
How deeply do you want to stem? For example, the Porter stemmer (there are implementations in every NLP library) stems so deeply that "universe" and "university" become the same resulting stem. Do you really want that? There are less aggressive stemmers out there, depending on your use case. Also, why use stemming if you can use lemmatization, i.e. splitting the word into the grammatical prefix, root and suffix (e.g. walked = walk (root) + ed (suffix)). The roots would then give you better results than stems in most cases. Most NLP libraries that I mentioned above do that.
Also, what's your distinction between a stop word and a non-useful word? For example, you removed the pronoun in the subject form "I" and the possessive form "my," but not the object form "me." I recommend picking up an NLP textbook like "Speech and Language Processing" by Jurafsky and Martin (for the ambitious), or just reading the one of the engineering-centered books about NLP tools such as LingPipe for Java, NLTK for Python, etc., to get a good overview of the terminology, the steps in an NLP pipeline, etc.
There is no built-in NLP capability in Apache Spark. You would have to implement it for yourself, perhaps based on a non-distributed NLP library, as described in marekinfo's excellent answer.
I would suggest you to take a look in spark's ml pipeline. You may not get everything out of the box yet, but you can build your capabililties and use pipeline as a framework..

How is Perl useful as a metadata tool?

In The Pragmatic Programmer:
Normally, you can simply hide a third-party product behind a
well-defined, abstract interface. In fact , we've always been able to
do so on any project we've worked on. But suppose you couldn't isolate
it that cleanly. What if you had to sprinkle certain statements
liberally throughout the code? Put that requirement in metadata, and
use some automatic mechanism, such as Aspects (see page 39 ) or Perl,
to insert the necessary statements into the code itself.
Here the author is referring to Aspect Oriented Programming and Perl as tools that support "automatic mechanisms" for inserting metadata.
In my mind I envision some type of run-time injection of code. How does Perl allow for "automatic mechanisms" for inserting metadata?
Skip ahead to the section on Code Generators. The author provides a number of examples of processing input files to generate code, including this one:
Another example of melding environments using code generators happens when different programming languages are used in the same application. In order to communicate, each code base will need some information in commondata structures, message formats, and field names, for example. Rather than duplicate this information, use a code generator. Sometimes you can parse the information out of the source files of one language and use it to generate code in a second language. Often, though, it is simpler to express it in a simpler, language-neutral representation and generate the code for both languages, as shown in Figure 3.4 on the following page. Also see the answer to Exercise 13 on page 286 for an example of how to separate the parsing of the flat file representation from code generation.
The answer to Exercise 13 is a set of Perl programs used to generate C and Pascal data structures from a common input file.

Is there any suitable way to generate a [jpg, png etc.] syntax diagram(and/or AST) directly from Scala Parser Combinators?

The only ways I am aware of, aren't "direct":
converting to ANTLR format and using its own visualizer
VISUALLANGLAB, which it seems to require an entire mouse-clicks "rewrite"
implementing a converter by myself (which would be funny, but time-consuming)
second link below
Related:
comparison
wrapper
a 3rd party attempt
The second link suggests to debug adding an implicitly method to the parsers:
implicit def toLogged(name:String) = new {
def !!![T](p:Parser[T]) = log(p)(name)
}
May be an AST would be more feasible/usefull; but the question remains similar.
I might have misunderstood your question.
Scala parser combinators are used to parse strings to instances of types that you can use (either custom or built-in). The result is a structure of Scala instances that you decide, this could be anything.
You could create a parser that parses your arbitrary string into instances of a well known java structure for example ECore.
Without a usecase it's hard to suggest the best road for your problem. Maybe Xtext can help you: http://www.eclipse.org/Xtext/. Xtext has quite a few built-in features, however it's an Eclipse plugin and you might need something else.

Environment properties files in scala project

Just starting to learn scala for a new project. Have got to the point where I would like to define different properties files for the different environments the app is going to run on, ideally in a similar way to Rails - very lightweight, just one different properties file per environment that is loaded based on its name. I don't really care if it's a java properties file, YML or scala code.
In the spirit of not reinventing the wheel I've been looking to see if there is some accepted standard Scala way of doing this but I can't find one, I've found a few similar but not identical questions here where people suggest using system properties in the startup script but this feels like it would end up being a nightmare.
I could obviously implement it if needs be but feels like the sort of thing that should already exist. So - does it?
I'm using sbt if that makes a difference.
I know of Configgy. Also, Akka/Play 2.0 will be using Config, which looks nice too. See blog about the latter.
Basically, Configgy has been used for a while now, but has been deprecated, while Config will be all-new. However, having Config as the default Typesafe Stack configuration tool will probably make it the preferred tool for that pretty fast.
I have written a Configgy replacement called Configrity. It can use different input formats (like YAML), it's immutable, supports functional patterns and uses type class to convert automatically the values to the desired type.
I have written BeeConfig, a replacement for java.util.Properties except that it is a Scala API and uses UTF-encoded configuration files. It supports string interpolation, chaining and a bunch of other features. But its main objective is simplicity.
Bitbucket | Blog post
Rick