What is the official specification for PMML substring handling of long strings? - pmml

Given a substring Definition of
<Apply function="substring">
<FieldRef field="Input"/>
<Constant>1</Constant>
<Constant>2</Constant>
</Apply>
What is the official specification on what will happen if the string "helloworld" is the input?
Is it not allowed, or should something else occur?

Please refer to the specification of PMML built-in function "substring", which is based on XQuery built-in function "substring".
In Java, your expression translates to the following input.substring((1 - 1), (1 - 1) + 2).
The important thing to notice is that in PMML and XQuery the indexing of strings starts from position 1 (not 0). Also, there is no such thing as StringIndexOutOfBoundsException when working with this function. If you are interested in obtaining the remainder of a string, then you can pass an arbitrarily large number as the length argument.

Related

Extract Types/Classnames from flat Modelica code

I was wondering if there already exists a possibility to extract from flat Modelica code all variables AND their corresponding types (classnames respectively).
For example:
Given an extract from a flattened Modelica model:
constant Integer nSurfaces = 8;
constant Integer construction1.nLayers(min = 1.0) = 2 "Number of layers of the construction";
parameter Modelica.SIunits.Length construction1.thickness[construction1.nLayers]= {0.2, 0.1} "Thickness of each construction layer";
Here, the wanted output would be something like:
nSurfaces, Integer, constant;
construction1.nLayers, Integer, constant;
construction1.thickness[construction1.nLayers], Modelica.SIunits.Length, parameter
Ideally, for construction1.thickness there would be two lines (=number of construction1.nLayers).
I know, that it is possible to get a list of used variables from the dsin.txt, which is produced while translating a model. But until now I did not find an already existing way to get the corresponding types. And I really would like to avoid writing an own parser :-).
You could try to generate the file modelDescription.xml as defined by the FMI standard. It contains a ton of information and XML should be easier to parse, e.g. python has a couple of xml parsing/reading packages.
If you are using Dymola you just set the flag Advanced.FMI.GenerateModelDescriptionInterface2 = true to generate the model description file.
The second idea could be to let the compiler/tool parse the Modelica file for you as they need to do that anyway, try searching for AST (abstract syntax tree). In Dymola, this is available through the ModelManagement library, and also through the Python interface.
Third idea could be to use one of the Modelica parsers available, e.g. have a look at:
https://github.com/lbl-srg/modelica-json
https://hackage.haskell.org/package/modelicaparser
https://github.com/xie-dongping/modparc
https://github.com/pymoca/pymoca
https://github.com/pymola/pymola/tree/master/src/pymola
Fourth, if all that did not work, you still do not have to write a full parser, you could use ANTLR, then use an existing grammar file (look for e.g. modelica.g4).

How Scalding DSL translates into regular Scala code?

Please help to find out how Scalding DSL translates into regular Scala code.
https://github.com/twitter/scalding/wiki/Fields-based-API-Reference#sortBy
For example:
val fasterBirds = birds.map('speed -> 'doubledSpeed) { speed : Int => speed * 2 }
Questions:
What conventions I need to follow to add my own functions to Scalding map,reduce, groupBy,sort and `scanLeft?
How Scalding translates expressions on fields like `'inpFld -> 'outFld to Scala code?
What data structures/functions Scalding translator creates? Where to find them in Scalding source code?
Thanks!
That IS regular Scala code. One strength of Scala lies in its extensibility. The syntax allows the programmer to extend the syntax of programs to create domain-specific languages. This is especially helpful when using underlying libraries.
The domain-specific language of Scala doesn't translate so much as allow you to defer application of code until the appropriate time. The tick character (') means that the following set of characters is a symbol, built-in datatype. The -> operator is syntactic sugar that can be expressed in the same way that a comma is, but visually, it imparts the concept of "translation" or "from this to that".
The domain-specific language you are looking at doesn't create structures, although it looks like it does create a functor. In this case it is a seen by the Java Virtual Machine as a Function1[Type,Type] instance which has an apply method that takes its argument and returns a result which is calculated by the provided code.

Specification and correct use of (boolean) URI matrix parameters (and making them optional when using CXF/JAXB)?

I was wondering if the "proper" use of URI/URL matrix parameters was ever defined in a specification, such as an RFC or a W3 recommendation?
In particular, I just joined a project where we use matrix parameters and a Java framework to implement a REST service. One of the matrix parameters we have for our REST service is a boolean one, much like ;sortByDate=true
What bugged me about this one is that the Java framework we use apparently insists that boolean parameters are always passed in (i.e. you cannot make them optional/omit them; probably because they are converted to Java boolean type). I think that's a bit odd...
I have to doublecheck what framework we use tomorrow (I think it is JAXB), but in the meantime I wondered if matrix parameters were defined in an official specification somewhere, and if such a specification made any mention of boolean parameters.
So far I found a hint (though no mention of boolean matrix parameters) in Appendix B 2.2 of the W3's "HTML 4.01 Specification":
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
And the "Web Application Description Language" specification specifies:
Boolean matrix parameters are represented as: ';' name when value is true and are omitted from identifier when value is false
What I haven't found is "the" specification for matrix parameters. Is there any? Does it mention how boolean matrix parameters should be used? If not, is there an established best practice?
And, as a bonus question: can you omit boolean URL matrix parameters when using CXF (JAXB), or do you always have to specify them?
Cheers! :)
Update: We're using CXF (which apparently uses JAXB under the hood...)
RFC3986 describes Matrix Parameters without explicitly naming them. Quoting https://www.rfc-editor.org/rfc/rfc3986#section-3.3:
For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.
I hope this helps.
I think that this answer does a good job of explaining the purpose of matrix parameters:
https://stackoverflow.com/a/5602678
You can use the Boolean wrapper class to support an optional boolean value. The values true and false will be mapped to the correct boolean values.
#MatrixParam("sortByDate") Boolean sortByDate
It will be null if the param is not present. Note that JAXB doesn't apply when dealing with JAX-RS parameters.

Simplify boolean expression i.t.o variable occurrence

How to simplify a given boolean expression with many variables (>10) so that the number of occurrences of each variable is minimized?
In my scenario, the value of a variable has to be considered ephemeral, that is, has to recomputed for each access (while still being static of course). I therefor need to minimize the number of times a variable has to be evaluated before trying to solve the function.
Consider the function
f(A,B,C,D,E,F) = (ABC)+(ABCD)+(ABEF)
Recursively using the distributive and absorption law one comes up with
f'(A,B,C,E,F) = AB(C+(EF))
I'm now wondering if there is an algorithm or method to solve this task in minimal runtime.
Using only Quine-McCluskey in the example above gives
f'(A,B,C,E,F) = (ABEF) + (ABC)
which is not optimal for my case. Is it save to assume that simplifying with QM first and then use algebra like above to reduce further is optimal?
I usually use Wolfram Alpha for this sort of thing.
Try Logic Friday 1
It features multi-level design of boolean circuits.
For your example, input and output look as follows:
You can use an online boolean expression calculator like https://www.dcode.fr/boolean-expressions-calculator
You can refer to Any good boolean expression simplifiers out there? it will definitely help.

Finding info on scala operators

Im reading http://debasishg.blogspot.com/2008/04/external-dsls-made-easy-with-scala.html and I am trying to find info on the "<~" operator, for example:
def trans = "(" ~> repsep(trans_spec, ",") <~ ")"
I have some reasonable guess that has something to do with the product("~") operator along with lists?
What does it do?
In the future, how do I lookup operators like that? It is no good to google "<~" for example.
EDIT:
Found the "<~" info in Scala combinator parsers - distinguish between number strings and variable strings
Question 2 remains
On Question 2, unfortunately that is one disadvantage of Scala's allowance of non-alphabetic characters, they're not easily found in search engines. Your best bet is simply to check the Scaladocs of whatever code is in scope.
Regarding Question 2, there is an upcoming (time-frame unkonwn to me) addition to the ScalaDoc processor that will produce a cross-reference index that allows you to look up method and field names and see which classes declare or define them.
You can get a preview of this (not integrated with the ScalaDocs, but useful nonetheless) here: ScalaDoc Name Index