How do you get InputColumn names from the Model? - pyspark

For example, take OneHotEncoderModel but you could take anything from pyspark.ml.feature package. When you use OneHotEncoderEstimator you have the option to set the inputCols. In face you must use the inputCols and outputCols in the constructor.
After you create the corresponding model from the estimator, you cannot retrieve the value for inputCols anymore. There is no method like getInputCols() to give you that from the given model. If you use getParam("inputCols") it will just give you the Param description and not its value.
If you look at the serialized model (the metadata file) the value for this param (inputCols) is actually written out. See example below:
{"class":"org.apache.spark.ml.feature.OneHotEncoderModel","timestamp":1548215172466,"sparkVersion":"2.4.0","uid":"OneHotEncoderEstimator_c5fcbebe4045","paramMap":{"inputCols":["workclass-tmp"],"outputCols":["workclass-encoded"]},"defaultParamMap":{"handleInvalid":"error","dropLast":true}}
However I'm looking for a way to get that from the API.

Correction to my earlier answer:
the right method is called getOrDefault. For instance:
model.getOrDefault("inputCols")
It looks like there is this undocumented way of getting to those values:
model._paramMap[model.inputCols]
or
model._paramMap[model.params["inputCols"]]

Related

Apache AGE - Creating Functions With Multiple Parameters

I was looking inside the create_vlabel function and noted that to get the graph_name and label_name it is used graph_name = PG_GETARG_NAME(0) and label_name = PG_GETARG_NAME(1). Since these two variables are also passed as parameters, I was thinking that, if I wanted to add one more parameter to this function, then I would need to use PG_GETARG_NAME(2) to get this parameter and use it in the function's logic. Is my assumption correct or do I need to do more tweaks to do this?
You are correct, but you also need to change the function signature in the "age--1.2.0.sql" file, updating the arguments:
CREATE FUNCTION ag_catalog.create_vlabel(graph_name name, label_name name, type new_argument)
RETURNS void
LANGUAGE c
AS 'MODULE_PATHNAME';
Note that all arguments come as a "Datum" struct, and PG_GETARG_NAME automatically converts it to a "Name" struct. If you need an argument as int32, for example, you should use PG_GETARG_INT32(index_of_the_argument), for strings, PG_GETARG_CSTRING(n), and so on.
Yes, your assumption is correct. If you want to add an additional parameter to the create_vlabel function in PostgreSQL, you can retrieve the value of the third argument using PG_GETARG_NAME(2). Keep in mind that you may need to make additional modifications to the function's logic to handle the new parameter correctly.
The answers given by Fahad Zaheer and Marco Souza are correct, but you can also create a Variadic function, with which you could have n number of arguments but one drawback is that you would have to check the type yourself. You can find more information here. You can also check many Apache Age functions made this way e.g agtype_to_int2.

Losing path dependent type when extracting value from Try in scala

I'm working with scalax to generate a graph of my Spark operationS. So, I have a custom library that generates my graph. So, let me show a sample:
val DAGWithoutGet = createGraphFromOps(ops)
val DAGWithGet = createGraphFromOps(ops).get
The return type of DAGWithoutGet is
scala.util.Try[scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge]],
and, for DAGWithGet is
scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge].
Here, typeA is a project related class representing a single Spark operation, not relevant for the context of this question. (for context only: What my custom library does is, essentially, generate a map of dependencies between those operations, creating a big Map object, and calling Graph(myBigMap: _*) to generate the graph).
As far as I know, calling the .get command on this point of my code or later should not make any difference, but that is not what I'm seeing.
Calling DAGWithoutGet.get.nodes has a return type of scalax.collection.Graph[typeA,DiEdge]#NodeSetT,
while calling DAGWithGet.nodes returns DAGWithGet.NodeSetT.
When I extract one of those nodes (using the .find method), I receive scalax.collection.Graph[typeA,DiEdge]#NodeT and DAGWithGet.NodeT types, respectively. Much to my dismay, even the methods available in each case are different - I cannot use pathTo (which happens to be what I want) or withSubgraph on the former, only on the latter.
My doubt is, then, after this relatively complex example: what is going on here? Why extracting the value from the Try construct on different moments leads to different types, one path dependent, and the other not - or, if that isn't correct, what may I be missing here?

Extract Types/Classnames from flat Modelica code

I was wondering if there already exists a possibility to extract from flat Modelica code all variables AND their corresponding types (classnames respectively).
For example:
Given an extract from a flattened Modelica model:
constant Integer nSurfaces = 8;
constant Integer construction1.nLayers(min = 1.0) = 2 "Number of layers of the construction";
parameter Modelica.SIunits.Length construction1.thickness[construction1.nLayers]= {0.2, 0.1} "Thickness of each construction layer";
Here, the wanted output would be something like:
nSurfaces, Integer, constant;
construction1.nLayers, Integer, constant;
construction1.thickness[construction1.nLayers], Modelica.SIunits.Length, parameter
Ideally, for construction1.thickness there would be two lines (=number of construction1.nLayers).
I know, that it is possible to get a list of used variables from the dsin.txt, which is produced while translating a model. But until now I did not find an already existing way to get the corresponding types. And I really would like to avoid writing an own parser :-).
You could try to generate the file modelDescription.xml as defined by the FMI standard. It contains a ton of information and XML should be easier to parse, e.g. python has a couple of xml parsing/reading packages.
If you are using Dymola you just set the flag Advanced.FMI.GenerateModelDescriptionInterface2 = true to generate the model description file.
The second idea could be to let the compiler/tool parse the Modelica file for you as they need to do that anyway, try searching for AST (abstract syntax tree). In Dymola, this is available through the ModelManagement library, and also through the Python interface.
Third idea could be to use one of the Modelica parsers available, e.g. have a look at:
https://github.com/lbl-srg/modelica-json
https://hackage.haskell.org/package/modelicaparser
https://github.com/xie-dongping/modparc
https://github.com/pymoca/pymoca
https://github.com/pymola/pymola/tree/master/src/pymola
Fourth, if all that did not work, you still do not have to write a full parser, you could use ANTLR, then use an existing grammar file (look for e.g. modelica.g4).

Tell IPython to use an object's `__str__` instead of `__repr__` for output

By default, when IPython displays an object, it seems to use __repr__.
__repr__ is supposed to produce a unique string which could be used to reconstruct an object, given the right environment.
This is distinct from __str__, which supposed to produce human-readable output.
Now suppose we've written a particular class and we'd like IPython to produce human readable output by default (i.e. without explicitly calling print or __str__).
We don't want to fudge it by making our class's __repr__ do __str__'s job.
That would be breaking the rules.
Is there a way to tell IPython to invoke __str__ by default for a particular class?
This is certainly possible; you just need implement the instance method _repr_pretty_(self). This is described in the documentation for IPython.lib.pretty. Its implementation could look something like this:
class MyObject:
def _repr_pretty_(self, p, cycle):
p.text(str(self) if not cycle else '...')
The p parameter is an instance of IPython.lib.pretty.PrettyPrinter, whose methods you should use to output the text representation of the object you're formatting. Usually you will use p.text(text) which just adds the given text verbatim to the formatted representation, but you can do things like starting and ending groups if your class represents a collection.
The cycle parameter is a boolean that indicates whether a reference cycle is detected - that is, whether you're trying to format the object twice in the same call stack (which leads to an infinite loop). It may or may not be necessary to consider it depending on what kind of object you're using, but it doesn't hurt.
As a bonus, if you want to do this for a class whose code you don't have access to (or, more accurately, don't want to) modify, or if you just want to make a temporary change for testing, you can use the IPython display formatter's for_type method, as shown in this example of customizing int display. In your case, you would use
get_ipython().display_formatter.formatters['text/plain'].for_type(
MyObject,
lambda obj, p, cycle: p.text(str(obj) if not cycle else '...')
)
with MyObject of course representing the type you want to customize the printing of. Note that the lambda function carries the same signature as _repr_pretty_, and works the same way.

Using module annotations as method attributes

So here's an interesting question, what I look at the documentation for module attributes in elixir i.e. http://elixir-lang.org/getting-started/module-attributes.html at the bottom it mentions that they can be used as method annotations as in ExUnit.
Unfortunately there is basically no information on how to achieve this and looking through ExUnit code has just got me lost. It seems like I would need to determine the closest method to the attribute to say that they are associated in some way (could be wrong though).
Any idea where I might look to learn about this?
It works like this. Look at source code of ExUnit.Case.
At first, look into __using__ macro, since it will be invoked first when you use it in a test case. Particularly, note here
Enum.each [:ex_unit_tests, :tag, :describetag, :moduletag, :ex_unit_registered],
&Module.register_attribute(__MODULE__, &1, accumulate: true)
This registers #tag and a bunch of more attributes as accumulated. Read the docs of Module.register_attribute/3, and you will see it means anytime attribute is invoked, the value gets appended to a list of previous attributes.
Then note test/3 macro, particularly here
quote bind_quoted: [var: var, contents: contents, message: message] do
name = ExUnit.Case.register_test(__ENV__, :test, message, [])
def unquote(name)(unquote(var)), do: unquote(contents)
end
Note the call to ExUnit.Case.register_test/4. Looking at it, specially here
tag = Module.delete_attribute(mod, :tag)
It fetches the tags until here, and deletes them. And by having the tags, and the name of test, it invokes (here)
test = %ExUnit.Test{name: name, case: mod, tags: tags}
Module.put_attribute(mod, :ex_unit_tests, test)
which saves the test along with tags inside another attributes.
And at last, note here
#doc false
defmacro __before_compile__(_) do
quote do
def __ex_unit__(:case) do
%ExUnit.TestCase{name: __MODULE__, tests: #ex_unit_tests}
end
end
end
The function __ex_unit__/1 is called in ExUnit.Runner.run_case/3 to get information of tests inside each case.
You see the point? Use an accumulated attribute, inside your macro call a function that always gets current value of the attribute and clears it, then do anything you want with the value, because you know it is always for when the macro is called.
I hope it was clear enough, drop a comment if you need more explanation.
PS. I just read the source code to find this out. It was exciting to know how it works.