CrateDB create custom analyzer - crate

I'm trying to create a custom analyzer, following the syntax from https://crate.io/docs/crate/reference/en/latest/sql/statements/create-analyzer.html however when I attempt to create the following :
create analyzer FullAddressAnalyzer ([TOKENIZER ngram with (min_gram = 2, max_gram =10)])
I get the SQLActionException[SQLParseException: tokenizer name 'ngram' is reserved] error, this baffles me as the documentation explains that you can use parameters for ngram but this doesn't seem to make any sense

Seems like the CrateDB SQL reference documentation is not correct for this case.
When creating a custom analyzer with a parameterized tokenizer, one must use a custom name for the tokenizer while defining the type of the tokenizer. Example:
create analyzer full_address_analyzer (TOKENIZER my_ngram with (type = ngram, min_gram = 2, max_gram =10)])

Related

How do you get InputColumn names from the Model?

For example, take OneHotEncoderModel but you could take anything from pyspark.ml.feature package. When you use OneHotEncoderEstimator you have the option to set the inputCols. In face you must use the inputCols and outputCols in the constructor.
After you create the corresponding model from the estimator, you cannot retrieve the value for inputCols anymore. There is no method like getInputCols() to give you that from the given model. If you use getParam("inputCols") it will just give you the Param description and not its value.
If you look at the serialized model (the metadata file) the value for this param (inputCols) is actually written out. See example below:
{"class":"org.apache.spark.ml.feature.OneHotEncoderModel","timestamp":1548215172466,"sparkVersion":"2.4.0","uid":"OneHotEncoderEstimator_c5fcbebe4045","paramMap":{"inputCols":["workclass-tmp"],"outputCols":["workclass-encoded"]},"defaultParamMap":{"handleInvalid":"error","dropLast":true}}
However I'm looking for a way to get that from the API.
Correction to my earlier answer:
the right method is called getOrDefault. For instance:
model.getOrDefault("inputCols")
It looks like there is this undocumented way of getting to those values:
model._paramMap[model.inputCols]
or
model._paramMap[model.params["inputCols"]]

Replacement for deprecated PostgresDataType.JSON?

I'm using JOOQ with PostgreSQL, and trying to implement a query like this:
INSERT INTO dest_table (id,name,custom_data)
SELECT key as id,
nameproperty as name,
CONCAT('{"propertyA": "',property_a,'", "propertyB": "',property_b,'","propertyC": "',property_c,'"}')::json as custom_data
FROM source_table
The concatenation/JSON bit is what I'm here to ask about. I actually have managed to get it working, but only by using this (Kotlin):
val concatBits = mutableListOf<Field<Any>>()
... build up various bits of the concatenation ...
val concatField = concat(*(concatBits.toTypedArray())).cast(PostgresDataType.JSON)
It concerns me that PostgresDataType is deprecated. The documentation says I should use SQLDataType instead, but it has no JSON value.
What's the recommended way to do this?
EDIT: a bit more information ...
I'm building the query like this:
val innerSelectFields = listOf(
field("key").`as`(DEST_TABLE.ID),
field("nameproperty").`as`(DEST_TABLE.NAME),
concatField.`as`(DEST_TABLE.CUSTOM_DATA)
)
val innerSelect = dslContext
.select(innerSelectFields)
.from(table("source_table"))
val insertInto = dslContext
.insertInto(DEST_TABLE)
.select(innerSelect)
The initial query I posted is slightly misleading, as the resulting SQL from this code doesn't have the
(id,name,custom_data) part.
Also, in case it matters, "source_table" is a temporary table, created during runtime, so there are no autogenerated classes for it.
jOOQ currently doesn't support the JSON data type out of the box. The main reason is that it is unclear what Java type to bind a JSON data structure to, as the JDK doesn't have such a standard type, and jOOQ will not prefer one third party library over the other.
The currently recommended approach is to create your own custom data type binding for your preferred third party JSON library:
https://www.jooq.org/doc/latest/manual/code-generation/custom-data-type-bindings
In that case, you will no longer need to explicitly cast your bind variable to some JSON type, because your binding will take care of that transparently.

User defined postgresql types using Npgsql from F#

We use postgresql's features to the maximum to ease our development effort. We make heavy use of custom types (user defined types) in postgresql; most of our functions and stored procedures either take them as input parameters or return them.
We would like to make use of them from F#'s SqlDataProvider. That means we should somehow be able to tell F# how to map F# user type to postgresql user type. In other words
Postgresql has our defined user type post_user_defined
F# has our defined user type fsharp_user_defined
We should instruct Npgsql to somehow perform this mapping. My research so far points me to two approaches and none of them are completely clear to me. Any help is appreciated
Approach 1
NpgsqlTypes namespace has pre-defined set of postgresql types mapped to .NET out of box. Few of them are classes, others structures. Say I would like to use postgresql's built in type point which is mapped to .NET by Npgsql via NpgsqlPoint. I can map this to application specific data structure like this:
let point (x,y) = NpgsqlTypes.NpgsqlPoint(x,y)
(From PostgreSQLTests.fsx)
In this case, postgresql point and NpgsqlPoint (.NET) are already defined. Now I would like to do the same for my custom type.
Suppose the user defined postgresql composite is
create type product_t as ( name text, product_type text);
And the application data structure (F#) is the record
type product_f = {name :string; ptype :string }
or a tuple
type product_f = string * string
How do I tell Npgsql to make use of my type when passed as a parameter to postgresql functions/procedures? It looks like I will need to use NpgsqTypes.NpgsqlDbType.Composite or Npgsql.PostgresCompositeType which doesn't have a constructor that is public.
I am at a dead end here!
Approach 2
Taking cue from this post, I could create a custom type and register with MapCompositeGlobally and use it to pass to postgresql functions.So, here I try my hand at it
On Postgresql side, the type and functions are respectively
CREATE TYPE product_t AS
(name text,
product_type text)
and
func_product(p product_t) RETURNS void AS
And from my application in F#
type PgProductType(Name:string,ProductType:string)=
member this.Name = Name
member this.ProductType = ProductType
new() = PgProductType("","")
Npgsql.NpgsqlConnection.MapCompositeGlobally<PgProductType>("product_t",null)
and then
type Provider = SqlDataProvider
let ctx = Provider.GetDataContext()
let prd = new PgProductType("F#Product","")
ctx.Functions.FuncProduct.Invoke(prd);;
ctx.Functions.FuncIproduct.Invoke(prd);;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stdin(29,1): error FS0501: The member or object constructor 'Invoke' takes 0 argument(s) but is here given 1. The requir
ed signature is 'SqlDataProvider<...>.dataContext.Functions.FuncIproduct.Result.Invoke() : Unit'.
Its strange to note that the error reports that : constructor 'Invoke' takes 0 argument(s) but is here given 1. F# side of things are completely blind to the argument that postgresql function takes. It does recognize that the function FuncIproduct exists but blind to the arguments it takes.
Regarding your 1st approach, as you've understood NpgsqlTypes contains some types which Npgsql supports out of the box - but these are only PostgreSQL built-in types. You cannot add a new type into there without changing Npgsql's source code, which isn't something you want to do.
Also, you should understand the difference between user-defined types (which PostgreSQL calls "composite") and totally independent types such as point. The latter are full types (similar to int4), with their own custom binary representation, while the former aren't.
Your 2nd approach is the right one - Npgsql comes with full support for PostgreSQL composite types. I have no idea how SqlDataProvider functions - I'm assuming this is an F#-specific type provider - but once you've properly mapped your composite via MapCompositeGlobally, Npgsql allows you to write it transparently by setting an NpgsqlParameter's Value to an instance of PgProductType. It may be worth trying to get it working with type providers first.

Zend\db\sql - prepareStatementForSqlObject - still need to bind or worry about sql injection?

I'm using zf 2.4 and for this example in Zend\db\sql. Do I need to worry about sql injection or do I still need to do quote() or escape anything if I already use prepareStatementForSqlObject()? The below example will do the blind variable already?
https://framework.zend.com/manual/2.4/en/modules/zend.db.sql.html
use Zend\Db\Sql\Sql;
$sql = new Sql($adapter);
$select = $sql->select();
$select->from('foo');
$select->where(array('id' => $id));
$statement = $sql->prepareStatementForSqlObject($select);
$results = $statement->execute();
The Select class will cleverly check your predicate(s) and add them in a safe manner to the query to prevent SQL-injection. I'd recommend you to take a look at the source for yourself so I'll point you to the process and the classes that are responsible for this in the latest ZF version.
Predicate Processing
Take a look at the class PredicateSet. The method \Zend\Db\Sql\Predicate::addPredicates determines the best way to handle your predicate based on their type. In your case you are using an associative array. Every item in that array will be checked and processed based on type:
If an abstraction replacement character (questionmark) is found, it will be turned into an Expression.
If the value is NULL, an IS NULL check will be performed on the column found in the key: WHERE key IS NULL.
If the value is an array, and IN check will be performed on the kolumn found in the key: WHERE key IN (arrayVal1, arrayVal2, ...).
Otherwise, the predicate will be a new Operator of the type 'equals': WHERE key = value.
In each case the final predicate to be added to the Select will be implementing PredicateInterface
Preparing the statement
The method \Zend\Db\Sql\Sql::prepareStatementForSqlObject instructs its adapter (i.e. PDO) to create a statement that will be prepared. From here it gets a little bit more complicated.
\Zend\Db\Sql is where the real magic happens where in method \Zend\Db\Sql::createSqlFromSpecificationAndParameters the function vsprintf is used to build the query strings, as you can see here.
NotePlease consider using the new docs.framework.zend.com website from now on. This website is leading when it comes to documentation of the latest version.

Use of option helper in Play Framework 2.0 templates

I'm trying to use views.html.helper.select (documentation here). I don't know scala, so i'm using java. I need to pass object of type Seq[(String)(String)] to the template right? Something like:
#(fooForm:Form[Foo])(optionValues:Seq[(String)(String)])
#import helper._
#form(routes.foo){
#select(field=myForm("selectField"),options=optionValues)
}
I don't know how to create Seq[(String)(String)] in java. I need to fill this collection with pairs (id,title) from my enum class.
Can somebody show me some expample how to use the select helper?
I found this thread on users group, but Kevin's answer didn't helped me a lot.
The right type is: Seq[(String, String)]. It means a sequence of pairs of String. In Scala there is a way to define pairs using the arrow: a->b == (a, b). So you could write e.g.:
#select(field = myForm("selectField"), options = Seq("foo"->"Foo", "bar"->"Bar"))
But there is another helper, as shown in the documentation, to build the sequence of select options: options, so you can rewrite the above code as:
#select(myForm("selectField"), options("foo"->"Foo", "bar"->"Bar"))
In the case your options values are the same as their label, you can even shorten the code to:
#select(myForm("selectField"), options(List("Foo", "Bar")))
(note: in Play 2.0.4 options(List("Foo", "Bar")) doesn't compile, so you can try this options(Seq("Foo", "Bar")))
To fill the options from Java code, the more convenient way is to use either the overloaded options function taking a java.util.List<String> as parameter (in this cases options values will be the same as their label) or the overloaded function taking a java.util.Map<String, String>.