building a function to add checks to amazon deequ framework

building a function to add checks to amazon deequ framework - scala

Using amazon deequ library I'm trying to build a function that takes 3 parameters, the check object, a string telling what constraint needs to be run and another string that provides the constraint criteria. I have a bunch of checks that I want to read from a mysql table. My intention is to iterate through all the checks that I get from the mysql table and build a check object using the function I described above and run the checks on a source dataframe
Here a example of the amazon deequ
https://towardsdatascience.com/automated-data-quality-testing-at-scale-using-apache-spark-93bb1e2c5cd0
So the function call looks something like this,
var _check = build_check_object_function(check_object, "hasSize", "10000")
This function should add a new hasSize check to the check_object and return that.
The part where I'm stuck is how to translate the hasSize string to the hasSize function.
var _check = Check(CheckLevel.Error, "Data Validation Check")
val listOfFunctions= _check.getClass.getMethods.filter(!_.getName().contains('$'))
for (function <- listOfFunctions) {
if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {
_check = _check.function(row(3))
}else{
println("Not a match")}
}
Here is the error that I'm getting
<console>:38: error: value function is not a member of com.amazon.deequ.checks.Check
if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {_check = _check.function(row(3))

You can either use runtime reflection or build a thin translation layer between your database and the deequ declarations.
I would suggest you go with translating database constraint/check strings explicitly to deequ declarations, e.g.:
if (constraint == "hasSize") {
// as Constraint
Constraint.sizeConstraint(_ <= 10)
// as Check
Check(CheckLevel.Error, "name").hasSize(_ <= 10)
}

Related

How to use Kotlin's Ktorm to perform WHERE clause operation on custom Postgres "object type" avoiding "PSQLException: ERROR: operator does not exist"

Whilst performing a WHERE clause operation on a custom Postgres "object type" I ended up the following PSQLException.
Language: Kotlin.
ORM Library: Ktorm ORM.
Exception
org.postgresql.util.PSQLException: ERROR: operator does not exist: rate = character varying
Hint: No operator matches the given name and argument types. You might need to add explicit type casts.
I have followed the Ktorm official guides here but, there is no mention of custom Postgres types. Any pointers/help would be highly appreciated. See code below to reproduce:
Thank you.
Example test that would produce the above exception
internal class SuppliersInstanceDAOTest {
#Test
fun shouldReturnInstanceSequence() {
val database = Database.connect("jdbc:postgresql://localhost:5432/mydb", user = "postgres", password = "superpassword")
val instanceDate: LocalDate = LocalDate.of(2019, 4, 1)
database.withSchemaTransaction("suppliers") {
database.from(SuppliersInstanceTable)
.select(SuppliersInstanceTable.instanceSeq)
.whereWithConditions {
// The following line causes "ERROR: operator does not exist: rate = character varying"
it += SuppliersInstanceTable.rate eq Rate.DAILY
}.asIterable()
.first()
.getInt(1)
}
}
}
Schema
-- Note the special custom enum object type here that I cannot do anything about
CREATE TYPE suppliers.rate AS ENUM
('Daily', 'Byweekly');
CREATE TABLE suppliers.instance
(
rate suppliers.rate NOT NULL,
instance_value integer NOT NULL
)
TABLESPACE pg_default;
Kotlin's Ktorms Entities and bindings
enum class Rate(val value: String) {
DAILY("Daily"),
BIWEEKLY("Byweekly")
}
interface SuppliersInstance : Entity<SuppliersInstance> {
companion object : Entity.Factory<SuppliersInstance>()
val rate: Rate
val instanceSeq: Int
}
object SuppliersInstanceTable : Table<SuppliersInstance>("instance") {
val rate = enum("rate", typeRef<Rate>()).primaryKey().bindTo { it.rate } // <-- Suspect
//val rate = enum<Rate>("rate", typeRef()).primaryKey().bindTo { it.rate } // Failed too
val instanceSeq = int("instance_value").primaryKey().bindTo { it.instanceSeq }
}

After seeking help from the maintainers of Ktorm, it turns out there is support in the newer version of ktorm for native Postgressql enum object types. In my case I needed pgEnum instead of the default ktrom enum function that converts the enums to varchar causing the clash in types in Postressql:
Reference here for pgEnum
However, note at the time of writing ktorm's pgEnum function is only in ktorm v3.2.x +. The latest version available in Jcentral and mavenCentral maven repositories is v3.1.0. This is because there is also a group name change from me.liuwj.ktorm to org.ktorm in the latest version. So upgrading would also mean changing the group name in your dependencies for the new group name to match the maven repos. This was a seamless upgrade for my project and the new pgEnum worked in my use case.
For my code examples above, this would mean swapping this
object SuppliersInstanceTable : Table<SuppliersInstance>("instance") {
val rate = enum("rate", typeRef<Rate>()).primaryKey().bindTo { it.rate } <---
...
}
For
object SuppliersInstanceTable : Table<SuppliersInstance>("instance") {
val rate = pgEnum<Rate>("rate").primaryKey().bindTo { it.rate } <---
...
}

How to enumerate over columns with tokio-postgres when the field types are unknown at compile-time?

I would like a generic function that converts the result of a SQL query to JSON. I would like to build a JSON string manually (or use an external library). For that to happen, I need to be able to enumerate the columns in a row dynamically.
let rows = client
.query("select * from ExampleTable;")
.await?;
// This is how you read a string if you know the first column is a string type.
let thisValue: &str = rows[0].get(0);
Dynamic types are possible with Rust, but not with the tokio-postgres library API.
The row.get function of tokio-postgres is designed to require generic inference according to the source code
Without the right API, how can I enumerate rows and columns?

You need to enumerate the rows and columns, doing so you can get the column reference while enumerating, and from that get the postgresql-type. With the type information it's possible to have conditional logic to choose different sub-functions to both: i) get the strongly typed variable; and, ii) convert to a JSON value.
for (rowIndex, row) in rows.iter().enumerate() {
for (colIndex, column) in row.columns().iter().enumerate() {
let colType: string = col.type_().to_string();
if colType == "int4" { //i32
let value: i32 = row.get(colIndex);
return value.to_string();
}
else if colType == "text" {
let value: &str = row.get(colIndex);
return value; //TODO: escape characters
}
//TODO: more type support
else {
//TODO: raise error
}
}
}
Bonus tips for tokio-postgres code maintainers
Ideally, tokio-postgres would include a direct API that returns a dyn any type. The internals of row.rs already use the database column type information to confirm that the supplied generic type is valid. Ideally a new API uses would use the internal column information quite directly with improved FromSQL API, but a simpler middle-ground exists:-
It would be possible for an extra function layer in row.rs that uses the same column type conditional logic used in this answer to then leverage the existing get function. If a user such as myself needs to handle this kind of conditional logic, I also need to maintain this code when new types are handled by tokio-postgresql, therefore, this kind of logic should be included inside the library where such functionality can be better maintained.

Slick insert not working while trying to return inserted row

My goal here is to retrieve the Board entity upon insert. If the entity exists then I just want to return the existing object (which coincides with the argument of the add method). Otherwise I'd like to return the new row inserted in the database.
I am using Play 2.7 with Slick 3.2 and MySQL 5.7.
The implementation is based on this answer which is more than insightful.
Also from Essential Slick
exec(messages returning messages +=
Message("Dave", "So... what do we do now?"))
DAO code
#Singleton
class SlickDao #Inject()(db: Database,implicit val playDefaultContext: ExecutionContext) extends MyDao {
override def add(board: Board): Future[Board] = {
val insert = Boards
.filter(b => b.id === board.id && ).exists.result.flatMap { exists =>
if (!exists) Boards returning Boards += board
else DBIO.successful(board) // no-op - return specified board
}.transactionally
db.run(insert)
}
EDIT: also tried replacing the += part with
Boards returning Boards.map(_.id) into { (b, boardId) => sb.copy(id = boardId) } += board
and this does not work either
The table definition is the following:
object Board {
val Boards: TableQuery[BoardTable] = TableQuery[BoardTable]
class BoardTable(tag: Tag) extends Table[BoardRow](tag, "BOARDS") {
// columns
def id = column[String]("ID", O.Length(128))
def x = column[String]("X")
def y = column[Option[Int]]("Y")
// foreign key definitions
.....
// primary key definitions
def pk = primaryKey("PK_BOARDS", (id,y))
// default projection
def * = (boardId, x, y).mapTo[BoardRow]
}
}
I would expect that there would e a new row in the table but although the exists query gets executed
select exists(select `ID`, `X`, `Y`
from `BOARDS`
where ((`ID` = '92f10c23-2087-409a-9c4f-eb2d4d6c841f'));
and the result is false there is no insert.
There is neither any logging in the database that any insert statements are received (I am referring to the general_log file)

So first of all the problem for the query execution was a mishandling of the futures that the DAO produced. I was assigning the insert statement to a future but this future was never submitted to an execution context. My bad even more so that I did not mention it in the description of the problem.
But when this was actually fixed I could see the actual error in the logs of my application. The stack trace was the following:
slick.SlickException: This DBMS allows only a single column to be returned from an INSERT, and that column must be an AutoInc column.
at slick.jdbc.JdbcStatementBuilderComponent$JdbcCompiledInsert.buildReturnColumns(JdbcStatementBuilderComponent.scala:67)
at slick.jdbc.JdbcActionComponent$ReturningInsertActionComposerImpl.x$17$lzycompute(JdbcActionComponent.scala:659)
at slick.jdbc.JdbcActionComponent$ReturningInsertActionComposerImpl.x$17(JdbcActionComponent.scala:659)
at slick.jdbc.JdbcActionComponent$ReturningInsertActionComposerImpl.keyColumns$lzycompute(JdbcActionComponent.scala:659)
at slick.jdbc.JdbcActionComponent$ReturningInsertActionComposerImpl.keyColumns(JdbcActionComponent.scala:659)
So this is a MySQL thing in its core. I had to redesign my schema in order to make this retrieval after insert possible. This redesign includes an introduction of a dedicated primary key (completely unrelated to the business logic) which is also an AutoInc column as the stack trace prescribes.
In the end the solution becomes too involved and instead decided to use the actual argument of the add method to return if the insert was actually successful. So the implementation of the add method ended up being something like this
override def add(board: Board): Future[Board] = {
db.run(Boards.insertOrUpdate(board).map(_ => board))
}
while there was some appropriate Future error handling in the controller which was invoking the underlying repo.
If you're lucky enough and not using MySQL with Slick I suppose you might have been able to do this without a dedicated AutoInc primary key. If not then I suppose this is a one way road.

Groovy sql.rows returns org.postgresql.util.PSQLException: No hstore extension installed

I am using Groovy Sql in Grails with named parameters to get results from a Postgres DB. My statement is generated dynamically, i.e. concatenated to become the final statement, with the params being added to a map as I go along.
sqlWhere += " AND bar = :namedParam1"
paramsMap.namedParam1 = "blah"
For readability, I am using the groovy string syntax which allows me to write my sql statement over multiple lines, like this:
sql = """
SELECT *
FROM foo
WHERE 1=1
${sqlWhere}
"""
The expression is evaluated as a string containing the linebreaks as \n:
SELECT *\n ...
This is not a problem when I pass params like this
results = sql.rows(sqlString, paramsMap)
but it does become one if paramsMap is empty (which happens since AND bar = :namedParam1 is not always concatenated into the query). I then get an error
org.postgresql.util.PSQLException: No hstore extension installed
which does not really seem to relate to the true nature of the problem. I have for now fixed this with an if...else
if (sqlQuery.params.size() > 0) {
results = sql.rows(sqlString, paramsMap)
} else {
results = sql.rows(sqlString.replace('\n',' '))
}
But this seems a bit weird (especially since it does not work if I use the replace in the if-branch as well).
My question is: why do I really get this error message and is there a better way to prevent it from occuring?

It's certainly a bug in groovy.sql.SQL implementation. The method rows() can't deal with an empty map passed as params. As a workaround, you can test for it and pass an empty list instead.
def paramsMap = [:]
...
if (paramsMap.isEmpty())
paramsMap= []
Issue created at https://issues.apache.org/jira/browse/GROOVY-8082

hooks causing issues with insert using subqueries

I'm not sure why this is causing me an issue, but I'm using Orient 2.1.19, found this in 2.1.12 as well. We are building some hooks to implement a method of encryption. I know 2.2 implements some encryption, but we had some further requirements.
Anyway, we have hooks for onRecordAfterRead, onRecordBeforeCreate and onRecordBeforeUpdate. It works for most statements fine, but with the hook in place, running a query that sets a link property using a subquery in an insert fails. Here's an example query:
create EDGE eThisEdge from (select from vVertex where thisproperty = 'this') to (select from vVertex where thatProperty = 'that' ) set current = (select from lookupCurrent where displayCurrentPast = 'Current');
Runnning this query gives me the error:
com.orientechnologies.orient.core.exception.OValidationException: The field 'eThisEdge.current' has been declared as LINK but the value is not a record or a record-id.
It's some issue with the way a subquery is ran during just an insert though, because if I run the insert without setting any properties, then run an update to set the properties, that works. I'd hate to have to rewrite all of our inserts for our base data and our coding just as a work around for this, and it seems like I'm just missing something here.
Has anyone seen this kind of issue with hooks as well?
The biggest issue seems to be surrounding the onRecordBeforeCreate code. We are trying to have a generic hook that encrypts strings in our database. Here's the basics of the onRecordBeforeCreate method:
public RESULT onRecordBeforeCreate( ODocument oDocument) {
RESULT changed = RESULT.RECORD_NOT_CHANGED;
try {
if(classIsCipherable(oDocument)) {
for (String field : oDocument.fieldNames()) {
if (oDocument.fieldType(field) != null && oDocument.fieldType(field) == OType.STRING && oDocument.field(field) != null) {
oDocument.field(field, crypto.encrypt(oDocument.field(field).toString()));
changed = RESULT.RECORD_CHANGED;
}
}
}
return changed;
} catch (Exception e) {
throw new RuntimeException( e );
}
Is there anything there that looks obvious that I'd have issues with running a create edge statement that sets properties with a property that is a link?

The query select from lookupCurrent where displayCurrentPast = "Current" return more than one element, you must use a LinkList or a LinkSet

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

building a function to add checks to amazon deequ framework - scala

Related

How to use Kotlin's Ktorm to perform WHERE clause operation on custom Postgres "object type" avoiding "PSQLException: ERROR: operator does not exist"

How to enumerate over columns with tokio-postgres when the field types are unknown at compile-time?

Slick insert not working while trying to return inserted row

Groovy sql.rows returns org.postgresql.util.PSQLException: No hstore extension installed

hooks causing issues with insert using subqueries

Categories

Resources