Scala code: Getting type mismatch error when using substring spark sql function? - scala

Error occurs at (instr(col("manuscriptpolicy_ext_vehicledescription"),"Registration No.:")+17) since it detects the type to be column and not int as the substring function input is substring(string, int, int). Ive tried casting it to be int but it still detects it as a column. Am I doing something that's causing it to be detected as a column?
val registrationNumber = pc_policy_df.select(col("vehicledescription"),
when(col("subproduct_ext")==="SpecialRiskOwnDamage" &&
instr(col("vehicledescription"),"Registration No.:")===0, "")
.when(col("subproduct_ext")==="SpecialRiskOwnDamage" &&
instr(col("vehicledescription"),"Registration No.:")>0, trim(substring(col
("vehicledescription"),(instr(col("manuscriptpolicy_ext_vehicledescription"),
"Registration No.:")+17),locate(";",col("vehicledescription"),instr
(col("vehicledescription"), "Registration No.:")+17)-(instr
(col("vehicledescription"),"Registration No.:") +17) )))
.otherwise("registrationnumber"))
.as("R_NUMBER")
}

You input to the trim function should look more like:
trim(
col("vehicledescription")
.substr(
instr(col("manuscriptpolicy_ext_vehicledescription"), "Registration No.:"+ lit(17)),
locate(";", col("vehicledescription"))))
The substr method is defined on Column and accepts columns as inputs which is what you want here.

Related

ConstraintSuggestionRunner not taking up columns enclosed with backticks

I am currently importing the dataset from an excel sheet which has a column name with a dot character like this "abc.xyz".
I went through a couple of stackOverflow questions and it says that we can replace it with the column names with backtick like this: "'abc.xyz'". So, I renamed all the column names which have a dot in it with the same name but enclosed in backticks like this:
df.columns.foreach(item => {
if(item.contains("."))
{
df.withColumnRenamed(item, s"`$item`")
}
})
Now when I pass this dataframe inside the ConstraintSuggestionRunner class like this:
val suggestionResult = ConstraintSuggestionRunner()
.onData(df)
.addConstraintRules(Rules.DEFAULT)
.setKLLParameters(KLLParameters(sketchSize = 2048, shrinkingFactor = 0.64, numberOfBuckets = 10))
.run()
I am getting errors like :
ERROR Main: org.apache.spark.sql.AnalysisException: cannot resolve
'`abc.xyz`' given input columns:
How can I resolve this error?
The escaping must be handled in Deequ but the issue is always open. What you did here is adding the backticks as part of the column names, not escaping them.
You can try to replace the dots by another caracheter like underscore _ then pass the dataframe with the renamed columns to the ConstraintSuggestionRunner:
val df1 = df.toDF(df.columns.map(_.replaceAll("[.]+", "_")):_*)
val suggestionResult = ConstraintSuggestionRunner()
.onData(df1)
.addConstraintRules(Rules.DEFAULT)
.setKLLParameters(KLLParameters(sketchSize = 2048, shrinkingFactor = 0.64, numberOfBuckets = 10))
.run()

found String, Required (String,String,String,Int): tuples-scala

I am having 3 ListBuffers of equal lengths.
devicenamelist:ListBuffer[String]
datelist:ListBuffer[String]
wordcountssortedlistbuf[(String,Int)]
Now I need to convert them of the format
ListBuffer(String,String,String,Int)
I tried to do the following
var sortedrecords=scala.collection.mutable.ListBuffer[(String,String,String,Int)]()
for(i <- 0 to devicenamelist.length)
{
sortedrecords+=(devicenamelist(i),datelist(i),wordcountssortedlistbuf(i)._1,wordcountssortedlistbuf(i)._2)
}
It gives me error as follows
[error] found String
Required (String,String,String,Int)
How is the list appending operation at the top giving only a single string when my intention was to create (String,String,String,Int). Am I missing something?
Thanks
You are missing a set of parentheses in your += line, but, please, don't do that, it hurts my eyes to see someone write something like this in scala.
Try something like this instead:
val sortedrecords = devicenamelist.zip(datelist).zip(wordcountssortedlistbuf)
.map { case ((devicename, date), (word, count)) =>
(devicename, date, word, count)
}

How to pass a null value received on msg.req.query to msg.payload

I am developing an application using Dashdb on Bluemix and nodered, my PHP application uses the call to webservice to invoke the node-red, whenever my function on PHP invokes the node to insert on table and the field GEO_ID is null, the application fails, I understand the issue, it seems the third parameter was not informed, I have just tried to check the param before and passing something like NULL but it continues not working.
See the code:
msg.account_id = msg.req.query.account_id;
msg.user_id = msg.req.query.user_id;
msg.geo_id=msg.req.query.geo_id;
msg.payload = "INSERT INTO ACCOUNT_USER (ACCOUNT_ID, USER_ID, GEO_ID) VALUES (?,?,?) ";
return msg;
And on Dashdb component I have set the parameter as below:
msg.account_id,msg.user_id,msg.geo_id
The third geo_id is the issue, I have tried something like the code below:
if(msg.req.query.geo_id===null){msg.geo_id=null}
or
if(msg.req.query.geo_id===null){msg.geo_id="null"}
The error I got is the one below:
dashDB query node: Error: [IBM][CLI Driver][DB2/LINUXX8664] SQL0420N Invalid character found in a character string argument of the function "DECIMAL". SQLSTATE=22018
I really appreciate if someone could help me on it .
Thanks,
Eduardo Diogo Garcia
Is it possible that msg.req.query.geo_id is set to an empty string?
In that case neither if statement above would get executed, and you would be trying to insert an empty string into a DECIMAL column. Maybe try something like this:
if (! msg.req.query.geo_id || msg.req.query.geo_id == '') {
msg.geo_id = null;
}

Anorm String Interpolation not replacing variables

We are using Scala Play, and I am trying to ensure that all SQL queries are using Anorm's String Interpolation. It works with some queries, but many are not actually replacing the variables before the query is executing.
import anorm.SQL
import anorm.SqlStringInterpolation
object SecureFile
{
val table = "secure_file"
val pk = "secure_file_idx"
...
// This method works exactly as I would hope
def insert(secureFile: SecureFile): Option[Long] = {
DBExec { implicit connection =>
SQL"""
INSERT INTO secure_file (
subscriber_idx,
mime_type,
file_size_bytes,
portal_msg_idx
) VALUES (
${secureFile.subscriberIdx},
${secureFile.mimeType},
${secureFile.fileSizeBytes},
${secureFile.portalMsgIdx}
)
""" executeInsert()
}
}
def delete(secureFileIdx: Long): Int = {
DBExec { implicit connection =>
// Prints correct values
println(s"table: ${table} pk: ${pk} secureFileIdx: ${secureFileIdx} ")
// Does not work
SQL"""
DELETE FROM $table WHERE ${pk} = ${secureFileIdx}
""".executeUpdate()
// Works, but unsafe
val query = s"DELETE FROM ${table} WHERE ${pk} = ${secureFileIdx}"
SQL(query).executeUpdate()
}
}
....
}
Over in the PostgreSQL logs, it's clear that the delete statement has not acquired the correct values:
2015-01-09 17:23:03 MST ERROR: syntax error at or near "$1" at character 23
2015-01-09 17:23:03 MST STATEMENT: DELETE FROM $1 WHERE $2 = $3
2015-01-09 17:23:03 MST LOG: execute S_1: ROLLBACK
I've tried many variations of execute, executeUpdate, and executeQuery with similar results. For the moment, we are using basic string replacement but of course this is bad because it's not using PreparedStatements.
For anyone else sitting on this page scratching their head and wondering what they might be missing...
SQL("select * from mytable where id = $id")
is NOT the same as
SQL"select * from mytable where id = $id"
The former does not do String interpolation whereas the latter does.
This is easily overlooked in the aforementioned docs as all the samples provided just happen to have a (non-related) closing parenthesis on them (like this sentence does)
Anorm String interpolation was introduced to pass parameter (e.g. SQL"Select * From Test Where id = $x), with interpolation arguments (e.g. $x) set on underlying PreparedStament according proper type conversion (see use cases on https://www.playframework.com/documentation/2.3.x/ScalaAnorm ).
Next Anorm release will also have the #$foo syntax to mix interpolation for parameter with standard string interpolation. This will allow to write DELETE FROM #$table WHERE #${pk} = ${secureFileIdx} and having it executed as DELETE FROM foo WHERE bar = ? (if literal table is "foo" and pk is "bar"), with literal secureFileIdx passed as parameter. See related pull request.
Until next revision is release, you can build Anorm from its master sources ti benefit from this change.

How to further improve error messages in Scala parser-combinator based parsers?

I've coded a parser based on Scala parser combinators:
class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
[...]
lazy val document: PackratParser[AstNodeDocument] =
((procinst | element | comment | cdata | whitespace | text)*) ^^ {
AstNodeDocument(_)
}
[...]
}
object SxmlParser {
def parse(text: String): AstNodeDocument = {
var ast = AstNodeDocument()
val parser = new SxmlParser()
val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
result match {
case parser.Success(x, _) => ast = x
case parser.NoSuccess(err, next) => {
tool.die("failed to parse SXML input " +
"(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
err + "\n" +
next.pos.longString)
}
}
ast
}
}
Usually the resulting parsing error messages are rather nice. But sometimes it becomes just
sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^
This happens if a quote characters is not closed and the parser reaches the EOT. What I would like to see here is (1) what production the parser was in when it expected the '"' (I've multiple ones) and (2) where in the input this production started parsing (which is an indicator where the opening quote is in the input). Does anybody know how I can improve the error messages and include more information about the actual internal parsing state when the error happens (perhaps something like a production rule stacktrace or whatever can be given reasonably here to better identify the error location). BTW, the above "line 32, column 1" is actually the EOT position and hence of no use here, of course.
I don't know yet how to deal with (1), but I was also looking for (2) when I found this webpage:
https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624
I'm just copying the information:
A useful enhancement is to record the input position (line number and column number) of the significant tokens. To do this, you must do three things:
Make each output type extend scala.util.parsing.input.Positional
invoke the Parsers.positioned() combinator
Use a text source that records line and column positions
and
Finally, ensure that the source tracks positions. For streams, you can simply use scala.util.parsing.input.StreamReader; for Strings, use scala.util.parsing.input.CharArrayReader.
I'm currently playing with it so I'll try to add a simple example later
In such cases you may use err, failure and ~! with production rules designed specifically to match the error.