Escape string interpolation in anorm - scala

I want to insert the literal '${a}' into a table using anorm 2.5.2, which means I want to execute the bare SQL query
INSERT INTO `db`.`table` (`a`) VALUES ('${a}');
without using any anorm / string interpolation. When I try to do the following
SQL("INSERT INTO `db`.`table` (`a`) VALUES ('${a}');").execute()
I get an anorm.Sql$MissingParameter: Missing parameter value exception because it tries to use anorm interpolation on ${a} but no value a is available in the scope.
How to escape the anorm / string interpolations $... and ${...}?
Escape a dollar sign in string interpolation doesn't seem to work here.

You can make ${a} the value of a parameter, i.e.
SQL("""INSERT INTO db.table (a) VALUES ({x})""").on("x" -> s"$${a}")
(s"$${a}" is the way to write "${a}" without getting a warning about possible missing interpolators).
The same can be written equivalently as
val lit = s"$${a}"
SQL"""INSERT INTO db.table (a) VALUES ($lit)"""
The below will probably work, but I am not sure:
SQL"INSERT INTO db.table (a) VALUES ('$${a}')"
It may also be worth asking if it's intentional behavior or a bug: when talking about parametrized SQL queries, it doesn't make sense to have a parameter inside '.

Related

Checking if a range contains a value broken in PQexecPrepared (works in psql)

I have this (rather ugly, generated) prepared statement to fetch some game data. I try to check if a value ($3) is contained in spawn_level_range (which is an int4range) by doing $3<#quests.spawn_level_range:
SELECT quests.id,
quests.base_attack,
quests.base_strg,
quests.base_accy,
quests.base_hp,
quests.name,
quests.task,
quests.image_url,
quests.spawn_chance
FROM quests
WHERE (((quests.server_id=$1)
AND ((quests.channel_id='all') OR (quests.channel_id=$2)))
AND ($3<#quests.spawn_level_range))
ORDER BY RANDOM()
LIMIT 1;
This exact query works perfectly when pasted into psql when I prepend:
prepare test (varchar, varchar, int) AS
then run it with:
execute test('669105577238069249', '682205516667158549', 1);
However, for some reason, it just does not work in libpq.
When running the statement with PQexecPrepared, it raises the error:
ERROR: malformed range literal: "1"
DETAIL: Missing left parenthesis or bracket.
(note that 1 is what I'm trying to bind $3 to)
It seems like it's trying to interpret $3 as a range (rather than an integer) – which seems like a bug to me.
In your prepared statement, you explicitly declare the third parameter to be an integer.
In your PQprepare call (that you didn't show) you must have neglected to set the paramTypes argument to indicate the types of the parameters, so they are all unknown to PostgreSQL, and it infers the data type from the context.
Now there are two operators <# for ranges:
anyrange <# anyrange
anyelement <# anyrange
Not knowing which one you want, PostgreSQL's data type resolution rules prefer the operator that has the same data type on both sides.
There are two possible solutions:
specify the correct type in the paramTypes argument of PQprepare
add an explicit type cast to the query: CAST ($3 AS integer)

What's the meaning of "$" in Dataset's operators (like select or filter)?

I am a bit confused about using $ to reference columns in DataFrame operators like select or filter.
The following statements work:
df.select("app", "renders").show
df.select($"app", $"renders").show
But, only the first statement in the following works:
df.filter("renders = 265").show // <-- this works
df.filter($"renders" = 265).show // <-- this does not work (!) Why?!
However, this again works:
df.filter($"renders" > 265).show
Basically, what is this $ in DataFrame's operators and when/how should I use it?
Implicits are a major feature of the Scala language that take a lot of different forms--like implicit classes as we will see shortly. They have different purposes, and they all come with varying levels of debate regarding how useful or dangerous they are. Ultimately though, implicits generally come down to simply having the compiler convert one class to another when you bring them into scope.
Why does this matter? Because in Spark there is an implicitclass called StringToColumn that endows a StringContext with additional functionality. As you can see, StringToColumn adds the $ method to the Scala class StringContext. This method produces a ColumnName, which extends Column.
The end result of all this is that the $ method allows you to treat the name of a column, represented as a String, as if it were the Column itself. Implicits, when used wisely, can produce convenient conversions like this to make development easier.
So let's use this to understand what you found:
df.select("app","renders").show -- succeeds because select takes multiple Strings
df.select($"app",$"renders").show -- succeeds because select takes multiple Columnss that result after the implicit conversions are applied
df.filter("renders = 265").show -- succeeds because Spark supports SQL-like filters
df.filter($"renders" = 265).show -- fails because $"renders" is of type Column after implicit conversion, and Columns use the custom === operator for equality (unlike the case in SQL).
df.filter($"renders" > 265).show -- succeeds because you're using a Column after implicit conversion and > is a function on Column.
$ is a way to convert a string to the column with that name.
Both options of select work originally because select can receive either a column or a string.
When you do the filter $"renders" = 265 is an attempt at assigning a number to the column. > on the other hand is a comparison method. You should be using === instead of =.

Anorm: WHERE condition, conditionally

Consider a repository/DAO method like this, which works great:
def countReports(customerId: Long, createdSince: ZonedDateTime) =
DB.withConnection {
implicit c =>
SQL"""SELECT COUNT(*)
FROM report
WHERE customer_id = $customerId
AND created >= $createdSince
""".as(scalar[Int].single)
}
But what if the method is defined with optional parameters:
def countReports(customerId: Option[Long], createdSince: Option[ZonedDateTime])
Point being, if either optional argument is present, use it in filtering the results (as shown above), and otherwise (in case it is None) simply leave out the corresponding WHERE condition.
What's the simplest way to write this method with optional WHERE conditions? As Anorm newbie I was struggling to find an example of this, but I suppose there must be some sensible way to do it (that is, without duplicating the SQL for each combination of present/missing arguments).
Note that the java.time.ZonedDateTime instance maps perfectly and automatically into Postgres timestamptz when used inside the Anorm SQL call. (Trying to extract the WHERE condition as a string, outside SQL, created with normal string interpolation did not work; toString produces a representation not understood by the database.)
Play 2.4.4
One approach is to set up filter clauses such as
val customerClause =
if (customerId.isEmpty) ""
else " and customer_id={customerId}"
then substitute these into you SQL:
SQL(s"""
select count(*)
from report
where true
$customerClause
$createdClause
""")
.on('customerId -> customerId,
'createdSince -> createdSince)
.as(scalar[Int].singleOpt).getOrElse(0)
Using {variable} as opposed to $variable is I think preferable as it reduces the risk of SQL injection attacks where someone potentially calls your method with a malicious string. Anorm doesn't mind if you have additional symbols that aren't referenced in the SQL (i.e. if a clause string is empty). Lastly, depending on the database(?), a count might return no rows, so I use singleOpt rather than single.
I'm curious as to what other answers you receive.
Edit: Anorm interpolation (i.e. SQL"...", an interpolation implementation beyond Scala's s"...", f"..." and raw"...") was introduced to allow the use $variable as equivalent to {variable} with .on. And from Play 2.4, Scala and Anorm interpolation can be mixed using $ for Anorm (SQL parameter/variable) and #$ for Scala (plain string). And indeed this works well, as long as the Scala interpolated string does not contains references to an SQL parameter. The only way, in 2.4.4, I could find to use a variable in an Scala interpolated string when using Anorm interpolation, was:
val limitClause = if (nameFilter="") "" else s"where name>'$nameFilter'"
SQL"select * from tab #$limitClause order by name"
But this is vulnerable to SQL injection (e.g. a string like it's will cause a runtime syntax exception). So, in the case of variables inside interpolated strings, it seems it is necessary to use the "traditional" .on approach with only Scala interpolation:
val limitClause = if (nameFilter="") "" else "where name>{nameFilter}"
SQL(s"select * from tab $limitClause order by name").on('limitClause -> limitClause)
Perhaps in the future Anorm interpolation could be extended to parse the interpolated string for variables?
Edit2: I'm finding there are some tables where the number of attributes that might or might not be included in the query changes from time to time. For these cases I'm defining a context class, e.g. CustomerContext. In this case class there are lazy vals for the different clauses that affect the sql. Callers of the sql method must supply a CustomerContext, and the sql will then have inclusions such as ${context.createdClause} and so on. This helps give a consistency, as I end up using the context in other places (such as total record count for paging, etc.).
Finally got this simpler approach posted by Joel Arnold to work in my example case, also with ZonedDateTime!
def countReports(customerId: Option[Long], createdSince: Option[ZonedDateTime]) =
DB.withConnection {
implicit c =>
SQL( """
SELECT count(*) FROM report
WHERE ({customerId} is null or customer_id = {customerId})
AND ({created}::timestamptz is null or created >= {created})
""")
.on('customerId -> customerId, 'created -> createdSince)
.as(scalar[Int].singleOpt).getOrElse(0)
}
The tricky part is having to use {created}::timestamptz in the null check. As Joel commented, this is needed to work around a PostgreSQL driver issue.
Apparently the cast is needed only for timestamp types, and the simpler way ({customerId} is null) works with everything else. Also, comment if you know whether other databases require something like this, or if this is a Postgres-only peculiarity.
(While wwkudu's approach also works fine, this definitely is cleaner, as you can see comparing them side to side in a full example.)

Is there a convention for named arguments in a function in PostgreSQL

I come from a SQL Server background where the '#' symbol is used/encouraged in stored procedures. This is useful because you can easily see what is a column and what is a value. For example.
CREATE PROCEDURE Foo
#Bar VARCHAR(10),
#Baz INT
AS
BEGIN
INSERT INTO MyTable (
Bar,
Baz)
VALUES (
#Bar,
#Baz)
END
I know that I can just use ordinal position but some of our stored procs have 20 or so parameters and the named parameter makes it much more legible IMO.
Is there some sort of convention that the PostgreSQL communitiy uses for a prefix? I tried to find out exactly what the rules were for named parameters but my Googling didn't yield anything.
Parameter identifiers follow the same rules as other identifiers:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
http://www.postgresql.org/docs/current/static/xfunc-sql.html#XFUNC-SQL-FUNCTION-ARGUMENTS
It is common to start a parameter identifier with an underscore _ and I think it makes sense although it is not a convention.
It is also possible to avoid ambiguity by qualifying the identifier with the function name
my_funtion.my_parameter

Disable implicit conversion of quoted values to integer

# CREATE TABLE foo ( id serial, val integer );
CREATE TABLE
# INSERT INTO foo (val) VALUES ('1'), (2);
INSERT 0 2
# SELECT * FROM foo WHERE id='1';
id | val
----+-----
1 | 1
(1 row)
Here, both on insert and on selection postgres implicitly converts quoted strings to integral types rather than raise a type error, unless the quoted value is very specifically typed as a varchar:
# INSERT INTO foo (val) VALUES (varchar '1');
ERROR: column "val" is of type integer
but expression is of type character varying
LINE 1: INSERT INTO foo (val) VALUES (varchar '1');
^
HINT: You will need to rewrite or cast the expression.
The issue here is for dynamically typed languages without implicit conversions (e.g. Ruby or Python)
a quoted value maps to a string
an integer maps to an integer
those are not compatible so depending on the connecting application's architecture this behavior may lead to incoherent caches and the like
Is there a way to disable it and force quoted values to always be varchars (unless explicitly convert)?
edit: because people apparently focus on the irrelevant, these queries come from parameterized statements, psycopg2 will convert strings to quoted values and quoted values back to strings, so the mismatch exists regardless of access method, that's a red herring. here's the exact same thing with parameterised statements:
import psycopg2.extensions
with psycopg2.connect(dbname='postgres') as cn:
cn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
with cn.cursor() as cx:
cx.execute("DROP DATABASE IF EXISTS test")
cx.execute("CREATE DATABASE test")
with psycopg2.connect(dbname='test') as cn:
with cn.cursor() as cx:
cx.execute("CREATE TABLE foo ( id serial, val integer )")
cx.execute("INSERT INTO foo (val) VALUES (%s), (%s)",
(1, '2'))
cx.execute("SELECT * FROM foo WHERE id=%s",
('1',))
print cx.fetchall()
which outputs:
[(1, 1)]
No, you cannot disable implicit conversion of quoted literals to any target type. PostgreSQL considers such literals to be of unknown type unless overridden by a cast or literal type-specifier, and will convert from unknown to any type. There is no cast from unknown to a type in pg_cast; it's implicit. So you can't drop it.
As far as I know, PostgreSQL is following the SQL spec by accepting quoted literals as integers.
To PostgreSQL's type engine, 1 is an integer, and '1' is an unknown that's type-inferred to an integer if passed to an integer function, operator, or field. You cannot disable type inference from unknown or force unknown to be treated as text without hacking the parser / query planner directly.
What you should be doing is using parameterised statements instead of substituting literals into SQL. You won't have this issue if you do so, because the client-side type is known or can be specified. That certainly works with Python (psycopg2) and Ruby (Pg gem) doesn't work how I thought for psycopg2, see below.
Update after question clarification: In the narrow case being described here, psycopg2's client-side parameterised statements, while correct, do not produce the result the original poster desires. Running the demo in the update shows that psycopg2 isn't using PostgreSQL's v3 bind/execute protocol, it's using the simple query protocol and doing parameter substitution locally. So while you're using parameterised statements in Python, you're not using parameterised statements in PostgreSQL. I was mistaken above in saying that parameterised statments in psycopg2 would resolve this issue.
The demo runs this SQL, from the PostgreSQL logs:
< 2014-07-07 18:17:24.450 WST >LOG: statement: INSERT INTO foo (val) VALUES (1), ('2')
< 2014-07-07 18:17:24.451 WST >LOG: statement: SELECT * FROM foo WHERE id='1'
Note the lack of placement parameters. They're substituted client-side.
So if you want psycopg2 to be stricter, you'll have to adapt the client side framework.
psycopg2 is extensible, so that should be pretty practical - you need to override the type handlers for str, unicode and integer (or, in Python3, bytes, str and integer) using psycopg2.extras, per adapting new types. There's even an FAQ entry about overriding psycopg2's handling of float as an example: http://initd.org/psycopg/docs/faq.html#faq-float
The naïve approach won't work though, because of infinite recursion:
def adapt_str_strict(thestr):
return psycopg2.extensions.AsIs('TEXT ' + psycopg2.extensions.adapt(thestr))
psycopg2.extensions.register_adapter(str, adapt_str_strict)
so you need to bypass type adapter registration to call the original underlying adapter for str. This will, though it's ugly:
def adapt_str_strict(thestr):
return psycopg2.extensions.AsIs('TEXT ' + str(psycopg2.extensions.QuotedString(thestr)))
psycopg2.extensions.register_adapter(str, adapt_str_strict)
Run your demo with that and you get:
psycopg2.ProgrammingError: parameter $1 of type text cannot be coerced to the expected type integer
HINT: You will need to rewrite or cast the expression.
(BTW, using server-side PREPARE and EXECUTE won't work, because you'll just suffer the same typing issues when passing values to EXECUTE via psycopg2).