Can I get Ecto to log raw SQL? - postgresql

I am building an Ecto query like this:
from item in query,
where: like(item.description, ^"%#{text}%")
I'm concerned that this allows SQL injection in text. Before trying to fix that, I want to see how the query is actually sent to the database.
If I inspect the query or look at what is logged, I see some SQL, but it's not valid.
For instance, inspecting the query shows me this:
{"SELECT i0.\"id\", i0.\"store_id\", i0.\"title\", i0.\"description\"
FROM \"items\" AS i0 WHERE (i0.\"description\" LIKE $1)",
["%foo%"]}
When I pass this query to Repo.all, it logs this:
SELECT i0."id", i0."store_id", i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1) ["%foo%"]
But if I copy and paste that into psql, PostgreSQL gives me an error:
ERROR: 42P02: there is no parameter $1
It seems as though Ecto may actually be doing a parameterized query, like this:
PREPARE bydesc(text) AS SELECT i0."id",
i0."store_id", i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1);
EXECUTE bydesc('foo');
If so, I think that would prevent SQL injection. But I'm just guessing that this is what Ecto does.
How can I see the actual SQL that Ecto is executing?

Ecto uses only prepared statements. When using ecto query syntax, introducing SQL injection is not possible. The query syntax verifies at compile-time that no SQL injection is possible.
Showing exactly the queries executed might be difficult because of couple reasons:
Postgrex (and hence Ecto) uses the postgresql binary protocol (instead of the most common, but less efficient, text protocol), so the PREPARE query never actually exists as a string.
For most cases all you would see would be one initial PREPARE 64237612638712636123(...) AS ... and later a lot of EXECUTE 64237612638712636123(...) which isn't that helpful. Trying to relate one to another would be horrible.
From my experience most software of that kind, use prepare statements and log them instead of raw queries, since it's much more helpful in understanding the behaviour of the system.

Yes, that is the exact SQL that is being executed by Ecto (it uses prepared queries through the db_connection package internally) and no SQL injection is possible in that code. This can be verified by turning on logging of all executed SQL queries by changing log_statement to all in postgresql.conf:
...
log_statement = 'all'
...
and then restarting PostgreSQL and running a query. For the following queries:
Repo.get(Post, 1)
Repo.get(Post, 2)
this is logged:
LOG: execute ecto_818: SELECT p0."id", p0."title", p0."user_id", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 WHERE (p0."id" = $1)
DETAIL: parameters: $1 = '1'
LOG: execute ecto_818: SELECT p0."id", p0."title", p0."user_id", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 WHERE (p0."id" = $1)
DETAIL: parameters: $1 = '2'

Related

How to detect if Postgresql function utilizing index on the tables or not

I have created a PL/pgSQL table-returning function that executes a SELECT statement and uses the input parameter in the WHERE clause of the query.
I frame the statement dynamically and execute it like this: EXECUTE sqlStmt USING empID;
sqlStmt is a variable of data type text that has the SELECT query which joins 3 tables.
When I execute that query in pgAdmin and analyze I could see that 'Index scan' on the tables are utilized as expected. However, when I do EXPLAIN ANALYZE SELECT * from fn_getDetails(12), the output just says "Function scan".
How do we know if the table indexes are utilized? Other SO answers to use auto_explain module did not provide details of the function body statements. And I am unable to use the PREPARE inside my function body.
The time taken by execution of the direct SELECT statement is almost the same as the use of function, just couple of milliseconds, but how can I know if the index was used?
auto_explain will certainly provide the requested information.
Set the following parameters:
shared_preload_libraries = 'auto_explain' # requires a restart
auto_explain.log_min_duration = 0 # log all statements
auto_explain.log_nested_statements = on # log statements in functions too
The last parameter is required for tracking SQL statements inside functions.
To activate the module, you need to restart the database.
Of course, testing if the index is used in a query on a small table won't give you a reliable result. You need about as many test data as you expect to have in reality.

How can I execute an "Explain" statement with a prepared query in PostgrSQL that has bind parameters?

I would like to be able to execute an explain statement on a query that has bind parameters. For example:
EXPLAIN SELECT * FROM metasyntax WHERE id = $1;
When I attempt to execute this, I get the following error:
ERROR: bind message supplies 0 parameters, but prepared statement "" requires 1
I understand it's telling me that it wants me to supply a value for the query. However, I may not necessarily know the answer to that. In other SQL dialects such as Oracle, it will generate an explain plan without needing me to supply a value for the parameter.
Is it possible to get an explain plan without binding to an actual value? Thanks!
Assuming the parameter is an integer:
PREPARE x(integer) AS SELECT * FROM metasyntax WHERE id = $1;
Then run the following six times, where “42” is a representative value:
EXPLAIN (ANALYZE, BUFFERS) EXECUTE x(42);
Normally PostgreSQL will switch to a generic plan for the sixth run, and you will see a plan that contains the placeholder $1.
No.
The optimiser is allowed to change the query plan based on the parameter. Imagine if the parameter was null - it's obvious that no rows will be returned, so the DB may return an empty rowset instantly.
Just use a representative value.

How to log queries inside PLPGSQL with inlined parameter values

When a statement in my PLPGSQL function (Postgres 9.6) is being run I can see the query on one line, and then all the parameters on another line. A 2-line logging. Something like:
LOG: execute <unnamed>: SELECT * FROM table WHERE field1=$1 AND field2=$2 ...
DETAIL: parameters: $1 = '-767197682', $2 = '234324' ....
Is it possible to log the entire query in pg_log WITH the parameters already replaced in the query and log it in a SINGLE line?
Because this would make it much easier to copy/paste the query to reproduce it in another terminal, especially if queries have dozens of parameters.
The reason behind this: PL/pgSQL treats SQL statements as prepared statements internally.
First: With default settings, there is no logging of SQL statements inside PL/pgSQL functions at all. Are you using auto_explain?
Postgres query plan of a UDF invocation written in pgpsql
The first couple of invocations in the same session, the SPI manager (Server Programming Interface) generates a fresh execution plan, based on actual parameter values. Any kind of logging should report parameter values inline.
Postgres keeps track and after a couple of invocations in the current session, if execution plans don't seem sensitive to actual parameter values, it will start reusing a generic, cached plan. Then you should see the generic plan of a prepared statements with $n parameters (like in the question).
Details in the chapter "Plan Caching" in the manual.
You can observe the effect with a simple demo. In the same session (not necessarily same transaction):
CREATE TEMP TABLE tbl AS
SELECT id FROM generate_series(1, 100) id;
PREPARE prep1(int) AS
SELECT min(id) FROM tbl WHERE id > $1;
EXPLAIN EXECUTE prep1(3); -- 1st execution
You'll see the actual value:
Filter: (id > 3)
EXECUTE prep1(1); -- several more executions
EXECUTE prep1(2);
EXECUTE prep1(3);
EXECUTE prep1(4);
EXECUTE prep1(5);
EXPLAIN EXECUTE prep1(3);
Now you'll see a $n parameter:
Filter: (id > $1)
So you can get the query with parameter values inlined on the first couple of invocations in the current session.
Or you can use dynamic SQL with EXECUTE, because, per documentation:
Also, there is no plan caching for commands executed via EXECUTE.
Instead, the command is always planned each time the statement is run.
Thus the command string can be dynamically created within the function
to perform actions on different tables and columns.
That can actually affect performance, of course.
Related:
PostgreSQL Stored Procedure Performance

How can I log `PREPARE` statements in PostgreSQL?

I'm using a database tool (Elixir's Ecto) which uses prepared statements for most PostgreSQL queries. I want to see exactly how and when it does that.
I found the correct Postgres config file, postgresql.conf, by running SHOW config_file; in psql. I edited it to include
log_statement = 'all'
as Dogbert suggested here.
According to the PostgreSQL 9.6 docs, this setting should cause PREPARE statements to be logged.
After restarting PostgreSQL, I can tail -f its log file (the one specified by the -r flag when running the postgres executable) and I do see entries like this:
LOG: execute ecto_728: SELECT i0."id", i0."store_id", i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1)
DETAIL: parameters: $1 = '%foo%'
This corresponds to a query (albeit done over binary protocol, I think) like
EXECUTE ecto_728('%foo%');
However, I don't see the original PREPARE that created ecto_728.
I tried dropping and recreating the database. Afterwards, the same query is executed as ecto_578, so it seems that the original prepared statement was dropped with the database and a new one was created.
But when I search the PostgreSQL log for ecto_578, I only see it being executed, not created.
How can I see PREPARE statements in the PostgreSQL log?
As you mentioned, your queries are being prepared via the extended query protocol, which is distinct from a PREPARE statement. And according to the docs for log_statement:
For clients using extended query protocol, logging occurs when an
Execute message is received
(Which is to say that logging does not occur when a Parse or Bind message is received.)
However, if you set log_min_duration_statement = 0, then:
For clients using extended query protocol, durations of the Parse,
Bind, and Execute steps are logged independently
Enabling both of these settings together will give you two log entries per Execute (one from log_statement when the message is received, and another from log_min_duration_statement once execution is complete).
Nick's answer was correct; I'm just answering to add what I learned by trying it.
First, I was able to see three separate actions in the log: a parse to create the prepared statement, a bind to give the parameters for it, and an execute to have the database actually carry it out and return results. This is described in the PostgreSQL docs for the "extended query protocol".
LOG: duration: 0.170 ms parse ecto_918: SELECT i0."id", i0."store_id",
i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1)
LOG: duration: 0.094 ms bind ecto_918: SELECT i0."id", i0."store_id",
i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1)
DETAIL: parameters: $1 = '%priceless%'
LOG: execute ecto_918: SELECT i0."id", i0."store_id",
i0."title", i0."description"
FROM "items" AS i0 WHERE (i0."description" LIKE $1)
DETAIL: parameters: $1 = '%priceless%'
This output was generated during a run of some automated tests. On subsequent runs, I saw the same query with a different name - eg, ecto_1573. This was without dropping the database or even restarting the PostgreSQL process. The docs say that
If successfully created, a named prepared-statement object lasts till
the end of the current session, unless explicitly destroyed.
So these statements have to be recreated on each session, and probably my test suite has a new session on each run.

How to guard execution when 'USE' not supported in SQL

When I'm sketching out SQL statements I have a file of all the queries I have used to analyse my live data. Each time I write a new statement or group of statements at the end of the fileI select them and click 'execute' to see the results. I'm paranoid that I may forget the selection stage and accidentally run all the queries sequentially in the entire file and so I head the file with the line
USE FakeDatabase
so that the queries will fail as they will be run against a non-existing database. But no, instead I get the error
USE statement is not supported to switch between databases
(N.B. I am using SQL Server Management Studio v17.0 RC1 against a v12 Azure SQL Server database.)
What tSQL statement can I use that will prevent further execution of tSQL statements in a file?
use is not supported in AZURE...you can try below ,but there can be many options depending on your use case
Replace use Database with below statement
if db_name() <>'Fakedatabase'
return;
You could, instead, put something like this in each script:
IF ##SERVERNAME <> 'Not-Really-My-Server'
BEGIN
raiserror('Database Name Not Set', 20, -1) with log
END
-- Rest of my query...