I wanted to perform a conditional insert in PostgreSQL. Something like:
INSERT INTO {TABLE_NAME} (user_id, data) values ('{user_id}', '{data}')
WHERE not exists(select 1 from files where user_id='{user_id}' and data->'userType'='Type1')
Unfortunately, insert and where does not cooperate in PostGreSQL. What could be a suitable syntax for my query? I was considering ON CONFLICT, but couldn't find the syntax for using it with JSON object. (Data in the example)
Is it possible?
Rewrite the VALUES part to a SELECT, then you can use a WHERE condition:
INSERT INTO { TABLE_NAME } ( user_id, data )
SELECT
user_id,
data
FROM
( VALUES ( '{user_id}', '{data}' ) ) sub ( user_id, data )
WHERE
NOT EXISTS (
SELECT 1
FROM files
WHERE user_id = '{user_id}'
AND data -> 'userType' = 'Type1'
);
But, there is NO guarantee that the WHERE condition works! Another transaction that has not been committed yet, is invisible to this query. This could lead to data quality issues.
You can use INSERT ... SELECT ... WHERE ....
INSERT INTO elbat
(user_id,
data)
SELECT 'abc',
'xyz'
WHERE NOT EXISTS (SELECT *
FROM files
WHERE user_id = 'abc'
AND data->>'userType' = 'Type1')
And it looks like you're creating the query in a host language. Don't use string concatenation or interpolation for getting the values in it. That's error prone and makes your application vulnerable to SQL injection attacks. Look up how to use parameterized queries in your host language. Very likely for the table name parameters cannot be used. You need some other method of either whitelisting the names or properly quoting them.
Related
We are using PostgREST to automatically generate a REST API for a Postgres database. Our primary keys have an external representation that's different from how we store them internally. For simplicity's sake lets pretend the ids are stored as integers but we represent them as hexadecimal strings outwardly.
It's simple enough to get PostgREST to convert to the external representation for read operations:
CREATE DOMAIN hexid AS bigint;
CREATE TABLE fruits (
fruit_id hexid PRIMARY KEY,
name text
);
CREATE OR REPLACE VIEW api_fruits AS
SELECT to_hex(fruit_id) as fruit_id, name FROM fruits;
INSERT INTO fruits(fruit_id, name) VALUES('51955', 'avocado');
PostgREST generates the expected representation when we GET api_fruits:
[
{
"fruit_id": "caf3",
"name": "avocado"
}
]
But that's about as far as we get with this solution. It's a one way transformation so we won't be able to POST/PATCH records this way. The way PostgREST works is to transform such requests into equivalent INSERT and UPDATE statements. But this view with its custom formatting is not updatable. This is what would happen if we tried:
ERROR: cannot insert into column "fruit_id" of view "api_fruits"
DETAIL: View columns that are not columns of their base relation are not updatable.
STATEMENT: WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) INSERT INTO "api_x"."api_fruits"("fruit_id", "name") SELECT "fruit_id", "name" FROM json_populate_recordset (null::"api_x"."api_fruits", (SELECT val FROM pgrst_body)) _ RETURNING "api_x"."api_fruits".*) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, CASE WHEN pg_catalog.count(_postgrest_t) = 1 THEN coalesce((
WITH data AS (SELECT row_to_json(_) AS row FROM pgrst_source AS _ LIMIT 1)
SELECT array_agg(json_data.key || '=eq.' || json_data.value)
FROM data CROSS JOIN json_each_text(data.row) AS json_data
WHERE json_data.key IN ('')
), array[]::text[]) ELSE array[]::text[] END AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
We can't INSERT into "View columns that are not columns of their base relation".
The obvious workaround is to serve fruit_id as a straight column, just an integer. With some post and preprocessing at the nginx level we can hex encode it there (and hex decode incoming ids). I'm wondering if we can do better than that though. For large API operations, re-encoding the JSON will use a lot of memory and CPU time and it seems so unnecessary.
It would have been great to be able to use a custom CREATE CAST to take the incoming hexadecimal strings and turn them back into integers, something like this:
CREATE CAST (json AS hexid) WITH FUNCTION json_to_hexid AS ASSIGNMENT;
But alas custom casts are ignored on CREATE DOMAIN types. And we can't make a true custom column type because our cloud Postgres host (Google Cloud SQL) doesn't allow custom extensions.
It feels like some combination of INSTEAD OF triggers or rules could work. But when using query parameters to filter results using query parameters (e.g. select a fruit by id), I don't think there's an appropriate trigger to use. INSTEAD OF doesn't work for straight SELECT does it?
For example I've tested doing something like this to take care of INSERT and allow POST with PostgREST. It works:
CREATE OR REPLACE FUNCTION api_fruits_insert()
RETURNS trigger AS
$$
BEGIN
INSERT INTO fruits(fruit_id, name) VALUES (('x' || lpad(NEW.fruit_id, 16, '0'))::bit(64)::bigint::hexid, NEW.name);
RETURN NEW;
END
$$ LANGUAGE 'plpgsql';
CREATE TRIGGER api_fruits_insert
INSTEAD OF INSERT
ON api_fruits
FOR EACH ROW
EXECUTE PROCEDURE api_fruits_insert();
The trouble is in the WHERE clause. Let's PATCH api_fruits?fruit_id=in.(7b,caf3) with {"name": "pear"}. This works out of the box since the name column is updatable but look at the query:
WITH pgrst_source AS (WITH pgrst_payload AS (SELECT $1::json AS json_data), pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload) UPDATE "api_x"."api_fruits" SET "name" = _."name" FROM (SELECT * FROM json_populate_recordset (null::"api_x"."api_fruits" , (SELECT val FROM pgrst_body) )) _ WHERE "api_x"."api_fruits"."fruit_id" = ANY ($2) RETURNING 1) SELECT '' AS total_result_set, pg_catalog.count(_postgrest_t) AS page_total, array[]::text[] AS header, '' AS body, nullif(current_setting('response.headers', true), '') AS response_headers, nullif(current_setting('response.status', true), '') AS response_status FROM (SELECT * FROM pgrst_source) _postgrest_t
DETAIL: parameters: $1 = '{
"name": "pear"
}', $2 = '{7b,caf3}'
So we have essentially UPDATE api_fruits SET name='berry' WHERE fruit_id IN ('7b', 'caf3');. Surprisingly this works but it's a full table scan so Postgres can evaluate to_hex(fruit_id) for each row looking for matches. The same happens if we try to GET a record by fruit_id. How would we rewrite the WHERE clauses?
It really feels like some combination of just the right Postgres and PostgREST features should be able to get us to a point where it's all happening in Postgres without nginx's help and without excessive complexity. Any ideas?
Working on a new TSQL Stored Procedure, I am wanting to get all rows where values in a specific column don't start with any of a specific set of 2 character substrings.
The general idea is:
SELECT * FROM table WHERE value NOT LIKE 's1%' AND value NOT LIKE 's2%' AND value NOT LIKE 's3%'.
The catch is that I am trying to make it dynamic so that the specific substrings can be pulled from another table in the database, which can have more values added to it.
While I have never used the IN operator before, I think something along these lines should do what I am looking for, however, I don't think it is possible to use wildcards with IN, so I might not be able to compare just the substrings.
SELECT * FROM table WHERE value NOT IN (SELECT substrings FROM subTable)
To get around that limitation, I am trying to do something like this:
SELECT * FROM table WHERE SUBSTRING(value, 1, 2) NOT IN (SELECT Prefix FROM subTable WHERE Prefix IS NOT NULL)
but I'm not sure this is right, or if it is the most efficient way to do this. My preference is to do this in a Stored Procedure, but if that isn't feasible or efficient I'm also open to building the query dynamically in C#.
Here's an option. Load values you want to filter to a table, left outer join and use PATINDEX().
DECLARE #FilterValues TABLE
(
[FilterValue] NVARCHAR(10)
);
--Table with values we want filter on.
INSERT INTO #FilterValues (
[FilterValue]
)
VALUES ( N's1' )
, ( N's2' )
, ( N's3' );
DECLARE #TestData TABLE
(
[TestValues] NVARCHAR(100)
);
--Load some test data
INSERT INTO #TestData (
[TestValues]
)
VALUES ( N's1 Test Data' )
, ( N's2 Test Data' )
, ( N's3 Test Data' )
, ( N'test data not filtered out' )
, ( N'test data not filtered out 1' );
SELECT a.*
FROM #TestData [a]
LEFT OUTER JOIN #FilterValues [b]
ON PATINDEX([b].[FilterValue] + '%', [a].[TestValues]) > 0
WHERE [b].[FilterValue] IS NULL;
I have a very complex CTE and I would like to insert the result into a physical table.
Is the following valid?
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos
(
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
WITH tab (
-- some query
)
SELECT * FROM tab
I am thinking of using a function to create this CTE which will allow me to reuse. Any thoughts?
You need to put the CTE first and then combine the INSERT INTO with your select statement. Also, the "AS" keyword following the CTE's name is not optional:
WITH tab AS (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos (
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
SELECT * FROM tab
Please note that the code assumes that the CTE will return exactly four fields and that those fields are matching in order and type with those specified in the INSERT statement.
If that is not the case, just replace the "SELECT *" with a specific select of the fields that you require.
As for your question on using a function, I would say "it depends". If you are putting the data in a table just because of performance reasons, and the speed is acceptable when using it through a function, then I'd consider function to be an option.
On the other hand, if you need to use the result of the CTE in several different queries, and speed is already an issue, I'd go for a table (either regular, or temp).
WITH common_table_expression (Transact-SQL)
The WITH clause for Common Table Expressions go at the top.
Wrapping every insert in a CTE has the benefit of visually segregating the query logic from the column mapping.
Spot the mistake:
WITH _INSERT_ AS (
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
)
INSERT Table2
([BatchID], [SourceRowID], [APartyNo])
SELECT [BatchID], [APartyNo], [SourceRowID]
FROM _INSERT_
Same mistake:
INSERT Table2 (
[BatchID]
,[SourceRowID]
,[APartyNo]
)
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
A few lines of boilerplate make it extremely easy to verify the code inserts the right number of columns in the right order, even with a very large number of columns. Your future self will thank you later.
Yep:
WITH tab (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos ( BatchID, AccountNo,
APartyNo,
SourceRowID)
SELECT * FROM tab
Note that this is for SQL Server, which supports multiple CTEs:
WITH x AS (), y AS () INSERT INTO z (a, b, c) SELECT a, b, c FROM y
Teradata allows only one CTE and the syntax is as your example.
Late to the party here, but for my purposes I wanted to be able to run the code the user inputted and store in a temp table. Using oracle no such issues.. the insert is at the start of the statement before the with clause.
For this to work in sql server, the following worked:
INSERT into #stagetable execute (#InputSql)
(so the select statement #inputsql can start as a with clause).
I'd like to generate a single sql query to mass-insert a series of rows that don't exist on a table. My current setup makes a new query for each record insertion similar to the solution detailed in WHERE NOT EXISTS in PostgreSQL gives syntax error, but I'd like to move this to a single query to optimize performance since my current setup could generate several hundred queries at a time. Right now I'm trying something like the example I've added below:
INSERT INTO users (first_name, last_name, uid)
SELECT ( 'John', 'Doe', '3sldkjfksjd'), ( 'Jane', 'Doe', 'adslkejkdsjfds')
WHERE NOT EXISTS (
SELECT * FROM users WHERE uid IN ('3sldkjfksjd', 'adslkejkdsjfds')
)
Postgres returns the following error:
PG::Error: ERROR: INSERT has more target columns than expressions
The problem is that PostgresQL doesn't seem to want to take a series of values when using SELECT. Conversely, I can make the insertions using VALUES, but I can't then prevent duplicates from being generated using WHERE NOT EXISTS.
http://www.techonthenet.com/postgresql/insert.php suggests in the section EXAMPLE - USING SUB-SELECT that multiple records should be insertable from another referenced table using SELECT, so I'm wondering why I can't seem to pass in a series of values to insert. The values I'm passing are coming from an external API, so I need to generate the values to insert by hand.
Your select is not doing what you think it does.
The most compact version in PostgreSQL would be something like this:
with data(first_name, last_name, uid) as (
values
( 'John', 'Doe', '3sldkjfksjd'),
( 'Jane', 'Doe', 'adslkejkdsjfds')
)
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from data d
where not exists (select 1
from users u2
where u2.uid = d.uid);
Which is pretty much equivalent to:
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from (
select 'John' as first_name, 'Doe' as last_name, '3sldkjfksjd' as uid
union all
select 'Jane', 'Doe', 'adslkejkdsjfds'
) as d
where not exists (select 1
from users u2
where u2.uid = d.uid);
a_horse_with_no_name's answer actually has a syntax error, missing a final closing right parens, but other than that is the correct way to do this.
Update:
For anyone coming to this with a situation like mine, if you have columns that need to be type cast (for instance timestamps or uuids or jsonb in PG 9.5), you must declare that in the values you pass to the query:
-- insert multiple if not exists
-- where another_column_name is of type uuid, with strings cast as uuids
-- where created_at and updated_at is of type timestamp, with strings cast as timestamps
WITH data (id, some_column_name, another_column_name, created_at, updated_at) AS (
VALUES
(<id value>, <some_column_name_value>, 'a5fa7660-8273-4ffd-b832-d94f081a4661'::uuid, '2016-06-13T12:15:27.552-07:00'::timestamp, '2016-06-13T12:15:27.879-07:00'::timestamp),
(<id value>, <some_column_name_value>, 'b9b17117-1e90-45c5-8f62-d03412d407dd'::uuid, '2016-06-13T12:08:17.683-07:00'::timestamp, '2016-06-13T12:08:17.801-07:00'::timestamp)
)
INSERT INTO table_name (id, some_column_name, another_column_name, created_at, updated_at)
SELECT d.id, d.survey_id, d.arrival_uuid, d.gf_created_at, d.gf_updated_at
FROM data d
WHERE NOT EXISTS (SELECT 1 FROM table_name t WHERE t.id = d.id);
a_horse_with_no_name's answer saved me today on a project, but had to make these tweaks to make it perfect.
I have a very complex CTE and I would like to insert the result into a physical table.
Is the following valid?
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos
(
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
WITH tab (
-- some query
)
SELECT * FROM tab
I am thinking of using a function to create this CTE which will allow me to reuse. Any thoughts?
You need to put the CTE first and then combine the INSERT INTO with your select statement. Also, the "AS" keyword following the CTE's name is not optional:
WITH tab AS (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos (
BatchID,
AccountNo,
APartyNo,
SourceRowID
)
SELECT * FROM tab
Please note that the code assumes that the CTE will return exactly four fields and that those fields are matching in order and type with those specified in the INSERT statement.
If that is not the case, just replace the "SELECT *" with a specific select of the fields that you require.
As for your question on using a function, I would say "it depends". If you are putting the data in a table just because of performance reasons, and the speed is acceptable when using it through a function, then I'd consider function to be an option.
On the other hand, if you need to use the result of the CTE in several different queries, and speed is already an issue, I'd go for a table (either regular, or temp).
WITH common_table_expression (Transact-SQL)
The WITH clause for Common Table Expressions go at the top.
Wrapping every insert in a CTE has the benefit of visually segregating the query logic from the column mapping.
Spot the mistake:
WITH _INSERT_ AS (
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
)
INSERT Table2
([BatchID], [SourceRowID], [APartyNo])
SELECT [BatchID], [APartyNo], [SourceRowID]
FROM _INSERT_
Same mistake:
INSERT Table2 (
[BatchID]
,[SourceRowID]
,[APartyNo]
)
SELECT
[BatchID] = blah
,[APartyNo] = blahblah
,[SourceRowID] = blahblahblah
FROM Table1 AS t1
A few lines of boilerplate make it extremely easy to verify the code inserts the right number of columns in the right order, even with a very large number of columns. Your future self will thank you later.
Yep:
WITH tab (
bla bla
)
INSERT INTO dbo.prf_BatchItemAdditionalAPartyNos ( BatchID, AccountNo,
APartyNo,
SourceRowID)
SELECT * FROM tab
Note that this is for SQL Server, which supports multiple CTEs:
WITH x AS (), y AS () INSERT INTO z (a, b, c) SELECT a, b, c FROM y
Teradata allows only one CTE and the syntax is as your example.
Late to the party here, but for my purposes I wanted to be able to run the code the user inputted and store in a temp table. Using oracle no such issues.. the insert is at the start of the statement before the with clause.
For this to work in sql server, the following worked:
INSERT into #stagetable execute (#InputSql)
(so the select statement #inputsql can start as a with clause).