Insert multiple rows where not exists PostgresQL - postgresql

I'd like to generate a single sql query to mass-insert a series of rows that don't exist on a table. My current setup makes a new query for each record insertion similar to the solution detailed in WHERE NOT EXISTS in PostgreSQL gives syntax error, but I'd like to move this to a single query to optimize performance since my current setup could generate several hundred queries at a time. Right now I'm trying something like the example I've added below:
INSERT INTO users (first_name, last_name, uid)
SELECT ( 'John', 'Doe', '3sldkjfksjd'), ( 'Jane', 'Doe', 'adslkejkdsjfds')
WHERE NOT EXISTS (
SELECT * FROM users WHERE uid IN ('3sldkjfksjd', 'adslkejkdsjfds')
)
Postgres returns the following error:
PG::Error: ERROR: INSERT has more target columns than expressions
The problem is that PostgresQL doesn't seem to want to take a series of values when using SELECT. Conversely, I can make the insertions using VALUES, but I can't then prevent duplicates from being generated using WHERE NOT EXISTS.
http://www.techonthenet.com/postgresql/insert.php suggests in the section EXAMPLE - USING SUB-SELECT that multiple records should be insertable from another referenced table using SELECT, so I'm wondering why I can't seem to pass in a series of values to insert. The values I'm passing are coming from an external API, so I need to generate the values to insert by hand.

Your select is not doing what you think it does.
The most compact version in PostgreSQL would be something like this:
with data(first_name, last_name, uid) as (
values
( 'John', 'Doe', '3sldkjfksjd'),
( 'Jane', 'Doe', 'adslkejkdsjfds')
)
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from data d
where not exists (select 1
from users u2
where u2.uid = d.uid);
Which is pretty much equivalent to:
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from (
select 'John' as first_name, 'Doe' as last_name, '3sldkjfksjd' as uid
union all
select 'Jane', 'Doe', 'adslkejkdsjfds'
) as d
where not exists (select 1
from users u2
where u2.uid = d.uid);

a_horse_with_no_name's answer actually has a syntax error, missing a final closing right parens, but other than that is the correct way to do this.
Update:
For anyone coming to this with a situation like mine, if you have columns that need to be type cast (for instance timestamps or uuids or jsonb in PG 9.5), you must declare that in the values you pass to the query:
-- insert multiple if not exists
-- where another_column_name is of type uuid, with strings cast as uuids
-- where created_at and updated_at is of type timestamp, with strings cast as timestamps
WITH data (id, some_column_name, another_column_name, created_at, updated_at) AS (
VALUES
(<id value>, <some_column_name_value>, 'a5fa7660-8273-4ffd-b832-d94f081a4661'::uuid, '2016-06-13T12:15:27.552-07:00'::timestamp, '2016-06-13T12:15:27.879-07:00'::timestamp),
(<id value>, <some_column_name_value>, 'b9b17117-1e90-45c5-8f62-d03412d407dd'::uuid, '2016-06-13T12:08:17.683-07:00'::timestamp, '2016-06-13T12:08:17.801-07:00'::timestamp)
)
INSERT INTO table_name (id, some_column_name, another_column_name, created_at, updated_at)
SELECT d.id, d.survey_id, d.arrival_uuid, d.gf_created_at, d.gf_updated_at
FROM data d
WHERE NOT EXISTS (SELECT 1 FROM table_name t WHERE t.id = d.id);
a_horse_with_no_name's answer saved me today on a project, but had to make these tweaks to make it perfect.

Related

PostgreSQL - JSON function

There's a different format for row_to_json and json_build_object.
For row_to_json the input is:
select ... <table.field> "<json_key"
And for the json_build_object:
"<json_key" , <table.field>
That's surprising to me. Oracle uses similar JSON functions with similar input. This may look like a minor thing, but it makes it harder for me to automate the generation of SELECT statements for complex/nested structures.
Consider this simple example:
CREATE TABLE "users" (
id SERIAL NOT NULL,
name VARCHAR(100),
email_address VARCHAR(150),
PRIMARY KEY(id)
);
INSERT INTO "users" ("id", "name", "email_address")
VALUES (1, 'user1', 'user1#mail.com'), (2, 'user2', 'user2#mail.com');
To fetch the email and the name where email is a dictionary object:
"{""user_name"":""user1"",""email"":{""email_address"" : ""user1#mail.com""}}"
The query would be:
select row_to_json(users) from
(select users.name "user_name" ,
(json_build_object( 'email_address' , users.email_address )) email
from users ) users;
I would like to use the same order of fields in both cases.
The input would be:
<table.field> "<json_key" OR "<json_key" <table.field>
I tried to flip the order using "AS" but it doesn't work:
select row_to_json(users) from
(select users.name "user_name" ,
(json_build_object( users.email_address as 'email_address' )) email
from users ) users;
Maybe with using only row_to_json() or json_build_object()?
Or anything else? (But no left join.)
row_to_json takes an entire row tuple (a record) as input, such as you get it from a subquery. If you don't use those, or don't like row_to_json, don't use it. It's a convenience mainly if you use SELECT * and not want to specify column/field names.
I recommend to use json_build_object with explicit field names, especially if you want to rename them anyway. Your query can and should be written as
SELECT json_build_object(
'user_name', users.name,
'email', json_build_object(
'email_address', users.email_address
)
) AS json_result
FROM users;

Can postgreSQL OnConflict combine with JSON obejcts?

I wanted to perform a conditional insert in PostgreSQL. Something like:
INSERT INTO {TABLE_NAME} (user_id, data) values ('{user_id}', '{data}')
WHERE not exists(select 1 from files where user_id='{user_id}' and data->'userType'='Type1')
Unfortunately, insert and where does not cooperate in PostGreSQL. What could be a suitable syntax for my query? I was considering ON CONFLICT, but couldn't find the syntax for using it with JSON object. (Data in the example)
Is it possible?
Rewrite the VALUES part to a SELECT, then you can use a WHERE condition:
INSERT INTO { TABLE_NAME } ( user_id, data )
SELECT
user_id,
data
FROM
( VALUES ( '{user_id}', '{data}' ) ) sub ( user_id, data )
WHERE
NOT EXISTS (
SELECT 1
FROM files
WHERE user_id = '{user_id}'
AND data -> 'userType' = 'Type1'
);
But, there is NO guarantee that the WHERE condition works! Another transaction that has not been committed yet, is invisible to this query. This could lead to data quality issues.
You can use INSERT ... SELECT ... WHERE ....
INSERT INTO elbat
(user_id,
data)
SELECT 'abc',
'xyz'
WHERE NOT EXISTS (SELECT *
FROM files
WHERE user_id = 'abc'
AND data->>'userType' = 'Type1')
And it looks like you're creating the query in a host language. Don't use string concatenation or interpolation for getting the values in it. That's error prone and makes your application vulnerable to SQL injection attacks. Look up how to use parameterized queries in your host language. Very likely for the table name parameters cannot be used. You need some other method of either whitelisting the names or properly quoting them.

Using ANY with raw data work but not subquery

I just can't figure it out why this query work
SELECT id, name, organization_id
FROM facilities
WHERE organization_id = ANY(
'{abc-xyz-123,678-ght-nmp}'
)
But this query wont work with error operator does not exist: uuid = uuid[]
SELECT id, name, organization_id
FROM facilities
WHERE organization_id = ANY(
SELECT organization_ids
FROM admins
WHERE id = 'jkl-iop-345'
)
When the subquery
SELECT organization_ids
FROM admins
WHERE id = 'jkl-iop-345'
give the exact result of {abc-xyz-123,678-ght-nmp}.
I'm using postgres (PostgreSQL) 13.3
The subquery produces one row that contains an array.
If you use = ANY (SELECT ...), the result set is converted to an array, so you end up with
{{abc-xyz-123,678-ght-nmp}}
which is an array of arrays.
You probably want
SELECT id, name, organization_id
FROM facilities
WHERE EXISTS (SELECT 1 FROM admins
WHERE admins.id = 'jkl-iop-345'
AND facilities.organization_id = ANY (admins.organization_ids)
);
Let me remark that storing references to other tables in an array, JSON or other composite data type is an exceptionally bad idea. A normalized schema with a junction table would serve you better.

Perform search based on seperate table data in PostgreSQL

I have a table which looks like this:
deals
id vendorid name
1 52 '25% Off'
2 34 '10% Off'
-
vendors
id name
52 'Walmart'
34 'Home Depot'
I'm trying to do a search of the database that also searches based on the vendor id.
I was able to do this to get the vendor data in the query:
SELECT *,
(SELECT json_agg(vendors.*) FROM vendors WHERE deals.vendorid = vendors.id) as vendor
FROM deals
Now I want to add a search to it, here is something I tried
SELECT *,
(SELECT json_agg(vendors.*) FROM vendors WHERE deals.vendorid = vendors.id) as vendor
FROM deals
WHERE vendor.name ILIKE '%wal%'
In the above example the user would be starting a search for walmart however it says that the vendor.name column does not exist, what would the correct way to do this be?
The output I'm expecting from the above query is:
[
{
id: 1,
vendorid: 52,
name: '25% Off',
vendor: [
{
id: 52,
name: 'Walmart'
}
]
}
]
I was able to get it done by using this.
SELECT
*,
(SELECT json_agg(vendors.*) FROM vendors WHERE deals.vendorid = vendors.id) as vendor
FROM
deals
where (SELECT json_agg(vendors.*) FROM vendors WHERE deals.vendorid = vendors.id).name ILIKE '%wal%'
It's pretty ugly but get the work done. I am not really familiar with postgree but you can store it in a variable somehow.
The query you need
drop table if exists deals;
drop table if exists vendors;
create table if not exists deals(
id serial,
vendorid int not null,
name varchar not null
);
create table if not exists vendors(
id serial,
name varchar not null
);
insert into deals values
(1,52,'25% off'),
(2,34,'10% off');
insert into vendors values
(52,'Walmart'),
(34,'Home Depot');
select
d.id,
d.vendorid,
v.name,
json_agg(v.*)
from
vendors v
left join
deals d on d.vendorid = v.id
where
--v.id=52 and
v.name ILIKE '%wal%'
group by
d.id,
d.vendorid,
v.name;
If you are running a 9.3+ version of postgres, you can use a great feature provided by postgres : the full text search
check how it works here
Enjoy postgres power.

DB2 loses track of table after getting 3 levels deep into a subquery

With this standard authors/books setup:
CREATE TABLE authors (
id int NOT NULL,
name varchar(255) NOT NULL
)
CREATE TABLE books (
id int NOT NULL,
name varchar(255) NOT NULL,
author_id int NOT NULL,
sold int NOT NULL
)
INSERT INTO authors VALUES (1, 'author 1')
INSERT INTO authors VALUES (2, 'author 2')
INSERT INTO books VALUES (1, 'book 1', 1, 10)
INSERT INTO books VALUES (2, 'book 2', 1, 5)
INSERT INTO books VALUES (3, 'book 3', 2, 7)
this query somehow doesn't work:
SELECT
(
SELECT
count(*)
FROM
(
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The error message is:
SQL0204N "AUTHORS.ID" is an undefined name. SQLSTATE=42704
It looks like DB2 loses track of the outermost table after getting 3 levels deep into a subquery?
NOTE: This is just a fabricated query so it may not make much sense (and can be easily rewritten to only have 2 levels nesting which works fine). I just want to confirm if DB2 indeed has such a glaring shortcoming.
Just found the solution which is rather simple. DB2 has this LATERAL keyword which is needed for such query to work, e.g.
SELECT
(
SELECT
count(*)
FROM
LATERAL( -- this is the only change
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The solution came from this blog post https://www.ibm.com/developerworks/mydeveloperworks/blogs/SQLTips4DB2LUW/entry/scoping_rules_in_db2125?lang=en, where the author also noticed the same shortcoming in DB2:
But DB2 also didn't jump two levels up to S.c1. I suppose it could but, alas, it does not.
The problem is the innermost query.
You can't just compare to authors.id without having authors in the FROM clause.
This also fails in MySQL with the exact same error:
ERROR 1054 (42S22): Unknown column 'authors.id' in 'where clause'
I would suspect that the query is indeed incorrect. I believe that you need a reference to the authors table in the FROM clause on the innermost query. I've been doing a lot of NOSQL stuff lately so my SQL query skills are a little dusty, but I'm pretty sure that an inner query cannot reach out to other tables.
Rewrite the query using joins instead of nested queries if you can. The performance of nested queries tend to optimize poorly anyway (unsubstantiated in DB2, but true in MySQL 5.1 at least).