There's a different format for row_to_json and json_build_object.
For row_to_json the input is:
select ... <table.field> "<json_key"
And for the json_build_object:
"<json_key" , <table.field>
That's surprising to me. Oracle uses similar JSON functions with similar input. This may look like a minor thing, but it makes it harder for me to automate the generation of SELECT statements for complex/nested structures.
Consider this simple example:
CREATE TABLE "users" (
id SERIAL NOT NULL,
name VARCHAR(100),
email_address VARCHAR(150),
PRIMARY KEY(id)
);
INSERT INTO "users" ("id", "name", "email_address")
VALUES (1, 'user1', 'user1#mail.com'), (2, 'user2', 'user2#mail.com');
To fetch the email and the name where email is a dictionary object:
"{""user_name"":""user1"",""email"":{""email_address"" : ""user1#mail.com""}}"
The query would be:
select row_to_json(users) from
(select users.name "user_name" ,
(json_build_object( 'email_address' , users.email_address )) email
from users ) users;
I would like to use the same order of fields in both cases.
The input would be:
<table.field> "<json_key" OR "<json_key" <table.field>
I tried to flip the order using "AS" but it doesn't work:
select row_to_json(users) from
(select users.name "user_name" ,
(json_build_object( users.email_address as 'email_address' )) email
from users ) users;
Maybe with using only row_to_json() or json_build_object()?
Or anything else? (But no left join.)
row_to_json takes an entire row tuple (a record) as input, such as you get it from a subquery. If you don't use those, or don't like row_to_json, don't use it. It's a convenience mainly if you use SELECT * and not want to specify column/field names.
I recommend to use json_build_object with explicit field names, especially if you want to rename them anyway. Your query can and should be written as
SELECT json_build_object(
'user_name', users.name,
'email', json_build_object(
'email_address', users.email_address
)
) AS json_result
FROM users;
Related
I wanted to perform a conditional insert in PostgreSQL. Something like:
INSERT INTO {TABLE_NAME} (user_id, data) values ('{user_id}', '{data}')
WHERE not exists(select 1 from files where user_id='{user_id}' and data->'userType'='Type1')
Unfortunately, insert and where does not cooperate in PostGreSQL. What could be a suitable syntax for my query? I was considering ON CONFLICT, but couldn't find the syntax for using it with JSON object. (Data in the example)
Is it possible?
Rewrite the VALUES part to a SELECT, then you can use a WHERE condition:
INSERT INTO { TABLE_NAME } ( user_id, data )
SELECT
user_id,
data
FROM
( VALUES ( '{user_id}', '{data}' ) ) sub ( user_id, data )
WHERE
NOT EXISTS (
SELECT 1
FROM files
WHERE user_id = '{user_id}'
AND data -> 'userType' = 'Type1'
);
But, there is NO guarantee that the WHERE condition works! Another transaction that has not been committed yet, is invisible to this query. This could lead to data quality issues.
You can use INSERT ... SELECT ... WHERE ....
INSERT INTO elbat
(user_id,
data)
SELECT 'abc',
'xyz'
WHERE NOT EXISTS (SELECT *
FROM files
WHERE user_id = 'abc'
AND data->>'userType' = 'Type1')
And it looks like you're creating the query in a host language. Don't use string concatenation or interpolation for getting the values in it. That's error prone and makes your application vulnerable to SQL injection attacks. Look up how to use parameterized queries in your host language. Very likely for the table name parameters cannot be used. You need some other method of either whitelisting the names or properly quoting them.
It's all in the title. Documentation has something like this:
SELECT *
FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
I have two tables, lab_tests and lab_tests_results. All of the lab_tests_results rows are tied to the primary key id integer in the lab_tests table. I'm trying to make a pivot table where the lab tests (identified by an integer) are row headers and the respective results are in the table. I can't get around a syntax error at or around the integer.
Is this possible with the current set up? Am I missing something in the documentation? Or do I need to perform an inner join of sorts to make the categories strings? Or modify the lab_tests_results table to use a text identifier for the lab tests?
Thanks for the help, all. Much appreciated.
Edit: Got it figured out with the help of Dmitry. He had the data layout figured out, but I was unclear on what kind of output I needed. I was trying to get the pivot table to be based on batch_id numbers in the lab_tests_results table. Had to hammer out the base query and casting data types.
SELECT *
FROM crosstab('SELECT lab_tests_results.batch_id, lab_tests.test_name, lab_tests_results.test_result::FLOAT
FROM lab_tests_results, lab_tests
WHERE lab_tests.id=lab_tests_results.lab_test AND (lab_tests.test_name LIKE ''Test Name 1'' OR lab_tests.test_name LIKE ''Test Name 2'')
ORDER BY 1,2'
) AS final_result(batch_id VARCHAR, test_name_1 FLOAT, test_name_2 FLOAT);
This provides a pivot table from the lab_tests_results table like below:
batch_id |test_name_1 |test_name_2
---------------------------------------
batch1 | result1 | <null>
batch2 | result2 | result3
If I understand correctly your tables look something like this:
CREATE TABLE lab_tests (
id INTEGER PRIMARY KEY,
name VARCHAR(500)
);
CREATE TABLE lab_tests_results (
id INTEGER PRIMARY KEY,
lab_tests_id INTEGER REFERENCES lab_tests (id),
result TEXT
);
And your data looks something like this:
INSERT INTO lab_tests (id, name)
VALUES (1, 'test1'),
(2, 'test2');
INSERT INTO lab_tests_results (id, lab_tests_id, result)
VALUES (1,1,'result1'),
(2,1,'result2'),
(3,2,'result3'),
(4,2,'result4'),
(5,2,'result5');
First of all crosstab is part of tablefunc, you need to enable it:
CREATE EXTENSION tablefunc;
You need to run it one per database as per this answer.
The final query will look like this:
SELECT *
FROM crosstab(
'SELECT lt.name::TEXT, lt.id, ltr.result
FROM lab_tests AS lt
JOIN lab_tests_results ltr ON ltr.lab_tests_id = lt.id'
) AS ct(test_name text, result_1 text, result_2 text, result_3 text);
Explanation:
The crosstab() function takes a text of a query which should return 3 columns; (1) a column for name of a group, (2) a column for grouping, (3) the value. The wrapping query just selects all the values those crosstab() returns and defines the list of columns after (the part after AS). First is the category name (test_name) and then the values (result_1, result_2). In my query I'll get up to 3 results. If I have more then 3 results then I won't see them, If I have less then 3 results I'll get nulls.
The result for this query is:
test_name |result_1 |result_2 |result_3
---------------------------------------
test1 |result1 |result2 |<null>
test2 |result3 |result4 |result5
I'm getting grips with the JSONB functionality in Postgres >= 9.5 (and loving it) but have hit a stumbling block. I've read about the ability to concatenate JSON fields, so '{"a":1}' || '{"b":2}' creates {"a":1,"b":2}, but I'd like to do this across the same field in multiple rows. e.g.:
select row_concat_??(data) from table where field = 'value'
I've discovered the jsonb_object_agg function which sounds like it would be what I want, except that the docs show it taking multiple arguments, and I only have one.
Any ideas how I would do this? jsonb_agg creates an array successfully so it feels like I'm really close.
After some digging around with custom aggregates in Postgres, I have the following:
DROP AGGREGATE IF EXISTS jsonb_merge(jsonb);
CREATE AGGREGATE jsonb_merge(jsonb) (
SFUNC = jsonb_concat(jsonb, jsonb),
STYPE = jsonb,
INITCOND = '{}'
)
Which is then usable as:
SELECT group_id, jsonb_merge(data) FROM table GROUP BY group_id
Use jsonb_each():
with data(js) as (
values
('{"a": 1}'::jsonb),
('{"b": 2}')
)
select jsonb_object_agg(key, value)
from data
cross join lateral jsonb_each(js);
jsonb_object_agg
------------------
{"a": 1, "b": 2}
(1 row)
I'd like to generate a single sql query to mass-insert a series of rows that don't exist on a table. My current setup makes a new query for each record insertion similar to the solution detailed in WHERE NOT EXISTS in PostgreSQL gives syntax error, but I'd like to move this to a single query to optimize performance since my current setup could generate several hundred queries at a time. Right now I'm trying something like the example I've added below:
INSERT INTO users (first_name, last_name, uid)
SELECT ( 'John', 'Doe', '3sldkjfksjd'), ( 'Jane', 'Doe', 'adslkejkdsjfds')
WHERE NOT EXISTS (
SELECT * FROM users WHERE uid IN ('3sldkjfksjd', 'adslkejkdsjfds')
)
Postgres returns the following error:
PG::Error: ERROR: INSERT has more target columns than expressions
The problem is that PostgresQL doesn't seem to want to take a series of values when using SELECT. Conversely, I can make the insertions using VALUES, but I can't then prevent duplicates from being generated using WHERE NOT EXISTS.
http://www.techonthenet.com/postgresql/insert.php suggests in the section EXAMPLE - USING SUB-SELECT that multiple records should be insertable from another referenced table using SELECT, so I'm wondering why I can't seem to pass in a series of values to insert. The values I'm passing are coming from an external API, so I need to generate the values to insert by hand.
Your select is not doing what you think it does.
The most compact version in PostgreSQL would be something like this:
with data(first_name, last_name, uid) as (
values
( 'John', 'Doe', '3sldkjfksjd'),
( 'Jane', 'Doe', 'adslkejkdsjfds')
)
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from data d
where not exists (select 1
from users u2
where u2.uid = d.uid);
Which is pretty much equivalent to:
insert into users (first_name, last_name, uid)
select d.first_name, d.last_name, d.uid
from (
select 'John' as first_name, 'Doe' as last_name, '3sldkjfksjd' as uid
union all
select 'Jane', 'Doe', 'adslkejkdsjfds'
) as d
where not exists (select 1
from users u2
where u2.uid = d.uid);
a_horse_with_no_name's answer actually has a syntax error, missing a final closing right parens, but other than that is the correct way to do this.
Update:
For anyone coming to this with a situation like mine, if you have columns that need to be type cast (for instance timestamps or uuids or jsonb in PG 9.5), you must declare that in the values you pass to the query:
-- insert multiple if not exists
-- where another_column_name is of type uuid, with strings cast as uuids
-- where created_at and updated_at is of type timestamp, with strings cast as timestamps
WITH data (id, some_column_name, another_column_name, created_at, updated_at) AS (
VALUES
(<id value>, <some_column_name_value>, 'a5fa7660-8273-4ffd-b832-d94f081a4661'::uuid, '2016-06-13T12:15:27.552-07:00'::timestamp, '2016-06-13T12:15:27.879-07:00'::timestamp),
(<id value>, <some_column_name_value>, 'b9b17117-1e90-45c5-8f62-d03412d407dd'::uuid, '2016-06-13T12:08:17.683-07:00'::timestamp, '2016-06-13T12:08:17.801-07:00'::timestamp)
)
INSERT INTO table_name (id, some_column_name, another_column_name, created_at, updated_at)
SELECT d.id, d.survey_id, d.arrival_uuid, d.gf_created_at, d.gf_updated_at
FROM data d
WHERE NOT EXISTS (SELECT 1 FROM table_name t WHERE t.id = d.id);
a_horse_with_no_name's answer saved me today on a project, but had to make these tweaks to make it perfect.
With this standard authors/books setup:
CREATE TABLE authors (
id int NOT NULL,
name varchar(255) NOT NULL
)
CREATE TABLE books (
id int NOT NULL,
name varchar(255) NOT NULL,
author_id int NOT NULL,
sold int NOT NULL
)
INSERT INTO authors VALUES (1, 'author 1')
INSERT INTO authors VALUES (2, 'author 2')
INSERT INTO books VALUES (1, 'book 1', 1, 10)
INSERT INTO books VALUES (2, 'book 2', 1, 5)
INSERT INTO books VALUES (3, 'book 3', 2, 7)
this query somehow doesn't work:
SELECT
(
SELECT
count(*)
FROM
(
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The error message is:
SQL0204N "AUTHORS.ID" is an undefined name. SQLSTATE=42704
It looks like DB2 loses track of the outermost table after getting 3 levels deep into a subquery?
NOTE: This is just a fabricated query so it may not make much sense (and can be easily rewritten to only have 2 levels nesting which works fine). I just want to confirm if DB2 indeed has such a glaring shortcoming.
Just found the solution which is rather simple. DB2 has this LATERAL keyword which is needed for such query to work, e.g.
SELECT
(
SELECT
count(*)
FROM
LATERAL( -- this is the only change
SELECT
books.name
FROM
books
WHERE
books.author_id = authors.id
AND books.sold > (
SELECT
avg(sold)
FROM
books
WHERE
books.author_id <> authors.id
)
) AS t
) AS good_selling_books
FROM
authors
WHERE
authors.id = 1
The solution came from this blog post https://www.ibm.com/developerworks/mydeveloperworks/blogs/SQLTips4DB2LUW/entry/scoping_rules_in_db2125?lang=en, where the author also noticed the same shortcoming in DB2:
But DB2 also didn't jump two levels up to S.c1. I suppose it could but, alas, it does not.
The problem is the innermost query.
You can't just compare to authors.id without having authors in the FROM clause.
This also fails in MySQL with the exact same error:
ERROR 1054 (42S22): Unknown column 'authors.id' in 'where clause'
I would suspect that the query is indeed incorrect. I believe that you need a reference to the authors table in the FROM clause on the innermost query. I've been doing a lot of NOSQL stuff lately so my SQL query skills are a little dusty, but I'm pretty sure that an inner query cannot reach out to other tables.
Rewrite the query using joins instead of nested queries if you can. The performance of nested queries tend to optimize poorly anyway (unsubstantiated in DB2, but true in MySQL 5.1 at least).