Better/shorter way to check inexistence of JSONB key - postgresql

Where data is JSONB column, I want to check if the key exists, normally I use:
SELECT id FROM users WHERE length(data->>'fingerprint_id') IS NULL;
Is there any better/shorter way to do this? since other alternative give incorrect result:
> SELECT id FROM users WHERE data->>'fingerprint_id' IS NULL;
ERROR: operator does not exist: jsonb ->> boolean
LINE 1: SELECT id FROM users WHERE data->>'fingerprint_id' IS NULL;
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
> SELECT id FROM users WHERE data->>'fingerprint_id' = '';
id
----
(0 rows)

There is an explicit operator (?) for this purpose:
where data_jsonb ? 'key'
But be aware, that this might cause some DB abstraction layers (f.ex. JDBC) to falsely recognize ? as an input parameter's placeholder.
As a workaround, you could use the jsonb_exists(json, text) function directly (but your code will then depend on an undocumented function), or define a similar operator for this, like in this answer.
More details about the (data ->> 'key') IS NULL syntax can be found here.

Apparently, I just need to enclose the query with () before IS NULL
SELECT id FROM users WHERE (data->>'fingerprint_id') IS NULL;

Related

Postgres: getting "... is out of range for type integer" when using NULLIF

For context, this issue occurred in a Go program I am writing using the default postgres database driver.
I have been building a service to talk to a postgres database which has a table similar to the one listed below:
CREATE TABLE object (
id SERIAL PRIMARY KEY NOT NULL,
name VARCHAR(255) UNIQUE,
some_other_id BIGINT UNIQUE
...
);
I have created some endpoints for this item including an "Install" endpoint which effectively acts as an upsert function like so:
INSERT INTO object (name, some_other_id)
VALUES ($1, $2)
ON CONFLICT name DO UPDATE SET
some_other_id = COALESCE(NULLIF($2, 0), object.some_other_id)
I also have an "Update" endpoint with an underlying query like so:
UPDATE object
SET some_other_id = COALESCE(NULLIF($2, 0), object.some_other_id)
WHERE name = $1
The problem:
Whenever I run the update query I always run into the error, referencing the field "some_other_id":
pq: value "1010101010144" is out of range for type integer
However this error never occurs on the "upsert" version of the query, even when the row already exists in the database (when it has to evaluate the COALESCE statement). I have been able to prevent this error by updating COALESCE statement to be as follows:
COALESCE(NULLIF($2, CAST(0 AS BIGINT)), object.some_other_id)
But as it never occurrs with the first query I wondered if this inconsitency had come from me doing something wrong or something that I don't understand? And also what the best practice is with this, should I be casting all values?
I am definitely passing in a 64 bit integer to the query for "some_other_id", and the first query works with the Go implementation even without the explicit type cast.
If any more information (or Go implementation) is required then please let me know, many thanks in advance! (:
Edit:
To eliminate confusion, the queries are being executed directly in Go code like so:
res, err := s.db.ExecContext(ctx, `UPDATE object SET some_other_id = COALESCE(NULLIF($2, 0), object.some_other_id) WHERE name = $1`,
"a name",
1010101010144,
)
Both queries are executed in exactly the same way.
Edit: Also corrected parameter (from $51 to $2) in my current workaround.
I would also like to take this opportunity to note that the query does work with my proposed fix, which suggests that the issue is in me confusing postgres with types in the NULLIF statement? There is no stored procedure asking for an INTEGER arg inbetween my code and the database, at least that I have written.
This has to do with how the postgres parser resolves types for the parameters. I don't know how exactly it's implemented, but given the observed behaviour, I would assume that the INSERT query doesn't fail because it is clear from (name,some_other_id) VALUES ($1,$2) that the $2 parameter should have the same type as the target some_other_id column, which is of type int8. This type information is then also used in the NULLIF expression of the DO UPDATE SET part of the query.
You can also test this assumption by using (name) VALUES ($1) in the INSERT and you'll see that the NULLIF expression in DO UPDATE SET will then fail the same way as it does in the UPDATE query.
So the UPDATE query fails because there is not enough context for the parser to infer the accurate type of the $2 parameter. The "closest" thing that the parser can use to infer the type of $2 is the NULLIF call expression, specifically it uses the type of the second argument of the call expression, i.e. 0, which is of type int4, and it then uses that type information for the first argument, i.e. $2.
To avoid this issue, you should use an explicit type cast with any parameter where the type cannot be inferred accurately. i.e. use NULLIF($2::int8, 0).
COALESCE(NULLIF($51, CAST(0 AS BIGINT)), object.some_other_id)
Fifty-one? Realy?
pq: value "1010101010144" is out of range for type integer
Pay attention, the data type in the error message is an integer, not bigint.
I think the reason for the error is out of showed code. So I take out a magic crystal ball and make a pass with my hands.
an "Install" endpoint which effectively acts as an upsert function like so
I also have an "Update" endpoint
Do you call endpoint a PostgreSQL function (stored procedure)? I think yes.
Also $1, $2 looks like PostgreSQL function arguments.
The magic crystal ball says: you have two PostgreSQL function with different data types of arguments:
"Install" endpoint has $2 function argument as a bigint data type. It looks like CREATE FUNCTION Install(VARCHAR(255), bigint)
"Update" endpoint has $2 function argument as an integer data type, not bigint. It looks like CREATE FUNCTION Update(VARCHAR(255), integer).
At last, I would rewrite your condition more understandable:
UPDATE object
SET some_other_id =
CASE
WHEN $2 = 0 THEN object.some_other_id
ELSE $2
END
WHERE name = $1

How to use sub-queries correctly inside a Postgresql query

I'm having troubles resetting the sequences as automatically as possible.
I'm trying to use the next query from phpPgAdmin:
SELECT SETVAL('course_subjects_seq', (SELECT MAX(subject_id) FROM course_subjects));
Somehow this query returns:
> HINT: No function matches the given name and argument types. You might need to add explicit type casts.
pointing to the first SELECT SETVAL
The next query will give the same error:
SELECT setval("course_subjects_seq", COALESCE((SELECT MAX(subject_id) FROM course_subjects), 1))
Can anyone point me to what am I doing wrong?
Fixed by doing so:
setval function requires regclass, bigint and boolean as arguments, therefore I added the type casts:
SELECT setval('course_subjects_seq'::regclass, COALESCE((SELECT MAX(subject_id) FROM course_subjects)::bigint, 1));
::regclass
and ::bigint
You don't need a subquery at all here. Can be a single SELECT:
SELECT setval(pg_get_serial_sequence('course_subjects', 'subject_id')
, COALESCE(max(subject_id) + 1, 1)
, false) -- not called yet
FROM course_subjects;
Assuming subject_id is a serial column, pg_get_serial_sequence() is useful so you don't have to know the sequence name (which is an implementation detail, really).
SELECT with an aggregate function like max() always returns a single row, even if the underlying table has no rows. The value is NULL in this case, that's why you have COALESCE in there.
But if you call setval() with 1, the next sequence returned number will be 2, not 1, since that is considered to be called already. There is an overloaded variant of setval() with a 3rd, boolean parameter: is_called, which makes it possible to actually start from 1 in this case like demonstrated.
Related:
How to reset postgres' primary key sequence when it falls out of sync?

Query array field for multiple values

I have a roles field in a Users table of type Array. I need to be able to get all users that have both role 'admin' and role 'member'. FOr now I use this query:
select * from users where 'member' = ANY(roles) and 'admin' = ANY(roles);
I was wondering if there was a cleaner way to do this.
Use the array-contained-by operator #<:
select * from users where ARRAY['member','admin']::varchar[] <# roles;
That'll let you index the lookup too.
(Correction per #bereal; I misread the question)
Or, if you meant that they must have both rights, use array-overlaps &&:
select * from users where ARRAY['member','admin']::varchar[] && roles;
Also, as your input turns out to be varchar[] (you didn't show your table definition), you must cast the array input to varchar[] too, as there's no implicit cast between array types.

select an item with a null comparison

I'm not using a full on DB Abstraction library, and am using raw sql templates in psycopg2 that look like this :
SELECT id FROM table WHERE message = %(message)s ;
The ideal query to retrieve my intended results looks something like this :
SELECT id FROM table WHERE message = 'a3cbb207' ;
SELECT id FROM table WHERE message IS NULL ;
Unfortunately... the obvious problem is that my NULL comparisons come out looking like this:
SELECT id FROM table WHERE message = NULL ;
... which is not the correct comparison - and doesn't give me the intended result set.
My actual queries are much more complex than the illustration above - so I can't change them easily. ( which would be the correct solution , i agree. i'm looking for an emergency fix right now )
Does anyone know of a workaround , so I can keep the same singular templates going until a proper fix is in place ? I was trying to get coalesce and/or cast to work , but I struck out with my attempts.
What you want is IS NOT DISTINCT FROM.
SELECT id FROM table WHERE message IS NOT DISTINCT FROM 'the text';
SELECT id FROM table WHERE message IS NOT DISTINCT FROM NULL;
NULL IS NOT DISTINCT FROM NULL is true, not NULL, so it's like = but with different NULL comparison semantics. Great in trigger functions.
AFAIK can't use IS DISTINCT FROM for index lookups though, so be careful there. It can be better to use separate tests for null and value.
You can try writing your query clause as follows:
WHERE message = %(message)s OR ((%message)s IS NULL AND message IS NULL))
It's a bit rough, but it means "select the message that match my parameter, or all the messages that are null if my parameter is null". It should do the trick.
Unfortunately, NULL does not actually equal anything (not even another NULL) as the value of NULL is intended to represent an unknown. Your best bet is to change your templates to handle this correctly.
If it's possible that you can pass in separate values for the left and right operand in your template, one way to still use an equal sign would be:
SELECT id FROM table WHERE true = (message is null);

How to make "case-insensitive" query in Postgresql?

Is there any way to write case-insensitive queries in PostgreSQL, E.g. I want that following 3 queries return same result.
SELECT id FROM groups where name='administrator'
SELECT id FROM groups where name='ADMINISTRATOR'
SELECT id FROM groups where name='Administrator'
Use LOWER function to convert the strings to lower case before comparing.
Try this:
SELECT id
FROM groups
WHERE LOWER(name)=LOWER('Administrator')
using ILIKE instead of LIKE
SELECT id FROM groups WHERE name ILIKE 'Administrator'
The most common approach is to either lowercase or uppercase the search string and the data. But there are two problems with that.
It works in English, but not in all languages. (Maybe not even in
most languages.) Not every lowercase letter has a corresponding
uppercase letter; not every uppercase letter has a corresponding
lowercase letter.
Using functions like lower() and upper() will give you a sequential
scan. It can't use indexes. On my test system, using lower() takes
about 2000 times longer than a query that can use an index. (Test data has a little over 100k rows.)
There are at least three less frequently used solutions that might be more effective.
Use the citext module, which mostly mimics the behavior of a case-insensitive data type. Having loaded that module, you can create a case-insensitive index by CREATE INDEX ON groups (name::citext);. (But see below.)
Use a case-insensitive collation. This is set when you initialize a
database. Using a case-insensitive collation means you can accept
just about any format from client code, and you'll still return
useful results. (It also means you can't do case-sensitive queries. Duh.)
Create a functional index. Create a lowercase index by using CREATE
INDEX ON groups (LOWER(name));. Having done that, you can take advantage
of the index with queries like SELECT id FROM groups WHERE LOWER(name) = LOWER('ADMINISTRATOR');, or SELECT id FROM groups WHERE LOWER(name) = 'administrator'; You have to remember to use LOWER(), though.
The citext module doesn't provide a true case-insensitive data type. Instead, it behaves as if each string were lowercased. That is, it behaves as if you had called lower() on each string, as in number 3 above. The advantage is that programmers don't have to remember to lowercase strings. But you need to read the sections "String Comparison Behavior" and "Limitations" in the docs before you decide to use citext.
You can use ILIKE. i.e.
SELECT id FROM groups where name ILIKE 'administrator'
You can also read up on the ILIKE keyword. It can be quite useful at times, albeit it does not conform to the SQL standard. See here for more information: http://www.postgresql.org/docs/9.2/static/functions-matching.html
You could also use POSIX regular expressions, like
SELECT id FROM groups where name ~* 'administrator'
SELECT 'asd' ~* 'AsD' returns t
use ILIKE
select id from groups where name ILIKE 'adminstration';
If your coming the expressjs background and name is a variable
use
select id from groups where name ILIKE $1;
Using ~* can improve greatly on performance, with functionality of INSTR.
SELECT id FROM groups WHERE name ~* 'adm'
return rows with name that contains OR equals to 'adm'.
ILIKE work in this case:
SELECT id
FROM groups
WHERE name ILIKE 'Administrator'
For a case-insensitive parameterized query, you can use the following syntax:
"select * from article where upper(content) LIKE upper('%' || $1 || '%')"
-- Install 'Case Ignore Test Extension'
create extension citext;
-- Make a request
select 'Thomas'::citext in ('thomas', 'tiago');
select name from users where name::citext in ('thomas', 'tiago');
If you want not only upper/lower case but also diacritics, you can implement your own func:
CREATE EXTENSION unaccent;
CREATE OR REPLACE FUNCTION lower_unaccent(input text)
RETURNS text
LANGUAGE plpgsql
AS $function$
BEGIN
return lower(unaccent(input));
END;
$function$;
Call is then
select lower_unaccent('Hôtel')
>> 'hotel'
A tested approach is using ~*
As in the example below
SELECT id FROM groups WHERE name ~* 'administrator'
select id from groups where name in ('administrator', 'ADMINISTRATOR', 'Administrator')