PostgreSQL - Comparison operator with character varying - Exclude values - postgresql

I want to query a PostgreSQL table with comparison operators. This table have two character varying columns.
Table
CREATE TABLE IF NOT EXISTS test.test
(
scope character varying COLLATE pg_catalog."default",
project_code character varying COLLATE pg_catalog."default"
)
Values
INSERT INTO test.test(scope, project_code) VALUES (NULL, 'AA');
INSERT INTO test.test(scope, project_code) VALUES ('A', 'AA');
When I wan't to query values with a project_code = 'AA' and a scope = 'A', I write:
SELECT * FROM test.test WHERE project_code LIKE 'AA' AND scope LIKE 'A';
It returns me one row, result is ok.
But when I try to query values with a project_code = 'AA' and scope with any other values than 'A', I write:
SELECT * FROM test.test WHERE project_code LIKE 'AA' AND scope NOT LIKE 'A';
It doesn't return me any results. But I have a row who match this. How to explain this and how to write this query ?
I try other comparaison operators <> and !=, same result. I'm using PostgreSQL 13.6.

You need to use a NULL safe comparison operator. The SQL standard defines the is not distinct from operator as the NULL safe version of <> and Postgres supports this:
SELECT *
FROM test.test
WHERE project_code = 'AA'
AND scope IS DISTINCT FROM 'A';

NULL in most operations will return NULL. For example
SELECT NULL LIKE 'A', NULL NOT LIKE 'A'
returns (NULL, NULL). Probably handling the NULL case specifically helps:
SELECT
*
FROM
test.test
WHERE
project_code LIKE 'AA'
AND (scope IS NULL OR scope NOT LIKE 'A')
The solution offered by #a_horse_with_no_name is more elegant; this solution may be interesting when using "wildcards" in the LIKE operator.

select null like 'a' is true; --return false
select null not like 'a' is true; --return false
select null like 'a'; --return null
select null not like 'a' ; --return null
https://www.postgresql.org/docs/current/functions-matching.html.
If pattern does not contain percent signs or underscores, then the
pattern only represents the string itself; in that case LIKE acts like
the equals operator. An underscore (_) in pattern stands for (matches)
any single character; a percent sign (%) matches any sequence of zero
or more characters.

Related

Postgres query for IN(NULL, 'test') does not work

When I wan't to match a column that has some certain string values or is null, I assumed I can do something like this:
SELECT * FROM table_name WHERE column_name IN (NULL, 'someTest', 'someOtherTest');
But it does not return the columns where column_name set set to NULL. Is this anywhere documented? Why does it not work?
You can't compare NULL values using = (which is what IN is doing).
Quote from the manual
Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL
You need to add a check for NULL explicitly:
SELECT *
FROM table_name
WHERE (column_name IN ('someTest', 'someOtherTest') OR column_name IS NULL);
NULL and empty string (i.e ' ') both are considered different in Postgres, unlike Oracle.
The query can be modified as:
SELECT *
FROM table_name
WHERE (column_name IN ('someTest', 'someOtherTest', '', ' ') OR
column_name IS NULL);

Concatenate string instead of just replacing it

I have a table with standard columns where I want to perform regular INSERTs.
But one of the columns is of type varchar with special semantics. It's a string that's supposed to behave as a set of strings, where the elements of the set are separated by commas.
Eg. if one row has in that varchar column the value fish,sheep,dove, and I insert the string ,fish,eagle, I want the result to be fish,sheep,dove,eagle (ie. eagle gets added to the set, but fish doesn't because it's already in the set).
I have here this Postgres code that does the "set concatenation" that I want:
SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array('fish,sheep,dove' || ',fish,eagle', ','))) AS x;
But I can't figure out how to apply this logic to insertions.
What I want is something like:
CREATE TABLE IF NOT EXISTS t00(
userid int8 PRIMARY KEY,
a int8,
b varchar);
INSERT INTO t00 (userid,a,b) VALUES (0,1,'fish,sheep,dove');
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x;
How can I achieve something like that?
Storing comma separated values is a huge mistake to begin with. But if you really want to make your life harder than it needs to be, you might want to create a function that merges two comma separated lists:
create function merge_lists(p_one text, p_two text)
returns text
as
$$
select string_agg(item, ',')
from (
select e.item
from unnest(string_to_array(p_one, ',')) as e(item)
where e.item <> '' --< necessary because of the leading , in your data
union
select t.item
from unnest(string_to_array(p_two, ',')) t(item)
where t.item <> ''
) t;
$$
language sql;
If you are using Postgres 14 or later, unnest(string_to_array(..., ',')) can be replace with string_to_table(..., ',')
Then your INSERT statement gets a bit simpler:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = merge_lists(excluded.b, t00.b);
I think I was only missing parentheses around the SELECT statement:
INSERT INTO t00 (userid,a,b) VALUES (0,1,',fish,eagle')
ON CONFLICT (userid)
DO UPDATE SET
a = EXCLUDED.a,
b = (SELECT string_agg(unnest, ',') AS x FROM (SELECT DISTINCT unnest(string_to_array(t00.b || EXCLUDED.b, ','))) AS x);

How to use queried table name in subquery

I'm trying to query field names as well as their maximum length in their corresponding table with a single query - is it at all possible? I've read about correlated subqueries, but I couldn't get the desired result.
Here is the query I have so far:
select T1.RDB$FIELD_NAME, T2.RDB$FIELD_NAME, T2.RDB$RELATION_NAME as tabName, T1.RDB$CHARACTER_SET_ID, T1.RDB$FIELD_LENGTH,
(select max(char_length(T2.RDB$FIELD_NAME))
FROM tabName as MaxLength)
from RDB$FIELDS T1, RDB$RELATION_FIELDS T2
The above doesn't work because, of course, here the subquery tries to find "tabName" table. My guess is that I should use some kind of joins, but my SQL skills are very limited in this matter.
The origin of the request is that I want to apply this script in order to transform all my non-utf8 fields to UTF8 but I run into "string truncation" issues, as I have a few `VARCHAR(8192)' fields that lead to string truncation errors with the script. Usually, none of the fields would actually use these 8192 chars, but I'd rather make sure before truncating.
What you're trying to do cannot be done this way. It looks like you want to obtain the actual maximum length of fields in tables, but you cannot dynamically reference table and column names like this; being able to do that would be a SQL injection heaven. In addition, your use of a SQL-89 cross join instead of an inner join (preferably in SQL-92 style) causes other problems, as you will combine fields incorrectly (as a Cartesian product).
Instead you need to write PSQL to dynamically build and execute the statement to obtain the lengths (using EXECUTE BLOCK (or a stored procedure) and EXECUTE STATEMENT).
For example, something like this:
execute block
returns (
table_name varchar(63) character set unicode_fss,
column_name varchar(63) character set unicode_fss,
type varchar(10),
length smallint,
charset_name varchar(63) character set unicode_fss,
collation_name varchar(63) character set unicode_fss,
max_length smallint)
as
begin
for select
trim(rrf.RDB$RELATION_NAME) as table_name,
trim(rrf.RDB$FIELD_NAME) as column_name,
case rf.RDB$FIELD_TYPE when 14 then 'CHAR' when 37 then 'VARCHAR' end as type,
coalesce(rf.RDB$CHARACTER_LENGTH, rf.RDB$FIELD_LENGTH / rcs.RDB$BYTES_PER_CHARACTER) as length,
trim(rcs.RDB$CHARACTER_SET_NAME) as charset_name,
trim(rc.RDB$COLLATION_NAME) as collation_name
from RDB$RELATIONS rr
inner join RDB$RELATION_FIELDS rrf
on rrf.RDB$RELATION_NAME = rr.RDB$RELATION_NAME
inner join RDB$FIELDS rf
on rf.RDB$FIELD_NAME = rrf.RDB$FIELD_SOURCE
inner join RDB$CHARACTER_SETS rcs
on rcs.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
left join RDB$COLLATIONS rc
on rc.RDB$CHARACTER_SET_ID = rf.RDB$CHARACTER_SET_ID
and rc.RDB$COLLATION_ID = rf.RDB$COLLATION_ID
and rc.RDB$COLLATION_NAME <> rcs.RDB$DEFAULT_COLLATE_NAME
where coalesce(rr.RDB$RELATION_TYPE, 0) = 0 and coalesce(rr.RDB$SYSTEM_FLAG, 0) = 0
and rf.RDB$FIELD_TYPE in (14 /* char */, 37 /* varchar */)
into table_name, column_name, type, length, charset_name, collation_name
do
begin
execute statement 'select max(character_length("' || replace(column_name, '"', '""') || '")) from "' || replace(table_name, '"', '""') || '"'
into max_length;
suspend;
end
end
As an aside, the maximum length of a VARCHAR of character set UTF8 is 8191, not 8192.

Why does atttypmod differ from character_maximum_length?

I'm converting some information_schema queries to system catalog queries and I'm getting different results for character maximum length.
SELECT column_name,
data_type ,
character_maximum_length AS "maxlen"
FROM information_schema.columns
WHERE table_name = 'x'
returns the results I expect, e.g.:
city character varying 255
company character varying 1000
The equivalent catalog query
SELECT attname,
atttypid::regtype AS datatype,
NULLIF(atttypmod, -1) AS maxlen
FROM pg_attribute
WHERE CAST(attrelid::regclass AS varchar) = 'x'
AND attnum > 0
AND NOT attisdropped
Seems to return every length + 4:
city character varying 259
company character varying 1004
Why the difference? Is it safe to always simply subtract 4 from the result?
You could say it's safe to substract 4 from the result for types char and varchar. What information_schema.columns view does under the hood is it calls a function informatoin_schema._pg_char_max_length (this is your difference, since you don't), which body is:
CREATE OR REPLACE FUNCTION information_schema._pg_char_max_length(typid oid, typmod integer)
RETURNS integer
LANGUAGE sql
IMMUTABLE PARALLEL SAFE STRICT
AS $function$SELECT
CASE WHEN $2 = -1 /* default typmod */
THEN null
WHEN $1 IN (1042, 1043) /* char, varchar */
THEN $2 - 4
WHEN $1 IN (1560, 1562) /* bit, varbit */
THEN $2
ELSE null
END$function$
That said, for chars and varchars it always substracts 4.
This makes your query not equivalent to the extent that it would actually need a join to pg_type in order to establish the typid of the column and wrap the value in a function to have it return proper values. This is due to the fact, that there are more things coming into play than just that. If you wish to simplify, you can do it without a join (it won't be bulletproof though):
SELECT attname,
atttypid::regtype AS datatype,
NULLIF(information_schema._pg_char_max_length(atttypid, atttypmod), -1) AS maxlen
FROM pg_attribute
WHERE CAST(attrelid::regclass AS varchar) = 'x'
AND attnum > 0
AND NOT attisdropped
This should do it for you. Should you wish to investigate the matter further, refer to view definition of information_schema.columns.

Merging Concatenating JSON(B) columns in query

Using Postgres 9.4, I am looking for a way to merge two (or more) json or jsonb columns in a query. Consider the following table as an example:
id | json1 | json2
----------------------------------------
1 | {'a':'b'} | {'c':'d'}
2 | {'a1':'b2'} | {'f':{'g' : 'h'}}
Is it possible to have the query return the following:
id | json
----------------------------------------
1 | {'a':'b', 'c':'d'}
2 | {'a1':'b2', 'f':{'g' : 'h'}}
Unfortunately, I can't define a function as described here. Is this possible with a "traditional" query?
In Postgres 9.5+ you can merge JSONB like this:
select json1 || json2;
Or, if it's JSON, coerce to JSONB if necessary:
select json1::jsonb || json2::jsonb;
Or:
select COALESCE(json1::jsonb||json2::jsonb, json1::jsonb, json2::jsonb);
(Otherwise, any null value in json1 or json2 returns an empty row)
For example:
select data || '{"foo":"bar"}'::jsonb from photos limit 1;
?column?
----------------------------------------------------------------------
{"foo": "bar", "preview_url": "https://unsplash.it/500/720/123"}
Kudos to #MattZukowski for pointing this out in a comment.
Here is the complete list of build-in functions that can be used to create json objects in PostgreSQL. http://www.postgresql.org/docs/9.4/static/functions-json.html
row_to_json and json_object doest not allow you to define your own keys, so it can't be used here
json_build_object expect you to know by advance how many keys and values our object will have, that's the case in your example, but should not be the case in the real world
json_object looks like a good tool to tackle this problem but it forces us to cast our values to text so we can't use this one either
Well... ok, wo we can't use any classic functions.
Let's take a look at some aggregate functions and hope for the best... http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
json_object_agg Is the only aggregate function that build objects, that's our only chance to tackle this problem. The trick here is to find the correct way to feed the json_object_agg function.
Here is my test table and data
CREATE TABLE test (
id SERIAL PRIMARY KEY,
json1 JSONB,
json2 JSONB
);
INSERT INTO test (json1, json2) VALUES
('{"a":"b", "c":"d"}', '{"e":"f"}'),
('{"a1":"b2"}', '{"f":{"g" : "h"}}');
And after some trials and errors with json_object here is a query you can use to merge json1 and json2 in PostgreSQL 9.4
WITH all_json_key_value AS (
SELECT id, t1.key, t1.value FROM test, jsonb_each(json1) as t1
UNION
SELECT id, t1.key, t1.value FROM test, jsonb_each(json2) as t1
)
SELECT id, json_object_agg(key, value)
FROM all_json_key_value
GROUP BY id
For PostgreSQL 9.5+, look at Zubin's answer.
This function would merge nested json objects
create or replace function jsonb_merge(CurrentData jsonb,newData jsonb)
returns jsonb
language sql
immutable
as $jsonb_merge_func$
select case jsonb_typeof(CurrentData)
when 'object' then case jsonb_typeof(newData)
when 'object' then (
select jsonb_object_agg(k, case
when e2.v is null then e1.v
when e1.v is null then e2.v
when e1.v = e2.v then e1.v
else jsonb_merge(e1.v, e2.v)
end)
from jsonb_each(CurrentData) e1(k, v)
full join jsonb_each(newData) e2(k, v) using (k)
)
else newData
end
when 'array' then CurrentData || newData
else newData
end
$jsonb_merge_func$;
Looks like nobody proposed this kind of solution yet, so here's my take, using custom aggregate functions:
create or replace aggregate jsonb_merge_agg(jsonb)
(
sfunc = jsonb_concat,
stype = jsonb,
initcond = '{}'
);
create or replace function jsonb_concat(a jsonb, b jsonb) returns jsonb
as 'select $1 || $2'
language sql
immutable
parallel safe
;
Note: this is using || which replaces existing values at same path instead of deeply merging them.
Now jsonb_merge_agg is accessible like so:
select jsonb_merge_agg(some_col) from some_table group by something;
Also you can tranform json into text, concatenate, replace and convert back to json. Using the same data from Clément you can do:
SELECT replace(
(json1::text || json2::text),
'}{',
', ')::json
FROM test
You could also concatenate all json1 into single json with:
SELECT regexp_replace(
array_agg((json1))::text,
'}"(,)"{|\\| |^{"|"}$',
'\1',
'g'
)::json
FROM test
This is a very old solution, since 9.4 you should use json_object_agg and simple || concatenate operator. Keeping here just for reference.
However this question is answered already some time ago; the fact that when json1 and json2 contain the same key; the key appears twice in the document, does not seem to be best practice.
Therefore u can use this jsonb_merge function with PostgreSQL 9.5:
CREATE OR REPLACE FUNCTION jsonb_merge(jsonb1 JSONB, jsonb2 JSONB)
RETURNS JSONB AS $$
DECLARE
result JSONB;
v RECORD;
BEGIN
result = (
SELECT json_object_agg(KEY,value)
FROM
(SELECT jsonb_object_keys(jsonb1) AS KEY,
1::int AS jsb,
jsonb1 -> jsonb_object_keys(jsonb1) AS value
UNION SELECT jsonb_object_keys(jsonb2) AS KEY,
2::int AS jsb,
jsonb2 -> jsonb_object_keys(jsonb2) AS value ) AS t1
);
RETURN result;
END;
$$ LANGUAGE plpgsql;
The following query returns the concatenated jsonb columns, where the keys in json2 are dominant over the keys in json1:
select id, jsonb_merge(json1, json2) from test
FYI, if someone's using jsonb in >= 9.5 and they only care about top-level elements being merged without duplicate keys, then it's as easy as using the || operator:
select '{"a1": "b2"}'::jsonb || '{"f":{"g" : "h"}}'::jsonb;
?column?
-----------------------------
{"a1": "b2", "f": {"g": "h"}}
(1 row)
Try this, if anyone having an issue for merging two JSON object
select table.attributes::jsonb || json_build_object('foo',1,'bar',2)::jsonb FROM table where table.x='y';
CREATE OR REPLACE FUNCTION jsonb_merge(pCurrentData jsonb, pMergeData jsonb, pExcludeKeys text[])
RETURNS jsonb IMMUTABLE LANGUAGE sql
AS $$
SELECT json_object_agg(key,value)::jsonb
FROM (
WITH to_merge AS (
SELECT * FROM jsonb_each(pMergeData)
)
SELECT *
FROM jsonb_each(pCurrentData)
WHERE key NOT IN (SELECT key FROM to_merge)
AND ( pExcludeKeys ISNULL OR key <> ALL(pExcludeKeys))
UNION ALL
SELECT * FROM to_merge
) t;
$$;
SELECT jsonb_merge('{"a": 1, "b": 9, "c": 3, "e":5}'::jsonb, '{"b": 2, "d": 4}'::jsonb, '{"c","e"}'::text[]) as jsonb
works well as an alternative to || when recursive deep merge is required (found here) :
create or replace function jsonb_merge_recurse(orig jsonb, delta jsonb)
returns jsonb language sql as $$
select
jsonb_object_agg(
coalesce(keyOrig, keyDelta),
case
when valOrig isnull then valDelta
when valDelta isnull then valOrig
when (jsonb_typeof(valOrig) <> 'object' or jsonb_typeof(valDelta) <> 'object') then valDelta
else jsonb_merge_recurse(valOrig, valDelta)
end
)
from jsonb_each(orig) e1(keyOrig, valOrig)
full join jsonb_each(delta) e2(keyDelta, valDelta) on keyOrig = keyDelta
$$;