What is PostgreSQL syntax for complex array literals? (e.g. circle[]) - postgresql

The PostgreSQL docs state that, though some literals have special syntax, generic syntax for literal values looks like type 'data', 'data'::type, or CAST('data' AS type). For instance, one could write the integer 16 as 16 or as '16'::int. Dollar-quoting is also allowed, so $$16$$:int works as well.
For some types, such as circle, the generic syntax is (as far as I can tell) the only way to write a literal. Four syntaxes are listed for circle: <(x, y), r>, ((x, y), r), (x, y), r, and x, y, r; however, none of these seem to work plain:
y=> create table t (c circle);
CREATE TABLE
y=> insert into t (c) values ( <(1,2),3> );
ERROR: syntax error at or near "<"
LINE 1: insert into t (c) values ( <(1,2),3> );
^
y=> insert into t (c) values ( ((1,2),3) );
ERROR: column "c" is of type circle but expression is of type record
LINE 1: insert into t (c) values ( ((1,2),3) );
^
HINT: You will need to rewrite or cast the expression.
y=> insert into t (c) values ( (1,2),3 );
ERROR: INSERT has more expressions than target columns
LINE 1: insert into t (c) values ( (1,2),3 );
^
y=> insert into t (c) values ( 1,2,3 );
ERROR: INSERT has more expressions than target columns
LINE 1: insert into t (c) values ( 1,2,3 );
^
On the contrary, using the generic syntax works fine:
y=> insert into t (c) values ( '1,2,3' );
INSERT 0 1
y=> select c from t;
c
-----------
<(1,2),3>
Also documented is the array syntax, which is comma-delimited and curly-enclosed. So the range 1-5 may be written, for instance, as '{1, 2, 3, 4, 5}'::int[].
Unfortunately, the array syntax does not seem to support values specified with the generic syntax: $${ 1 }$$::int[] is accepted, but $${ '1'::int }::int[] is rejected, as is $${ '1' }::int[].
Because array literal syntax does not accept generic-form syntax for elements, and because generic syntax seems to be the only way to write a circle literal, it appears that it is impossible to write a circle literal in PostgreSQL.
y=> select $${ }$$::circle[];
circle
--------
{}
(1 row)
y=> select $${ '<(1, 2), 3>'::circle }$$::circle[];
ERROR: invalid input syntax for type circle: "'<(1"
LINE 1: select $${ '<(1, 2), 3>'::circle }$$::circle[];
Is this indeed the case?
Note: yes, it's crucial that I want to write a circle[] literal. The reason is that my use-case is for specifying values in prepared statements in a PostgreSQL client, and, as the PostgreSQL protocol documentation says (emphasis mine):
Parameter data types can be specified by OID; if not given, the parser attempts to infer the data types in the same way as it would do for untyped literal string constants.
This means that, for instance, ARRAY[ '<(1,2),3>'::circle ] is not a valid solution, as it does not use literal syntax.

Turns out the solution is simple: escape the commas!
y=> select '{ <(1\,2)\,3>, <(4\,5)\,6> }'::circle[];
circle
---------------------------
{"<(1,2),3>","<(4,5),6>"}
(1 row)
Also an option is to double-quote the elements:
y=> select '{ "<(1,2),3>", "<(4,5),6>" }'::circle[];
circle
---------------------------
{"<(1,2),3>","<(4,5),6>"}
(1 row)
Credit to sehrope on Github for telling me this. Quote from the relevant part of the PostgreSQL docs:
[...] when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. For example, elements containing curly braces, commas [...], double quotes, backslashes, [etc,] must be double-quoted. [...] Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax.

Related

SQLAlchemy IN_ - trouble with leading zeroes

In my sqlalchemy ( sqlalchemy = "^1.4.36" ) query I have a clause:
.filter( some_model.some_field[2].in_(['item1', 'item2']) )
where some_field is jsonb and the value in some_field value in the db formatted like this:
["something","something","123"]
or
["something","something","0123"]
note: some_field[2] is always digits-only double-quoted string, sometimes with leading zeroes and sometimes without them.
The query works fine for cases like this:
.filter( some_model.some_field[2].in_(['123', '345']) )
and fails when the values in the in_ clause have leading zeroes:
e.g. .filter( some_model.some_field[2].in_(['0123', '0345']) ) fails.
The error it gives:
cursor.execute(statement, parameters)\\npsycopg2.errors.InvalidTextRepresentation: invalid input syntax for type json\\nLINE 3: ...d_on) = 2 AND (app_cache.value_metadata -> 2) IN (\\'0123\\'\\n ^\\nDETAIL: Token \"0123\" is invalid.
Again, in the case of '123' (or any string of digits without leading zero) instead of '0123' the error is not thrown.
What is wrong with having leading zeroes for the strings in the list of in_ clause? Thanks.
UPDATE: basically, sqlachemy's IN_ assumes int input and fails accordingly. There must be some reasoning behind this behavior, can't tell what it is. I removed that filter fromm the query and did the filtering of the ouput in python code afterwards.
The problem here is that the values in the IN clause are being interpreted by PostgreSQL as JSON representations of integers, and an integer with a leading zero is not valid JSON.
The IN clause has a value of type jsonb on the left hand side. The values on the right hand side are not explicitly typed, so Postgres tries to find the best match that will allow them to be compared with a jsonb value. This type is jsonb, so Postgres attempts to cast the values to jsonb. This works for values without a leading zero, because digits in single quotes without leading zeroes are valid representations of integers in JSON:
test# select '123'::jsonb;
jsonb
═══════
123
(1 row)
but it doesn't work for values with leading zeroes, because they are not valid JSON:
test# select '0123'::jsonb;
ERROR: invalid input syntax for type json
LINE 1: select '0123'::jsonb;
^
DETAIL: Token "0123" is invalid.
CONTEXT: JSON data, line 1: 0123
Assuming that you expect some_field[2].in_(['123', '345']) and some_field[2].in_(['0123', '345']) to match ["something","something","123"] and ["something","something","123"] respectively, you can either serialise the values to JSON yourself:
some_field[2].in_([json.dumps(x) for x in ['0123', '345']])
or use the contained_by operator (<# in PostgreSQL), to test whether some_field[2] is present in the list of values:
some_field[2].contained_by(['0123', '345'])
or cast some_field[2] to text (that is, use the ->> operator) so that the values are compared as text, not JSON.
some_field[2].astext.in_(['0123', '345'])

Cast to int instead of decimal?

I have field that has up to 9 comma separated values each of which have a string value and a numeric value separated by colon. After parsing them all some of the values between 0 and 1 are being set to an integer rather than a numeric as cast. The problem is obviously related to data type but I am unsure what is causing it or how to fix it. The problem only exists in the case statement, the split_part function seems to be working perfect.
Things I have tried:
nvl(split_part(one,':',2),0) = COALESCE types text and integer cannot be matched
nvl(split_part(one,':',2)::numeric,0) => Invalid input syntax for type numeric
numerous other cast/convert variations
(CASE WHEN split_part(one,':',2) = '' THEN 0::numeric ELSE split_part(one,':',2)::numeric END)::numeric => runs but get int value of 0
When using the split_part function outside of case statement it does work correctly. However, I need the result to be zero for null values.
split_part(one,':',2) => 0.02068278096187390979 (expected result)
When running the code above I get zero but expect 0.02068278096187390979
Field "one" has the following value 'xyz: 0.02068278096187390979' before the split_part function.
EXAMPLE:
create table test(one varchar);
insert into test values('XYZ: 0.50000000000000000000')
select
one ,split_part(one,':',2) as correct_value_for_those_that_are_not_null ,
case
when split_part(one,':',2) = '' then null
else split_part(one,':',2)::numeric
end::numeric as this_one_is_the_problem
from test
However, I need the result to be zero for null values.
Your example does not deal with NULL values at all, though. Only addressing the empty string ('').
To replace either with 0 reliably, efficiently and without casting issues:
SELECT part1, CASE WHEN part2 <> '' THEN part2::numeric ELSE numeric '0' END AS part2
FROM (
SELECT split_part(one, ':', 1) AS part1
, split_part(one, ':', 2) AS part2
FROM test
) sub;
See:
Best way to check for "empty or null value"
Also note that all SQL CASE branches must agree on a common data type. There have been minor adjustments in the logic that determines the resulting type in the past, so the version of Postgres may play a role in corner cases. Don't recall the details now.
nvl()is not a Postgres function. You probably meant COALESCE. The manual:
This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

remove non-numeric characters in a column (character varying), postgresql (9.3.5)

I need to remove non-numeric characters in a column (character varying) and keep numeric values in postgresql 9.3.5.
Examples:
1) "ggg" => ""
2) "3,0 kg" => "3,0"
3) "15 kg." => "15"
4) ...
There are a few problems, some values are like:
1) "2x3,25"
2) "96+109"
3) ...
These need to remain as is (i.e when containing non-numeric characters between numeric characters - do nothing).
Using regexp_replace is more simple:
# select regexp_replace('test1234test45abc', '[^0-9]+', '', 'g');
regexp_replace
----------------
123445
(1 row)
The ^ means not, so any character that is not in the range 0-9 will be replaced with an empty string, ''.
The 'g' is a flag that means all matches will be replaced, not just the first match.
For modifying strings in PostgreSQL take a look at The String functions and operators section of the documentation. Function substring(string from pattern) uses POSIX regular expressions for pattern matching and works well for removing different characters from your string.
(Note that the VALUES clause inside the parentheses is just to provide the example material and you can replace it any SELECT statement or table that provides the data):
SELECT substring(column1 from '(([0-9]+.*)*[0-9]+)'), column1 FROM
(VALUES
('ggg'),
('3,0 kg'),
('15 kg.'),
('2x3,25'),
('96+109')
) strings
The regular expression explained in parts:
[0-9]+ - string has at least one number, example: '789'
[0-9]+.* - string has at least one number followed by something, example: '12smth'
([0-9]+.\*)* - the string similar to the previous line zero or more times, example: '12smth22smth'
(([0-9]+.\*)*[0-9]+) - the string from the previous line zero or more times and at least one number at the end, example: '12smth22smth345'

invalid input syntax for integer: "9Na_(2)SO_(4)"

I'm trying to insert a alphanumeric value in a table:
INSERT INTO solution (solution, nextsolution) VALUES
('9Na_(2)SO_(4)', NULL), ('2Ni(OH)_(3)', (SELECT id FROM solution WHERE solution='9Na_(2)SO_(4)' & nextsolution=null));
solution is of type text and nextsolution is an integer. Unfortunately postgresql doesn't allow me to do the WHERE clause. It gives me the error:
ERROR: invalid input syntax for integer: "9Na_(2)SO_(4)"
LINE 9: ...OH)_(3)', (SELECT id FROM solution WHERE solution='9Na_(2)SO...
How can I solve this?
The issue is that the statement in the where clause: '9Na_(2)SO_(4)' & nextsolution=null tries to do a bitwise and (&) operation on the string and this won't work (and probably isn't what you want anyway).
Looking at your query I think what you want is to first insert the value '9Na_(2)SO_(4)' and then the value '2Ni(OH)_(3)' with the id of the previous inserted row.
You need to do this as two statements and use a different syntax. This should do what you want:
INSERT INTO solution (solution, nextsolution) VALUES (
'9Na_(2)SO_(4)',
NULL
);
INSERT INTO solution (solution, nextsolution) VALUES (
'2Ni(OH)_(3)',
(SELECT id FROM solution WHERE solution='9Na_(2)SO_(4)' and nextsolution is null)
);
You need to use AND instead of & to join your WHERE clause - an ampersand (&) is used for bitwise operations.