What are good ways to add a constraint to PostgreSQL to check that exactly one column (from a set of columns) contains a non-null value?
Update: It is likely that I want to use a check expression as detailed in Create Table and Alter Table.
Update: I'm looking through the available functions.
Update: Just for background, here is the Rails validation logic I'm currently using:
validate :multi_column_validation
def multi_column_validation
n = 0
n += 1 if column_1
n += 1 if column_2
n += 1 if column_3
unless 1 == n
errors.add(:base, "Exactly one column from " +
"column_1, column_2, column_3 must be present")
end
end
To be clear, I'm looking for PSQL, not Ruby, here. I just wanted to show the logic I'm using since it is more compact than enumerating all "truth table" possibilities.
Since PostgreSQL 9.6 you have the num_nonnulls and num_nulls comparison functions that accept any number of VARIADIC arguments.
For example, this would make sure exactly one of the three columns is not null.
ALTER TABLE your_table
ADD CONSTRAINT chk_only_one_is_not_null CHECK (num_nonnulls(col1, col2, col3) = 1);
History & References
The PostgreSQL 9.6.0 Release Notes from 2016-09-29 say:
Add variadic functions num_nulls() and num_nonnulls() that count the number of their arguments that are null or non-null (Marko Tiikkaja)
On 2015-08-12, Marko Tiikkaja proposed this feature on the pgsql-hacker mailing list:
I'd like to suggest $SUBJECT for inclusion in Postgres 9.6. I'm sure everyone would've found it useful at some point in their lives, and the fact that it can't be properly implemented in any language other than C I think speaks for the fact that we as a project should provide it.
A quick and dirty proof of concept (patch attached):
=# select count_nulls(null::int, null::text, 17, 'bar');
count_nulls
-------------
2
(1 row)
Its natural habitat would be CHECK constraints, e.g:
CHECK (count_nulls(a,b,c) IN (0, 3))
Avid code historians can follow more discussion from that point. :)
2021-09-17 update: As of today, gerardnll provides a better answer (the best IMO):
"Since PostgreSQL 9.6 you have the num_nonnulls and num_nulls comparison functions that accept any number of VARIADIC arguments."
In order to help people find the cleanest solution, I recommend you upvote gerardnll's answer.
(FYI, I'm the same person who asked the original question.)
Here is my original answer from 2013
Here is an elegant two column solution according to the "constraint -- one or the other column not null" PostgreSQL message board:
ALTER TABLE my_table ADD CONSTRAINT my_constraint CHECK (
(column_1 IS NULL) != (column_2 IS NULL));
(But the above approach is not generalizable to three or more columns.)
If you have three or more columns, you can use the truth table approach illustrated by a_horse_with_no_name. However, I consider the following to be easier to maintain because you don't have to type out the logical combinations:
ALTER TABLE my_table
ADD CONSTRAINT my_constraint CHECK (
(CASE WHEN column_1 IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN column_2 IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN column_3 IS NULL THEN 0 ELSE 1 END) = 1;
To compact this, it would be useful to create a custom function so that the CASE WHEN column_k IS NULL THEN 0 ELSE 1 END boilerplate could be removed, leaving something like:
(non_null_count(column_1) +
non_null_count(column_2) +
non_null_count(column_3)) = 1
That may be as compact as PSQL will allow (?). That said, I'd prefer to get to this kind of syntax if possible:
non_null_count(column_1, column_2, column_3) = 1
I think the most clean and generic solution is to create a function to count the null values from some arguments. For that you can use the pseudo-type anyarray and a SQL function like that:
CREATE FUNCTION count_not_nulls(p_array anyarray)
RETURNS BIGINT AS
$$
SELECT count(x) FROM unnest($1) AS x
$$ LANGUAGE SQL IMMUTABLE;
With that function, you can create your CHECK CONSTRAINT as:
ALTER TABLE your_table
ADD chk_only_one_is_not_null CHECK(count_not_nulls(array[col1, col2, col3]) = 1);
This will only work if the columns are of the same data type. If it's not the case, you can cast them, as text for instance (as you just care for the null case):
ALTER TABLE your_table
ADD chk_only_one_is_not_null CHECK(count_not_nulls(array[col1::text, col2::text, col3::text]) = 1);
As well remembered by #muistooshort, you can create the function with variadic arguments, which makes it clear to call:
CREATE FUNCTION count_not_nulls(variadic p_array anyarray)
RETURNS BIGINT AS
$$
SELECT count(x) FROM unnest($1) AS x
$$ LANGUAGE SQL IMMUTABLE;
ALTER TABLE your_table
ADD chk_only_one_is_not_null CHECK(count_not_nulls(col1, col2, col3) = 1);
As hinted by mu is too short:
alter table t
add constraint only_one_null check (
(col1 is not null)::integer + (col2 is not null)::integer = 1
)
A bit clumsy, but should do the trick:
create table foo
(
col1 integer,
col2 integer,
col3 integer,
constraint one_is_not_null check
( (col1 is not null and col2 is null and col3 is null)
or (col1 is null and col2 is not null and col3 is null)
or (col1 is null and col2 is null and col3 is not null)
)
)
Here's a solution using the built-in array functions:
ALTER TABLE your_table
ADD chk_only_one_is_not_null CHECK (array_length(array_remove(ARRAY[col1, col2, col3], NULL), 1) = 1);
Related
I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database
I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database
The end result of what I am after is a query that calls a function and that function returns a set of records that are in their own separate fields. I can do this but the results of the function are all in one field.
ie: http://i.stack.imgur.com/ETLCL.png and the results I am after are: http://i.stack.imgur.com/wqRQ9.png
Here's the code to create the table
CREATE TABLE tbl_1_hm
(
tbl_1_hm_id bigserial NOT NULL,
tbl_1_hm_f1 VARCHAR (250),
tbl_1_hm_f2 INTEGER,
CONSTRAINT tbl_1_hm PRIMARY KEY (tbl_1_hm_id)
)
-- do that for a few times to get some data
INSERT INTO tbl_1_hm (tbl_1_hm_f1, tbl_1_hm_f2)
VALUES ('hello', 1);
CREATE OR REPLACE FUNCTION proc_1_hm(id BIGINT)
RETURNS TABLE(tbl_1_hm_f1 VARCHAR (250), tbl_1_hm_f2 int AS $$
SELECT tbl_1_hm_f1, tbl_1_hm_f2
FROM tbl_1_hm
WHERE tbl_1_hm_id = id
$$ LANGUAGE SQL;
--And here is the current query I am running for my results:
SELECT t1.tbl_1_hm_id, proc_1_hm(t1.tbl_1_hm_id) AS t3
FROM tbl_1_hm AS t1
Thanks for having a read. Please if you want to haggle about the semantics of what I am doing by hitting the same table twice or my naming convention --> this is a simplified test.
When a function returns a set of records, you should treat it as a table source:
SELECT t1.tbl_1_hm_id, t3.*
FROM tbl_1_hm AS t1, proc_1_hm(t1.tbl_1_hm_id) AS t3;
Note that functions are implicitly using a LATERAL join (scroll down to sub-sections 4 and 5) so you can use fields from tables listed previously without having to specify an explicit JOIN condition.
How do I join 2 sets of records solely based on the default order?
So if I have a table x(col(1,2,3,4,5,6,7)) and another table z(col(a,b,c,d,e,f,g))
it will return
c1 c2
-- --
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Actually, I wanted to join a pair of one dimensional arrays from parameters and treat them like columns from a table.
Sample code:
CREATE OR REPLACE FUNCTION "Test"(timestamp without time zone[],
timestamp without time zone[])
RETURNS refcursor AS
$BODY$
DECLARE
curr refcursor;
BEGIN
OPEN curr FOR
SELECT DISTINCT "Start" AS x, "End" AS y, COUNT("A"."id")
FROM UNNEST($1) "Start"
INNER JOIN
(
SELECT "End", ROW_NUMBER() OVER(ORDER BY ("End")) rn
FROM UNNEST($2) "End" ORDER BY ("End")
) "End" ON ROW_NUMBER() OVER(ORDER BY ("Start")) = "End".rn
LEFT JOIN "A" ON ("A"."date" BETWEEN x AND y)
GROUP BY 1,2
ORDER BY "Start";
return curr;
END
$BODY$
Now, to answer the real question that was revealed in comments, which appears to be something like:
Given two arrays 'a' and 'b', how do I pair up their elements so I can get the element pairs as column aliases in a query?
There are a couple of ways to tackle this:
If and only if the arrays are of equal length, use multiple unnest functions in the SELECT clause (a deprecated approach that should only be used for backward compatibility);
Use generate_subscripts to loop over the arrays;
Use generate_series over subqueries against array_lower and array_upper to emulate generate_subscripts if you need to support versions too old to have generate_subscripts;
Relying on the order that unnest returns tuples in and hoping - like in my other answer and as shown below. It'll work, but it's not guaranteed to work in future versions.
Use the WITH ORDINALITY functionality added in PostgreSQL 9.4 (see also its first posting) to get a row number for unnest when 9.4 comes out.
Use multiple-array UNNEST, which is SQL-standard but which PostgreSQL doesn't support yet.
So, say we have function arraypair with array parameters a and b:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
-- blah code here blah
$$ LANGUAGE whatever IMMUTABLE;
and it's invoked as:
SELECT * FROM arraypair( ARRAY[1,2,3,4,5,6,7], ARRAY['a','b','c','d','e','f','g'] );
possible function definitions would be:
SRF-in-SELECT (deprecated)
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT unnest(a), unnest(b);
$$ LANGUAGE sql IMMUTABLE;
Will produce bizarre and unexpected results if the arrays aren't equal in length; see the documentation on set returning functions and their non-standard use in the SELECT list to learn why, and what exactly happens.
generate_subscripts
This is likely the safest option:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT
a[i], b[i]
FROM generate_subscripts(CASE WHEN array_length(a,1) >= array_length(b,1) THEN a::text[] ELSE b::text[] END, 1) i;
$$ LANGUAGE sql IMMUTABLE;
If the arrays are of unequal length, as written it'll return null elements for the shorter, so it works like a full outer join. Reverse the sense of the case to get an inner-join like effect. The function assumes the arrays are one-dimensional and that they start at index 1. If an entire array argument is NULL then the function returns NULL.
A more generalized version would be written in PL/PgSQL and would check array_ndims(a) = 1, check array_lower(a, 1) = 1, test for null arrays, etc. I'll leave that to you.
Hoping for pair-wise returns:
This isn't guaranteed to work, but does with PostgreSQL's current query executor:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (), c1.col
FROM unnest(a) c1(col)
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (), c2.col
FROM unnest(b) c2(col)
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
$$ LANGUAGE sql IMMUTABLE;
I would consider using generate_subscripts much safer.
Multi-argument unnest:
This should work, but doesn't because PostgreSQL's unnest doesn't accept multiple input arrays (yet):
SELECT * FROM unnest(a,b);
select x.c1, z.c2
from
x
inner join
(
select
c2,
row_number() over(order by c2) rn
from z
order by c2
) z on x.c1 = z.rn
order by x.c1
If x.c1 is not 1,2,3... you can do the same that was done with z
The middle order by is not necessary as pointed by Erwin. I tested it like this:
create table t (i integer);
insert into t
select ceil(random() * 100000)
from generate_series(1, 100000);
select
i,
row_number() over(order by i) rn
from t
;
And i comes out ordered. Before this simple test which I never executed I though it would be possible that the rows would be numbered in any order.
By "default order" it sounds like you probably mean the order in which the rows are returned by select * from tablename without an ORDER BY.
If so, this ordering is undefined. The database can return rows in any order that it feels like. You'll find that if you UPDATE a row, it probably moves to a different position in the table.
If you're stuck in a situation where you assumed tables had an order and they don't, you can as a recovery option add a row number based on the on-disk ordering of the tuples within the table:
select row_number() OVER (), *
from the_table
order by ctid
If the output looks right, I recommend that you CREATE TABLE a new table with an extra field, then do an INSERT INTO ... SELECT to insert the data ordered by ctid, then ALTER TABLE ... RENAME the tables and finally fix any foreign key references so they point to the new table.
ctid can be changed by autovacuum, UPDATE, CLUSTER, etc, so it is not something you should ever be using in applications. I'm using it here only because it sounds like you don't have any real ordering or identifier key.
If you need to pair up rows based on their on-disk ordering (an unreliable and unsafe thing to do as noted above), you could per this SQLFiddle try:
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c1.col
FROM c1
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c2.col
FROM c2
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
but never rely on this in a production app. If you're really stuck you can use this with CREATE TABLE AS to construct a new table that you can start with when you're working on recovering data from a DB that lacks a required key, but that's about it.
The same approach given above might work with an empty window clause () instead of (ORDER BY ctid) when using sets that lack a ctid, like interim results from functions. It's even less safe then though, and should be a matter of last resort only.
(See also this newer related answer: https://stackoverflow.com/a/17762282/398670)
I have a POstgreSQL 8.4.
I have a table and i want to find a string in one row (character varying datatype) of this table using substring (character varying datatype) returned by subquery:
SELECT uchastki.kadnum
FROM uchastki
WHERE kadnum LIKE (
SELECT str
FROM test
WHERE str IS NOT NULL)
But get a error
ERROR: more than one row returned by a subquery used as an expression
In field test.str i have strings like 66:07:21 01 001 in uchastki.kadnum 66:07:21 01 001:27.
How to find substring using results of subquery?
UPDATE
Table test:
CREATE TABLE test
(
id serial NOT NULL,
str character varying(255)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test OWNER TO postgres;
Table uchastki:
CREATE TABLE uchastki
(
fid serial NOT NULL,
the_geom geometry,
id_uch integer,
num_opora character varying,
kod_lep integer,
kadnum character varying,
sq real,
kod_type_opora character varying,
num_f11s integer,
num_opisanie character varying,
CONSTRAINT uchastki_pkey PRIMARY KEY (fid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2)
)
WITH (
OIDS=FALSE
);
ALTER TABLE uchastki OWNER TO postgres;
Use like any :
SELECT uchastki.kadnum
FROM uchastki
WHERE kadnum LIKE ANY(
SELECT str
FROM test
WHERE str IS NOT NULL)
Or perhaps:
SELECT uchastki.kadnum
FROM uchastki
WHERE kadnum LIKE ANY(
SELECT '%' || str || '%'
FROM test
WHERE str IS NOT NULL)
this is a nice feature, You can use different operators, for example = any (select ... ), or <> all (select...).
I'm going to take a wild stab in the dark and assume you mean that you want to match a string Sa from table A against one or more other strings S1 .. Sn from table B to find out if any of the other strings in S1 .. Sn is a substring of Sa.
A simple example to show what I mean (hint, hint):
Given:
CREATE TABLE tableA (string_a text);
INSERT INTO tableA(string_a) VALUES
('the manual is great'), ('Chicken chicken chicken'), ('bork');
CREATE TABLE tableB(candidate_str text);
INSERT INTO tableB(candidate_str) VALUES
('man'),('great'),('chicken');
I want the result set:
the manual is great
chicken chicken chicken
because the manual is great has man and great in it; and because chicken chicken chicken has chicken in it. There is no need to show the substring(s) that matched. bork doesn't match any substring so it is not found.
Here's a SQLFiddle with the sample data.
If so, shamelessly stealing #maniek's excellent suggestion, you would use:
SELECT string_a
FROM tableA
WHERE string_a LIKE ANY (SELECT '%'||candidate_str||'%' FROM tableB);
(Vote for #maniek please, I'm just illustrating how to clearly explain - I hope - what you want to achieve, sample data, etc).
(Note: This answer was written before further discussion clarified the poster's actual intentions)
It would appear highly likely that there is more than one str in test where str IS NOT NULL. That's why more than one row is returned by the subquery used as an expression, and, thus, why the statement fails.
Run the subquery stand-alone to see what it returns and you'll see. Perhaps you intended it to be a correlated subquery but forgot the outer column-reference? Or perhaps there's a column also called str in the outer table and you meant to write:
SELECT uchastki.kadnum
FROM uchastki
WHERE kadnum LIKE (
SELECT test.str
FROM test
WHERE uchastki.str IS NOT NULL)
?
(Hint: Consistently using table aliases on column references helps to avoid name-clash confusion).