I would like to select rows in a table in which a certain number of items in an array column meet a comparison condition (>= n). Is this possible without using unnest?
unnest() is a natural way to count filtered elements in an array.
However, you can hide this in an sql function like this:
create or replace function number_of_elements(arr int[], val int)
returns bigint language sql
as $$
select count(*)
from unnest(arr) e
where e > val;
$$;
with test(id, arr) as (
values
(1, array[1,2,3,4]),
(2, array[3,4,5,6]))
select id, arr, number_of_elements(arr, 3)
from test;
id | arr | number_of_elements
----+-----------+--------------------
1 | {1,2,3,4} | 1
2 | {3,4,5,6} | 3
(2 rows)
Related
I have a set of records in a table, with some records having invalid date. I wanted to ignore those invalid records and do a check with rest of the records. I framed a query like below but I don't find it working.
select * from tbl_name i
where is_date(i.dob) and i.dob::date > CURRENT_DATE;
I got to know that sql doesn't short circuit so it also consider invalid record and end up in date/time out of range. Please help me alter this query in a way i could eliminate invalid dates and do date comparison on valid dates only.
There is no guarantee for short-circuiting in Postgres. Neither in a "plain" WHERE clause, nor when using a derived table (from (select ...) where ...). One way to force the evaluation in two steps would be a materialized common table expressions:
with data as materialized (
select *
from tbl_name i
where is_date(i.dob)
)
select *
from data
where dob::date > CURRENT_DATE;
The materialized keyword prevents the optimizer from pushing the condition of the outer query into the CTE.
Obviously this assumes that is_date() will never return false positives
Using CASE in the WHERE to differentiate between a valid date and an invalid one and run the > comparison for valid date otherwise return FALSE.
create or replace function is_date(s varchar) returns boolean as $$
begin
if s is null then
return false;
end if;
perform s::date;
return true;
exception when others then
return false;
end;
$$ language plpgsql;
create table date_str (id integer, dt_str varchar);
insert into date_str values (1, '2022-11-02'), (2, '1234'), (3, '2022-12-03');
insert into date_str values (4, 'xyz'), (5, '2022-01-01'), (6, '2023-02-02');
select * from date_str;
id | dt_str
----+------------
1 | 2022-11-02
2 | 1234
3 | 2022-12-03
4 | xyz
5 | 2022-01-01
6 | 2023-02-02
select current_date;
current_date
--------------
11/02/2022
SELECT
*
FROM
date_str
WHERE
CASE WHEN is_date (dt_str) = 't' THEN
dt_str::date > CURRENT_DATE
ELSE
FALSE
END;
id | dt_str
----+------------
3 | 2022-12-03
6 | 2023-02-02
I have a column that has two types of values A or B. I need to find the count of values that come after the first occurrence of A.
eg
column
B
B
B
A
B
A
B
The result in this case would be 4 as their are 4 entries after the first occurrence of A(including A)
You can use a sub query to get the id number (of whatever column you are using to order the columns) of the first 'A'.
CREATE TABLE t (
id serial,
col char(1));
insert into t (col) values
('B'),
('B'),
('A'),
('B'),
('A'),
('B')
✓
6 rows affected
select
count(*) NUM
from t
where id >=
(select MIN(id)
from t
where col = 'A');
| num |
| --: |
| 4 |
db<>fiddle here
I need a function to insert rows because one column's (seriano) default value should be the same as PK id.
I have defined table:
CREATE SEQUENCE some_table_id_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
CREATE TABLE some_table
(
id bigint NOT NULL DEFAULT nextval('some_table_id_seq'::regclass),
itemid integer NOT NULL,
serialno bigint,
CONSTRAINT stockitem_pkey PRIMARY KEY (id),
CONSTRAINT stockitem_serialno_key UNIQUE (serialno)
);
and function to insert count of rows:
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
id bigint;
BEGIN
FOR counter IN 1..count LOOP
id := NEXTVAL( 'some_table_id_seq' );
INSERT INTO some_table (id, itemid, serialno) VALUES (id, itemid, id);
ids := array_append(ids, id);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
And inserting with it works fine:
$ select insert_item(123, 10);
insert_item
-------------
1
2
3
4
5
6
7
8
9
10
(10 rows)
$ select * from some_table;
id | itemid | serialno
----+--------+----------
1 | 123 | 1
2 | 123 | 2
3 | 123 | 3
4 | 123 | 4
5 | 123 | 5
6 | 123 | 6
7 | 123 | 7
8 | 123 | 8
9 | 123 | 9
10 | 123 | 10
(10 rows)
But if I want to use function insert_item as subquery, it seems not to work anymore:
$ select id, itemid from some_table where id in (select insert_item(123, 10));
id | itemid
----+--------
(0 rows)
I created dumb function insert_dumb to test in a subquery:
CREATE OR REPLACE FUNCTION insert_dumb(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
BEGIN
FOR counter IN 1..count LOOP
ids := array_append(ids, counter::bigint);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
and this works in a subquery as expected:
$ select id, itemid from some_table where id in (select insert_dumb(123, 10));
id | itemid
----+--------
1 | 123
2 | 123
3 | 123
4 | 123
5 | 123
6 | 123
7 | 123
8 | 123
9 | 123
10 | 123
(10 rows)
Why does insert_item function not insert new rows when called as subquery? I tried to add raise notice to the loop and it runs as expected shouting new id every time (and increasing the sequence), but no new rows are appended to the table.
I made all the setup available as fiddle
I am using Postgres 11 on Ubuntu.
EDIT
Of course, I let out my real reason, and it pays off...
I need the insert_item function returning ids, so I could use it in update-statement, like:
update some_table set some_text = 'x' where id in (select insert_item(123, 10);)
And addition to the why-question: it is understandable I can get no ids in return (because they share the same snapshot), but the function runs all the needed INSERTs without affecting the table. Shouldn't those rows be available in the next query?
The problem is that the subquery and the surrounding query share the same snapshot, that is, they see the same state of the database. Hence the outer query cannot see the rows inserted by the inner query.
See the documentation (which explains that in the context of WITH, although it also applies here):
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables.
In addition, there is a second problem with your approach: if you run EXPLAIN (ANALYZE) on your statement, you will find that the subquery is not executed at all! Since the table is empty, there is no id, and running the subquery is not necessary to calculate the (empty) result.
You will have to run that in two different statements. Or, better, do it in a different fashion: updating a row that you just inserted is unnecessarily wasteful.
Laurenz explained the visibility problem, but you don't need the sub-query at all if you re-write your function to return the actual table, rather than just he IDs
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1)
RETURNS setof some_table
AS
$func$
INSERT INTO some_table (id, itemid, serialno)
select NEXTVAL( 'some_table_id_seq' ), itemid, currval('some_table_id_seq')
from generate_series(1,count)
returning *;
$func$
LANGUAGE sql;
Then you can use it like this:
select id, itemid
from insert_item(123, 10);
And you get the complete inserted rows.
Online example
How to add a unique index on text array column.
I have a column in my Postgres table which contains sections.
+----+-----------+
| id | sections |
|----|-----------|
| 1 |['A', 'B'] |
+----+-----------+
| 2 |['A', 'A'] |
+----+-----------+
As you can see for id 2 I can insert two sections with the same text. I do not want to add duplicate text.
I do not want duplicate sections in my column.
Is there a way I can add an index on text array.
I saw the examples for int array but can't find anything for text array
I do not want to create the new function. I want to use the existing function in Postgres.
You can append into the sections column and unnest with distinct element like this:
update class set sections = array(
select distinct unnest(
array_append(
(select section from class where id = 2), 'A'))
where id = 2)
I like arays and do not always it good to normalize tables :-)
CREATE OR REPLACE FUNCTION is_not_unique(a int[]) RETURNS bool AS $f$
SELECT array_upper(a, 1) = array_upper(
(
SELECT array_agg(DISTINCT u)
FROM unnest(a) AS u
), 1);
$f$ LANGUAGE sql;
CREATE TEMP TABLE a (a int[], CHECK (is_not_unique(a)));
Test it:
# INSERT INTO a VALUES (ARRAY[1]);
INSERT 0 1
# INSERT INTO a VALUES (ARRAY[1, 2]);
INSERT 0 1
# INSERT INTO a VALUES (ARRAY[1, 1]);
ERROR: new row for relation "a" violates check constraint "a_a_check"
DETAIL: Failing row contains ({1,1}).
Here is what I ideally want. Imagine that I have a table with the row A.
I want to do:
SELECT A, func(A) FROM table
and for the output to have say 4 columns.
Is there any way to do this? I have seen things on custom types or whatever that let you sort of get a result that would look like
A,(B,C,D)
But it would be really great if I could have that one function return multiple columns without any more finagling.
Is there anything that can do something like this?
If the function func returns only 1 row with 3 values, such as:
CREATE OR REPLACE FUNCTION func
(
input_val integer,
OUT output_val1 integer,
OUT output_val2 integer,
OUT output_val3 integer
)
AS $$
BEGIN
output_val1 := input_val + 1;
output_val2 := input_val + 2;
output_val3 := input_val + 3;
END;
$$ LANGUAGE plpgsql;
and you then execute SELECT a, func(a) FROM table1 you'll get:
a | func
integer | record
========|==========
1 | (2, 3, 4)
2 | (3, 4, 5)
3 | (4, 5, 6)
but, if you execute:
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3
FROM (SELECT a, func(a) AS f FROM table1) AS x
you'll get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
or, using CTE (Common Table Expressions), if you execute:
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3 FROM temp
you'll also get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
Note: you may also use the following queries to obtain the same results:
SELECT a, (f).*
FROM (SELECT a, func(a) AS f FROM table1) AS x
or
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).* FROM temp
I agree with bambam's answer but would like to point out that JackPDouglas's more succinct syntax SELECT a, (func(a)).* FROM table1, from my tests, would actually execute the function once for each column returned whereas the CTE expression will only execute the function once. So the CTE expression is preferred if the function takes a long time to execute.
If the function always returns 3 columns, you can do something like that:
CREATE TYPE sometype AS (b INT, c TEXT, d TEXT);
CREATE OR REPLACE FUNCTION func(a TEXT) RETURNS SETOF sometype AS $$
BEGIN
RETURN QUERY EXECUTE 'SELECT b, c, d FROM ' || a;
END;
$$ LANGUAGE plpgsql;
SELECT a, (f).b, (f).c, (f).d
FROM (SELECT a, func(a) AS f FROM table) x;
If you can access the table from within a view, maybe you can create a view in some way
CREATE VIEW v AS
SELECT 'tab1' AS a, b, c, d FROM tab1 WHERE 'tab1' IN (SELECT a FROM table)
UNION
SELECT 'tab2' AS a, b, c, d FROM tab2 WHERE 'tab2' IN (SELECT a FROM table)
UNION
SELECT 'tab3' AS a, b, c, d FROM tab3 WHERE 'tab3' IN (SELECT a FROM table);
then it's just a SELECT * FROM v. But again this looks like Inheritance could be used.
I think you will want to return a single record, with multiple columns? In that case you can use the return-type RECORD for example. This will allow you to return an anonymous variable with as many columns as you want. You can find more information about all the different variables here:
http://www.postgresql.org/docs/9.0/static/plpgsql-declarations.html
And about return types:
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS
If you want to return multiple records with multiple columns, first check and see if you have to use a stored procedure for this. It might be an option to just use a VIEW (and query it with a WHERE-clause) instead. If that's not a good option, there is the possibility of returning a TABLE from a stored procedure in version 9.0.