Rownum in postgresql - postgresql

Is there any way to simulate rownum in postgresql ?

Postgresql > 8.4
SELECT
row_number() OVER (ORDER BY col1) AS i,
e.col1,
e.col2,
...
FROM ...

Postgresql have limit.
Oracle's code:
select *
from
tbl
where rownum <= 1000;
same in Postgresql's code:
select *
from
tbl
limit 1000

I have just tested in Postgres 9.1 a solution which is close to Oracle ROWNUM:
select row_number() over() as id, t.*
from information_schema.tables t;

If you just want a number to come back try this.
create temp sequence temp_seq;
SELECT inline_v1.ROWNUM,inline_v1.c1
FROM
(
select nextval('temp_seq') as ROWNUM, c1
from sometable
)inline_v1;
You can add a order by to the inline_v1 SQL so your ROWNUM has some sequential meaning to your data.
select nextval('temp_seq') as ROWNUM, c1
from sometable
ORDER BY c1 desc;
Might not be the fastest, but it's an option if you really do need them.

If you have a unique key, you may use COUNT(*) OVER ( ORDER BY unique_key ) as ROWNUM
SELECT t.*, count(*) OVER (ORDER BY k ) ROWNUM
FROM yourtable t;
| k | n | rownum |
|---|-------|--------|
| a | TEST1 | 1 |
| b | TEST2 | 2 |
| c | TEST2 | 3 |
| d | TEST4 | 4 |
DEMO

I think it's possible to mimic Oracle rownum using temporary sequences.
create or replace function rownum_seq() returns text as $$
select concat('seq_rownum_',replace(uuid_generate_v4()::text,'-','_'));
$$ language sql immutable;
create or replace function rownum(r record, v_seq_name text default rownum_seq()) returns bigint as $$
declare
begin
return nextval(v_seq_name);
exception when undefined_table then
execute concat('create temporary sequence ',v_seq_name,' minvalue 1 increment by 1');
return nextval(v_seq_name);
end;
$$ language plpgsql volatile;
Demo:
select ccy_code,rownum(a.*) from (select ccy_code from currency order by ccy_code desc) a where rownum(a.*)<10;
Gives:
ZWD 1
ZMK 2
ZBH 3
ZAR 4
YUN 5
YER 6
XXX 7
XPT 8
XPF 9
Explanations:
Function rownum_seq() is immutable, called only once by PG in a query, so we get the same unique sequence name (even if the function is called thousand times in the same query)
Function rownum() is volatile and called each time by PG (even in a where clause)
Without r record parameter (which is unused), the function rownum() could be evaluated too early. That's the tricky point. Imagine, the following rownum() function:
create or replace function rownum(v_seq_name text default rownum_seq()) returns bigint as $$
declare
begin
return nextval(v_seq_name);
exception when undefined_table then
execute concat('create temporary sequence ',v_seq_name,' minvalue 1 increment by 1');
return nextval(v_seq_name);
end;
$$ language plpgsql volatile;
explain select ccy_code,rownum() from (select ccy_code from currency order by ccy_code desc) a where rownum()<10
Sort (cost=56.41..56.57 rows=65 width=4)
Sort Key: currency.ccy_code DESC
-> Seq Scan on currency (cost=0.00..54.45 rows=65 width=4)
Filter: (rownum('649aec1a-d512-4af0-87d8-23e8d8a9d982'::text) < 10)
PG apply the filter before the order. Damned!
With the first unused parameter, we force PG to order before filter:
explain select * from (select ccy_code from currency order by ccy_code desc) a where rownum(a.*)<10;
Subquery Scan on a (cost=12.42..64.36 rows=65 width=4)
Filter: (rownum(a.*, 'seq_rownum_43b5c67f_dd64_4191_b29c_372061c848d6'::text) < 10)
-> Sort (cost=12.42..12.91 rows=196 width=4)
Sort Key: currency.ccy_code DESC
-> Seq Scan on currency (cost=0.00..4.96 rows=196 width=4)
Pros:
works as an expression or in a where clause
easy to use: just pass the first record.* you have in the from
Cons:
a temporary sequence is created for each rownum() encountered, but it is removed when session ends.
performance (to discuss, row_number() over () versus nextval)

Postgresql does not have an equivalent of Oracle's ROWNUM.
In many cases you can achieve the same result by using LIMIT and OFFSET in your query.

use the limit clausule, with the offset to choose the row number -1 so if u wanna get the number 8 row so use:
limit 1 offset 7

Related

Postgresql strange behavior with update trigger

I have a table1: id int, id_2 int, date timestamp, vec float[]
And table2 : id int, vec float[]
My target is to create trigger on update of table1 which will take last 10 (by date) rows for id_2, take average of vectors by first axis(10 x N -> N) and write it to table2 under id = id_2.
My code:
CREATE OR REPLACE FUNCTION public.foo()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
WITH rows AS (
SELECT DISTINCT t1.id_2, t2.id, t2.vec, t2.date, DENSE_RANK() OVER (PARTITION BY t1.id_2 ORDER BY t2.date desc) AS counter
FROM new_table AS t1
LEFT JOIN table1 t2 ON t1.id_2 = t2.id_2
),
elements_average AS (
SELECT id_2, AVG(unnest::float) AS av
FROM rows,
unnest(vec) with ORDINALITY
WHERE counter < 11
GROUP BY id_2, ORDINALITY
ORDER BY ORDINALITY
),
avr AS (
SELECT id_2, array_agg(av::float) AS averages
FROM elements_average
GROUP BY id_2
)
UPDATE table2 SET vec = averages FROM avr WHERE table2.id = user_av.id_2;
RETURN NULL;
END;
$function$
;
CREATE TRIGGER foo_trigger AFTER
UPDATE
ON
public.table1 REFERENCING NEW TABLE AS new_table FOR EACH STATEMENT EXECUTE FUNCTION foo()
The problem: when I update few rows in table1 with different id_2 in one transactions a value in table2 becomes wrong. Not the average.
What's even more strange is that this code gives correct values in same situation:
...
avr AS (
SELECT id_2, array_agg(av::float) AS averages
FROM elements_average
GROUP BY id_2
),
strange_thing AS (
SELECT * from elements_average
)
UPDATE table2 SET vec = averages FROM avr WHERE table2.id = user_av.id_2;
RETURN NULL;
END;
$function$
;
So, small meaningless and unimportant SELECT changes the behavior of the function. Is it a bug of postgres or my fault?

Postgresql dynamic dataset aggregation

I'm trying to aggregate multiple datasets where I have to be able to create dynamic rules on how to aggregate the data on an id basis. Below is a quick approach I wrote up that seems to do what I intended. Is there a more performant (and preferably safe) way of doing this. tbl1...n will be larger in size than agg_rule. I'm running postgres 13.0, if there are new features coming in 14 that'll help, that could also be of interest. I am only interesting in coalescing and sum:ing like below if that simplifies the problem.
CREATE TABLE tbl1 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl2 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE tbl3 AS
SELECT 1 id
, 1 seq
, 1 val;
CREATE TABLE agg_rule AS
SELECT 1 id
, 'coalesce(tbl1.val, tbl2.val + tbl3.val)' expr;
CREATE OR REPLACE FUNCTION eval(_id INTEGER, _seq INTEGER)
RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE _res INTEGER;
BEGIN
EXECUTE 'SELECT ' || (
SELECT expr
FROM agg_rule
WHERE id = _id
) || '
FROM tbl1
FULL JOIN tbl2 USING (id, seq)
FULL JOIN tbl3 USING (id, seq)
WHERE id = ' || _id || '
AND seq = ' || _seq || ';' INTO _res;
RETURN _res;
END
$$;

Return the number of rows with a where condition after an INSERT INTO? postgresql

I have a table that regroups some users and which event (as in IRL event) they've joined.
I have set up a server query that lets a user join an event.
It goes like this :
INSERT INTO participations
VALUES(:usr,:event_id)
I want that statement to also return the number of people who have joined the same event as the user. How do I proceed? If possible in one SQL statement.
Thanks
You can use a common table expression like this to execute it as one query.
with insert_tbl_statement as (
insert into tbl values (4, 1) returning event_id
)
select (count(*) + 1) as event_count from tbl where event_id = (select event_id from insert_tbl_statement);
see demo http://rextester.com/BUF16406
You can use a function, I've set up next example, but keep in mind you must add 1 to the final count because still transaction hasn't been committed.
create table tbl(id int, event_id int);
✓
insert into tbl values (1, 2),(2, 2),(3, 3);
3 rows affected
create function new_tbl(id int, event_id int)
returns bigint as $$
insert into tbl values ($1, $2);
select count(*) + 1 from tbl where event_id = $2;
$$ language sql;
✓
select new_tbl(4, 2);
| new_tbl |
| ------: |
| 4 |
db<>fiddle here

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

Duplicate single database record

Hello what is the easiest way to duplicate a DB record over the same table?
My problem is that the table where I am doing this has many column, like 100+, and I don't like how the solution looks like. Here is what I do (this is inside plpqsql function):
...
1. duplicate record
INSERT INTO history
(SELECT NEXTVAL('history_id_seq'), col_1, col_2, ... , col_100)
FROM history
WHERE history_id = 1234
ORDER BY datetime DESC
LIMIT 1)
RETURNING
history_id INTO new_history_id;
2. update some columns
UPDATE history
SET
col_5 = 'test_5',
col_23 = 'test_23',
datetime = CURRENT_TIMESTAMP
WHERE history_id = new_history_id;
Here are the problems I am attempting to solve
Listing all these 100+ columns looks lame
When new column is added eventually the function should be updated too
On separate DB instances the column order might differ, which would cause the function fail
I am not sure if I can list them once more (solving issue 3) like insert into <table> (<columns_list>) values (<query>) but then the query looks even uglier.
I would like to achieve something like 'insert into ', but this seems impossible the unique primary key constraint will raise a duplication error.
Any suggestions?
Thanks in advance for you time.
This isn't pretty or particularly optimized but there are a couple of ways to go about this. Ideally, you might want to do this all in an UPDATE trigger though you could implement a duplication function something like this:
-- create source table
CREATE TABLE history (history_id serial not null primary key, col_2 int, col_3 int, col_4 int, datetime timestamptz default now());
-- add some data
INSERT INTO history (col_2, col_3, col_4)
SELECT g, g * 10, g * 100 FROM generate_series(1, 100) AS g;
-- function to duplicate record
CREATE OR REPLACE FUNCTION fn_history_duplicate(p_history_id integer) RETURNS SETOF history AS
$BODY$
DECLARE
cols text;
insert_statement text;
BEGIN
-- build list of columns
SELECT array_to_string(array_agg(column_name::name), ',') INTO cols
FROM information_schema.columns
WHERE (table_schema, table_name) = ('public', 'history')
AND column_name <> 'history_id';
-- build insert statement
insert_statement := 'INSERT INTO history (' || cols || ') SELECT ' || cols || ' FROM history WHERE history_id = $1 RETURNING *';
-- execute statement
RETURN QUERY EXECUTE insert_statement USING p_history_id;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql';
-- test
SELECT * FROM fn_history_duplicate(1);
history_id | col_2 | col_3 | col_4 | datetime
------------+-------+-------+-------+-------------------------------
101 | 1 | 10 | 100 | 2013-04-15 14:56:11.131507+00
(1 row)
As I noted in my original comment, you might also take a look at the colnames extension as an alternative to querying the information schema.
You don't need the update anyway, you can supply the constant values directly in the SELECT statement:
INSERT INTO history
SELECT NEXTVAL('history_id_seq'),
col_1,
col_2,
col_3,
col_4,
'test_5',
...
'test_23',
...,
col_100
FROM history
WHERE history_sid = 1234
ORDER BY datetime DESC
LIMIT 1
RETURNING history_sid INTO new_history_sid;