How to insert into after the last row in a table? - postgresql

i have this table below named roombooking:
I wrote this code that inserts a new row into roombooking(dont mind the details, just the hotelbookingID):
CREATE OR REPLACE FUNCTION my_function(startdate date , enddate date,idForHotel integer)
RETURNS void AS
$$
BEGIN
INSERT INTO roombooking("hotelbookingID","roomID","bookedforpersonID"
,checkin,checkout,rate)
SELECT rb."hotelbookingID", r."idRoom", p."idPerson"
,startdate-integer'20', startdate-integer'10', rr.rate
FROM(SELECT "hotelbookingID" FROM roombooking
WHERE "hotelbookingID"=
(select "hotelbookingID"
from roombooking
order by "hotelbookingID" desc
limit 1)+1) rb,
(SELECT "idRoom" FROM room
WHERE "idHotel"=idForHotel) r ,
(SELECT "idPerson" FROM person
ORDER BY random()
LIMIT 1) p,
(SELECT rate FROM roomrate
WHERE "idHotel"=idForHotel) rr;
END;
$$
LANGUAGE 'plpgsql';
The problem here is that i want to insert after the last row based on the last hotelbookingID(it is in asc order)
My function works but as i guess it cant find the last row ,in order to perform the insertion after . (I think that the problem can be spotted here :
SELECT "hotelbookingID" FROM roombooking
WHERE "hotelbookingID"=
(select "hotelbookingID"
from roombooking
order by "hotelbookingID" desc
limit 1)+1)
Any help would be valuable. Thank you.

Any approach that uses a subquery to find the maximum existing id is doomed to suffer from race conditions: if two such INSERTs are running concurrently, they will end up with the same number.
Use an identity column:
ALTER TABLE roombooking
ALTER id ADD GENERATED ALWAYS AS IDENTITY (START 100000);
where 100000 is a value greater than the maximum id in the table.
Then all you have to do is not insert anything into id, and the column will be populated automatically.

That WHERE condition makes no sense. There is no row in the roombooking table whose id is 1 + the largest id in the roombooking table.
You simply want to add 1 to the inserted value:
INSERT INTO roombooking("hotelbookingID", …)
SELECT rb."hotelbookingID" + 1, …
-- ^^^^
FROM (
SELECT "hotelbookingID"
FROM roombooking
ORDER BY "hotelbookingID" DESC
LIMIT 1
) rb,
…
That said, I would recommend to simply use a sequence instead (if you don't care about occasional gaps). If you really need a continuous numbering, I wouldn't use order by+limit though. Just use an aggregate, and consider the case where the table is still empty:
INSERT INTO roombooking("hotelbookingID", …)
VALUES ( COALESCE((SELECT max("hotelbookingID") FROM roombooking), 0) + 1, …);

Related

Postgres 14 delete with count in where clause

I wanted to delete all records except the one with the highest value so I did
CREATE TABLE code (
id SERIAL,
name VARCHAR(255) NOT NULL ,
value int NOT NULL
);
INSERT INTO code (name,value) VALUES ('name',1);
INSERT INTO code (name,value) VALUES ('name',2);
INSERT INTO code (name,value) VALUES ('name',3);
INSERT INTO code (name,value) VALUES ('name1',3);
INSERT INTO code (name,value) VALUES ('name2',1);
INSERT INTO code (name,value) VALUES ('name2',3);
Example I want to delete all records except the one with the highest value on value column
I am expecting to get result as:
name 3
name1 3
name2 3
I tried doing
DELETE FROM code where value != (select MAX(value) value from code where count(code) > 1)
But I'm getting an error like:
ERROR: aggregate functions are not allowed in WHERE
LINE 1: ...value != (select MAX(value) value from code where count(code...
With everyone's idea and combine with this
SELECT dept, SUM(expense) FROM records
WHERE ROW(year, dept) IN (SELECT x, y FROM otherTable)
GROUP BY dept;
link
I was able to make the query I want
Demo
Your query makes no sense. Try this:
DELETE FROM code
where value <> (select value
FROM (SELECT count(*) AS count,
value
from code
GROUP BY value) AS q
ORDER BY count DESC
FETCH FIRST 1 ROWS ONLY);
The fast and easy solution would be:
BEGIN;
SELECT name,max(value) INTO temp t FROM code group by 1;
TRUNCATE code;
insert into code SELECT * FROM t;
END;
Or you can do like:
BEGIN;
DELETE FROM code USING (SELECT name,max(value) FROM code group by 1) a WHERE code.name=a.name AND code.value!=a.max;
END;

Check if the row is the last one when looping through result set in PL/pgSQL

Is something like this possible?
FOR row_var IN SELECT * FROM my_table LOOP
-- ...
IF is_last_row THEN
-- do something...
END IF;
END LOOP
The only thing that goes into my mind now is to select a count of the rows and compare it with row_number() in the loop.
There is a very cheap and simple way. Your row variable row_var still holds the last row after the loop ends. Just use it then:
FOR row_var IN
SELECT * FROM my_table ORDER BY ???
LOOP
-- do something for every row here
END LOOP;
-- do something with row_var for the last row here
Aside from that, there is often a more efficient solution with plain SQL, depending on the undisclosed details of your use case ...
If you selects rows ordered by a unique key, querying last_id should be cheaper than count() with row_number(), e.g.:
last_id := (select id from test order by id desc);
for rec in
select * from test order by id
loop
if rec.id = last_id then
...
You don't need new variable nor addition query for count().
If we assume you have primary key in table my_table in column id you can add one column with WINDOW FUNCTION:
FOR row_var IN SELECT lead(id) OVER() IS NULL AS is_last_row, * FROM my_table LOOP
-- ...
IF row_var.is_last_row THEN
-- do something...
END IF;
END LOOP
If you have composite PK or have not PK at all on that table you may use:
SELECT lead(r) OVER() IS NULL AS is_last_row, *
FROM (
SELECT 1 as r, * FROM ruch.s_icd10
) sub
as LOOP query. That's only idea I have.

PL/pgSQL function to randomly select an id

Goal:
pre-populate a table with a list of sequential id, from e.g. 1 to 1,000,000. The table has an additional column that is nillable. NULL values are marked as unassigned and non-NULL values are marked as assigned
have function i can call that asks for x number of randomly chosen ids from the table which have not been assigned.
This is for something quite specific and while I understand there are different ways of doing this, I'd like to know if there's a solution to the flaw in this particular implementation.
I have something that partially works, but wondering where the flaw in the function is.
Here's the table:
CREATE SEQUENCE accounts_seq MINVALUE 700000000001 NO MAXVALUE;
CREATE TABLE accounts (
id BIGINT PRIMARY KEY default nextval('accounts_seq'),
client VARCHAR(25), UNIQUE(id, client)
);
This function gen_account_ids is just a one-time setup to pre-populate the table with a fixed number of rows, all marked as unassigned.
/*
This function will insert new rows into the accounts table with ids being
generated by a sequence, and client being NULL. A NULL client indicates
the account has not yet been assigned.
*/
CREATE OR REPLACE FUNCTION gen_account_ids(bigint)
RETURNS INT AS $gen_account_ids$
DECLARE
-- count is the number of new accounts you want generated
count alias for $1;
-- rowcount is returned as the number of rows inserted
rowcount int;
BEGIN
INSERT INTO accounts(client) SELECT NULL FROM generate_series(1, count);
GET DIAGNOSTICS rowcount = ROW_COUNT;
RETURN rowcount;
END;
$gen_account_ids$ LANGUAGE plpgsql;
So, I use this to pre-populate the table with, say 1000 records:
SELECT gen_account_ids(1000);
The next function assign is meant to randomly select an unassigned id (unassigned means client column is null), and update it with a client value so it becomes assigned. It returns the number of rows affected.
It works sometimes, but I do believe there are collisions occurring -- which is why I tried for DISTINCT, but it often returns fewer than the desired number of rows. For example, if I select assign(100, 'foo'); it might return 95 rows instead of the desired 100.
How can I modify this to make it always return the exact desired rows?
/*
This will assign ids to a client randomly
#param int is the number of account numbers to generate
#param varchar(10) is a string descriptor for the client
#returns the number of rows affected -- should be the same as the input int
Call it like this: `SELECT * FROM assign(100, 'FOO')`
*/
CREATE OR REPLACE FUNCTION assign(INT, VARCHAR(10))
RETURNS INT AS $$
DECLARE
total ALIAS FOR $1;
clientname ALIAS FOR $2;
rowcount int;
BEGIN
UPDATE accounts SET client = clientname WHERE id IN (
SELECT DISTINCT trunc(random() * (
(SELECT max(id) FROM accounts WHERE client IS NULL) -
(SELECT min(id) FROM accounts WHERE client IS NULL)) +
(SELECT min(id) FROM accounts WHERE client IS NULL)) FROM generate_series(1, total));
GET DIAGNOSTICS rowcount = ROW_COUNT;
RETURN rowcount;
END;
$$ LANGUAGE plpgsql;
This is loosely based on this where you can do something like SELECT trunc(random() * (100 - 1) + 1) FROM generate_series(1,5); which will select 5 random numbers between 1 and 100.
My goal is to do something similar where I select a random id between the min and max unassigned rows, and mark it for update.
This isn't the best answer b/c it does involve full table scans, but in my situation, I'm not concerned about the performance, and it works. This is based off #CraigRinger's reference to the blog post getting random tuples
I'd be generally interested in hearing about other (perhaps better) solutions -- and am specifically curious about why the original solution falls short, and what #klin also devised.
So, here's my brute force random order solution:
-- generate a million unassigned rows with null client column
insert into accounts(client) select null from generate_series(1, 1000000);
-- assign 1000 random rows to client 'foo'
update accounts set client = 'foo' where id in
(select id from accounts where client is null order by random() limit 1000);
Because ids of random subset of rows are not consecutive, select a random row_number() instead of random id.
with nulls as ( -- base query
select id
from accounts
where client is null
),
randoms as ( -- calculate random int in range 1..count(nulls.*)
select trunc(random()* (count(*) - 1) + 1)::int random_value
from nulls
),
row_numbers as ( -- add row numbers to nulls
select id, row_number() over (order by id) rn
from nulls
)
select id
from row_numbers, randoms
where rn = random_value; -- random row number
A function is not necessary here, but you can easily place the query in a function body if needed.
This query updates 5 random rows with null client.
update accounts
set client = 'new value' -- <-- clientname
where id in (
with nulls as ( -- base query
select id
from accounts
where client is null
),
randoms as ( -- calculate random int in range 1..count(nulls.*)
select i, trunc(random()* (count(*) - 1) + 1)::int random_value
from nulls
cross join generate_series(1, 5) i -- <-- total
group by 1
),
row_numbers as ( -- add row numbers to nulls in order by id
select id, row_number() over (order by id) rn
from nulls
)
select id
from row_numbers, randoms
where rn = random_value -- random row number
)
However, there is no certainty that the query will update exactly 5 rows, because
select trunc(random()* (max_value - 1) + 1)::int
from generate_series(1, n)
is not a correct way to generate n different random values. The probability of repetitions increases with the quotient n / max_value.

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

In SQL Server 2000, how to delete the specified rows in a table that does not have a primary key?

Let's say we have a table with some data in it.
IF OBJECT_ID('dbo.table1') IS NOT NULL
BEGIN
DROP TABLE dbo.table1;
END
CREATE TABLE table1 ( DATA INT );
---------------------------------------------------------------------
-- Generating testing data
---------------------------------------------------------------------
INSERT INTO dbo.table1(data)
SELECT 100
UNION ALL
SELECT 200
UNION ALL
SELECT NULL
UNION ALL
SELECT 400
UNION ALL
SELECT 400
UNION ALL
SELECT 500
UNION ALL
SELECT NULL;
How to delete the 2nd, 5th, 6th records in the table? The order is defined by the following query.
SELECT data
FROM dbo.table1
ORDER BY data DESC;
Note, this is in SQL Server 2000 environment.
Thanks.
In short, you need something in the table to indicate sequence. The "2nd row" is a non-sequitur when there is nothing that enforces sequence. However, a possible solution might be (toy example => toy solution):
If object_id('tempdb..#NumberedData') Is Not Null
Drop Table #NumberedData
Create Table #NumberedData
(
Id int not null identity(1,1) primary key clustered
, data int null
)
Insert #NumberedData( data )
SELECT 100
UNION ALL SELECT 200
UNION ALL SELECT NULL
UNION ALL SELECT 400
UNION ALL SELECT 400
UNION ALL SELECT 500
UNION ALL SELECT NULL
Begin Tran
Delete table1
Insert table1( data )
Select data
From #NumberedData
Where Id Not In(2,5,6)
If ##Error <> 0
Commit Tran
Else
Rollback Tran
Obviously, this type of solution is not guaranteed to work exactly as you want but the concept is the best you will get. In essence, you stuff your rows into a table with an identity column and use that to identify the rows to remove. Removing the rows entails emptying the original table and re-populating with only the rows you want. Without a unique key of some kind, there just is no clean way of handling this problem.
As you are probably aware you can do this in later versions using row_number very straightforwardly.
delete t from
(select ROW_NUMBER() over (order by data) r from table1) t
where r in (2,5,6)
Even without that it is possible to use the undocumented %%LOCKRES%% function to differentiate between 2 identical rows
SELECT data,%%LOCKRES%%
FROM dbo.table1`
I don't think that's available in SQL Server 2000 though.
In SQL Sets don't have order but cursors do so you could use something like the below. NB: I was expecting to be able to use DELETE ... WHERE CURRENT OF but that relies on a PK so the code to delete a row is not as simple as I was hoping for.
In the event that the data to be deleted is a duplicate then there is no guarantee that it will delete the same row as CURRENT OF would have. However in this eventuality the ordering of the tied rows is arbitrary anyway so whichever row is deleted could equally well have been given that row number in the cursor ordering.
DECLARE #RowsToDelete TABLE
(
rowidx INT PRIMARY KEY
)
INSERT INTO #RowsToDelete SELECT 2 UNION SELECT 5 UNION SELECT 6
DECLARE #PrevRowIdx int
DECLARE #CurrentRowIdx int
DECLARE #Offset int
SET #CurrentRowIdx = 1
DECLARE #data int
DECLARE ordered_cursor SCROLL CURSOR FOR
SELECT data
FROM dbo.table1
ORDER BY data
OPEN ordered_cursor
FETCH NEXT FROM ordered_cursor INTO #data
WHILE EXISTS(SELECT * FROM #RowsToDelete)
BEGIN
SET #PrevRowIdx = #CurrentRowIdx
SET #CurrentRowIdx = (SELECT TOP 1 rowidx FROM #RowsToDelete ORDER BY rowidx)
SET #Offset = #CurrentRowIdx - #PrevRowIdx
DELETE FROM #RowsToDelete WHERE rowidx = #CurrentRowIdx
FETCH RELATIVE #Offset FROM ordered_cursor INTO #data
/*Can't use DELETE ... WHERE CURRENT OF as here that requires a PK*/
SET ROWCOUNT 1
DELETE FROM dbo.table1 WHERE (data=#data OR data IS NULL OR #data IS NULL)
SET ROWCOUNT 0
END
CLOSE ordered_cursor
DEALLOCATE ordered_cursor
To perform any action on a set of rows (such as deleting them), you need to know what identifies those rows.
So, you have to come up with criteria that identifies the rows you want to delete.
Providing a toy example, like the one above, is not particularly useful.
You plan ahead and if you anticipate this is possible you add a surrogate key column or some such.
In general you make sure you don't create tables without PK's.
It's like asking "Say I don't look both directions before crossing the road and I step in front of a bus."