How to create multiple temp tables using records from a CTE that I need to call multiple times in Postgres plpgsql Procedure? - postgresql

UPDATE:
I am using the CTE because I am using a LOOP to loop in batches of 10000.
I am already using a CTE expression within a plpgsql Procedure to grab some Foreign Keys from (1) specific table, we can call it master_table. I created a brand new table, we can call this table table_with_fks, in my DDL statements so this table holds the FKs I am fetching and saving.
I later take these FKs from my table_with_fks and JOIN on my other tables in my database to get the entire original record (the full record with all columns from its corresponding table) and insert it into an archive table.
I have an awesome lucid chart I drew that might make what I say down below make much more sense:
My CTE example:
LOOP
EXIT WHEN some_condition;
WITH fk_list_cte AS (
SELECT mt.fk1, mt.fk2, mt.fk3, mt.fk4
FROM master_table mt
WHERE mt.created_date < now() - interval '365' // archive record if >= 1 year old
LIMIT 10000
)
INSERT INTO table_with_fks (SELECT * FROM fk_list_cte);
commit;
END LOOP;
Now, I have (4) other Procedures that JOIN on each FK in this table_with_fks with its parent table that it references. I do this because as I said, I only got the FK at first, and I don't have all the original columns for the record. So I will do something like
LOOP
EXIT WHEN some_condition;
WITH full_record_cte AS (
SELECT *
FROM table_with_fks fks
JOIN parent_table1 pt1
ON fks.fk1 = pt1.id
LIMIT 10000),
INSERT INTO (select * from full_record_cte);
commit;
END LOOP;
NOW, what I want to do, is instead of having to RE-JOIN 4 times later on these FK's that are found in my table_with_fks, I want to use the first CTE fk_list_cte to JOIN on the parent tables right away and grab the full record from each (4) tables and put it in some TEMP postgres table. I think I will need (4) unique TEMP tables, as I don't know how it would work if I combine all their data into one BIG table, because each table has different data/different columns.
Is there a way to use the original CTE fk_list_cte and call it multiple times in succession and CREATE 4 TEMP tables right after, that all use the original CTE? example:
LOOP
EXIT WHEN some_condition;
WITH fk_list_cte AS (
SELECT mt.fk1, mt.fk2, mt.fk3, mt.fk4
FROM master_table mt
WHERE mt.created_date < now() - interval '365' // archive record if >= 1 year old
LIMIT 10000
),
WITH fetch_fk1_original_record_from_parent AS (
SELECT *
FROM fk_list_cte cte
JOIN parent_table1 pt1
ON cte.fk1 = pt1.id
),
WITH fetch_fk2_original_record_from_parent AS (
SELECT *
FROM fk_list_cte cte
JOIN parent_table2 pt2
ON cte.fk2 = pt2.id
),
WITH fetch_fk3_original_record_from_parent AS (
SELECT *
FROM fk_list_cte cte
JOIN parent_table3 pt3
ON cte.fk3 = pt3.id
),
WITH fetch_fk4_original_record_from_parent AS (
SELECT *
FROM fk_list_cte cte
JOIN parent_table4 pt4
ON cte.fk4 = pt4.id
),
CREATE TEMPORARY TABLE fk1_tmp_tbl AS (
SELECT *
FROM fetch_fk1_original_record_from_parent
)
CREATE TEMPORARY TABLE fk2_tmp_tbl AS (
SELECT *
FROM fetch_fk2_original_record_from_parent
)
CREATE TEMPORARY TABLE fk3_tmp_tbl AS (
SELECT *
FROM fetch_fk3_original_record_from_parent
)
CREATE TEMPORARY TABLE fk4_tmp_tbl AS (
SELECT *
FROM fetch_fk4_original_record_from_parent
);
END LOOP;
I know the 4 CREATE TEMPORARY TABLE statements definitely won't work, (can I create 4 temp tables simultaneously/at once?) . Does anyone see the logic of what I am trying to do here and can help me?

Related

Sum and average total columns in PostgreSQL

I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.

Temp table intermittent issue

We used to have a really badly performing CTE that used to take 3-5 mins to execute. I have modified the CTE and used a function with temp tables to accomplish the same task. Now the new function runs in less than 5 secs.
This is what I did:
From
WITH CTE1 AS (
SELECT ...
FROM ...
),
CTE2 AS (
SELECT ...
FROM ...
),
CTEn AS (
SELECT ...
FROM ...
)
SELECT A,B,C,D
FROM CTE1
JOIN CTE2 ON ...
JOIN CTEn ON ...;
TO
CREATE OR REPLACE FUNCTION FUNC_ABC(a integer)
RETURNS TABLE(A integer, B integer, C integer, D integer)
LANGUAGE plpgsql
AS $function$
DECLARE
x ALIAS for $1;
BEGIN
DROP TABLE IF EXISTS CTE1;
DROP TABLE IF EXISTS CTE2;
DROP TABLE IF EXISTS CTEn;
CREATE TEMP TABLE CTE1 AS
( SELECT ...
FROM ...
);
CREATE TEMP TABLE CTE2 AS
( SELECT ...
FROM ...);
CREATE TEMP TABLE CTEn AS
( SELECT ...
FROM ...);
CREATE INDEX ix_cte1 ON CTE1(A);
CREATE INDEX ix_cte2 ON CTE2(B);
CREATE INDEX ix_cten ON CTEn(C);
CREATE INDEX ix_cten ON CTEn(D);
RETURNS QUERY SELECT A,B,C,D
FROM CTE1
JOIN CTE2 ON ...
JOIN CTEn ON ...
END;
$function$
;
As I stated above, the function pretty fast. The reason behind adding "DROP TABLE" is that, within a transaction, this function can be executed any number of times. But, intermittently, we see an error like:
ERROR: must be owner of relation CTE1
I am not able to reproduce this error. And there is only one user that runs this function. No other user has permissions to execute this function.
I couldn't think of a scenario when this would fail. Any thoughts of insights will be appreciated.

In SQL Server 2000, how to delete the specified rows in a table that does not have a primary key?

Let's say we have a table with some data in it.
IF OBJECT_ID('dbo.table1') IS NOT NULL
BEGIN
DROP TABLE dbo.table1;
END
CREATE TABLE table1 ( DATA INT );
---------------------------------------------------------------------
-- Generating testing data
---------------------------------------------------------------------
INSERT INTO dbo.table1(data)
SELECT 100
UNION ALL
SELECT 200
UNION ALL
SELECT NULL
UNION ALL
SELECT 400
UNION ALL
SELECT 400
UNION ALL
SELECT 500
UNION ALL
SELECT NULL;
How to delete the 2nd, 5th, 6th records in the table? The order is defined by the following query.
SELECT data
FROM dbo.table1
ORDER BY data DESC;
Note, this is in SQL Server 2000 environment.
Thanks.
In short, you need something in the table to indicate sequence. The "2nd row" is a non-sequitur when there is nothing that enforces sequence. However, a possible solution might be (toy example => toy solution):
If object_id('tempdb..#NumberedData') Is Not Null
Drop Table #NumberedData
Create Table #NumberedData
(
Id int not null identity(1,1) primary key clustered
, data int null
)
Insert #NumberedData( data )
SELECT 100
UNION ALL SELECT 200
UNION ALL SELECT NULL
UNION ALL SELECT 400
UNION ALL SELECT 400
UNION ALL SELECT 500
UNION ALL SELECT NULL
Begin Tran
Delete table1
Insert table1( data )
Select data
From #NumberedData
Where Id Not In(2,5,6)
If ##Error <> 0
Commit Tran
Else
Rollback Tran
Obviously, this type of solution is not guaranteed to work exactly as you want but the concept is the best you will get. In essence, you stuff your rows into a table with an identity column and use that to identify the rows to remove. Removing the rows entails emptying the original table and re-populating with only the rows you want. Without a unique key of some kind, there just is no clean way of handling this problem.
As you are probably aware you can do this in later versions using row_number very straightforwardly.
delete t from
(select ROW_NUMBER() over (order by data) r from table1) t
where r in (2,5,6)
Even without that it is possible to use the undocumented %%LOCKRES%% function to differentiate between 2 identical rows
SELECT data,%%LOCKRES%%
FROM dbo.table1`
I don't think that's available in SQL Server 2000 though.
In SQL Sets don't have order but cursors do so you could use something like the below. NB: I was expecting to be able to use DELETE ... WHERE CURRENT OF but that relies on a PK so the code to delete a row is not as simple as I was hoping for.
In the event that the data to be deleted is a duplicate then there is no guarantee that it will delete the same row as CURRENT OF would have. However in this eventuality the ordering of the tied rows is arbitrary anyway so whichever row is deleted could equally well have been given that row number in the cursor ordering.
DECLARE #RowsToDelete TABLE
(
rowidx INT PRIMARY KEY
)
INSERT INTO #RowsToDelete SELECT 2 UNION SELECT 5 UNION SELECT 6
DECLARE #PrevRowIdx int
DECLARE #CurrentRowIdx int
DECLARE #Offset int
SET #CurrentRowIdx = 1
DECLARE #data int
DECLARE ordered_cursor SCROLL CURSOR FOR
SELECT data
FROM dbo.table1
ORDER BY data
OPEN ordered_cursor
FETCH NEXT FROM ordered_cursor INTO #data
WHILE EXISTS(SELECT * FROM #RowsToDelete)
BEGIN
SET #PrevRowIdx = #CurrentRowIdx
SET #CurrentRowIdx = (SELECT TOP 1 rowidx FROM #RowsToDelete ORDER BY rowidx)
SET #Offset = #CurrentRowIdx - #PrevRowIdx
DELETE FROM #RowsToDelete WHERE rowidx = #CurrentRowIdx
FETCH RELATIVE #Offset FROM ordered_cursor INTO #data
/*Can't use DELETE ... WHERE CURRENT OF as here that requires a PK*/
SET ROWCOUNT 1
DELETE FROM dbo.table1 WHERE (data=#data OR data IS NULL OR #data IS NULL)
SET ROWCOUNT 0
END
CLOSE ordered_cursor
DEALLOCATE ordered_cursor
To perform any action on a set of rows (such as deleting them), you need to know what identifies those rows.
So, you have to come up with criteria that identifies the rows you want to delete.
Providing a toy example, like the one above, is not particularly useful.
You plan ahead and if you anticipate this is possible you add a surrogate key column or some such.
In general you make sure you don't create tables without PK's.
It's like asking "Say I don't look both directions before crossing the road and I step in front of a bus."

tsql - using internal stored procedure as parameter is where clause

I'm trying to build a stored procedure that makes use of another stored procedure. Taking its result and using it as part of its where clause, from some reason I receive an error:
Invalid object name 'dbo.GetSuitableCategories'.
Here is a copy of the code:
select distinct top 6 * from
(
SELECT TOP 100 *
FROM [dbo].[products] products
where products.categoryId in
(select top 10 categories.categoryid from
[dbo].[GetSuitableCategories]
(
-- #Age
-- ,#Sex
-- ,#Event
1,
1,
1
) categories
ORDER BY NEWID()
)
--and products.Price <=#priceRange
ORDER BY NEWID()
)as d
union
select * from
(
select TOP 1 * FROM [dbo].[products] competingproducts
where competingproducts.categoryId =-2
--and competingproducts.Price <=#priceRange
ORDER BY NEWID()
) as d
and here is [dbo].[GetSuitableCategories] :
if (#gender =0)
begin
select * from categoryTable categories
where categories.gender =3
end
else
begin
select * from categoryTable categories
where categories.gender = #gender
or categories.gender =3
end
I would use an inline table valued user defined function. Or simply code it inline is no re-use is required
CREATE dbo.GetSuitableCategories
(
--parameters
)
RETURNS TABLE
AS
RETURN (
select * from categoryTable categories
where categories.gender IN (3, #gender)
)
Some points though:
I assume categoryTable has no gender = 0
Do you have 3 genders in your categoryTable? :-)
Why do pass in 3 parameters but only use 1? See below please
Does #sex map to #gender?
If you have extra processing on the 3 parameters, then you'll need a multi statement table valued functions but beware these can be slow
You can't use the results of a stored procedure directly in a select statement
You'll either have to output the results into a temp table, or make the sproc into a table valued function to do what you doing.
I think this is valid, but I'm doing this from memory
create table #tmp (blah, blah)
Insert into #tmp
exec dbo.sprocName

Delete N oldest entries in table

How to delete N oldest entries. I'm limited Sybase. I need to write a stored procedure which would accept a number X and then leave only X newest entries in the table.
For example:
Say ID is auto incremented. The smaller it is, the older this entry is.
ID Text
=========
1 ASD
2 DSA
3 HJK
4 OIU
I need a procedure which would be executed like this.
execute CleanUp 2
and the result will be
ID Text
=========
3 HJK
4 OIU
Note: SQL Server syntax, but should work
Delete from TableName where ID in
(select top N ID from TableName order by ID )
If you want N to be a parameter you will have to construct the statement string and execute it
declare #query varchar(4000)
set #query = 'Delete from TableName where ID in '
set #query = #query + '(select top ' + #N + ' ID from TableName order by ID )'
exec sp_executesql #query
I Like Eduardo's option best as it's the simplest solution, but since Sergej mentions it is quite slow, here's an alternative solution:
Create a stored procedure that does the following:
Create a temp table with the same structure as the original table.
Insert the top N rows into the temp table.
Truncate the original table.
Copy the rows from the temp table back to the original table.
Generally this will be much faster, especially if you have lots of rows in the table.
If you have a clustered index on id, it's safe to execute a delete top query.
delete top 2 from TableName;
I know this is an old question but this can be done without constructing the statement as the top answer say, using a CTE :
WITH MyCTE AS
(
SELECT Field1, Field2, ROW_NUMBER() OVER (ORDER BY Field1 ASC) AS RowNum
FROM MyTable
WHERE Field2 = #WhatIWant
)
DELETE FROM MyCTE WHERE RowNum <= #NbRowsToDelete;