How to create temporary tables inside a postgres function - postgresql

The problem here is that by the time i go to the UPDATE block of code, i no longer have access to the data that was in subquery
I tried many variations of creating a temporary table and a select into from deleted_rows instead of the subquery AS part of the WITH statement but it did not like anything i tried, and it especially didn't like me trying to create a table after the initial with clause
CREATE OR REPLACE FUNCTION public.aggregate_userviews(
)
RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$BEGIN
WITH deleted_rows AS (
DELETE FROM user_details_views
WHERE ts < (timezone('UTC', now() - interval '5 minutes')) RETURNING *
), subquery AS (SELECT DISTINCT username, DATE(ts) as day_of_month, COUNT(id) AS user_views
FROM deleted_rows
GROUP BY username, day_of_month
ORDER BY day_of_month ASC)
INSERT INTO analytics_summary ( username, day_of_month, user_views)
SELECT username, day_of_month, user_views
FROM subquery
ON CONFLICT (username ,day_of_month)
DO UPDATE SET user_views = analytics_summary.user_views + excluded.user_views;
UPDATE user_details u
SET view_count = u.view_count + subquery.user_views
FROM subquery
WHERE u.username=subquery.username;
RETURN NULL;
END;$BODY$;
If i remove the update statement it works perfectly, and i could probably use a trigger to do the update but i would rather not if i am not far off from a solution with what i have

Got it, i had to create the table above the WITH and then fill it before the first insert and then use the temp table for the following two blocks of code like
CREATE OR REPLACE FUNCTION public.aggregate_userviews(
)
RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$BEGIN
create temporary table temp_userviews_table (username varchar, day_of_month date, user_views int);
WITH deleted_rows AS (
DELETE FROM user_details_views
WHERE ts < (timezone('UTC', now() - interval '5 minutes')) RETURNING *
), subquery AS (SELECT DISTINCT username, DATE(ts) as day_of_month, COUNT(id) AS user_views
FROM deleted_rows
GROUP BY username, day_of_month
ORDER BY day_of_month ASC)
INSERT INTO temp_userviews_table (username, day_of_month, user_views)
SELECT username, day_of_month, user_views
FROM subquery;
INSERT INTO analytics_summary ( username, day_of_month, user_views)
SELECT username, day_of_month, user_views
FROM temp_userviews_table
ON CONFLICT (username ,day_of_month)
DO UPDATE SET user_views = analytics_summary.user_views + excluded.user_views;
UPDATE user_details u
SET view_count = u.view_count + temp_userviews_table.user_views
FROM temp_userviews_table
WHERE u.username=temp_userviews_table.username;
drop table temp_userviews_table;
RETURN NULL;
END;$BODY$;

Related

Trigger taking time to insert data in postgres (column count 300)

I have created a trigger, it is taking more time while inserting multiple records.
Insetting 1 or 2 records is working. But if the records are more than 1000 then not fast, still running query from 2 hours.
I have created only 15 columns in below table. My actual table has 300 columns.
Is any other way to insert multiple records on the trigger table.?
Table
create table patients (
id serial,
name character varying (50),
daily varchar (8),
month varchar (6),
quarter varchar (6),
registration_date timestamp,
age integer,
address text,
country text,
city text,
phone_number integer,
Education text,
Occupation text,
Marital_Status text,"E-mail" text
);
trigger function
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$BEGIN
update patients t1
set quarter=t2.quarter
from (SELECT (extract(year from registration_date)::text || 'Q' || extract(quarter from registration_date)::text) as quarter,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set month=t2.month
from (select (extract(year from registration_date)::text || '' || to_char(registration_date,'MM')) as month,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set daily=t2.daily
from (select extract(year from registration_date) || '' ||to_char(registration_date,'MM') || '' || to_char(registration_date,'DD') as daily,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
RETURN new;
END;
$$ LANGUAGE plpgsql;
Trigger definition
create TRIGGER trigger_update_data_after_insert_patients
AFTER insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
insert multiple records into patients table
INSERT INTO public.patients
("name", daily, "month", quarter, registration_date, age, address, country, city, phone_number, education, occupation, marital_status, "E-mail")
VALUES('Adam', '20221215', '202212', '2022Q4', '2022-08-17 19:01:10-08', 24, '', '', '', 1245578, '', '', '', '');
select statement
select * from patients;
You are updating all rows in the table with the same registration date as the one provided in the insert three times - just to calculate those generated columns.
You can do this more efficiently by assigning the generated values to the NEW record in a BEFORE trigger.
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$
BEGIN
new.quarter := to_char(new.registration_date, 'yyyy"Q"q');
new.month := to_char(new.registration_date, 'yyyy mm');
new.daily := to_char(new.registration_date, 'yyyymmdd');
RETURN new;
END;
$$
LANGUAGE plpgsql;
create TRIGGER trigger_update_data_after_insert_patients
BEFORE insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
However I don't see the need to store these calculated values when you can easily format the registration_date when retrieving the data. I would get rid of those columns and the trigger and create a VIEW that does the formatting.

How to check content loading status on a static database?

We have a static database we constantly update with loader scripts. These loader scripts get current information from third party sources, clean it and upload it to database.
I have already made some SQL scripts to ensure schemas and tables required exists. Now I'd like to check that each table has the expected row count.
I did something like this:
select case when count(*) = <someNumber>
then 'someSchema.someTable OK'
else 'someSchema.someTable BAD row count' end
from someSchema.someTable;
But doing these kind of queries for ~300 tables is cumbersome.
Now I was thinking maybe there's a way to have a table like:
create table expected_row_count (
schema_name varchar,
table_name varchar,
row_count bigint
);
And somehow test all listed tables and only output the ones that fail the count check. But I'm kind of missing now... Should I try to write a function? Can a table like this be used to build queries and execute them?
Whole credit goes to #a-horse_with*_no_name , I'm posting a reply for completeness:
Check row count
First let's create some data to test the query:
create schema if not exists data;
create table if not exists data.test1 (nothing int);
create table if not exists data.test2 (nothing int);
insert into data.test1 (nothing)
(select random() from generate_series(1, 28));
insert into data.test2 (nothing)
(select random() from generate_series(1, 55));
create table if not exists public.expected_row_count (
table_schema varchar not null default '',
table_name varchar not null default '',
row_count bigint not null default 0
);
insert into public.expected_row_count (table_schema, table_name, row_count) values
('data', 'test1', (select count(*) from data.test1)),
('data', 'test2', (select count(*) from data.test2))
;
Now the query to check the data:
select * from (
select
table_schema,
table_name,
(xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
select
table_schema,
table_name,
query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
from information_schema.tables
where table_schema = 'data' --<< change here for the schema you want
) infs ) as r
inner join expected_row_count erc
on r.table_schema = erc.table_schema
and r.table_name = erc.table_name
and r.row_count != erc.row_count
;
Previous query should give an empty results if all counts are ok, and the
tables with missing data if not. To check it, update the count for some
table on expected_row_count and re-run the query. For example:
update expected_row_count set row_count = 666 where table_name = 'test1';

Create table with stored function postgresql

I have a query for create table like below
DROP TABLE IF EXISTS BAJUL;
CREATE TABLE BAJUL AS (
SELECT dt_trx, row_number() OVER (ORDER BY dt_trx DESC) AS row_number
FROM stock_trx_idx
WHERE dt_trx BETWEEN '2017-01-01' AND '2017-02-28'
GROUP BY 1
ORDER BY 1 DESC);
How able to create above table with stored function in Postgresql?
I tried with below script
CREATE OR REPLACE FUNCTION my_function (dt1 DATE, dt2 DATE)
RETURNS VOID AS
$func$
BEGIN
EXECUTE format('
DROP TABLE IF EXISTS tblq;
CREATE TABLE IF NOT EXISTS tblq AS(
SELECT dt_trx, row_number() OVER (ORDER BY dt_trx DESC) AS row_number
FROM stock_trx
WHERE dt_trx BETWEEN dt1 AND dt2
GROUP BY 1
ORDER BY 1 DESC
)' );
END
$func$ LANGUAGE plpgsql;
but when I try to execute SF like below
SELECT my_function ('2017-01-01', '2017-02-28');
I got error --> ERROR: column "dt1" does not exist
Would like to seek your help.
Thanks & rgds,
Bayu
Use
format('CREATE ... WHERE dt_trx BETWEEN %L AND %L ...', dt1, dt2)
You error is obvious. In the SELECT statement,
SELECT dt_trx, row_number() OVER (ORDER BY dt_trx DESC) AS row_number
FROM stock_trx
WHERE dt_trx BETWEEN dt1 AND dt2
GROUP BY 1
ORDER BY 1 DESC
The dt1 column doesn't exist. You didn't tell him you wanted to use your variable. Try concatenate your string with your variables.
By the way, you can drop your ORDER BY if your creating a table with that statement.

Multi-INSERT with unchangeable param

Is there any way to INSERT multiple values with one from DB that unchangable?
I thought about WITH but without success:
WITH t as (SELECT date_trunc('hour', NOW()))
INSERT INTO my_table(ID, TIME) VALUES (1,t),(2,t);
No need for the CTE, just use a plain SELECT as the source for the insert:
insert into my_table (id, time)
select i, date_trunc('hour', NOW())
from generate_series(1,2) i;
If you really want the CTE, you need to select from it in the values clause:
WITH t as (
SELECT date_trunc('hour', NOW()) hour_t
)
INSERT INTO my_table(ID, TIME)
VALUES
(1, (select hour_t from t)),
(2, (select hour_t from t));

How to write a multi-parameter CTE script?

I am trying to write a TSQL script for an SSRS report that uses a CTE to select records based on the parameters chosen. I'm looking for the most efficient way to do this, either all in TSQL and/or SSRS. I have 4 parameters which can be set to NULL (All values) or one specific value. Then in my CTE, I have the following line:
ROW_NUMBER() over(partition by G.[program_providing_service],G.people_id
order by G.[actual_date] desc) as rowID
This above CTE is for the case when Program is NULL and People is not null. My 4 parameters are:
Program, Facility, Staff, and People.
So I only want to partition values when they are NULL. Currently I implement this by one CTE depending on the parameter values. For example, if they choose NULL for all parameters except People, then this CTE would look like:
ROW_NUMBER() over(partition by G.people_id
order by G.[actual_date] desc) as rowID
Or if all 5 parameters are null:
ROW_NUMBER() over(partition by G.[program_providing_service], G.[site_providing_service], G.staff_id, G.people_id
order by G.[actual_date] desc) as rowID
If they do not choose NULL for any of the 4 parameters, then I probably do not need to partition by any field since I just want the top 1 record ordered by actual_date descending. This is what my CTE looks like:
;with cte as
(
Select distinct
G.[actual_date],
G.[site_providing_service],
p.[program_name],
G.[staff_id],
G.program_providing_service,
ROW_NUMBER() over(partition by G.[program_providing_service],G.people_id
order by G.[actual_date] desc) as rowID
From
event_log_rv G With (NoLock)
WHERE
...
AND (#ClientID Is Null OR [people_id]=#ClientID)
AND (#StaffID Is Null OR [staff_id] = #StaffID)
AND (#FacilityID Is Null OR [site_providing_service] = #FacilityID)
AND (#ProgramID Is Null OR [program_providing_service] = #ProgramID)
and (#SupervisorID is NULL OR staff_id in (select staff_id from #supervisors))
)
SELECT
[actual_date],
[site_providing_service],
[program_name],
[staff_id],
program_providing_service,
people_id,
rowID
FROM cte WHERE rowid = 1
ORDER BY [Client_FullName]
where the ROW_NUMBER line varies depending on the parameters chosen. Currently I have 5 IF statements in this TSQL script that look like:
IF #ProgramID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
with one CTE in each of these IF statements:
IF #FacilityID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #ProgramID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #StaffID IS NOT NULL AND #ClientID IS NULL
BEGIN
...
END
IF #ClientID IS NOT NULL
BEGIN
...
END
How can I code for all possible options, whether they choose NULL or else specific values?
OMG.... it took me long time to try to understand what you want to do. There is some contradiction in your description. Pleas revist your description. Like you said you only want to partition values when they are NULL; then you also said, when they choose NULL for all parameter except for people, then you partition on people....
No matter what way you want to achieve, partition on 'null' or 'not null', you can construct dynamic sql to achieve this, instead of adding a lot of [if...else]
Following code is pseudo, definitely not tested. Just give you a hint. The following code has one assumption, which is your parameters have priority in partition order, for example, if Program is not null (or null), Program is in the first location.
declare #sql varchar(max)
set #sql = '
;with cte as
(
Select distinct
G.[actual_date],
G.[site_providing_service],
p.[program_name],
G.[staff_id],
G.program_providing_service,
ROW_NUMBER() over(partition by
'
if(#progarm is null)
set #sql = #sql + 'G.[program_providing_service],'
if(#facility is null)
set #sql = #sql + 'G.[site_providing_service],'
if(#staff is null )
set #sql = #sql + 'G.staff_id,'
if(#people is null)
set #sql = #sql + 'G.people_id'
set #sql = #sql + '
order by G.[actual_date] desc) as rowID
From
event_log_rv G With (NoLock)
WHERE
...
AND (#ClientID Is Null OR [people_id]=#ClientID)
AND (#StaffID Is Null OR [staff_id] = #StaffID)
AND (#FacilityID Is Null OR [site_providing_service] = #FacilityID)
AND (#ProgramID Is Null OR [program_providing_service] = #ProgramID)
and (#SupervisorID is NULL OR staff_id in (select staff_id from #supervisors))
)
SELECT
[actual_date],
[site_providing_service],
[program_name],
[staff_id],
program_providing_service,
people_id,
rowID
FROM cte WHERE rowid = 1
ORDER BY [Client_FullName]
'
exec(#sql)