Postgres refresh materialized view doesn't persist new data - postgresql

I have an AWS lambda function that refreshes a Postgres view. The data appears to refresh for the current connection, but does not persist for future connections.
kwargs = { 'async': 0 }
conn = extensions.connection(dsn, **kwargs)
cur.execute("select max(updated_at), count(*) from my_view; ")
last_updated = cur.fetchone()
print("LambdaRefreshView: Starting to refresh, last updated {}".format(last_updated))
cur.execute("REFRESH MATERIALIZED VIEW my_view;")
cur.execute("select max(updated_at), count(*) from my_view; ")
last_updated = cur.fetchone()
print("LambdaRefreshView: Finished refreshing, last updated {}".format(last_updated))
outputs:
LambdaRefreshView: Starting to refresh, last updated (2021-09-24 21:31:46, 299906)
LambdaRefreshView: Finished refreshing, last updated (2021-09-29 15:37:37, 302511)
but then I run it again, and the first select query still shows old data. The view doesn't "stay" refreshed with the latest data:
LambdaRefreshView: Starting to refresh, last updated (2021-09-24 21:31:46, 299906)
LambdaRefreshView: Finished refreshing, last updated (2021-09-29 15:37:37, 302514)
I've different variations of the refresh command with/without CONCURRENTLY and WITH DATA, but it has no effect.
Any ideas? Python 3.6.14, psycopg2 2.9.1, PostgreSQL 9.6.22
If I run REFRESH MATERIALIZED VIEW CONCURRENTLY my_view; from psql, with the same DSN/user/password, it refreshes properly! And re-running from Lambda shows the updated data:
psql:
postgres=> select max(updated_at), count(*) from my_view;
max | count
-------------------------------+--------
2021-09-24 21:31:46.262262+00 | 299906
(1 row)
postgres=> REFRESH MATERIALIZED VIEW CONCURRENTLY my_view;
REFRESH MATERIALIZED VIEW
postgres=> select max(updated_at), count(*) from my_view;
max | count
-------------------------------+--------
2021-09-29 15:40:53.123746+00 | 302515
(1 row)
And the Lambda function sees this new, fresh, data:
LambdaRefreshView: Starting to refresh, last updated (2021-09-29 15:40:53, 302515)
LambdaRefreshView: Finished refreshing, last updated (2021-09-29 15:40:53, 302515)

Related

SQL Error: ERROR: not all tokens processed

Am getting below error in Postgres while executing insert and delete queries. I have around 50 inserts and 50 delete statements. When executed an getting the error as,
SQL Error: ERROR: not all tokens processed
The error is not consistent all the time,
For example,
My 20th delete statement is getting failed
Next time when the same queries are executed, 25th delete statement is getting failed
And when those statements are executed alone, there is no failure.
Not sure if it is a database load issue or infrastructure related issue.
Any suggestion would be helpful
Below is the query,
WITH del_table_1 AS
(
delete from table_1 where to_date('01-'||col1,'DD-mm-YYYY') < current_date-1
RETURNING *
)
update control_table set deleted_count = cnt, status = 'Completed',
update_user_id = 'User', update_datetime = current_date from
(select 'Table1' as table_name, count(*) as cnt from del_table_1) aa
where
control_table.table_name = aa.table_name
and control_table.table_name = 'Table1'
and control_table.status = 'Pending';

Update query is not working in the function created but that same query when runs manually working

I am creating function in which update command is used consecutively two times the first update is working and the second one is not
Have tried execute format () for the second update not working
While running the function the updation2 is not working but when I manually run this updation command the table get updated...
The code is as follows:
update edmonton.weekly_pmt_report
set permit_number = pmt.prnum
from(select permit_details,
split_part(permit_details,'-',1) as prnum
from edmonton.weekly_pmt_report) as pmt
where edmonton.weekly_pmt_report.permit_details =
pmt.permit_details;
execute format('update edmonton.weekly_pmt_report
set address = ds_dt.adr,
job_description = ds_dt.job,
applicant = ds_dt.apnt
from(select split_part(per_num,''-'',1) as job_id,
job_des as job,addr as adr, applic as apnt
from edmonton.descriptive_details ) as ds_dt
where edmonton.weekly_pmt_report.permit_number =
ds_dt.job_id');
That the second update query has value only 400 out of 1000 so the null columns are on the Top, that's why It seemed to be not working...

Tracking amount of bytes written to temp tables

With PostgreSQL 9.5, I would like to track the total amount of bytes written (since DB cluster start) to:
WAL
temp files
temp tables
For 1.:
select
pg_size_pretty(archived_count * 16*1024*1024) temp_bytes,
(now() - stats_reset)::text uptime
from pg_stat_archiver;
For 2.:
select
(now() - stats_reset)::text uptime,
pg_size_pretty(temp_bytes) temp_bytes
from pg_stat_database where datname = 'mydb';
How do I get 3.?
In response to a comment below, I did some tests to check where temp tables are actually written.
First, the DB parameter temp_buffers is at 8GB on this cluster:
select pg_size_pretty(setting::bigint*8192) from pg_settings
where name = 'temp_buffers';
-- "8192 MB"
Lets create a temp table:
drop table if exists foo;
create temp table foo as
select random() from generate_series(1, 1000000000);
-- Query returned successfully: 1000000000 rows affected, 10:22 minutes execution time.
Check the PostgreSQL backend pid and OID of the created temp table:
select pg_backend_pid(), 'pg_temp.foo'::regclass::oid;
-- 46573;398695055
Check the RSS size of the backend process
~$ grep VmRSS /proc/46573/status
VmRSS: 9246276 kB
As can be seen, this is only slightly above the 8GB set with temp_buffers.
The data inserted into the temp table is however immediately written, and it is written to the normal tablespace directories, not temp files:
select * from pg_relation_filepath('pg_temp.foo')
-- "base/16416/t3_398695055"
Here is the number of files and amount written:
with temp_table_files as
(
select * from pg_ls_dir('base/16416/') fn
where fn like 't3_398695055%'
)
select
count(*) as cnt,
pg_size_pretty(sum((pg_stat_file('base/16416/' || fn)).size)) as size
from temp_table_files;
-- 34;"34 GB"
And finally verify that the set of temp files owned by this backend PID is indeed empty:
with temp_files_per_pid as
(
with temp_files as
(
select
temp_file,
(regexp_replace(temp_file, $r$^pgsql_tmp(\d+)\..*$$r$, $rr$\1$rr$, 'g'))::int as pid,
(pg_stat_file('base/pgsql_tmp/' || temp_file)).size as size
from pg_ls_dir('base/pgsql_tmp') temp_file
)
select pid, pg_size_pretty(sum(size)) from temp_files group by pid order by pid
)
select * from temp_files_per_pid where pid = 46573;
Returns nothing.
What is also "interesting", after dropping the temp table
DROP TABLE foo;
the RSS of the backend process does not reduce:
~$ grep VmRSS /proc/46573/status
VmRSS: 9254544 kB
Doing the following will also not free the RSS again:
RESET ALL;
DEALLOCATE ALL;
DISCARD TEMP;
What I know, there are not any special metric for temp tables. The temp tables uses session (process) memory to temp_buffers size (8MB by default). When these temp buffers are full, then temporary files are generated.

RedShift copy command return

can we get the number of row inserted through copy command? Some records might fail, then what is the no of records successfully inserted?
I have a file with json object in Amazon S3 and trying to load data into Redshift using copy command. How do I know how many of records successfully got inserted and how many failed?
Loading some example data:
db=# copy test from 's3://bucket/data' credentials '' maxerror 5;
INFO: Load into table 'test' completed, 4 record(s) loaded successfully.
COPY
db=# copy test from 's3://bucket/err_data' credentials '' maxerror 5;
INFO: Load into table 'test' completed, 1 record(s) loaded successfully.
INFO: Load into table 'test' completed, 2 record(s) could not be loaded. Check 'stl_load_errors' system table for details.
COPY
Then the following query:
with _successful_loads as (
select
stl_load_commits.query
, listagg(trim(filename), ', ') within group(order by trim(filename)) as filenames
from stl_load_commits
left join stl_query using(query)
left join stl_utilitytext using(xid)
where rtrim("text") = 'COMMIT'
group by query
),
_unsuccessful_loads as (
select
query
, count(1) as errors
from stl_load_errors
group by query
)
select
query
, filenames
, sum(stl_insert.rows) as rows_loaded
, max(_unsuccessful_loads.errors) as rows_not_loaded
from stl_insert
inner join _successful_loads using(query)
left join _unsuccessful_loads using(query)
group by query, filenames
order by query, filenames
;
Giving:
query | filenames | rows_loaded | rows_not_loaded
-------+------------------------------------------------+-------------+-----------------
45597 | s3://bucket/err_data.json | 1 | 2
45611 | s3://bucket/data1.json, s3://bucket/data2.json | 4 |
(2 rows)

Deadlock between select and truncate (postgresql)

Table output_values_center1 (and some other) inherits output_values. Periodically I truncate table output_values_center1 and load new data (in one transaction). In that time user can request some data and he got error message. Why it ever happens (select query requests only one record) and how to avoid such problem:
2010-05-19 14:43:17 UTC ERROR: deadlock detected
2010-05-19 14:43:17 UTC DETAIL: Process 25972 waits for AccessShareLock on relation 2495092 of database 16385; blocked by process 26102.
Process 26102 waits for AccessExclusiveLock on relation 2494865 of database 16385; blocked by process 25972.
Process 25972: SELECT * FROM "output_values" WHERE ("output_values".id = 122312) LIMIT 1
Process 26102: TRUNCATE TABLE "output_values_center1"
"TRUNCATE acquires an ACCESS EXCLUSIVE lock on each table it operates on, which blocks all other concurrent operations on the table. If concurrent access to a table is required, then the DELETE command should be used instead."
Obviously it's not clear if you just look at the "manpage" linked above why querying the parent table affects its descendant. The following excerpt from the "manpage" for the SELECT command clarifies it:
"If ONLY is specified, only that table is scanned. If ONLY is not specified, the table and any descendant tables are scanned."
I'd try this (in pseudocode) for truncating:
#define NOWAIT_TIMES 100
#define SLEEPTIME_USECS (1000*100)
for ( i=0; ; i++ ) {
ok = query('start transaction');
if ( !ok ) raise 'Unable to start transaction!';
queries = array(
'lock table output_values in access exclusive mode nowait',
'truncate output_values_center1',
'commit'
);
if ( i>NOWAIT_TIMES ) {
// we will wait this time, as we tried NOWAIT_TIMES and failed
queries[0] = 'lock table output_values in access exclusive mode';
}
foreach q in queries {
ok = query(q);
if (!ok) break;
}
if (!ok) {
query('rollback');
usleep(SLEEPTIME_USECS);
} else {
break;
};
};
This way you'll be safe from deadlocks, as parent table will be exclusively locked. A user will just block for a fraction of second while truncate runs and will automatically resume after commit.
But be prepared that this can run several seconds on busy server as when table is in use then lock will fail and be retried.