I have a PostgreSQL server running on 10.6 with an openSUSE distribution of Linux, and I just set up pgpool-II, to allow me to cache queries. It works mostly fine, but for unknown reasons, sometimes I get this warning message :
WARNING: memcache: adding table oid maps, failed to create directory:"/var/log/pgpool/oiddir". error:"No such file or directory"
I already created the directory, changed the owner to the user that runs the pgpool server, and allowed read, write, and execution on this directory to the same user.
This message appears when a query is not yet cached, and it doesn't seem to have any impact, i.e. the query is cached as it should and if I do it again, the result is directly pulled from the cache.
But I also have another problem, and I don't know if it's related to the first one :
When I write big queries (using JOINs, subqueries, lots of conditions, etc), pgpool-II will not cache their results (or cache it but not use it, I have no idea), even though the results are not big (less than 500 rows). Also, in this case, I don't get the "oid" warning message. I tried raising the different limits of shared memory pgpool-II is allowed to use (see documentation) but it changed nothing as I expected, because when pgpool-II fails to cache a query due to the lack of available shared memory, it's supposed to return a message like this one :
LOG: pid 13756: pool_add_temp_query_cache: data size exceeds memqcache_maxcache. current:4095 requested:111 memq_maxcache:4096
But in my case, I don't get any message. Examples of both cases below.
1st problem
Simple query, result is cached, "oid" error :
SELECT *
FROM some_table
WARNING: memcache: adding table oid maps, failed to create directory:"/var/log/pgpool/oiddir". error:"No such file or directory"
-- Doing it a second time will just give me the cached result without any warning, as expected
2nd problem
Complex query, result is not cached (or cached and not used), no warning/error message :
SELECT A.geom, A.id, to_char(C.timestamp, 'DD/MM/YY') as date, C.timestamp::time as time, ROUND(C.value) as value
FROM segments A, lines B, ( SELECT DISTINCT ON (B.id) B.id, A.timestamp, ROUND(A.nb1+A.nb2+A.nb3) as value
FROM records A
CROSS JOIN LATERAL (SELECT *
FROM points B
WHERE A.id = B.id
AND A.direction = B.direction
ORDER BY A.position <-> B.geom
LIMIT 1) AS B
ORDER BY B.id, A.timestamp DESC) AS C
WHERE A.id = B.id
AND B.id = C.id
AND A.direction = B.direction
AND B.direction = C.direction
-- Doing it a second time will just directly request the PostgreSQL server again, instead of pulling the result from the cache, as it should if the result was cached.
Your first issue looks like a permissions issue on the oiddir. Basically pgpool can't create the necessary directories for caching. Depending on how you start pgpool (through pgpoolAdmin or command line), you need to
Create the oiddir folder, mkdir /var/log/pgpool/oiddir
Give permissions of the folder to pgpool
Assuming pgpoolAdmin starts pgpool:
chown -R _www /var/log/pgpool/oiddir (I use pgpool on mac os, so the apache user is "_www", change accordingly)
Assuming you start pgpool from command line:
chown -R postgres /var/log/pgpool/oiddir (or you can use your current user instead of postgres)
Your second issue looks based on the first issue. Basically you can't cache because the oiddir isn't created. Fixing the first should allow you to cache.
Related
I use macOS and I am having issue with full-refresh on a large table. During the run it appears as if it hangs and there is no query running in redshift. It does not happen with smaller tables and it does not happen if I run an incremental. This table used to be smaller and I was able to run a full refresh as long as I specified the table. Now that it is bigger I seem to be running into this issue. There are 6 tables that this model is dependent on. Almost like the command isn’t being sent. Any suggestions?
There is no error because it just doesn't run. Other team members running this on windows and macos expect it to finish in 10 min. Currently it is 30 min but I have let it sit a lot longer than that.
My command is
dbt run --models +fct_mymodel --full-refresh --vars "run_date_start: 2020-06-01"
Thank you
Redhift UI usually shows only the long running queries. I ran into similar problems and they were caused by lock on some tables - in our case caused by uncommitted explicit transactions (BEGIN without COMMIT or ROLLBACK).
run this query to see current transactions and their locks:
select a.txn_owner, a.txn_db, a.xid, a.pid, a.txn_start, a.lock_mode, a.relation as table_id,nvl(trim(c."name"),d.relname) as tablename, a.granted,b.pid as blocking_pid ,datediff(s,a.txn_start,getdate())/86400||' days '||datediff(s,a.txn_start,getdate())%86400/3600||' hrs '||datediff(s,a.txn_start,getdate())%3600/60||' mins '||datediff(s,a.txn_start,getdate())%60||' secs' as txn_duration
from svv_transactions a
left join (select pid,relation,granted from pg_locks group by 1,2,3) b
on a.relation=b.relation and a.granted='f' and b.granted='t'
left join (select * from stv_tbl_perm where slice=0) c
on a.relation=c.id
left join pg_class d on a.relation=d.oid
where a.relation is not null;
read AWS knowledge base entry for more details https://aws.amazon.com/premiumsupport/knowledge-center/prevent-locks-blocking-queries-redshift/
I have a process which is creating thousands of temporary tables a day to import data into a system.
It is using the form of:
create temp table if not exists test_table_temp as
select * from test_table where 1=0;
This very quickly creates a lot of dead rows in pg_attribute as it is constantly making lots of new columns and deleting them shortly afterwards for these tables. I have seen solutions elsewhere that suggest using on commit delete rows. However, this does not appear to have the desired effect either.
To test the above, you can create two separate sessions on a test database. In one of them, check:
select count(*)
from pg_catalog.pg_attribute;
and also note down the value for n_dead_tup from:
select n_dead_tup
from pg_stat_sys_tables
where relname = 'pg_attribute';
On the other one, create a temp table (will need another table to select from):
create temp table if not exists test_table_temp on commit delete rows as
select * from test_table where 1=0;
The count query for pg_attribute immediately goes up, even before we reach the commit. Upon closing the temp table creation session, the pg_attribute value goes down, but the n_dead_tup goes up, suggesting that vacuuming is still required.
I guess my real question is have I missed something above, or is the only way of dealing with this issue vacuuming aggressively and taking the performance hit that comes with it?
Thanks for any responses in advance.
No, you have understood the situation correctly.
You either need to make autovacuum more aggressive, or you need to use fewer temporary tables.
Unfortunately you cannot change the storage parameters on a catalog table – at least not in a supported fashion that will survive an upgrade – so you will have to do so for the whole cluster.
I have received the following email from Heroku:
The database DATABASE_URL on Heroku app [redacted] has
exceeded its allocated storage capacity. Immediate action is required.
The database contains 12,858 rows, exceeding the Hobby-dev plan limit
of 10,000. INSERT privileges to the database will be automatically
revoked in 7 days. This will cause service failures in most
applications dependent on this database.
To avoid a disruption to your service, migrate the database to a Hobby
Basic ($9/month) or higher database plan:
https://hello.heroku.com/upgrade-postgres-c#upgrading-with-pg-copy
If you are unable to upgrade the database, you should reduce the
number of records stored in it.
My postgres database had a single table with 5693 rows at the time I received this email, which does not match the '12858 rows' mentioned by the email. What am I missing here?
It is perhaps worth mentioning that my DB also has a view of the table mentioned above, which Heroku might be adding to the count (despite not being an actual table), doubling the row count from 5693 to 11386, which still does not match the 12858 mentioned in the email, but it is closer.
TL;DR the rows in views DO factor into the total row count, even when it isn't a materialized view, despite the fact that views do not store data.
I ran heroku pg:info and saw the line:
Rows: 12858/10000 (Above limits, access disruption imminent)
I then dropped the view I mentioned in the original post, and ran heroku pg:info again:
Rows: 5767/10000 (Above limits, access disruption imminent)
So it seems indeed that views DO get counted in the total row count, which seems rather silly, since views don't actually store any data.
I also don't know why the (Above limits, access disruption imminent) string is still present after reducing the row number below the 10000 limit, but after running heroku pg:info again a minute later, I got
Rows: 5767/10000 (In compliance)
so apparently the compliance flag is not updated at the same time as the row number.
What's even stranger is that when I later re-created the same view that I had dropped, and ran heroku pg:info again, the row count did not double back up to ~11000, it stayed at ~5500.
It is useful to note that the following SQL command will display the row counts of the various objects in the database:
select table_schema,
table_name,
(xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
select table_name, table_schema,
query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
from information_schema.tables
where table_schema = 'public' --<< change here for the schema you want
) t
the above query was copy-pasted from here
It sounds as you would have two different moments where usage of your postgresql database were measured: first one with higher values (12.858 rows is over the free limit and measured at Heroku) and the second one with less values (5693 rows which would be in free limit and could be measured on your local environment?).
Anyway - first things first: Take a look into your PostgreSQL database at Heroku - this can be done in two ways:
Connect your local Heroku CLI with your dyno and check the info of the related PostgreSQL database
HowTo see the database related info
Login into your Heroku WebGUI and check size and rows in there
Heroku Postgres
The background behind their database monitoring is Heroku explain in there:
Monitoring Heroku Postgres
At my work, I needed to build a new join table in a postgresql database that involved doing a lot of computations on two existing tables. The process was supposed to take a long time so I set it up to run over the weekend before I left on Friday. Now, I want to check to see if the query finished or not.
How can I check if an INSERT command has finished yet while not being at the computer I ran it on? (No, I don't know how many rows it was suppose to add.)
Select * from pg_stat_activity where state not ilike 'idle%' and query ilike 'insert%'
This will return all non-idle sessions where the query begins with insert, if your query does not show in this list then it is no longer running.
pg_stat_activity doc
You can have a look at the table pg_stat_activity which contains all database connections including active query, owner etc.
At https://gist.github.com/rgreenjr/3637525 there is a copy-able example how such a query could look like.
I want to perform a simple DROP VIEW ... but it hangs.
I have run this query SELECT * FROM pg_locks WHERE NOT granted taken from this page on Lock Monitoring.
However the following query they suggest returns no results:
SELECT bl.pid AS blocked_pid,
a.usename AS blocked_user,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_statement
FROM pg_catalog.pg_locks bl
JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid
JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
WHERE NOT bl.granted;
Where should I look now ?
Finally I figure out what was wrong. Here are the steps to find the root cause:
Solution
Step 1 : List requested locks not granted
select * from pg_locks where not granted;
In my case, an attempt to lock, with the mode AccessExclusiveLock, the view I want to drop was not granted. This is why my DROP VIEW... hangs.
Step 2 : Find which other process(es) held a conflicting lock
select * from pg_locks where relation = <oid_of_view>
Here I list all processes locking or trying to lock on my view. I found out two processes, the one that want to drop the view and... another one.
Step 3 : Find out what other process(es) is/are doing now
select xact_start,query_start,backend_start,state_change,state from pg_stat_activity where pid in (<list_of_other_process(es)_pid>);
I had only one process holding a lock in my case. Surprisingly, its state was : idle in transaction
I was not able to drop the view because another process was idle in transaction. I simply kill it to solve my issue. For example, if the procpid was 8484 and let's suppose my postgresql server runs on a Linux box, then in the shell, I execute the following command:
$ kill -9 8484
Discussion
If you face similar issue, you can quickly find out what's going on by reproducing steps 1,2,3. You may need to customize Step 2 in order to find other conflicting process(es).
References
Lock Monitoring
Lock Dependency Information
View Postgresql Locks
I had a similar problem but the accepted answer didn't work for me as I do not have admin access to kill any process. Instead, this is how I managed to solve the problem:
Issue SELECT * FROM pg_stat_activity; to get the stats about the PostgreSQL activities.
In query column, look for the queries that read from that view. You may choose to narrow down your search by only looking into the rows related to your username (using username column) or query_start if you know when the issue emerged. There could be more than one row associated with your unwanted view.
Identify all pid from the rows in the above step and plug them into SELECT pg_terminate_backend(<pid>); (instead of <pid>) one by one and run them.
Now you should be able to drop your view.
Please note that as you terminate the backend processes using pg_terminate_backend(), you may face some errors. The reason is that terminating some process may automatically end other processes. Therefore, some of the identified PIDs might be invalid by the time.
As a summary, this solution from comments worked for me:
Step 1: Find the pid:
select * from pg_stat_activity
where pid in
(select pid from pg_locks
where relation =
(select relation from pg_locks where not granted));
Step 2: kill pid:
kill -9 pid