Find unused tables in Amazon RDS (Postgres)

Find unused tables in Amazon RDS (Postgres) - postgresql

In an effort to do some basic housekeeping on our Amazon RDS (Postgresql) instance, my team hopes to drop unused or rarely used tables from our database. In Redshift, I used the stl_query table to determine which tables were accessed frequently enough to remain.
The problem is, I can't seem to figure out an equivalent strategy for Postgres. I tried checking the log files in the console, but these don't appear to have the correct info.
Aside from searching our code base for references to used tables, is there a good strategy to find unused / infrequently used tables in Postgres? If sufficient logs exist, I am willing to write some sort of parsing script to get the necessary data - I just need to find a good source.

It turns out the statistics I need live in the statistics collector views, specifically pg_stat_user_tables.
This is the query I was able to find infrequently accessed tables:
SELECT
relname,
schemaname
FROM
pg_stat_user_tables
WHERE
(idx_tup_fetch + seq_tup_read) < 5; --access threshold

Related

Amazon RDS Postgresql snapshot preserves schema but loses all data

Using AWS RDS console I created a snapshot backup of a Postgresql v11 database containing multiple schemas. I then created a new instance from the backup. The process seemed to work fine without error. However, upon inspection of the data in the new instance, I noticed that in only one of my schemas the data was not preserved. The schema structure, tables, indexes, constraints, etc looked fine, but every table was empty (select count(*) from schema.table was 0 for every table in the schema). All other schemas looked fine and contained the expected data. I looked everywhere (could not find help for this online) and tried many tests myself (changing roles, rebuilding the schema, privileges, much more) while attempting to solve this issue. What would cause my snapshots to preserve the entire schema structure, but lose all of the data itself?

I finally realized that the only difference between the problem schema and the other was that all tables in the problem schema had been created with the 'UNLOGGED' keyword. This was done to increase write speed for millions of rows inserted when the schema was first built. However, when a snapshot is created/restored as described above, the process depends on the WAL files that are written with normal (logged) tables to restore the data. To fix my problem I simply altered all of the tables and set them to be logged (alter table schema.table set logged). After this, snapshots worked fine. For anyone else in the future that is doing something similar, should unlogged tables be needed for initial mass population of data to get better write speed, it would be a good to changed them to be logged after initial data population (if you plan on using snapshots or replications or similar). Side note, pg_dump/pg_restore does still work for unlogged tables.

Is there an alternative to temporary tables I can use on a Hot Standby copy?

as suggested here's a TL;DR bit. I'm looking for an alternative to temporary tables I could use on a Hot Standby copy of a database. Is there anything or do I have to re-write everything and try and do it all in subqueries?
When I joined our company last year, our ERP was hosted locally and although I didn’t have admin access to the Postgres database I at least had read write access to the tables.
I wrote a number of reports (using SQL Command option in Crystal Reports)/SQL scripts that use temporary tables however, we’ve just migrated to a hosted version of the ERP and rather than access the live database we have been given access to a Hot Standby copy, mainly due to load balancing issues.
Unfortunately, the software company didn’t warn us that this would be the case or that it would be read only access. I found this out when I was testing some scripts when obviously anything with a temporary table failed.
I use temporary tables for things like storing dates and bank holiday information, holding temporary calculations and so on.
So I'm looking for an alternative to temporary tables I could use on the Hot Standby copy, or do I have to re-write everything and try and do it all in subqueries?
I’ve looked at using CTE (WITH) but the scope is far too small as I’d need access throughout the script.
Then I thought maybe I could read the data from the Hot Standby but create temporary tables in a different database/schema, but I don’t think that’s viable. If it is I might have to speak to the software house to be given access to another database/schema. postgres_fdw would seem the most likely candidate as you can update the external table but I can't see anywhere about dropping and creating tables.
I’ve only been using Postgres since last July having previously used MSSQL which I could probably have used a table variable, but I can’t find an equivalent for that.
I've tried looking at the Postgres documentation but, to some embarrassment, I do find a lot of documentation hard to follow without relatable examples, so I might well have missed something.
Sorry for the long post!
Thanks

In DBeaver, how can I run an SQL union query from two different connections..?

We recently migrated a large DB2 database to a new server. It got trimmed a lot in the migration, for instance 10 years of data chopped down to 3, to name a few. But now I find that I need certain data from the old server until after tax season.
How can I run a UNION query in DBeaver that pulls data from two different connections..? What's the proper syntax of the table identifiers in the FROM and JOIN keywords..?
I use DBeaver for my regular SQL work, and I cannot determine how to span a UNION query across two different connections. However, I also use Microsoft Access, and I easily did it there with two Pass-Through queries that are fed to a native Microsoft Access union query.
But how to do it in DBeaver..? I can't understand how to use two connections at the same time.
For instance, here are my connections:
And I need something like this...
SELECT *
FROM ASP7.F_CERTOB.LDHIST
UNION
SELECT *
FROM OLD.VIPDTAB.LDHIST
...but I get the following error, to which I say "No kidding! That's what I want!", lol... =-)
SQL Error [56023]: [SQL0512] Statement references objects in multiple databases.
How can this be done..?

This is not a feature of DBeaver. DBeaver can only access the data that the DB gives it, and this is restricted to a single connection at a time (save for import/export operations). This feature is being considered for development, so keep an eye out for this answer to be outdated sometime in 2019.
You can export data from your OLD database and import it into ASP7 using DBeaver (although vendor tools for this are typically more efficient for this). Then you can do your union as suggested.
Many RDBMS offer a way to logically access foreign databases as if they were local, in which case DBeaver would then be able to access the data from the OLD database (as far as DBeaver is concerned in this situation, all the data is coming from a single connection). In Postgres, for example, one can use a foreign data wrapper to access foreign data.
I'm not familiar with DB2, but a quick Google search suggests that you can set up foreign connections within DB2 using nicknames or three-part-names.

If you check this github issue:
https://github.com/dbeaver/dbeaver/issues/3605
The way to solve this is to create a task and execute it in different connections:
https://github.com/dbeaver/dbeaver/issues/3605#issuecomment-590405154

Linux directories still exist for tablespace after conversion, PostgreSQL will not drop due to "not empty error"

I have been looking through old questions but have not found a question that is close enough to my issue. If anyone has seen this issue elsewhere, please advise.
My situation. A past DBA created user-defined tablespaces in PostgreSQL. We are planning to convert from PostgreSQL 9.0.23 to 9.4.6. However, we see no advantage to continuing to manage user-defined tablespaces and are in the process of converting them to the "pg_default" tablespaces.
This worked successfully for 217 tablespaces. However with 3 of the tablespaces, after the conversion, we were not able to drop the old user-defined tablespace due to a "tablespace not empty" error. Queries of the 3 tablespaces showed no objects in them. Queries of pg_default showed the objects formerly in the user-defined tablespaces now in pg_default. An examination of the rhel directories where the old user-defined tablespace data resided shows that the directory structure still exists in those locations. Those structures were deleted in the other 217 tablespaces.
An article that discusses this problem in some degree is: How can I tell what is in a Postgresql tablespace? with the following advice:
“Prowl around in there to see if something real is in your way. I'd be
really cautious about trying to delete anything in there apart from using
PostgreSQL's interface, though.”
(I was able to drop the tablespaces in a test environment by deleting these linux directories. However, in our production instance, we want to drop the tablespaces from inside the PostgreSQL interface. Has anyone seen this problem and had a solution?

No merge tables in MariaDB - options for very large tables?

I've been happily using MySQl for years, and have followed the MariahDB fork with interest.
The server for one of my projects is reaching end of life and needs to be rehosted - likely to CentOS 7, which includes MariahDB
One of my concerns is the lack of the merge table feature, which I use extensively. We have a very large (at least by my standards) data set with on the order of 100M records/20 GB (with most data compressed) and growing. I've split this into read only compressed myisam "archive" tables organized by data epoch, and a regular myisam table for current data and inserts. I then span all of these with a merge table.
The software working against this database is then written such that it figures out which table to retrieve data from for the timespan in question, and if the timespan spans multiple tables, it queries the overlying merge table.
This does a few things for me:
Queries are much faster against the smaller tables - unfortunately, the index needed for the most typical query, and preventing duplicate records is relatively complicated
Frees the user from having to query multiple tables and assemble the results when a query spans multiple tables
Allowing > 90% of the data to reside in the compressed tables saves alot of disk space
I can back up the archive tables once - this saves tremendous time, bandwidth and storage on the nightly backups
An suggestions for how to handle this without merge tables? Does any other table type offer the compressed, read-only option that myisam does?
I'm thinking we may have to go with separate tables, and live with the additional complication and changing all the code in the multiple projects which use this database.

MariaDB 10 introduced the CONNECT storage engine that does a lot of different things. One of the table types it provides is TBL, which is basically an expansion of the MERGE table type. The TBL CONNECT type is currently read only, but you should be able to just insert into the base tables as needed. This is probably your best option but I'm not very familiar with the CONNECT engine in general and you will need to do a bit of experimentation to decide if it will work.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse