How to discover what a Postgresql table's usage is? - postgresql

Our team is working on a Postgresql database with lots of tables and views, without any referential constraints. The project is undocumented and there appears to be a great number of unused/temporary/duplicate tables/views dirtying the schema.
We need to discover what database objects have real value and are actually used and accessed.
My inital thoughts were to query the Catalog/'data-dictionary'.
Is it possible to query the Postgresql Catalog to find an object's last query time.
Any thoughts, alternative approaches and or tools ideas?

Check the Statistics Collector

Not sure about the last query time but you can adjust your postgresql.conf to log all SQL:
log_min_duration_statement = 0
That will at least give you an idea of current activity.

Reset the statistics, pg_stat_reset(), and then check the pg catalog tables like pg_stat_user_tables and such to see where activity looks like its showing up.

Related

DB2 updated rows since last check

I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.

DB2 System Tables Logs?

I'm using DB2 LUW. I'm working on looking at how often data is being changed or added into our database, and I was curious if there was a system table that I might be able to find this information?
It depends a bit on what you mean. You will for example find the #commit and #rollback in sysibmadm.snapdb, #rows_written per table can be found in sysibmadm.snaptab. If you take snapshots from those on a regular basis you get an idea on how often data is updated. Was that what you had in mind, or is it something else you are looking for?

SQL Developer returns query results on one computer but not on another

I can run a query on views in SQL Developer 3.1.07 and it returns the results I expect. My co-worker, who is in Mexico using the same user, can connect to the same database, sees the same views, runs the same query and gets no results, even from a simple "select * from VIEWNAME" query. The column headers display, but no data. If he selects a view from the connections window and selects the DATA tab no data displays. This user does not have access to any tables on this specific database.
I'm not certain he is running the same version of Developer, but it's not far off. I have checked as many settings in SQL Developer that I think could be the issue, but see no significant difference in his settings from mine.
Connecting to another database allows him to access data in both tables and views
Any thoughts on what we're missing?
I know I'm a few years late, but check if the underlying view doesn't filter on something that is based on localisation! I just had the issue and it turned out to be a statment like this that was causing issues:
SELECT *
FROM sometable
WHERE language = userenv('LANG')
Copy the JDBC folder from your oracle home and copy it over to your c-workers machine. we had the same issue and replacing the JDBC folder worked.
Faced the same which got resolved when I checked the 'skip NLS settings' box. My query was returning zero results earlier but when I ran the same query again, I could see the table rows.
Since your co-worker is in a different country, most probably the NLS settings (related to the language) are the culprit here.
I was facing the same issue, turned out that the update to the database from my sqldevelolper was not commited to the main database, that's why, I was getting results on my sqldeveloper for that query, but from aws it was returning empty results. When I chatted with DBA, he could find stale data. After I committed the data from my sqldeveloper, the db was actually updated.

ALTER query very slow on tiny table in PostgreSQL

I've got PostgreSQL 9.2 and a tiny database with just a bit of seed data for a website that I'm working on.
The following query seems to run forever:
ALTER TABLE diagnose_bodypart ADD description text NOT NULL;
diagnose_bodypart is a table with less than 10 rows. I've let the query run for over a minute with no results. What could be the problem? Any recommendations for debugging this?
Adding a column does not require rewriting a table (unless you specify a DEFAULT). It is a quick operation absent any locks. pg_locks is the place to check, as Craig pointed out.
In general the most likely cause are long-running transactions. I would be looking at what work-flows are hitting these tables and how long the transactions are staying open for. Locks of this sort are typically transactional and so committing transactions will usually fix the problem.

How to prevent Write Ahead Logging on just one table in PostgreSQL?

I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.