Determining relations hit by a query - postgresql

I have a PostgreSQL query constructed by a templating mechanism. What I want to do is to determine the relations actually hit by the query when it is run and record them in a relation. So this a very rudimentary lineage problem. Simply looking at the relation names appearing in the query (or parsing the query) would not easily solve the problem as the queries are somewhat complex and the templating mechanism inserts expressions like WHERE FALSE.
I can of course do it by using EXPLAIN on the query and insert the relation names I find manually. However this has two drawbacks:
EXPLAIN actually runs the query. Unfortunately running the query takes a lot of time so it is not ideal to run the query twice, once for the result and once for the EXPLAIN.
It is manual.
After reading a few documents I found out that on can log the result of an EXPLAIN automatically to a CSV file and read it back to a relation. But, as far as I understand, this means logging everything to the CSV which is not an option for me. Also, automatic logging seems to be triggered only when the execution takes longer then a predetermined threshold and I want to do this for a few specific queries, not for all time consuming ones.
PS: This does not need to be implemented fully at database layer. For instance, once I have the result of EXPLAIN in a relation, I can parse it and extract the relations it hits at the application layer.

EXPLAIN does not execute the query.
You can run EXPLAIN (FORMAT JSON) SELECT ..., which will return the execution plan as a JSON. Simply extract all Relation Name attributes, and you have a list of the tables scanned.

Related

PostgreSQL: See query results as scan progresses, rather than wait till end and show all

For a simple select query like select column_name from table_name on a very large table, is it possible to have the output being provided as the scan of the table progresses?
If I abort the command after sometime, I expect to get output from the select at least thus far.
Think cat, which I believe won't wait till it completes the full read of the file.
Does MySQL or other RDBMS systems support this?
PostgreSQL always streams the result to the client, and usually it is the client library that collects the whole result set before returning it to the user.
The C API libpq has functionality that supports this. The main disadvantage with this approach is that you could get a run time error after you already have received a some rows, so that's a case you'd have to handle.
The traditional way to receive a query result in parts is to use a cursor and fetch results from it. This is a technique supported by all client APIs.
Cursors are probably what you are looking for, and they are supported by all RDBMS I know in some fashion.

Getting most used queries in mongodb

I'd like to analyze our db and create better indices for it.
Because our app is very complex, and we don't know what are the most used parts of our app, I'd like to somehow see what are the most used read queries that we hit our db with.
That would make it very easy for me to analyze and create the right indices for them.
Any ideas on how to do that?
you can enable database profiling for this.
get the details here - https://docs.mongodb.com/v3.2/tutorial/manage-the-database-profiler/
alternatively a simpler way would be to use the mongostat (details here -https://docs.mongodb.com/v3.2/administration/monitoring/) which captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.).

Describe Impala Query Metadata

Is there a way of getting the metadata of a query?
I can use DESCRIBE but this only applies to tables, I don't really want to have to create a table from the query and get the metadata of that table as that would be unnecessarily expensive even if I limited the result rows.
I'm using impala shell to output queries to delimited files (usually only a couple of hundred rows) which are sometimes needed to be imported into an Access database.
I'd like to know the data types as then I can make Access use the correct data types rather than defaulting to string.
The answer, thanks to #SamsonScharfrichter is
CREATE VIEW xxxx AS, then DESCRIBE xxxx, then DROP VIEW xxxx.

Generate UPDATE queries from results of SELECT queries

I'm looking for an easy way to create UPDATE queries based on the results of certain SELECT queries. The purpose of this is to create a private configuration file that I'm planning to run after I revert my database from a "public" backup.
For example, assuming that I have a table named setting with the following table structure:
| id_setting | name | value | module |
and a query such as:
select * from setting where module = 'voip'
Based on the results of these queries, I would like to generate INSERT/UPDATE statements that are ultimately stored into my configuration script.
Any idea how to achieve this is a generic way?
PS. I know I can concatenate parts of SQL together but I feel that this approach is to time consuming.
The closest thing in pgAdmin is the query tool (see http://www.pgadmin.org/docs/1.16/query.html). This would not take your select statements and turn them into queries, but you can graphically build queries if you don't want to parse and concatenate.
If this is going to be a big, repetitive task, I would look at writing a Perl script to parse a query and rewrite it as needed. This would require some inside knowledge. It isn't clear what you want to do regarding updating the values so you'd have to design your solution around that. More likely you would want to write a functional API (a UDF) to do what you want, and then write calls to that, probably not in a config file directly (since it is not clear you can trust that) but through an interface.

Is it possible to query data from tables by object_id?

I was wondering whether it is possible to query tables by specifying their object_id instead of table names in SELECT statements.
The reason for this is that some tables are created dynamically, and their structure (and names) are not known before, and yet I would like to be able to write sprocs that are capable of querying these tables and working on their content.
I know I can create dynamic statements and execute it, but maybe there are some better ways, and I would be grateful if someone could share how to approach it.
Thanks.
You have to query sys.columns and build a dynamic query based on that.
There are no better ways: SQL isn't designed for adhoc or unknown sturctures.
I've never worked on an application in 20 years where I don't know what my data looks like. Either your data is persisted or it should be in XML or JSON or such if it's transient-