PostgreSQL nested selects

PostgreSQL nested selects - postgresql

Is there anybody who can help me with making a query with the following functionality:
Let's have a simple statement like:
SELECT relname FROM pg_catalog.pg_class WHERE relkind = 'r';
This will produce a nice result with a single column - the names of all tables.
Now lets imagine that one of the tables has name "table1". If we execute:
SELECT count(*) FROM table1;
we will get the number of rows of the table "table1".
Now the real question - how these two queries can be unified and to have one query, which to give the result of two columns: name of the table and number of rows? Written in pseudo SQL it should be something like this:
SELECT relname, (SELECT count(*) FROM relname::[as table name]) FROM pg_catalog.pg_class WHERE relkind = 'r';
And here is and example - if there are 3 tables in the database and the names are table1, table2 and table 3, and they have respectively 20, 30 and 40 rows, the query result should be like this:
-------------
|relname| rows|
|-------------|
|table1 | 20|
|-------------|
|table2 | 30|
|-------------|
|table3 | 40|
-------------
Thanks to everyone who is willing to help ;-)
P.S. Yes I know that the table name is not schema-qualified ;-) Let's hope that all tables in the database have unique names ;-)
(Corrected typos from rename to relname in last query)
EDIT1: The question is not related to "how can I find the number of rows in a table". What I'm asking is: how to build a query with 2 selects and the second to have as FROM the value of a column from the result of the first select.
EDIT2: As #jdigital suggested I've tried the dynamic querying and it does the job, but can be used only in PL/pgSQL. So it doesn't fit my needs. In additional I tried with PREPARE and EXECUTE statement - yet again it is not working. Anyway - I'll stick with the two queries approach. But I'm damn sure that PostgreSQL is capable of this ....

With PL/pgSQL (postgres SQL Procedural Language), you can execute dynamic queries by building a string and then executing it as SQL. Note that this is postgres-specific, but other databases are likely to have something equivalent. Even more generally, if you are willing to go beyond SQL, you can do this with any programming language (or shell/cmd script).
By the way, you'll get better results searching for "postgres dynamic query" since "nested select" has a different meaning.

Related

When is it better to use CTE or temp table postgres

I am doing a query on a very large data set and i am using WITH (CTE) syntax.. this seems to take a while and i was reading online that temp tables could be faster to use in these cases can someone advise me in which direction to go. In the CTE we join to a lot of tables then we filter on the CTE result..
Only interesting in postgres answers

What version of PostgreSQL are you using? CTEs perform differently in PostgreSQL versions 11 and older than versions 12 and above.
In PostgreSQL 11 and older, CTEs are optimization fences (outer query restrictions are not passed on to CTEs) and the database evaluates the query inside the CTE and caches the results (i.e., materialized results) and outer WHERE clauses are applied later when the outer query is processed, which means either a full table scan or a full index seek is performed and results in horrible performance for large tables. To avoid this, apply as much filters in the WHERE clause inside the CTE:
WITH UserRecord AS (SELECT * FROM Users WHERE Id = 100)
SELECT * FROM UserRecord;
PostgreSQL 12 addresses this problem by introducing query optimizer hints to enable us to control if the CTE should be materialized or not: MATERIALIZED, NOT MATERIALIZED.
WITH AllUsers AS NOT MATERIALIZED (SELECT * FROM Users)
SELECT * FROM AllUsers WHERE Id = 100;
Note: Text and code examples are taken from my book Migrating your SQL Server Workloads to PostgreSQL
Summary:
PostgreSQL 11 and older: Use Subquery
PostgreSQL 12 and above: Use CTE with NOT MATERIALIZED clause

My follow up comment is more than I can fit in a comment... so understand this may not be an answer to the OP per se.
Take the following query, which uses a CTE:
with sales as (
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item
),
inventory as (
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item
)
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item
There are times where I cannot explain why that the query runs slower than I would expect. Some times, simply materializing the CTEs makes it run better, as expected. Other times it does not, but when I do this:
drop table if exists sales;
drop table if exists inventory;
create temporary table sales as
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item;
create temporary table inventory as
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item;
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item;
Suddenly all is right in the world.
Temp tables may persist across sessions, but to my knowledge the data in them will be session-based. I'm honestly not even sure if the structures persist, which is why to be safe I always drop:
drop table if exists sales;
And use "if exists" to avoid any errors about the object not existing.
I rarely use these in common queries for the simple reason that they are not as portable as a simple SQL statement (you can't give the final query to another user without having the temp tables). My most common use case is when I am processing within a procedure/function:
create procedure sales_and_inventory()
language plpgsql
as
$BODY$
BEGIN
create temp table sales...
insert into sales_inventory
select ...
drop table sales;
END;
$BODY$
Hopefully this helps.
Also, to answer your question on indexes... typically I don't, but nothing says that's always the right answer. If I put data into a temp table, I assume I'm going to use all or most of it. That said, if you plan to query it multiple times with conditions where an index makes sense, then by all means do it.

Postgresql tsvector structure

lo here.
i am trying to utilizing tsvector for counting frequencies of terms.
i think i am almost there but i cannot find a way to obtain terms from tsvector structure.
what I have done is, after creating tsvector column:
select term_tsv, count(*) count from (select unnest(term_tsv) term_tsv from document_tsv) t group by term_tsv order by count desc;
the result is like:
stem_tsv | count
------------------------+-------
(3,{9},{D}) | 1
i am lost for not knowing what kind of expression the parenthesis represents.
can anybody tell me how to extract the term from shell?
thank you.

i figured out that something like the following lists top 10 frequent entries,
which was written in the official manual.
SELECT * FROM ts_stat('SELECT vector FROM apod')
ORDER BY nentry DESC, ndoc DESC, word
LIMIT 10;
just for record.

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.

Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all

If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;

I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.

After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')

#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.

Get PostgreSQL statistics for all tables?

I have seen that PostgreSQL have several statistics depending on every table's OID (like the number of inserts, and so) 1.
What I am looking for is some kind of SELECT query or something that would sum up everything for every table.
It would be something like:
SELECT SUM(pg_stat_get_db_tuples_returned(SELECT oid FROM pg_class));
Or something like that. I would apreciate any help in here.

Do you mean this:
SELECT SUM(pg_stat_get_db_tuples_returned(oid))
from pg_class;

Something like this, with your choice of stats function, aggregates the stats function on all tables except temporary and system catalog tables:
SELECT
sum(pg_stat_get_db_tuples_returned(c.oid))
FROM pg_catalog.pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE NOT (n.nspname LIKE ANY(ARRAY['pg_temp%','pg_catalog','information_schema']));
Note that pg_toast schemas are included in this, as I presume you want your stats to include any TOAST side-tables. If you don't, add pg_toast% to the exclusions.
Edit: I was using the construct:
(quote_ident(n.nspname)||'.'||quote_ident(c.relname))::regclass
to get the table oid, but that's just silly when it's right there in pg_class; it's ridiculously roundabout as shown by a_horse_with_no_name.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse