Finding out the size of a continuous aggregate - postgresql

Have hypertable table with a couple million rows. I'm able to select the size of this just fine using the following:
SELECT pg_size_pretty( pg_total_relation_size('towns') );
I also have a continuous aggregate for that hypertable:
WITH (timescaledb.continuous, timescaledb.materialized_only=true) AS
SELECT time_bucket(INTERVAL '1 minute', timestamp) AS bucket,
/* random query */
FROM towns
GROUP BY bucket, town
WITH NO DATA;
I've refreshed the view and the data is showing as expected. However, I cannot seem to figure out how much space this new view is taking up.
SELECT pg_size_pretty( pg_total_relation_size('towns_income') ); returns 0 bytes which I know isn't correct. I thought that maybe the total_relation_size for towns would increase, but that also seems the same. Am I missing something? I've tried hypertable_size as well with no success as mv isn't technically a hypertable.

The following SQL can help :)
SELECT view_name, hypertable_size(format('%I.%I', materialization_hypertable_schema , materialization_hypertable_name )::regclass)
FROM timescaledb_information.continuous_aggregates;

Another approach would be the following SQL query
SELECT pg_size_pretty(SUM(total_bytes)), aggs.view_name
FROM "_timescaledb_internal".hypertable_chunk_local_size hcls, timescaledb_information.continuous_aggregates aggs
WHERE hcls.table_name = aggs.materialization_hypertable_name
GROUP BY aggs.view_name;

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

redshift nulling columns when joining another table

table_1 has 35 columns, table_2 has 20 columns
query is:
select table1.*,
table2.f1,
...
table2.f20
FROM public.table_1 as table1
left join public.table_2 as table2
on table1.id = table2.id
and table1.arrival_time::date <= table2.end_date::date
and table2.activity_date < table2.end_date
;
this works I expect 469 rows to be returned and that's what I get. However several fields from table_1 get displayed as null instead of the values in the table.
These fields are NOT part of the join.
Due to IP concerns I can't provide the full details of the tables, each field in table_1 and table_2 are varchar (don't ask me why a timestamp is stored as a varchar - its a long story that I have no control over)
This query WORKS in RDS PostgreSQL!
Any ideas why it has a problem in redshift?
Well I'll be very confused.
table_1 is data from two sources joined together - I didn't even think to look at the sources. Turns out the linked source had no data for one value.
Just goes to show that when looking at just a piece of the data you need to look HARD at all the data.
Now I'm off to find a better source for the missing data.
Thanks for your time!
James

Db2 sql for partition by range select

I am trying to get my head around db2 partition stuff.
Select a.*, max(a.bloo)
over (
partition by range (a.bloo) (starting '2014-4-20' ending '2015-1-1')
)
as maxmax from (
select * from someTable
) a
I get a sql code of negative 104 for this, and I cannot decipher the docs.
You are mixing up two different things: table partitioning, which is a physical characteristic of a table, and OLAP (window) functions, which provide logical grouping of records in a query.
I guess what you wanted was something like
Select
a.*,
max(a.bloo) over ( partition by a.bloo ) as maxmax
from someTable a
where
a.bloo between '2014-4-20' and '2015-1-1'
However, without knowing what you wanted to achieve in the first place it's impossible to give you a definitive answer. You may want to publish some sample data and the desired output.

Amazon redshift query planner

I'm facing a situation with Amazon Redshift that I haven't been able to explain to myself yet. Query planner seems not to be able to handle same table in subquery of two derived tables in a join.
I have essentially three tables, Source_A, Source_B, Target_1, Target_2 and a query like
SELECT a,b,c,d FROM
(
SELECT a,b FROM Source_A where date > (SELECT max(date) FROM Target_1)
)
INNER JOIN
(
SELECT c,d FROM Source_B where date > (SELECT max(date) FROM Target_2)
)
ON Source_A.a = Source_B.c
The query works fine as long as tables Target_1 and Target_2 are different tables. If I change the query so that Target_2 = Target_1, something happens. After the change, the query starts to take about 10 times longer time. And when I look at the performance monitor I can see that all this extra time is taken so that only the Leader Node is active.
When I take EXPLAIN of both options I see practically no difference in the output. All the steps are the same. But the is the difference that the EXPLAIN takes seconds in one and almost half an hour with the other one where the Target tables are the same.
So to summarise what I think I have observed is -- that on join, if I use same table in a subquery of each derived tables, the query planner goes nuts.

SQL limit query

I'm having an issue with limiting the SQL query. I'm using SQL 2000 so I can't use any of the functions like ROW_NUMBER(),CTE OR OFFSET_ROW FETCH.
I have tried the Select TOP limit * FROM approach and excluded the already shown results but this way the query is so slow because sometimes my result query fetches more than 10000 records.
Also I have tried the following approach:
SELECT * FROM (
SELECT DISTINCT TOP 100 PERCENT i.name, i.location, i.image ,
( SELECT count(DISTINCT i.id) FROM image WHERE i.id<= im.id ) AS recordnum
FROM images AS im
order by im.location asc, im.name asc) as tmp
WHERE recordnum between 5 AND 15
same problem here plus issue because I couldn't add ORDER option in sub query from record um. I have placed both solution in stored procedure but still the query execution is still so slow.
So my question is:
IS there an efficient way to limit the query to pull 20 records per page in SQL 2000 for large amounts of data i.e more than 10000?
Thanks.
Now the subquery is only run once
where im2.id is null will skip the first 40 rows
SELECT top 25 im1.*
FROM images im1
left join ( select top 40 id from images order by id ) im2
on im1.id = im2.id
where im2.id is null
order by im1.id
Query-wise, there is no great performing way. If performance is critical and the data will always be grouped/ordered the same, you could add a int column and set the value by trigger based on the grouping/ordering. Index it and it should be extremely fast for reads; writes will be a bit slower.
Also, make sure you have indexes on the Id columns on image and images.