Postgres Crosstab Dynamic Number of Columns - postgresql

In Postgres 9.4, I have a table like this:
id extra_col days value
-- --------- --- -----
1 rev 0 4
1 rev 30 5
2 cost 60 6
i want this pivoted result
id extra_col 0 30 60
-- --------- -- -- --
1 rev 4 5
2 cost 6
this is simple enough with a crosstab.
but i want the following specifications:
day column will be dynamic. sometimes increments of 1,2,3 (days), 0,30,60 days (accounting months), and sometimes in 360, 720 (accounting years).
range of days will be dynamic. (e.g., 0..500 days versus 1..10 days).
the first two columns are static (id and extra_col)
The return type for all the dynamic columns will remain the same type (in this example, integer)
Here are the solutions I've explored, none of which work for me for the following reasons:
Automatically creating pivot table column names in PostgreSQL -
requires two trips to the database.
Using crosstab_hash - is not dynamic
From all the solutions I've explored, it seems the only one that allows this to occur in one trip to the database requires that the same query be run three times. Is there a way to store the query as a CTE within the crosstab function?
SELECT *
FROM
CROSSTAB(
--QUERY--,
$$--RUN QUERY AGAIN TO GET NUMBER OF COLUMNS--$$
)
as ct (
--RUN QUERY AGAIN AND CREATE STRING OF COLUMNS WITH TYPE--
)

Every solution based on any buildin functionality needs to know a number of output columns. The PostgreSQL planner needs it. There is workaround based on cursors - it is only one way, how to get really dynamic result from Postgres.
The example is relative long and unreadable (the SQL really doesn't support crosstabulation), so I will not to rewrite code from blog here http://okbob.blogspot.cz/2008/08/using-cursors-for-generating-cross.html.

Related

How to find the difference between two table sizes in postgres using shell script

I have one table named 'Table_size_details' in my database which stores the size of the tables in my Postgres database, every Friday.
Date Table_Name Table_size Growth_Difference Growth_Percentage
---- ----------- ---------- ----------------- -----------------
20-08-2021 Demo 1.2 GB
13-08-2021 Demo 578 MB
I have got a task to add two more columns named 'Growth_Difference' and 'Growth_Percentage'. In 'Growth_Difference' column I need to find the difference between current table size(1.3 GB) and previous week table size(578 MB) and display it in MB format. Also I need to find the growth percentage of both- the current table size and previous week's table size.
I have asked to develop using SHELL SCRIPT.
Table_size_old=`psql -d abc -At -c "SELECT Table_size_details from abc order by Date desc limit 1;"`
Table_size_new=`psql -At -c "SELECT pg_size_pretty(pg_total_relation_size('Table_size_details'));`
growth_table=`expr $Table_size_new - $Table_size_old;`
Above logic I have used to find the difference between new and old table size but I'm getting expr: syntax error on growth_table variable line. I believe its because I trying to find the difference between 1.2 GB and 578 MB.
I'm new to shell scripting, could anyone help me to find a solution?
Appreciate your help in advance.
There is no need to even attempt this is a shell script, further since both growth columns are computed values there is no need to store them. This can be done in a single query, or perhaps even better a single query that populates a view. With that getting what you are looking for is a simple Select from that view.
First off however do not store your size as a string with number and unit size code. Store instead a single numeric value in a constant unit size, then convert all values to that constant unit size. For example select GB as the constant unit, then 587MB would be stored as .587, this way there is no unit conversion needed. With that done (or added) create a view as follows:
create or replace view table_size_growth as
select table_name
, run_date
, size_in_gb
, Round( (size_in_gb - gb_last_week)::numeric,6) growth_in_gb
, case when gb_last_week < 0.0000001 -- set to desired precision
then null::double precision
else round((100 * (size_in_gb - gb_last_week)/abs(gb_last_week))::numeric,6)
end growth_in_pct
from (select ts.*, lag(ts.size_in_gb) over( partition by ts.table_name
order by ts.run_date) gb_last_week
from table_size_details ts
) s
order by table_name, run_date;
Your script (or anywhere else) now needs the single query: select * from table_size_growth Note: this provides every week for every table you are capturing. Use where clause as needed. See example here.

TimescaleDB: Understanding the return values after creating hypertable and the creation of chunks after populating the hypertable

I have an existing table in my database named price (has 264 rows) and I converted it into a hypertable price_hypertable doing:
CREATE TABLE price_hypertable (LIKE price INCLUDING DEFAULTS INCLUDING CONSTRAINTS EXCLUDING INDEXES);
SELECT create_hypertable('price_hypertable', 'start');
and the output it gave me is as follows:
create_hypertable
-------------------------------
(4,public,price_hypertable,t)
(1 row)
The next thing I did was to populate the price_hypertable as follows:
insert into price_hypertable select * from price;
And I got the following output:
INSERT 0 264
Now, I wanted to check the chunks created, for which I did:
select public.show_chunks('price_hypertable');
and the output I got:
show_chunks
----------------------------------------
_timescaledb_internal._hyper_4_3_chunk
_timescaledb_internal._hyper_4_4_chunk
(2 rows)
When I do:
select * from _timescaledb_internal._hyper_4_3_chunk;
select * from _timescaledb_internal._hyper_4_4_chunk ;
I see that the 264 entries are split as follows:
_timescaledb_internal._hyper_4_3_chunk has 98 rows
_timescaledb_internal._hyper_4_4_chunk has 166 rows
I have a few questions about these steps and their outputs:
Can someone please explain to me what do the values 4 and t represent, when I did
SELECT create_hypertable('price_hypertable', 'start');?
After populating the price_hypertable, the data was automatically split into chunks, but of different size. Why does this happen? Why wasn't the data just split in half (132 rows in each chunk instead of 98 and 166)?
Any help is appreciated. Thanks
For the first question, it is easier to see what they represent by executing create_hypertable as
SELECT * FROM create_hypertable('price_hypertable', 'start');
This gives something like:
hypertable_id | schema_name | table_name | created
---------------+-------------+--------------------+---------
4 | public | price_hypertable | t
For the second question, TmTron already answered. This is because the rows are sorted into buckets based on the time, and they are not necessarily evenly spaced. There is no automation that pick the correct interval for each bucket.
You can find information about the return values in the API documentation on create_hypertable which also discuss the parameter chunk_time_interval that can be used to set the chunk size.
related to your 2nd question:
When you don't specify the chunk_time_interval explicitly, the default is 7 days: see create-hypertable, Best Practices.
So the number of rows in each chunks depends on the distribution of your data (according to your start date-time column).

Optimal use of LIKE on indexed column

I have a large table (+- 1 million rows, 7 columns including the primary key). The table contains two columns (ie: symbol_01 and symbol_02) that are indexed and used for querying. This table contains rows such as:
id symbol_01 symbol_02 value_01 value_02
1 aaa bbb 12 15
2 bbb aaa 12 15
3 ccc ddd 20 50
4 ddd ccc 20 50
As per the example rows 1 and 2 are identical except that symbol_01 and symbol_02 are swapped but they have the same values for value_01 and value_02. That is true once again with row 3 and 4. This is the case for the entire table, there are essentially two rows for each combination of symbol_01+symbol_02.
I need to figure out a better way of handling this to get rid of the duplication. So far the solution I am considering is to just have one column called symbol which would be a combination of the two symbols, so the table would be as follows:
id symbol value_01 value_02
1 ,aaa,bbb, 12 15
2 ,ccc,ddd, 20 50
This would cut the number of rows in half. As a side note, every value in the symbol column will be unique. Results always need to be queried for using both symbols, so I would do:
select value_01, value_02
from my_table
where symbol like '%,aaa,%' and symbol like '%,bbb,%'
This would work but my question is around performance. This is still going to be a big table (and will get bigger soon). So my question is, is this the best solution for this scenario given that symbol will be indexed, every symbol combination will be unique, and I will need to use LIKE to query results.
Is there a better way to do this? Im not sure how great LIKE is for performance but I don't see an alternative?
There's no high performance solution, because your problem is shoehorning multiple values into one column.
Create a child table (with a foreign key to your current/main table) to separately hold all the individual values you want to search on, index that column and your query will be simple and fast.
With this index:
create index symbol_index on t (
least(symbol_01, symbol_02),
greatest(symbol_01, symbol_02)
)
The query would be:
select *
from t
where
least(symbol_01, symbol_02) = least('aaa', 'bbb')
and
greatest(symbol_01, symbol_02) = greatest('aaa', 'bbb')
Or simply delete the duplicates:
delete from t
using (
select distinct on (
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) id
from t
order by
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) s
where id = s.id
Depending on the columns semantics it might be better to normalize the table as suggested by #Bohemian

How to duplicate partition content?

I'm trying to set-up a testing environment for performance testing, currently we have a table with 8 million records and we want to duplicate this records for 30 days.
In other words:
- Table 1
--Partition1(8 million records)
--Partition2(0 records)
.
.
--Partition30(0 records)
Now I want to take the 8 million records in Partition1 and duplicate them across the rest of partitions, the only difference that they have is a column that contains a DATE. This column should vary 1 day in each copy.
Partition1(DATE)
Partition2(DATE+1)
Partition3(DATE+2)
And so on.
The last restrictions are that there are 2 indexes in the original table and they must be preserved in the copies and Oracle DB is 10g.
How can I duplicate this content?
Thanks!
It seems to me to be as simple as running as efficient an insert as possible.
Probably if you cross-join the existing data to a list of integers, 1 .. 29, then you can generate the new dates you need.
with list_of_numbers as (
select rownum day_add
from dual
connect by level <= 29)
insert /*+ append */ into ...
select date_col + day_add, ...
from ...,
list_of_numbers;
You might want to set NOLOGGING on the table, since this is test data.

T-SQL - CROSS APPLY to a PIVOT? (using pivot with a table-valued function)?

I have a table-valued function, basically a split-type function, that returns up to 4 rows per string of data.
So I run:
select * from dbo.split('a','1,a15,b20,c40;2,a25,d30;3,e50')
I get:
Seq Data
1 15
2 25
However, my end data needs to look like
15 25
so I do a pivot.
select [1],[2],[3],[4]
from dbo.split('a','1,a15,b20,c40;2,a25,d30;3,e50')
pivot (max(data) for seq in ([1],[2],[3],[4]))
as pivottable
which works as expected:
1 2
--- ---
15 25
HOWEVER, that's great for one row. I now need to do it for several hundred records at once. My thought is to do a CROSS APPLY, but not sure how to combine a CROSS APPLY and a PIVOT.
(yes, obviously the easy answer is to write a modified version that returns 4 columns, but that's not a great option for other reasons)
Any help greatly appreciated.
And the reason I'm doing this: the current query uses as scalar-valued version of SPLIT, called 12 times within the same SELECT against the same million rows (where the data string is 500+ bytes).
So far as I know, that would require it scan the same 500bytes * 1000000rows, 12 times.
This is how you use cross apply. Assume table1 is your table and Line is the field in your table you want to split
SELECT * fROM table1 as a
cross apply dbo.split(a.Line) as b
pivot (max(data) for seq in ([1],[2],[3],[4])) as p