Postgres: Analyze after creating index or before

Postgres: Analyze after creating index or before - postgresql

I've got Postgres 11 version and production server.
A procedure is rather slow and query (a piece of code in procedure) is something like that:
create temp table tmp_pos_source as
with ... (
...
)
, cte_emp as (
...
)
, cte_all as (
...
)
select ...
from cte_all;
analyze tmp_pos_source;
The query is slow and I want to create an index to improve speed.
create index idx_pos_obj_id on tmp_pos_source(pos_obj_id);
Where should I put it? After command ANALYZE or before?

It doesn't matter. The only time when it helps to ANALYZE a table after creating an index is when the index is on an expression rather than on a plain column. The reason is that PostgreSQL automatically collects statistics for each column, but statistics on expressions are only collected if there is an index or extended statistics on the expression.

Related

can't btree index function that returns text

I've created a function to index a certain value from another table.
basically, i'm querying these activities, often filtered on the context of the activityplans table.
this is my function:
CREATE or replace FUNCTION get_activity_context(parentact uuid, parentplan uuid) RETURNS TEXT
LANGUAGE sql IMMUTABLE AS
$$
SELECT CASE
WHEN $2 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap where ap.key = $2)
WHEN $1 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap, activities act where act."parentPlan" = ap.key AND act.key=$1)
END
$$;
the function works, when I use it, for example, like select get_activity_context("parentActivity", "parentPlan") from activities limit 10;
but when i try to create an index:
create index on activities (get_activity_context("parentActivity", "parentPlan"));
i get this:
ERROR: could not read block 0 in file "base/16402/60840": read only 0 of 8192 bytes
CONTEXT: SQL function "get_activity_context" during startup
SQL state: XX001
googling this error only bring me to database data issues etc, but i don't think this is the case. My guess is something is wrong with the function, but i can't seem to figure out what.

I don't know which relation 60840 is in your database, but it sure has a problem. Find out with
SELECT relname,relnamespace::regnamespace
FROM pg_class
WHERE relfilenode = 60840;
Anyway, that index will never work, because the function is not IMMUTABLE, no matter how you declared it. It may return a different result tomorrow. This would lead to data corruption.
An index on one table can never refer to data from another table.

My first thought:
Usually index are created like this:
CREATE INDEX id_column_idx
ON public.naleznosc USING btree
(id_column)
TABLESPACE pg_default;
but you, are trying to create them this way:
CREATE INDEX .......... on activities ...........(get_activity_context("parentActivity", "parentPlan")) ...........;
The dots showing places where you not put some things :)

How to add a date column which is 7 days later than an existing column in a Postgres table? [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx

Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.

YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.

One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.

PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo

Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.

Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc

I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;

A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

How to load bulk data to table from table using query as quick as possible? (postgresql)

I have a large table(postgre_a) which has 0.1 billion records with 100 columns. I want to duplicate this data into the same table.
I tried to do this using sql
INSERT INTO postgre_a select i1 + 100000000, i2, ... FROM postgre_a;
However, this query is running more than 10 hours now... so I want to do this more faster. I tried to do this with copy, but I cannot find the way to use copy from statement with query.
Is there any other method can do this faster?

You cannot directly use a query in COPY FROM, but maybe you can use COPY FROM PROGRAM with a query to do what you want:
COPY postgre_a
FROM PROGRAM '/usr/pgsql-10/bin/psql -d test'
' -c ''copy (SELECT i1+ 100000000, i2, ... FROM postgre_a) TO STDOUT''';
(Of course you have to replace the path to psql and the database name with your values.)
I am not sure if that is faster than using INSERT, but it is worth a try.
You should definitely drop all indexes and constraints before the operation and recreate them afterwards.

Does Postgres support virtual columns? [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx

Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.

YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.

One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.

PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo

Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.

Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc

I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;

A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

Sybase stored procedure - how do I create an index on a #table?

I have a stored procedure which creates and works with a temporary #table
Some of the queries would be tremendously optimized if that temporary #table would have an index created on it.
However, creating an index within the stored procedure fails:
create procedure test1 as
SELECT f1, f2, f3
INTO #table1
FROM main_table
WHERE 1 = 2
-- insert rows into #table1
create index my_idx on #table1 (f1)
SELECT f1, f2, f3 FROM #table1 (index my_idx) WHERE f1 = 11 -- "QUERY X"
When I call the above, the query plan for "QUERY X" shows a table scan.
If I simply run the code above outside the stored procedure, the messages show the following warning:
Index 'my_idx' specified as optimizer hint in the FROM clause of table '#table1' does not exist. Optimizer will choose another index instead.
This can be resolved when running ad-hoc (outside the stored procedure) by splitting the code above in two batches by addding "go" after index creation:
create index my_idx on #table1 (f1)
go
Now, "QUERY X" query plan shows the use of index "my_idx".
QUESTION: How do I mimique running the "create index" in a separate batch when it's inside the stored procedure? I can't insert a "go" there like I do with the ad-hoc copy above. Please note that I'm aware of the solution of "split up the 'QUERY X' into a separate stored procedure" and am looking for a solution that will avoid that.
P.S. If it matters, this is on Sybase 12 (ASE 12.5.4)
UPDATE:
I have been seeing several references to "schema bumping" during my Googling before posing the question. But that doesn't seem to happen in my case.
You can create a table, populate it, create an index on it and select values
from it in the same porc and have the optimizer fully cost it based on
accurate information. This is called 'schema bumping' and has been in place
since 11.5.1.

The Sybase documentation says that you create and use a temporary index in the same stored procedure:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20023_1251/html/optimizer/X26029.htm
I think to get around this you will need to split your stored procedure into at least two parts, one to create and populate the table then build the index, and then a second one to run the select query.

I am not sure how you are getting this problem, might be in older version of Sybase, however with version 12.5.4 I tried executing the same thing as suggested by you but in my case the optimizer correctly suggested the use of index created in the stored procedure. Usually in a stored procedure we do not need to break sql into batches because else we would have been required to have a seperate batch for create table command as well.
In case we try to create index within a same batch (not in a stored procedure) we will do get the same error as specified by you above because we are trying to create an index on a table and then trying to use it within the same batch. Usually the Sybase server will compile the whole batch in one go and hence the problem. But as far as stored procedure is concerned in Sybase 12.5.4 there will be no problem.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres: Analyze after creating index or before - postgresql

Related

can't btree index function that returns text

How to add a date column which is 7 days later than an existing column in a Postgres table? [duplicate]

How to load bulk data to table from table using query as quick as possible? (postgresql)

Does Postgres support virtual columns? [duplicate]

Sybase stored procedure - how do I create an index on a #table?

Categories

Resources