Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);
Related
On any update to the row (which would be somehow dumb and I would expect a performance warning on the documentation page then) or is it smart enough of analyzing the generation expression and only regenerate the computed column when the input column(s) have changed?
From the documentation it's rather clear
A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column. A virtual generated column occupies no storage and is computed when it is read. Thus, a virtual generated column is similar to a view and a stored generated column is similar to a materialized view (except that it is always updated automatically).
So it seams that the generated always column is generated always.
Below a small test case to verify
We define a immutable function used in the formula with pg_sleepinside to see if the function was called
create or replace function wait10s(x numeric)
returns int
as $$
SELECT pg_sleep(10);
select x as result;
$$ language sql IMMUTABLE;
Table DDL
create table t
(col1 numeric,
col2 numeric,
gen_col numeric generated always as ( wait10s(col2) ) STORED
);
Insert
as expected we wait 10 seconds
insert into t (col1, col2) values (1,1);
Update of column used in formula
update t set col2 = 2
Again expected wait
Update of column NOT used in formula
update t set col1 = 2
No wait so it seems that there is an optimizing step calling the formula only in case of necessity.
This makes perfect sense, but of course you should take it with care as this behavior is not documented and may change...
Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);
I have three different values in my database that represent a null: an actual null, an empty string, and a string {x:Null}. This value appears across multiple columns.
{x:Null} is normalized on the web front-end, so all these values look exactly the same although they end up ordered differently in a sort. How can I write a query that will take these values and make them actual nulls across every column and every table?
Bonus points if you can tell me how to make sure these other empty values are always inserted as nulls going forward. (Disclaimer: I have no power to grant any actual bonus points. ;)
You can query the information_schema to get a list of all tables and columns with a string type.
SELECT table_name, column_name
FROM information_schema.columns
WHERE data_type IN ('text', 'character', 'character varying')
NOTE double check first what values data_type has, I'm not sure if it will be character or char or what.
Then I would write a small program to update each column in each table. Here it is sketched out in Perl.
while( my($table, $column) = $sth->fetch ) {
my $q_table = $dbh->quote($table);
my $q_column = $dbh->quote($column);
$dbh->do(q[
UPDATE `$q_table`
SET `$q_column` = NULL
WHERE `$q_column` = '{x:Null}'
OR `$q_column` = ''
]);
}
Be sure to SQL escape $table and $column as in my sample.
Going forward, you'll have to set CONSTRAINTS on each and every column. You can use the information_schema.columns to do this as well. Something like
ALTER TABLE `$q_table` ADD CHECK(`$q_column` NOT IN ('{x:Null}', ''))
You could use a trigger to change the values to NULL, but I don't like data stores that silently change basic data for application purposes.
For new columns and tables, you'll have to remember to add that constraint. Same caveats about data_type apply.
However, it's probably a bad idea to say that no column can ever be an empty string. You might want to be bit more selective.
Another thing to note: NULL is a funny thing, its not true and its not false. You might be better off deciding that an empty string is the thing to set empty values to.
I don't think this approach is maintainable. It's scribbling an application rule all over the data layer. What if you have some data that doesn't follow that rule? And it will have to be continuously maintained for any new data schema added. Perhaps instead you should put this at your ORM layer. Or write a few stored procedures to take care of this.
Using the information_schema.columns table, write a procedural language routine which iterates through all applicable tables and columns, executing an update... set *column* = NULL...where column in ('','{x:Null}'). for each eligible column.
As for inserting these values as NULL going forward, you would have to set triggers on your tables to intercept these values and replace them with NULL.
I don't think there is any query that would do this thing for every table and every column. In principle, what you want to do is
UPDATE table SET column=NULL WHERE column='' OR column='{x:Null}';
You could try selecting data from the pg_attribute and pg_class columns to get the names of the tables and names of the columns and then generating automatically the queries. Be sure to select only those columns that contain textual data.
What if somebody has entered a genuine string '{x:Null}'? You would then change it into NULL.
However, you have done a real mistake by letting the situation to be as bad as it's currently. You should always normalize data before putting it into a database.
Using Postgres 9.3:
I am attempting to automatically populate a table when an insert is performed on another table. This seems like a good use for rules, but after adding the rule to the first table, I am no longer able to perform inserts into the second table using the writable CTE. Here is an example:
CREATE TABLE foo (
id INT PRIMARY KEY
);
CREATE TABLE bar (
id INT PRIMARY KEY REFERENCES foo
);
CREATE RULE insertFoo AS ON INSERT TO foo DO INSERT INTO bar VALUES (NEW.id);
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
INSERT INTO foo SELECT * FROM a
When this is run, I get the error
"ERROR: WITH cannot be used in a query that is rewritten by rules
into multiple queries".
I have searched for that error string, but am only able to find links to the source code. I know that I can perform the above using row-level triggers instead, but it seems like I should be able to do this at the statement level. Why can I not use the writable CTE, when queries like this can (in this case) be easily re-written as:
INSERT INTO foo SELECT * FROM (VALUES (1), (2)) a
Does anyone know of another way that would accomplish what I am attempting to do other than 1) using rules, which prevents the use of "with" queries, or 2) using row-level triggers? Thanks,
TL;DR: use triggers, not rules.
Generally speaking, prefer triggers over rules, unless rules are absolutely necessary. (Which, in practice, they never are.)
Using rules introduces heaps of problems which will needlessly complicate your life down the road. You've run into one here. Another (major) one is, for instance, that the number of affected rows will correspond to that of the very last query -- if you're relying on FOUND somewhere and your query is incorrectly reporting that no rows were affected by a query, you'll be in for painful bugs.
Moreover, there's occasional talk of deprecating Postgres rules outright:
http://postgresql.nabble.com/Deprecating-RULES-td5727689.html
As the other answer I definitely recommend using INSTEAD OF triggers before RULEs.
However if for some reason you don't want to change existing VIEW RULEs and still want use WITH you can do so by wrapping the VIEW in a stored procedure:
create function insert_foo(int) returns void as $$
insert into foo values ($1)
$$ language sql;
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
SELECT insert_foo(a.column1) from a;
This could be useful when using some legacy db through some system that wraps statements with CTEs.
We have an Oracle application that uses a standard pattern to populate surrogate keys. We have a series of extrinsic rows (that have specific values for the surrogate keys) and other rows that have intrinsic values.
We use the following Oracle trigger snippet to determine what to do with the Surrogate key on insert:
IF :NEW.SurrogateKey IS NULL THEN
SELECT SurrogateKey_SEQ.NEXTVAL INTO :NEW.SurrogateKey FROM DUAL;
END IF;
If the supplied surrogate key is null then get a value from the nominated sequence, else pass the supplied surrogate key through to the row.
I can't seem to find an easy way to do this is T-SQL. There are all sorts of approaches, but none of which use the notion of a sequence generator like Oracle and other SQL-92 compliant DBs do.
Anybody know of a really efficient way to do this in SQL Server T-SQL? By the way, we're using SQL Server 2008 if that's any help.
You may want to look at IDENTITY. This gives you a column for which the value will be determined when you insert the row.
This may mean that you have to insert the row, and determine the value afterwards, using SCOPE_IDENTITY().
There is also an article on simulating Oracle Sequences in SQL Server here: http://www.sqlmag.com/Articles/ArticleID/46900/46900.html?Ad=1
Identity is one approach, although it will generate unique identifiers at a per table level.
Another approach is to use unique identifiers, in particualr using NewSequantialID() that ensues the generated id is always bigger than the last. The problem with this approach is you are no longer dealing with integers.
The closest way to emulate the oracle method is to have a separate table with a counter field, and then write a user defined function that queries this field, increments it, and returns the value.
Here is a way to do it using a table to store your last sequence number. The stored proc is very simple, most of the stuff in there is because I'm lazy and don't like surprises should I forget something so...here it is:
----- Create the sequence value table.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[SequenceTbl]
(
[CurrentValue] [bigint]
) ON [PRIMARY]
GO
-----------------Create the stored procedure
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE procedure [dbo].[sp_NextInSequence](#SkipCount BigInt = 1)
AS
BEGIN
BEGIN TRANSACTION
DECLARE #NextInSequence BigInt;
IF NOT EXISTS
(
SELECT
CurrentValue
FROM
SequenceTbl
)
INSERT INTO SequenceTbl (CurrentValue) VALUES (0);
SELECT TOP 1
#NextInSequence = ISNULL(CurrentValue, 0) + 1
FROM
SequenceTbl WITH (HoldLock);
UPDATE SequenceTbl WITH (UPDLOCK)
SET CurrentValue = #NextInSequence + (#SkipCount - 1);
COMMIT TRANSACTION
RETURN #NextInSequence
END;
GO
--------Use the stored procedure in Sql Manager to retrive a test value.
declare #NextInSequence BigInt
exec #NextInSequence = sp_NextInSequence;
--exec #NextInSequence = sp_NextInSequence <skipcount>;
select NextInSequence = #NextInSequence;
-----Show the current table value.
select * from SequenceTbl;
The astute will notice that there is a parameter (optional) for the stored proc. This is to allow the caller to reserve a block of ID's in the instance that the caller has more than one record that needs a unique id - using the SkipCount, the caller need make only a single call for however many IDs are needed.
The entire "IF EXISTS...INSERT INTO..." block can be removed if you remember to insert a record when the table is created. If you also remember to insert that record with a value (your seed value - a number which will never be used as an ID), you can also remove the ISNULL(...) portion of the select and just use CurrentValue + 1.
Now, before anyone makes a comment, please note that I am a software engineer, not a dba! So, any constructive criticism concerning the use of "Top 1", "With (HoldLock)" and "With (UPDLock)" is welcome. I don't know how well this will scale but this works OK for me so far...