New table with all columns as differences between two base tables columns - postgresql

I am using Postgres 9.1 and have two tables (tab1 and tab2). I wish to create a third table (tab3) where each column is the difference between respective columns in these tables, i.e. tab3.col1 = (tab1.col1 - tab2.col1). However my tables tab1 and tab2 have a large number columns. Is there an efficient way to create table tab3?
If I were to hard code my desired output I would plan to use the code below. However I wish to avoid this as I have over 60 columns to create and want to avoid any hard-coding errors. The columns may not be in the same order across the two tables, however the naming is consistent across tables.
CREATE TABLE tab3 AS
SELECT a.col1_01 - b.col2_01 AS col3_01,
a.col1_02 - b.col2_02 AS col3_02,
...
...
FROM tab1 a FULL JOIN tab2 b USING (permno, datadate);

You can build the whole statement from information in the system catalog (or information schema) and execute it dynamically with a DO command. That's what I would do.
DO
$do$
BEGIN
EXECUTE (
SELECT 'CREATE TABLE tab3 AS
SELECT '
|| string_agg(format('a.%1$I - b.%1$I AS %1$I', attname)
, E'\n , ' ORDER BY attnum)
|| '
FROM tab1 a
FULL JOIN tab2 b USING (permno, datadate)'
FROM pg_attribute
WHERE attrelid = 'tab1'::regclass
AND attnum > 0 -- exclude system columns (neg. attnum)
AND NOT attisdropped -- no dropped (dead) columns
);
END
$do$;
Assuming that tab1 and tab2 are visible in your search_path.
Produces and executes what you requested exactly. Just replace dummy table and column names with your real names.
Read about format() and string_agg() in the manual.
More information in these related answers:
Table name as a PostgreSQL function parameter
Prepend table name to each column in a result set in SQL? (Postgres specifically)

Related

Multiple table UPDATE using SELECT that requires JOIN using PGAdmin

I am using PGAdmin III and postgres 9.6
I have postgres schemas that consists of 250 tables with various columns. Two tables are of similar names and have the same columns, but unique data and IDs. I want to UPDATE one column in both tables with a specific number but choosing which rows get updated is complicated.
I have a script which correctly works on one table at a time, but I wonder if it can be done in a single script. I have tried using DO and LOOP but keep running into roadblocks.
Here's the working single-table script:
UPDATE table1
SET number = 44
WHERE table1.id IN
(SELECT table1.id FROM plan
JOIN name ON name.id = plan.nameid
JOIN type ON type.id = name.typeid
JOIN simul ON simul.id = plan.simulid
LEFT JOIN table1 ON table1.id = tableid
WHERE type.tag = 'X' AND plan.class LIKE '%Table%' AND simul.label IN
('SIMUL90','SIMUL99','SIMUL87'));
"plan.class" contains text which determines class of simul and thus either Table1 or Table2. Table1 and Table2 are of identical column structure but their IDs may be identical so UNION is out of consideration. TH other joins are to fine tune the SELECT so I get only a small subset of the table. The number to update is the same for either table.

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.
Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

How to describe columns (get their names, data types, etc.) of a SQL query in PostgreSQL

I need a way to get a "description" of the columns from a SELECT query (cursor), such as their names, data types, precision, scale, etc., in PostgreSQL (or better yet PL/pgSQL).
I'm transitioning from Oracle PL/SQL, where I can get such description using a built-in procedure dbms_sql.describe_columns. It returns an array of records, one for each column of a given (parsed) cursor.
EDB has it implemented too (https://www.enterprisedb.com/docs/en/9.0/oracompat/Postgres_Plus_Advanced_Server_Oracle_Compatibility_Guide-127.htm#P13324_681237)
An examples of such query:
select col1 from tab where col2 = :a
I need an API (or a workaround) that could be called like this (hopefully):
select query_column_description('select col1 from tab where col2 = :a');
that will return something similar to:
{{"col1","numeric"}}
Why? We build views where these queries become individual columns. For example, view's query would look like the following:
select (select col1 from tab where col2 = t.colA) as col1::numeric
from tab_main t
http://sqlfiddle.com/#!17/21c7a/2
You can use systems table :
First step create a temporary view with your query (without clause where)
create or replace view temporary view a_view as
select col1 from tab
then select
select
row_to_json(t.*)
from (
select
column_name,
data_type
from
information_schema.columns
where
table_schema = 'public' and
table_name = 'a_view'
) as t

Update or insert with outer join in postgres

Is it possible to add a new column to an existing table from another table using insert or update in conjunction with full outer join .
In my main table i am missing some records in one column in the other table i have all those records i want to take the full record set into the maintable table. Something like this;
UPDATE maintable
SET all_records= othertable.records
FROM
FULL JOIN othertable on maintable.col = othertable.records;
Where maintable.col has same id a othertable.records.
I know i could simply join the tables but i have a lot of comments in the maintable i don't want to have to copy paste back in if possible. As i understand using where is equivalent of a left join so won't show me what i'm missing
EDIT:
What i want is effectively a new maintable.col with all the records i can then pare down based on presence of records in other cols from other tables
Try this:
UPDATE maintable
SET all_records = o.records
FROM othertable o
WHERE maintable.col = o.records;
This is the general syntax to use in postgres when updating via a join.
HTH
EDIT
BTW you will need to change this - I used your example, but you are updating the maintable with the column used for the join! Your set needs to be something like SET missingcol = o.extracol
AMENDED GENERALISED ANSWER (following off-line chat)
To take a simplified example, suppose that you have two tables maintable and subtable, each with the same columns, but where the subtable has extra records. For both tables id is the primary key. To fill maintable with the missing records, for pre 9.5 versions of Postgres you must use the following syntax:
INSERT INTO maintable (SELECT * FROM subtable s WHERE NOT EXISTS
(SELECT 1 FROM maintable m WHERE m.id = s.id));
Since 9.5 there is a (preferred) alternative:
INSERT INTO maintable (SELECT * FROM subtable) ON CONFLICT DO NOTHING;
This is preferred because (apart from being simpler) it avoids the situation that has been known to arise in the former, where a race condition is created between the INSERT and the sub-SELECT.
Obviously when the columns are different, you need to specify in the INSERT statement which columns are inserted from which. Something like:
INSERT INTO maintable (id, ColA, ColB)
(SELECT id, ColE, ColG FROM subtable ....)
Similarly the common field might not be id in both tables. However, the simplified example should be enough to point you in the right direction.

SELECT * except nth column

Is it possible to SELECT * but without n-th column, for example 2nd?
I have some view that have 4 and 5 columns (each has different column names, except for the 2nd column), but I do not want to show the second column.
SELECT * -- how to prevent 2nd column to be selected?
FROM view4
WHERE col2 = 'foo';
SELECT * -- how to prevent 2nd column to be selected?
FROM view5
WHERE col2 = 'foo';
without having to list all the columns (since they all have different column name).
The real answer is that you just can not practically (See LINK). This has been a requested feature for decades and the developers refuse to implement it. The best practice is to mention the column names instead of *. Using * in itself a source of performance penalties though.
However, in case you really need to use it, you might need to select the columns directly from the schema -> check LINK. Or as the below example using two PostgreSQL built-in functions: ARRAY and ARRAY_TO_STRING. The first one transforms a query result into an array, and the second one concatenates array components into a string. List components separator can be specified with the second parameter of the ARRAY_TO_STRING function;
SELECT 'SELECT ' ||
ARRAY_TO_STRING(ARRAY(SELECT COLUMN_NAME::VARCHAR(50)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='view4' AND
COLUMN_NAME NOT IN ('col2')
ORDER BY ORDINAL_POSITION
), ', ') || ' FROM view4';
where strings are concatenated with the standard operator ||. The COLUMN_NAME data type is information_schema.sql_identifier. This data type requires explicit conversion to CHAR/VARCHAR data type.
But that is not recommended as well, What if you add more columns in the long run but they are not necessarily required for that query?
You would start pulling more column than you need.
What if the select is part of an insert as in
Insert into tableA (col1, col2, col3.. coln) Select everything but 2 columns FROM tableB
The column match will be wrong and your insert will fail.
It's possible but I still recommend writing every needed column for every select written even if nearly every column is required.
Conclusion:
Since you are already using a VIEW, the simplest and most reliable way is to alter you view and mention the column names, excluding your 2nd column..
-- my table with 2 rows and 4 columns
DROP TABLE IF EXISTS t_target_table;
CREATE TEMP TABLE t_target_table as
SELECT 1 as id, 1 as v1 ,2 as v2,3 as v3,4 as v4
UNION ALL
SELECT 2 as id, 5 as v1 ,-6 as v2,7 as v3,8 as v4
;
-- my computation and stuff that i have to messure, any logic could be done here !
DROP TABLE IF EXISTS t_processing;
CREATE TEMP TABLE t_processing as
SELECT *, md5(t_target_table::text) as row_hash, case when v2 < 0 THEN true else false end as has_negative_value_in_v2
FROM t_target_table
;
-- now we want to insert that stuff into the t_target_table
-- this is standard
-- INSERT INTO t_target_table (id, v1, v2, v3, v4) SELECT id, v1, v2, v3, v4 FROM t_processing;
-- this is andvanced ;-)
INSERT INTO t_target_table
-- the following row select only the columns that are pressent in the target table, and ignore the others.
SELECT r.* FROM (SELECT to_jsonb(t_processing) as d FROM t_processing) t JOIN LATERAL jsonb_populate_record(NULL::t_target_table, d) as r ON TRUE
;
-- WARNING : you need a object that represent the target structure, an exclusion of a single column is not possible
For columns col1, col2, col3 and col4 you will need to request
SELECT col1, col3, col4 FROM...
to omit the second column. Requesting
SELECT *
will get you all the columns