Postgresql moving data from table to 3 other tables - postgresql

I have a table:
TABLE_A
id_table_a
field1
field2
field3
field4
field5
I need to move the data to 3 tables, 1 parent with 2 children:
TABLE_B
id_table_b
id_table_a
field1
field2
TABLE_C
id_table_c
id_table_b
field3
field4
TABLE_D
id_table_d
id_table_b
field5
We're talking about millions of registers. What would be the correct and most effective way to do this?
I'm completely new to PostgreSQL and I've come up with this after reading the documentation:
INSERT INTO table_b (id_table_a, field1, field2) SELECT id_table_a FROM table_a, SELECT field1 FROM table_a, SELECT field2 FROM table_a;
INSERT INTO table_c (id_table_b, field3, field4) SELECT id_table_b FROM table_b, SELECT field3 FROM table_a WHERE table_b.id_table_a = table_a.id_table_a, SELECT field4 FROM table_a WHERE table_b.id_table_a = table_a.id_table_a;
INSERT INTO table_d (id_table_d, field5) SELECT id_table_c FROM table_c, SELECT field5 FROM table_a WHERE table_b.id_table_a = table_a.id_table_a;
Would this do what I need or am I missing something? Thank you.

This will not work:
INSERT INTO table_b (id_table_a, field1, field2) SELECT id_table_a FROM table_a, SELECT field1 FROM table_a, SELECT field2 FROM table_a;
because the query SELECT id_table_a FROM table_a will (or can) return more than 1 value.
You need to write it like:
INSERT INTO table_b (id_table_a, field1, field2)
SELECT id_table_a, field1, field2 FROM table_a;

Maybe sub-optimal but a straightforward PL/pgSQL do block will help. Pls. note the returning into clause. Assuming that id_table_b, id_table_c and id_table_d are autogenerated integers, then
DO language plpgsql
$$
declare
r record;
var_id_table_b integer;
begin
for r in select * from table_a loop
insert into table_b (id_table_a, field1, field2)
values (r.id_table_a, r.field1, r.field2)
RETURNING id_table_b INTO var_id_table_b;
insert into table_c (id_table_b, field3, field4)
values (var_id_table_b, r.field3, r.field4);
insert into table_d (id_table_b, field5)
values (var_id_table_b, r.field5);
end loop;
end;
$$;

Related

Recursive CTE and multiple inserts in joined table

I'm searching to copy nodes of a hierarchical tree and to apply the changes onto a joined table. I found parts of the answer in other questions like Postgresql copy data within the tree table for the tree copy (in my case I only copy the children and not the root) and PostgreSQL - Insert data into multiple tables simultaneously to insert data in several table simultaneously, but I don't manage to mix them.
I would like to:
Generate the new nodes id from the fields table
Insert the new field ids in the data_versions table
Insert the new nodes in the fields table with the data_id from the data_versions table
Note: there is a circular reference between the fields and the data_versions tables.
See below the schema:
Here is a working query, but without the insert in the data_versions table. It is only a shallow copy (keeping the same data_id) while I would like a deep copy:
WITH created_data AS (
WITH RECURSIVE cte AS (
SELECT *, nextval('fields_id_seq') new_id FROM fields WHERE parent_id = :source_field_id
UNION ALL
SELECT fields.*, nextval('fields_id_seq') new_id FROM cte JOIN fields ON cte.id = fields.parent_id
)
SELECT C1.new_id, C1.name, C1.field_type, C1.data_id, C2.new_id new_parent_id
FROM cte C1 LEFT JOIN cte C2 ON C1.parent_id = C2.id
)
INSERT INTO fields (id, name, parent_id, field_type, data_id)
SELECT new_id, name, COALESCE(new_parent_id, :target_field_id), field_type, data_id FROM created_data
RETURNING id, name, parent_id, field_type, data_id;
And here is the draft query I'm working on for inserting data in the data_versions table resulting with WITH clause containing a data-modifying statement must be at the top level as an error:
WITH created_data AS (
WITH cloned_fields AS (
WITH RECURSIVE cte AS (
SELECT *, nextval('fields_id_seq') new_id FROM fields WHERE parent_id = :source_field_id
UNION ALL
SELECT fields.*, nextval('fields_id_seq') new_id FROM cte JOIN fields ON cte.id = fields.parent_id
)
SELECT C1.new_id, C1.name, C1.field_type, C1.data_id, C2.new_id new_parent_id
FROM cte C1 LEFT JOIN cte C2 ON C1.parent_id = C2.id
),
cloned_data AS (
INSERT INTO data_versions (value, author, field_id)
SELECT d.value, d.author, c.new_id
FROM cloned_fields c
INNER JOIN data_versions d ON c.data_id = d.id
RETURNING id data_id
)
SELECT cloned_fields.new_id, cloned_fields.name, cloned_fields.field_type, cloned_fields.new_parent_id, cloned_data.data_id
FROM cloned_fields
INNER JOIN cloned_data ON cloned_fields.data_id = cloned_data.id
)
INSERT INTO fields (id, name, parent_id, field_type, data_id)
SELECT new_id, name, COALESCE(new_parent_id, :target_field_id), field_type, data_id FROM created_data
RETURNING id, name, parent_id, field_type, data_id, value data;
If other people were encountering the same issue as me, I came up with this solution some months later. The trick was to move the data-modifying CTE at the top level as suggested by the error message. We can always access previously declared CTE's:
WITH new_fields_ids AS (
WITH RECURSIVE cte AS (
SELECT *, nextval('fields_id_seq') new_id FROM fields WHERE parent_id = :source_field_id
UNION ALL
SELECT fields.*, nextval('fields_id_seq') new_id FROM cte JOIN fields ON cte.id = fields.parent_id
)
SELECT C1.new_id, C1.name, C1.field_type, C1.data_id, C2.new_id new_parent_id
FROM cte C1 LEFT JOIN cte C2 ON C1.parent_id = C2.id
),
cloned_data AS (
INSERT INTO data_versions (value, author, field_id)
SELECT d.value, d.author, c.new_id
FROM new_fields_ids c
INNER JOIN data_versions d ON c.data_id = d.id
RETURNING id AS data_id, field_id, value
),
created_data AS (
SELECT new_fields_ids.new_id, new_fields_ids.name, new_fields_ids.field_type, new_fields_ids.new_parent_id, cloned_data.data_id
FROM new_fields_ids
INNER JOIN cloned_data ON new_fields_ids.new_id = cloned_data.field_id
),
cloned_fields AS (
INSERT INTO fields (id, name, parent_id, field_type, data_id)
SELECT new_id, name, COALESCE(new_parent_id, :target_field_id), field_type, data_id FROM created_data
RETURNING id, name, parent_id, field_type, data_id
)
SELECT f.id, f.name, f.parent_id, f.field_type, f.data_id, d.value AS data FROM cloned_fields f
INNER JOIN cloned_data d ON f.id = d.field_id;

Issue with PK violation on insert

I have a scenario where almost all of the tables have issues with the PK value as follows. This results is a database error or the violation of the PK insert.
When using the DBCC CheckIdent it displays an inconsistency between the next value and the current one.
Can anyone have a reason for the mismatch happening on several tables?
Since this database is then replicate, I'm afraid this error will propagate across the environment.
I adapted this script to fix it, but really trying to figure out the root of the problem.
/** Version 3.0 **/
if object_id('tempdb..#temp') is not null
drop table #temp
;
with cte as (
SELECT
distinct
A.TABLE_CATALOG AS CATALOG,
A.TABLE_SCHEMA AS "SCHEMA",
A.TABLE_NAME AS "TABLE",
B.COLUMN_NAME AS "COLUMN",
IDENT_SEED (A.TABLE_NAME) AS Seed,
IDENT_INCR (A.TABLE_NAME) AS Increment,
IDENT_CURRENT (A.TABLE_NAME) AS Curr_Value
, DBPS.row_count AS NumberOfRows
FROM INFORMATION_SCHEMA.TABLES A
inner join INFORMATION_SCHEMA.COLUMNS B on b.TABLE_NAME = a.TABLE_NAME and b.TABLE_SCHEMA = a.TABLE_SCHEMA
inner join sys.identity_columns IC on OBJECT_NAME (IC.object_id) = a.TABLE_NAME
inner join sys.dm_db_partition_stats DBPS ON DBPS.object_id =IC.object_id
inner join sys.indexes as IDX ON DBPS.index_id =IDX.index_id
WHERE A.TABLE_CATALOG = B.TABLE_CATALOG AND
A.TABLE_SCHEMA = B.TABLE_SCHEMA AND
A.TABLE_NAME = B.TABLE_NAME AND
COLUMNPROPERTY (OBJECT_ID (B.TABLE_NAME), B.COLUMN_NAME, 'IsIdentity') = 1 AND
OBJECTPROPERTY (OBJECT_ID (A.TABLE_NAME), 'TableHasIdentity') = 1 AND
A.TABLE_TYPE = 'BASE TABLE'
)
select 'DBCC CHECKIDENT ('''+A.[SCHEMA]+'.'+a.[TABLE]+''', reseed)' command
, ROW_NUMBER() OVER(ORDER BY a.[SCHEMA], a.[TABLE] asc) AS ID
, A.Curr_Value
, a.[TABLE]
into #temp
from cte A
ORDER BY A.[SCHEMA], A.[TABLE]
declare #i int = 1, #count int = (select max(ID) from #temp)
declare #text varchar(max) = ''
select #COUNT= count(1) FROM #temp
WHILE #I <= #COUNT
BEGIN
SET #text = (SELECT command from #temp where ID=#I)
EXEC (#text + ';')
print #text
select Curr_Value OldValue, ident_current([TABLE]) FixValue, [TABLE] from #temp where ID=#I
SET #I = #I + 1
SET #text='';
END
go
maybe someone or something with enough permissions made a mistake by reseeding?
As simple as this:
create table testid (
id int not null identity (1,1) primary key,
data varchar (3)
)
insert into testid (data) values ('abc'),('cde')
DBCC CHECKIDENT ('testid', RESEED, 1)
insert into testid (data) values ('bad')

postgresql column names as variable in a subquery

table1
id col1 col2 col3...
table2
col_id col_name
3432 col1
5342 col2
6756 col3
Now I want to generate table 3 like this:
id col_name col_value col_id
Please note that col1, col2,col3... are not in order. Therefore I have to query table2 to obtain col_id ( I think pivot does not work here)
How can I do it in SQL?
It appears that you want a select like this:
SELECT t2.id,
CASE
WHEN t2.col_name='col1' THEN t1.col1
WHEN t2.col_name='col2' THEN t1.col2
WHEN t2.col_name='col2' THEN t1.col2
-- ... more columns
ELSE NULL
END
FROM table2 t2 LEFT JOIN table2 t1 ON t2.col_id = t1.id
You could also create a function, though this will be slower in practice:
CREATE OR REPLACE FUNCTION table1_col(id integer, name text) RETURNS text as $$
DECLARE
col_val text;
BEGIN
EXECUTE format('SELECT %s FROM table1 WHERE id=$1', name)
INTO col_val
USING id;
RETURN col_val;
END;
$$ LANGUAGE plpgsql;
SELECT table1_col(col_id,col_name) FROM table2;

select record from joined table if it exists

I'm working on a sql query that should 'coalesce' the records from 2 tables, i.e. if the record exists in table2, it should take that one, otherwise it should fall back to the values in table1.
In the example, table1 and table2 have just 2 fields (id an description), but obviously in reality there could be more.
Here's a small test case:
create table table1 (id int, description nvarchar(50))
create table table2 (id int, description nvarchar(50))
insert into table1 values (1, 'record 1')
insert into table1 values (2, 'record 2')
insert into table1 values (3, 'record 3')
insert into table2 values (1, 'record 1 modified')
insert into table2 values (2, null)
The result of the query should look like this:
1, "record 1 modified"
2, null
3, "record 3"
Here's what I came up with.
select
case when table2.id is not null then
table2.id else table1.id
end as Id,
case when table2.id is not null then
table2.description
else
table1.description
end as Description
-- etc for other fields
from table1
left join table2 on table1.id = table2.id
Is there a better way to achieve what I want? I don't think I can use coalesce since that would not select a null value from table2 if the corresponding value in table1 is not null.
How about:
SELECT t2.ID, t2.Description
FROM table2 t2
UNION ALL
SELECT t1.ID, t1.Description
FROM table1 t1
WHERE NOT EXISTS (SELECT *
FROM table2 t2
WHERE t2.ID = t1.ID)
The above query gets all the records from table 2 (including the case where description is NULL but the ID is populated), and only the records from table 1 where they don't exist in table 2.
Here's an alternative:
SELECT table2.*
FROM table1
RIGHT JOIN table2
ON table1.id = table2.id
UNION
SELECT table1.*
FROM table1
FULL OUTER join table2
ON table1.id = table2.id
WHERE table1.id NOT IN (SELECT id FROM table2)
--and table2.id not in (select id from table1)
You can add in that last line if you don't want ids that are only in table2. Otherwise I guess Stuart Ainsworth's solution is better (i.e. drop all the joins)
http://sqlfiddle.com/#!3/03bab/12/0

T-SQL GROUP BY over a dynamic list

I have a table with an XML type column. This column contains a dynamic list of attributes that may be different between records.
I am trying to GROUP BY COUNT over these attributes without having to go through the table separately for each attribute.
For example, one record could have attributes A, B and C and the other would have B, C, D then, when I do the GROUP BY COUNT I would get A = 1, B = 2, C = 2 and D = 1.
Is there any straightforward way to do this?
EDIT in reply to Andrew's answer:
Because my knowledge of this construct is superficial at best I had to fiddle with it to get it to do what I want. In my actual code I needed to group by the TimeRange, as well as only select some attributes depending on their name. I am pasting the actual query below:
WITH attributes AS (
SELECT
Timestamp,
N.a.value('#name[1]', 'nvarchar(max)') AS AttributeName,
N.a.value('(.)[1]', 'nvarchar(max)') AS AttributeValue
FROM MyTable
CROSS APPLY AttributesXml.nodes('/Attributes/Attribute') AS N(a)
)
SELECT Datepart(dy, Timestamp), AttributeValue, COUNT(AttributeValue)
FROM attributes
WHERE AttributeName IN ('AttributeA', 'AttributeB')
GROUP BY Datepart(dy, Timestamp), AttributeValue
As a side-note: Is there any way to reduce this further?
WITH attributes AS (
SELECT a.value('(.)[1]', 'nvarchar(max)') AS attribute
FROM YourTable
CROSS APPLY YourXMLColumn.nodes('//path/to/attributes') AS N(a)
)
SELECT attribute, COUNT(attribute)
FROM attributes
GROUP BY attribute
CROSS APPLY is like being able to JOIN the xml as a table. The WITH is needed because you can't have xml methods in a group clause.
Here is a way to get the attribute data into a way that you can easily work with it and reduce the number of times you need to go through the main table.
--create test data
declare #tmp table (
field1 varchar(20),
field2 varchar(20),
field3 varchar(20))
insert into #tmp (field1, field2, field3)
values ('A', 'B', 'C'),
('B', 'C', 'D')
--convert the individual fields from seperate columns to one column
declare #table table(
field varchar(20))
insert into #table (field)
select field1 from #tmp
union all
select field2 from #tmp
union all
select field3 from #tmp
--run the group by and get the count
select field, count(*)
from #table
group by field