I'm facing problem within postgres12 FDW and can't understand where is the issue.
In DWH we are replicating tables from two difference sources (for example source1 and source2) but there is this once table to whom replication from source keep crashing. Table structures and data types are identical except that on source2 we have additional column which is also in target db (DWH) within default value "0" so data from source1 could also be replicated. As I know, that's not a problem that in target table are more columns than source, but issue is that subscription process synchronizing only with source2. Within source1 it's synchronize once, on alter subscription source1 refresh publication and then it's stops and data are not replicated anymore (but subscriptions keeps working for other tables, problem is only within this particular table).
There are no error messages on log file or anything that could help to resolve it by myself. I tried to drop table in DWH and recreate it, but it wont help. There are no duplicate entries or anything that could crash replication.
select
b.subname
,c.relname
,case
when a.srsubstate = 'i' then 'initialize'
when a.srsubstate = 'd' then 'data is being copied'
when a.srsubstate = 's' then 'synchronized'
when a.srsubstate = 'r' then 'ready (normal replication)'
else null
end srsubstate
from pg_subscription_rel a
left join pg_subscription b on a.srsubid = b.oid
left join pg_class c on a.srrelid = c.oid
where c.relname ='table_name';
RESULT:
"source2" "table_name" "ready (normal replication)"
"source1" "table_name" "synchronized"
REPLICA IDENTITY for tables in source and target = INDEX
INDEX in DWH and source db's are the same: "CREATE UNIQUE INDEX table_name_idx ON public.table_name USING btree (id, country)"
Also altered table: alter table table_name replica identity using index table_name_idx;
I guess DB links works correctly as other tables from both sources are replicated correctly.
PROBLEM: Data on DWH from source1 keep synchronized only once - on alter subscription refresh publication....
Related
I have joined two tables and fetched data using Postgres source connector. But every time it gave the same issue i.e.
I have run the same query in Postgres and it runs without any issue. Is fetching data by joining tables not possible in Kafka?
I solve this issue by using the concept of the subquery. The problem was that
when I use an alias, the alias column is interpreted as a whole as a column name and therefore the problem occurs. Here goes my query:
select * from (select p.\"Id\" as \"p_Id\", p.\"CreatedDate\" as p_createddate, p.\"ClassId\" as p_classid, c.\"Id\" as c_id, c.\"CreatedDate\" as c_createddate, c.\"ClassId\" as c_classid from \"PolicyIssuance\" p left join \"ClassDetails\" c on p.\"DocumentNumber\" = c.\"DocumentNo\") as result"
I have a process which is creating thousands of temporary tables a day to import data into a system.
It is using the form of:
create temp table if not exists test_table_temp as
select * from test_table where 1=0;
This very quickly creates a lot of dead rows in pg_attribute as it is constantly making lots of new columns and deleting them shortly afterwards for these tables. I have seen solutions elsewhere that suggest using on commit delete rows. However, this does not appear to have the desired effect either.
To test the above, you can create two separate sessions on a test database. In one of them, check:
select count(*)
from pg_catalog.pg_attribute;
and also note down the value for n_dead_tup from:
select n_dead_tup
from pg_stat_sys_tables
where relname = 'pg_attribute';
On the other one, create a temp table (will need another table to select from):
create temp table if not exists test_table_temp on commit delete rows as
select * from test_table where 1=0;
The count query for pg_attribute immediately goes up, even before we reach the commit. Upon closing the temp table creation session, the pg_attribute value goes down, but the n_dead_tup goes up, suggesting that vacuuming is still required.
I guess my real question is have I missed something above, or is the only way of dealing with this issue vacuuming aggressively and taking the performance hit that comes with it?
Thanks for any responses in advance.
No, you have understood the situation correctly.
You either need to make autovacuum more aggressive, or you need to use fewer temporary tables.
Unfortunately you cannot change the storage parameters on a catalog table – at least not in a supported fashion that will survive an upgrade – so you will have to do so for the whole cluster.
In Most common cases, we have two tables (& more) in DB termed as master (e.g. SalesOrderHeader) & chirld (e.g. SalesOrderDetail).
We can read records from DB by one Select with INNER JOIN and additional constaints WHERE for lessen volume data for loading from DB (using "Addater.Fill(DataSet)")
#"SELECT d.SalesOrderID, d.SalesOrderDetailID, d.OrderQty,
d.ProductID, d.UnitPrice
FROM Sales.SalesOrderDetail d
INNER JOIN Sales.SalesOrderHeader h
ON d.SalesOrderID = h.SalesOrderID
WHERE DATEPART(YEAR, OrderDate) = #year;"
Did I understand right, in this case we receive one table in DataSet, w/o primary and foreign keys, and w/o possibility to set constraint between master and child tables.
This Dataset can be useful only for different queries regarding columns and record which exist in DataSet?
We can't using DbCommandBuilder for creating SQLCommands for Insert, Update, Delete based on the SelectCommand which was used for filling DataSet? And simply to Update data in these table in DB?
If we want to organize the local data moddification in tables by using the disconnect layer of ADO.NET, we must populate DataSet by two Select
"SELECT *
FROM Sales.SalesOrderHeader;"
"SELECT *
FROM Sales.SalesOrderDetail;"
After that we must create the primary keys for both table, and set constraint between master and child table. Create by DbCommandBuilder SQLCommands for Insert, Update, Delete.
In that case we will have possibility to any modification data in these tables remotely and after Update records in DB (using "Addater.Update(DataSet)").
If we will use one SelectCommand to load data in two tables in DataSet, can we use that SelectCommand for DbCommandBuilder for creating other SQLCommands for "Update" and Update all tables in DataSet by one "Addater.Update(DataSet)" or we must create separate Addapter for Update every table?
If I for economy resources will load only part of records (see below) from table (e.g. SalesOrderDetail). Do I right understand, that in this case, I can have a possible problems, when I will send new records to DB (by Update), because news records can conflict with existen in DB by primary key (some records which have other value in OrderDate field)?
"SELECT *
FROM Sales.SalesOrderDetail
WHERE DATEPART(YEAR, OrderDate) = #year;"
There is nothing preventing you from writing your own Insert, Update and Delete commands for your first select statement with the join. Of course you will have to determine a way to assure that the foreign keys exist.
Insert Into SalesOrderDetail (SalesOrderID, OrderQty, ProductID, UnitPrice) Values ( #SalesOrderID, #OrderQty, #ProductID, #UnitPrice);
Update SalesOrderDetail Set OrderQty = #OrderQty Where SalesOrderDetailID = #ID;
Delete From SalesOrderDetail Where SalesOrderDetailID = #ID;
You would execute these with ADO.net commands instead of using the adapter. I wrote the sample code in vb.net but I am sure it is easy to change to C# if you prefer.
Private Sub UpdateQuantity(Quant As Integer, DetailID As Integer)
Using cn As New SqlConnection("Your connection string"),
cmd As New SqlCommand("Update SalesOrderDetail Set OrderQty = #OrderQty Where SalesOrderDetailID = #ID;")
cmd.Parameters.Add("#OrderQty", SqlDbType.Int).Value = Quant
cmd.Parameters.Add("#ID", SqlDbType.Int).Value = DetailID
cn.Open()
cmd.ExecuteNonQuery()
End Using
End Sub
I'm on PostgreSQL 9.3. I'm the only one working on the database, and my code run queries sequentially for unit tests.
Most of the times the following UPDATE query run without problem, but sometimes it makes locks on the PostgreSQL server. And then the query seems to never ends, while it takes only 3 sec normally.
I must precise that the query run in a unit test context, i.e. data is exactly the same whereas the lock happens or not. The code is the only process that updates the data.
I know there may be lock problems with PostgreSQL when using update query for a self updating table. And most over when a LEFT JOIN is used.
I also know that a LEFT JOIN query can be replaced with a NOT EXISTS query for an UPDATE but in my case the LEFT JOIN is much faster because there is few data to update, while a NOT EXISTS should visit quite all row candidates.
So my question is: what PostgreSQL commands (like Explicit Locking LOCK on table) or options (like SELECT FOR UPDATE) I should use in order to ensure to run my query without never-ending lock.
Query:
-- for each places of scenario #1 update all owners that
-- are different from scenario #0
UPDATE t_territories AS upt
SET id_owner = diff.id_owner
FROM (
-- list of owners in the source that are different from target
SELECT trg.id_place, src.id_owner
FROM t_territories AS trg
LEFT JOIN t_territories AS src
ON (src.id_scenario = 0)
AND (src.id_place = trg.id_place)
WHERE (trg.id_scenario = 1)
AND (trg.id_owner IS DISTINCT FROM src.id_owner)
-- FOR UPDATE -- bug SQL : FOR UPDATE cannot be applied to the nullable side of an outer join
) AS diff
WHERE (upt.id_scenario = 1)
AND (upt.id_place = diff.id_place)
Table structure:
CREATE TABLE t_territories
(
id_scenario integer NOT NULL,
id_place integer NOT NULL,
id_owner integer,
CONSTRAINT t_territories_pk PRIMARY KEY (id_scenario, id_place),
CONSTRAINT t_territories_fkey_owner FOREIGN KEY (id_owner)
REFERENCES t_owner (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
I think that your query was locked by another query. You can find this query by
SELECT
COALESCE(blockingl.relation::regclass::text,blockingl.locktype) as locked_item,
now() - blockeda.query_start AS waiting_duration, blockeda.pid AS blocked_pid,
blockeda.query as blocked_query, blockedl.mode as blocked_mode,
blockinga.pid AS blocking_pid, blockinga.query as blocking_query,
blockingl.mode as blocking_mode
FROM pg_catalog.pg_locks blockedl
JOIN pg_stat_activity blockeda ON blockedl.pid = blockeda.pid
JOIN pg_catalog.pg_locks blockingl ON(
( (blockingl.transactionid=blockedl.transactionid) OR
(blockingl.relation=blockedl.relation AND blockingl.locktype=blockedl.locktype)
) AND blockedl.pid != blockingl.pid)
JOIN pg_stat_activity blockinga ON blockingl.pid = blockinga.pid
AND blockinga.datid = blockeda.datid
WHERE NOT blockedl.granted
AND blockinga.datname = current_database()
This query I've found here http://big-elephants.com/2013-09/exploring-query-locks-in-postgres/
Also can use ACCESS EXCLUSIVE LOCK to prevent any query to read and write table t_territories
LOCK t_territories IN ACCESS EXCLUSIVE MODE;
More info about locks here https://www.postgresql.org/docs/9.1/static/explicit-locking.html
I try to fill a table "SAMPLE" that requires ids from three other tables.
The table "SAMPLE" that needs to be filled look holds the following:
id (integer, not null, pk)
code (text, not null)
subsystem_id (integer, fk)
system_id (integer, not null, fk)
manufacturer_id (integer, fk)
The current query looks like this:
insert into SAMPLE(system_id, manufacturer_id, code, subsystem_id)
values ((select id from system where initial = 'P'), (select id from manufacturer where name = 'nameXY'), 'P0001', (select id from subsystem where code = 'NAME PATTERN'));
It is ridiculously slow, inserting 8k rows in around a minute.
I'm not sure if this is a really bad query problem or if my postgres configuration is heavily messed up.
For clarification, more table information:
subsystem:
This table holds fixed values (9) with a basic pattern I can access easily.
system
This table holds fixed values (4) that can be identified using the "initial" attribute
manufacturer
This table holds the name of a manufacturer.
The "SAMPLE" table will be the only connection between those tables so I'm not sure if I can use joins.
I'm pretty sure 8k values should be a gigantic joke to insert for a database so I'm really confused.
My specs:
Win 7 x86_64
8GB RAM
intel i5 3470S (QUAD) 2,9 GHZ
Postgres is v9.3
I didn't see any peak during my query so I suspect something is up with my configuration. If you need information about it, let me know.
Note: It is possible that I have codes or names that can not be found in the subsystem or manufacturer tables. Instead of adding nothing, I want to add a NULL value to the cell then.
8000 inserts/mn is roughly 133 per second or 0.133 ms per statement.
This is to be expected if the INSERTs happen in a loop each statement in its own transaction.
Each transaction commits to disk and waits for the disk to confirm that the data is written in durable storage. This is known to be slow.
Add a transaction around the loop with BEGIN and END and it will run at normal speed.
Ideally you wouldn't even have a loop but a more complex query that does a single INSERT to create all the rows from their sources, if possible.
I could not test it because I have no PostgreSql installed and no database with a similar structure, but may it would be faster to get the insert data from a single statement
INSERT INTO Sample (system_id, manufacturer_id, code, subsystem_id)
SELECT s.id AS system_id,
m.id AS manufacturer_id,
'P0001' AS code,
ss.id AS subsystem_id
FROM system s
JOIN manufacturer m
ON m.name = 'nameXY'
JOIN subsystem ss
ON ss.code = 'NAME PATTERN'
WHERE s.initial = 'P'
I hope this works.