I have read the KC saying how to manually resolve indoubt xn using list indoubt transaction with prompting command but want understand
if these indoubt transaction are at instance level?
Can instance restart also solve these indoubt transaction?
In-doubt transactions can only happen in a distributed database scenario, i.e., with 2 or more databases involved in a single distributed ("global") transaction. A typical reason for in-doubt transaction is lost network communication, the local transaction does not know whether the global/distributed transaction committed successfully.
1) The in-doubt transactions are per database and need to be resolved at that level.
2) The instance restart does not change the transaction state.
The X/Open distributed transaction processing protocol for DB2 in the Knowledge Center.
Related
I'm trying to figure out the logical replication protocol. My attention was drawn to the message "standby status update" (byte('r')). Unfortunately, the documentation does not seem to describe the expected server behavior for this command.
If I send an lsn from the past, will the server resend transactions that were in the future regarding this lsn? Or, as far as I could find out in a practical way, this message will only affect the meta data of the replication slot (from pg_replication_slots table)?
That information is used to move the replication slot ahead and to provide the data that can be seen in pg_stat_replication (sent_lsn, write_lsn, flush_lsn and replay_lsn). With synchronous replication, it also provides the data that the primary has to wait for before it can report a transaction as committed.
Sending old, incorrect log sequence numbers will not change the data that the primary streams to the standby, but it might make the primary retail old WAL, because the replication slot is not moved ahead. It will also confuse monitoring. With synchronous replication, it will lead to hanging DML transactions.
While performing write/read operations in cockroach database with springboot, we are getting below error intermittently. Any solutions here is appreciated. Thanks
Caused by:
org.postgresql.util.PSQLException: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1640760553.619962171,0 encountered previous write with future timestamp
The documentation states that this error is a sign of contention. It also suggests four ways of solving it:
Be prepared to retry on uncertainty (and other) errors, as described in client-side retry handling.
Use historical reads with SELECT ... AS OF SYSTEM TIME.
Design your schema and queries to reduce contention. For more information about how contention occurs and how to avoid it, see Understanding and avoiding transaction contention. In particular, if you are able to send all of the statements in your transaction in a single batch, CockroachDB can usually automatically retry the entire transaction for you.
If you trust your clocks, you can try lowering the --max-offset option to cockroach start, which provides an upper limit on how long a transaction can continue to restart due to uncertainty.
Did you already try these?
If we compare multiple types of replication (Single-leader, Multi-leader or Leaderless), Single-leader replication has the possibility to be Linearizable. In my understanding, Linearizability means that once a write completes, all later reads should return that value, or of a later write. Or said in other words, there should be an impression if there is only one database, but no more. So i guess, no stale reads.
PostgreSQL in his streaming replication, has the ability to make all it's replicas synchronous using the synchronous_standby_names and it also has the ability to fine tune with the synchronous_commit option, where it can be set to remote_apply, so the leader waits until the transaction is replayed on the standby (making it visible to queries). In the documentation, in the paragraph where it talks about the remote_apply option, it states that this allows load-balancing in simple cases with causal consistency.
Few pages back, it says this:
,,Some solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried,,
So i'm struggling to understand what can there be guaranteed, and what anomalies can happen if we load-balance read queries to the read replicas. Can still there be stale reads? Can it happen when i query different replicas to get different results even no write after happend on the leader? My impression is yes, but i'm not really sure.
If no, how PostgreSQL prevents stale reads? I did not find anything with more details how it fully works under the hood. Does it use two-phase commit, or some modification of it, or it uses some other algorithm to prevent stale reads?
If it does not provide option of no stale reads, is there a way to accomplish that? I saw, PgPool has to option to load-balance to the replicas that are behind no more than a defined threshold, but i did not understand if it could be defined to load-balance to replicas that are up with the leader.
It's really confusing to me to really understand if there anomalies can happen in a fully synchronous replication in PostgreSQL.
I understand that setup like this has problems with availability, but that is now not a concern.
If you use synchronous replication with synchronous_commit = remote_apply, you can be certain that you will see the modified data on the standby as soon as the modifying transaction has committed.
Synchronous replication does not use two-phase commit, the primary server first commits locally and then simply waits for the feedback of the synchronous standby server before COMMIT returns. So the following is possible:
An observer will see the modified data on the primary before COMMIT returns, and before the data are propagated to the standby.
An observer will see the modified data on the standby before the COMMIT on the primary returns.
If the committing transaction is interrupted on the primary at the proper moment before COMMIT returns, the transaction will already be committed only on the primary. There is always a certain time window between the time the commit happens on the server and the time it is reported to the client, but that window increases considerably with streaming replication.
Eventually, though, the data modification will always make it to the standby.
My case I have connected to another GP DB to import data into my PostgreSQL tables and written Java schedulers to refresh it Daily. But when I'm trying to fetch the records everyday by using SQL functions, it's giving me an error Greenplum Database does not support REPEATABLE READ transactions. So, Can anyone suggest me how can I load the data in frequent from GP to postgres without isolation hassle.
I knew to execute to refresh the tables by
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
But, I'm not able to use the same in the functions due to transactions blocks.
Unlike Oracle database, which uses locks and latches for concurrency control, Greenplum Database (as does PostgreSQL) maintains data consistency by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database, each transaction sees a snapshot of data which protects the transaction from viewing inconsistent data that could be caused by (other) concurrent updates on the same data rows. This provides transaction isolation for each database session. In a nutshell, readers don’t block writers and writers don’t block readers. Each transaction sees a snapshot of the database rather than locking tables.
Transaction Isolation Levels
The SQL standard defines four transaction isolation levels. In Greenplum Database, you can request any of the four standard transaction isolation levels. But internally, there are only two distinct isolation levels — read committed and serializable:
read committed — When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began. It never sees either uncommitted data or changes committed during query execution by concurrent transactions. However, the SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. In effect, a SELECT query sees a snapshot of the database as of the instant that query begins to run. Notice that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. UPDATE and DELETE commands behave the same as SELECT in terms of searching for target rows. They will only find target rows that were committed as of the command start time. However, such a target row may have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. The partial transaction isolation provided by read committed mode is adequate for many applications, and this mode is fast and simple to use. However, for applications that do complex queries and updates, it may be necessary to guarantee a more rigorously consistent view of the database than the read committed mode provides.
serializable — This is the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. Applications using this level must be prepared to retry transactions due to serialization failures. When a transaction is on the serializable level, a SELECT query sees only data committed before the transaction began. It never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. However, the SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Successive SELECT commands within a single transaction always see the same data. UPDATE and DELETE commands behave the same as SELECT in terms of searching for target rows. They will only find target rows that were committed as of the transaction start time. However, such a target row may have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the serializable transaction will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the serializable transaction can proceed with updating the originally found row. But if the first updater commits (and actually updated or deleted the row, not just locked it) then the serializable transaction will be rolled back.
read uncommitted — Treated the same as read committed in Greenplum Database.
repeatable read — Treated the same as serializable in Greenplum Database.
The default transaction isolation level in Greenplum Database is read committed. To change the isolation level for a transaction, you can declare the isolation level when you BEGIN the transaction, or else use the SET TRANSACTION command after the transaction is started.
enter link description here
My web app uses ADO.NET against SQL Server 2008. Database writes happen against a primary (publisher) database, but reads are load balanced across the primary and a secondary (subscriber) database. We use SQL Server's built-in transactional replication to keep the secondary up-to-date. Most of the time, the couple of seconds of latency is not a problem.
However, I do have a case where I'd like to block until the transaction is committed at the secondary site. Blocking for a few seconds is OK, but returning a stale page to the user is not. Is there any way in ADO.NET or TSQL to specify that I want to wait for the replication to complete? Or can I, from the publisher, check the replication status of the transaction without manually connecting to the secondary server.
[edit]
99.9% of the time, The data in the subscriber is "fresh enough". But there is one operation that invalidates it. I can't read from the publisher every time on the off chance that it's become invalid. If I can't solve this problem under transactional replication, can you suggest an alternate architecture?
There's no such solution for SQL Server, but here's how I've worked around it in other environments.
Use three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of subscribers. No writes go here, only reads. Used for the vast majority of OLTP reads.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, especially in the case you're experiencing.
You are describing a synchronous mirroring situation. Replication cannot, by definition, support your requirement. Replication must wait for a transaction to commit before reading it from the log and delivering it to the distributor and from there to the subscriber, which means replication by definition has a window of opportunity for data to be out of sync.
If you have a requirement an operation to read the authorithative copy of the data, then you should make that decission in the client and ensure you read from the publisher in that case.
While you can, in threory, validate wether a certain transaction was distributed to the subscriber or not, you should not base your design on it. Transactional replication makes no latency guarantee, by design, so you cannot rely on a 'perfect day' operation mode.