Considerations for a RDBMS-agnostic transaction replication subsystem

Considerations for a RDBMS-agnostic transaction replication subsystem - postgresql

I am working on a RDBMS-agnostic (primarily via ODBC to start, though my personal preferred RDBMS is going to be PostgreSQL) transaction replicator for guaranteeing data in two databases is consistent.
This would be in similar vein to TIBCO Rendezvous, but not targeted at Oracle, and (likely) non-commercial.
I have considered alternatives such as using a simple message queue, but if users/processes in two locales update the same object at the same time (or before a transaction can replicate), you are still left with the issue of authority and "who's right".
What are primary considerations to keep in mind, especially concerning the high potential for conflicts in the environment?

There are some solutions out there, but I have no idea how big the gap between reality and the marketing advertising actually is.
http://symmetricds.codehaus.org/
http://www.continuent.com/solutions/tungsten-replicator
(update: 2015-03-13: does not seem to support Postgres any longer)

Related

How good is BDR for production Postgres sync?

I have a system where multiple satellites create financial transactions and they need to sync up with a core server. The satellites are remote servers that run Rails apps with a local Postgres database. The core is another Rails app with its own Postgres database. The satellites and core have pretty much the same schema (but not identical). Everything is containerized (apps and database). Very rarely, the core server does update some data that all satellites need. Currently I have one satellite, but this number will grow to a couple (I don’t think more than 100 in the distant future). There is no problem of sequence or contention between the core and the satellites. The core will never update the same transaction as any of the satellites and no satellites will update the same transaction as any of the other satellites. Even better, the financial transactions have a uuid as the primary key.
Since this is a multi-master sync problem, I naturally came across BDR. I have the following questions:
Is BDR production ready and stable? I’m reading about several competing technologies (like Bucardo and Londiste). Will it really be part of Postgres 9.6?
Can BDR handle a disconnected model? I don’t think this will be very often, but my satellites could be disconnected for hours.
Can BDR do selective syncs? For example, I’d only want certain tables be sync-ed.
Could BDR handle 100 satellites?

Is BDR production ready and stable?
Yes, BDR 1.0 for BDR-Postgres 9.4 is production-ready and stable. But then I would say that since I work for 2ndQuadrant, who develop BDR.
It is not a drop-in replacement for standalone PostgreSQL that you can use without application changes though. See the overview section of the manual.
I’m reading about several competing technologies (like Bucardo and Londiste).
They're all different. Different trade-offs. There's some discussion of them in the BDR manual, but of course, take that with a grain of salt since we can hardly claim to be unbiased.
Will it really be part of Postgres 9.6?
No, definitely not. Where have you seen that claim?
There will in future (but is not yet) be an extension released to add BDR to PostgreSQL 9.6 when it's ready. But it won't be part of PostgreSQL 9.6, it'll be something you install on top.
Can BDR handle a disconnected model? I don’t think this will be very often, but my satellites could be disconnected for hours.
Yes, it handles temporary partitions and network outages well, with some caveats around global sequences. See the manual for details.
Can BDR do selective syncs?
Yes. See the manual for replication sets.
Table structure is always replicated. So are initial table contents at the moment. But table changes can be replicated selectively, table-by-table.
For example, I’d only want certain tables be sync-ed.
Sure.
Could BDR handle 100 satellites?
Not well. It's a mesh topology that would expect every satellite to talk to every other satellite. Also, you'd have 198 backends (99 walsenders + 99 apply workers) per node. Not pretty.
You really want a star-and-hub model where each satellite only talks to the hub. That's not supported in BDR 1.0, nor is it targeted for support in BDR 2.0.
I think this is a better use case for pglogical or Londiste.
I can't really go into more detail here, since it overlaps with commercial consulting services I am involved in. The team I work with designs things like this for customers as a professional service.

How is a 'managed' permissioned blockchain different from a relational database service?

How is a managed and permissioned blockchain (Like the hyperledger blockchain service offered by IBM bluemix) different from a relational database service ?

The value proposition of permissioned blockchain systems over traditional databases is simple: integrity through cryptographically signed history. What's stopping twitter from editing my tweets and making it seem like I said something I didn't say? Little to nothing.
This is where a blockchain approach comes in. If twitter stored tweets in a blockchain that others could copy, then any modifications that twitter made to this chain would be caught. Blockchains preserve the integrity of the data within a database. They prevent people from cooking the books. This is of extraordinary importance and value in certain application areas.
In general, private or permissioned blockchains can be seen as a new method for ensuring consistency in a distributed database, even if that database is an environment of perfect trust. There is an equivalence between how a blockchain prevents two transactions spending the same prior transaction output, and how multiversion concurrency control (MVCC) in a relational database prevents two transactions modifying/deleting the same database row. From the perspective of the MVCC storage layer, there is no such thing as modifying a row in place.
This means that a permissioned blockchain can provide the same kind of concurrency control as MVCC, but in a distributed database which can be written to from many different locations simultaneously (multi-master replication). A blockchain is certainly not an ideal solution for all scenarios like this, but if the row size is small, transactions affect few rows, and conflicts only happen if someone is misbehaving, a permissioned blockchain can maintain provable consistency through a single hash across many nodes of a distributed database, all of which can write to the data.
When it comes to maintaining a shared database between entities with imperfect trust, permissioned blockchains have some great additional features:
The database can contain application logic in the form of constraints on valid transactions. This kind of constraint goes beyond regular database stored procedures because it cannot be circumvented under any circumstances.
The database has per-row permissions which use public key cryptography. Furthermore, every transaction presents a publicly auditable proof that its creator(s) had the right to delete/modify its prior rows.
Of course, not by coincidence, these are very relevant features for inter-company financial ledger databases. Signed commitments with immutable history are all that’s required for proof of integrity. Moreover, assuming commitments are immutable (transactions can only be reversed by adding a new commitment that reverses the actions of the previous commitment), you only need to keep track of the most recent commitment.
If the commitment signer is a known entity, a single honest "auditor" is all that's required to keep the commitment signer honest. Anyone closely watching the signer will be able to easily prove the signer modified the history.
Another use case is where the permissioned participants are a limited group of cooperating parties, where there is no particular enduring trust. The NASDAQ example is this use case. A known set of participants who currently remove the trust requirements by manual records (usually spread sheets) and expensive lawyers. A blockchain style shared database, whilst slower than an SQL DB, solves the proof of integrity in this case both faster and less expensively than the current manual/legal processes.
Further reading:
Ending the bitcoin vs blockchain debate
Blockchains vs centralized databases
Attribution: Parts of that answer where authored by Greg Slepak, Eric Lombrozo, Gideon Greenspan, and Ron OHara at Bitcoin Stack Exchange under the terms of CC BY-SA 3.0.

Transactional guarantee in mongodb

So, I am doing research on MongoDB, in line with upper management decision to embrace open source and to migrate existing product database from SQL Server to MongoDB and revamp the entire thing. Do note that our database should focus on data consistency and transactional guarantee.
And i discover this post: Click here. A summary of the post is as follow:
MongoDB claims to be strongly consistent, but a lot of evidence
recently has shown this not to be the case in certain scenarios (when
network partitioning occurs, which can happen under heavy load). This
means that you can potentially lose records that MongoDB has
acknowledged as "successfully written".
In terms of your application, if you have a need to have transactional
guarantees (meaning if you can't make a durable write, you need the
transaction to fail), you should avoid MongoDB. Example scenarios
where strong consistency and durability are essential include "making
a deposit into a bank account" or "creating a record of birth". To put
it another way, these are scenarios where you would get punched in the
face by your customer if you indicated an operation succeeded and it
didn't.
So, my question is as follow:
1) To what extend does "lost data" still valid in current version of MongoDB?
2) What approach can be take to ensure transactional guarantee in MongoDB?
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.

The references in that post have been discussed here before (for example, here is one: MongoDB: Does Write Concern guarantee the write on primary plus atleast one secondary ). No need to duplicate your question.
The blog "Aphyr" mostly uses these articles to tout it's own tech (if you read the entire blog you will realise they have their own database which they market). Every database they show loses data except their own.
2) What approach can be take to ensure transactional guarantee in MongoDB?
I agree you should be handling database problems in client code, if not then how is your client side ever going to remain consistent in the event of partitions?
Since you are not Harry Potter (are you?) I will say that you need to check for exceptions thrown in your client code and react to them as needed.
1) To what extend does "lost data" still valid in current version of MongoDB?
As for the bug he mentions in 2.4.3: he fails (as I mention in the linked post) to state the bug reference so again, no comment still.
Besides 2 writes in 6,000? That's less data loss than I have seen in MySQL on a partition! So not too shabby.
I have not noticed such behaviour in my own app and, from small to extremely large sites, I have not noticed anyone reproduce benchmark type scenarios as displayed in that article, I doubt very much you will.
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
They would have tight coding to ensure consistency in distributed environments.
Of course, they would start by choosing the right tech for the situation...

Write Concern Reference
Write concern describes the level of acknowledgement requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.
https://docs.mongodb.org/v3.0/reference/write-concern/

no single point of failure with traditional RDBMS

I am working in a trading applications that depends on an Oracle DB.
The DB is crashed two times and the business owner wants some solution in which the application still works even the DB is crashed.
My team leader introduced Cassandra NOSQL as a solution as it has no single point of failure but this option will make us move from the traditional relational model into the NOSQL model which I consider as a drawback.
My question here, Is there a way to avoid a single point of DB failure with traditional relational DBMS like Mysql, postgreSQL,......etc ?

Sounds like you just need a cluster of Oracle database instances, rather than just a single instance, such as Oracle RAC.
If your solution for the Oracle server being offline is to use Cassandra, what happens if the Cassandra cluster goes down? And are you really in the situation where it makes sense to rewrite and re-architect your entire application to use a different type of data store, just to avoid downtime from Oracle? I would suspect this only makes sense for applications with huge usage and load numbers, where any downtime is going to cost serious money (and not just cause embarrassment to the business folks to their bosses).

Is there a way to avoid a single point of DB failure with traditional relational DBMS
No, that's not possible. Simply because when one node dies. It is gone.
Any fault-tolerant system will use several nodes that replicate each other. You can still use traditional RDBMS, but you will need to configure mirroring in order for the system to tolerate a node failure.

NoSQL isn't the only possible solution. You can set up replication with MySQL:
http://dev.mysql.com/doc/refman/5.0/en/replication-solutions.html
and
http://mysql-mmm.org/
and concerming failover discussions:
http://serverfault.com/questions/274094/automated-failover-strategy-for-master-slave-mysql-replication-why-would-this

Which NoSQL DB is best fitted for OLTP financial systems?

We're designing an OLTP financial system. it should be able to support 10.000 transactions per second and have reporting features.
So we have come to the idea of using:
a NoSQL DB as our main storage
a MySQL DB (Percona server actually) making some ETLs from the NoSQL DB for store reporting data
We're considering MongoDB and Riak for the NoSQL job. we have read that Riak scales more smoothly than MongoDB. And we would like to listen your opinion.
Which NoSQL DB would you use for a
OLTP financial system?
How has been
your experience scaling MongoDB/Riak?

There is no conceivable circumstance where I would use a NOSQl database for anything to do with finance. You simply don't have the data integrity needed or the internal controls. Dow Jones uses SQL Server to do its transactions and if they can properly design a high performance, high transaction Relational datbase so can you. You will have to invest in some people who know what they are doing though.

One has to think about the problem differently. The notion of transaction consistency stems from the UD (update) in CRUD (Create, Read, Update, Delete). noSQL DBs are CRAP (Create, Replicate, Append, Process) oriented, working by accretion of time-stamped data. With the right domain model, there is no reason that auditability and the equivalent of referential integrity can't be achieved.

The global-storage based NoSQL databases - Cache from InterSystems and GT.M from FIS - are used extensively in financial services and have been for many years. Cache in particular is used for both the core database and for OLTP.

I can answer regarding my experience with scaling Riak.
Riak scales smoothly to the extreme. Scaling is as easy as adding nodes to the cluster, which is a very simple operation in itself. You can achieve near linear scalability by simply adding nodes. Our experience with Riak as far as scaling is concerned has been amazing.
The flip side is that it is lacking in many respects. Some examples:
You can't do something like count(*) or list keys on a production cluster. That would require a work around if you want to do ETL from Riak into MySQL - or how would you know what to (E)xtract?
(One possible work around would be to maintain a bucket with a known key sequence that map to values that contain the keys you inserted into your other buckets).
The free version of Riak comes with no management console that lets you know what's going on, and the one that's included in the Enterprise version isn't much of an improvement.
You'll need the Enterprise version of you're looking to replicate your data over WAN (e.g. for DR / high availability). That's alright if you don't mind paying, but keep in mind that Basho pricing is very high.

I work with the Starcounter (so I’m biased), but I think I can safely say that for a system processing financial transactions you have to worry about transaction consistency. Unfortunately, this is what the engines used for Facebook and Twitter had to give up allow their scale-out strategy to offer performance. This is not because engines such as MongoDb or Cassandra are poorly designed; rather, it follows naturally from the CAP theorem (http://en.wikipedia.org/wiki/CAP_theorem). Simply put, changes you make in your database will overwrite other changes if they occur close in time. Ok for status updates and new tweets, but disastrous if you deal with money or other quantities. The amounts will simply end up wrong when many reads and writes are being done in parallel. So for the throughput you need, a memory centric NoSQL database with ACID support is probably the way to go.

You can use some NoSQL databases (Cassandra, EventStore) as a storage for financial service if you implement your app using event sourcing and concepts from DDD. I recommend you to read this minibook http://www.oreilly.com/programming/free/reactive-microservices-architecture.html

OLTP can be achieved using NoSQL with a custom implementation,
there are two things,
1. How are you going to achieve ACID properties that an RDBMS gives.
2. Provide a custom blocking or non blocking concurrency and transaction handling mechanism.
To take you closer to solution,
Apache Phoenix,apache trafodion or Splice machine.

Trafodion has full ACID support over HBase, you should take a look.

Cassandra can be used for both OLTP and OLAP. Good replication and eventual data consistency gives you the choice in your hand. Need to design the system properly. And after all it's free of cost but not free of developer, give it a try

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse