How to process distributed transaction within postgresql? - postgresql

Anyone can kindly tell me how to process distributed transaction within postgresql, which is also called "XA"? Is there any resources about it? Great thanks for any answer.

It looks like you are a bit confused. Generally database systems support two notions of distributed transaction types:
Native distributed transactions and
XA transactions.
Native distributed transactions are generally between different servers of the same RDBMS. Postgres also supports this with the dblink_exec command. Generally the connection to the other server is created by a so called database link. Postgres is a bit more clumsy to use then some other commercial grade RDBMS. You first need to install an extension to be able to use database links. However the postgres rdbms is managing the transaction.
XA transactions on the other hand are managed by the external transaction manager (TM) and each of the participating database has the role of a XA resource, which enlists with the transaction manager. The RDBMS can no longer decide itself when to commit a transaction. This is the task of the XA transaction manager. He uses a 2PC protocol to make sure the changes are applied or rolled back in a consistent manner across the databases.
On some OSes like windows a transaction manager is part of the operating system on others not. Generally java is shipped with a transaction manager and the corresponding data source needs to be configured to use XA.

Related

Apache Ignite - Can writeThrough work with TRANSACTIONAL?

I am new to Ignite. Happen to find Ignite as an in-memory DB and it might be a good improvement to our current systems.
Here is my situation:
1, We have an existing huge OLTP system. Which is for online E-Commerce.
2, Right now the app uses Spring Boot, and the database is Postgres (AWS)
3, The app contains thousands of sql: select .. from A inner join B, inner join C …. (usually 5~10 tables join)
4, The app uses select … for update to lock entries and perform update. The retry / timeout is configured in the app for concurrency.
5, The system handles online traffic (100 requests/second) also some backend job update. So the concurrency might happen every second on a single record.
Here is my purpose:
1, Wish to change minimum app code to integrate Ignite;
2, Plan to setup architecture as this: App -> Ignite (Memory DB) -> Postgres (Backup DB). (Reason is that we are new to Ignite, to avoid risk of operation. So still prefer to keep Postgres as backup)
Some questions
Q1. writeThrough is not supported to work with TRANSACTIONAL?
Q2, As we require Transaction/lock (select … for update), I use CacheAtomicityMode.TRANSACTIONAL, but it seems can not autosync to Postgres(Q1). Is there a way to have TRANSACTIONAL and Autosync to PG at the same time? Otherwise it is very troublesome. We need to sync ourselves.
Q3, If we implement dynamic dataSource in app, then we can achieve switch to PG if Ignite is down. But that requires data on PG is same as Ignite. May I ask the advice how to keep data consistency between Postgres and Ignite?
writeThrough is supported with TRANSACTIONAL.
However, Apache Ignite does not have transactional SQL in GA currently, so you will need to use cache API transactions (get/put, etc).

Any internal contention when using PostgreSQL pg_try_advisory_lock

I am using the PostreSQL pg_try_advisory_lock function to coordinate a cluster of micro-services, sharing the same database, to ensure that only one instance performs an ad hoc task.
Does this function contend with any internal locks PostresSQL may be using. I am concerned that using a database wide locking mechanism such as this may impact the system resources used within the database engine.
PostgreSQL does not use advisory locks internally, so there is no danger of blocking anything in the system by using them.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

How to automatically dispatch read-only transactions to slave

I would like all queries from my Spring-Hibernate application executed in a read-only transaction to be dispatched to a PostgreSQL slave and all read-write transaction queries to a master.
While using annotation driven transactions in Spring, if the transaction is defined as read-only, the PostreSQL driver allows only select queries to be executed, which is obvious, however there is no mention of how the driver would behave in a master slave configuration. For e.g., the MySQL driver has a replication connection class which automatically dispatches read-only transaction queries to the slave.
One solution would be to use multiple Hibernate session factories and use the one pointing to the slave for selects and the other for updates, but that would be too much manual handling. How should I be designing this?
This is a surprisingly complex question and the answer is not simply easy. You need to keep in mind that you have to have this dispatched in such a way that the layer which does the dispatching knows whether a transaction is likely to be read-only or not.
The cleanest solution is probably to implement the dispatching in your middleware. This has the advantage of being a functional dispatch-- we know what we are trying to do so let's dispatch there... Of course functions can create a bit of a knowledge gap in what is read-only and what writes....
The second option is that one could probably dispatch with something like PGPool or the like. I would expect you would probably want to avoid server-side prepared queries in these cases because the more knowledge you provide the intermediate layer, the fewer problems you will have.

Sybase SQLAnywhere jConnect routines?

I have a database which is part of a closed system and the end-user of the system would like me to write some reports using the data contains in a Sybase SQL Anywhere Database. The system doesn't provide the reports that they are looking for, but access to the data is available by connecting to this ASA database.
The vendor of the software would likely prefer I not update the database and I am basically read-only as I am just doing some reporting. All is good, seal is not broken, warranty still intact, etc,etc..
My main problem is that I am using jConnect in order to read from the database, and jConnect requires some "jConnect Routines" to be installed into the database. I've found that I can make this happen by just doing an "Alter Database Upgrade JConnect On", but I just don't fully understand what this does and if there is any risks associated with it.
So, my question is does anyone know exactly what jConnect routines are and how are they used? Is there any risk adding these to a database? Should I be worried about this?
If the vendor wants you to write reports using jConnect they will have to allow the installation of the JConnect tables.
These are quite safe, where I work the DBA team install these as a matter of course and we run huge databases in production with no impact.
There is an alternative driver that you could use called jTDS. Its open source and supports MS SQL Server and Sybase. I'm not sure if they require the JConnect tables or not.
I think that the additional tables are a bit of anachronism in this day and age.
Looking at ASA 10 docs, there is another driver: the iAnywhere JDBC driver which seems to be going through the ODBC driver, and as such, probably will not require an alteration of the database.
On the other hand, installing the "jConnect system objects" is done by running the script scrits/jcatalog.sql... You can show it the DBAs, if you want to reassure them. It creates some procedures, tables, variables.
The need for this script probably comes from the fact that jConnect talks to both ASE (Sybase) and iAnywhere databases, so it needs a compatibility layer installed in the database...