I have a use case to distribute data across many databases on many servers, all in postgres tables.
From any given server/db, I may need to query another server/db.
The queries are quite basic, standard selects with where clauses on standard fields.
I have currently implemented postgres_FDW, (I'm, using postgres 9.5), but I think the queries are not using indexes on the remote db.
For this use case (a random node may query N other nodes), which is likely my best performance choice based on how each underlying engine actually executes?
The Postgres foreign data wrapper (postgres_FDW) is newer to
PostgreSQL so it tends to be the recommended method. While the
functionality in the dblink extension is similar to that in the
foreign data wrapper, the Postgres foreign data wrapper is more SQL
standard compliant and can provide improved performance over dblink
connections.
Read this article for more detailed info: Cross Database queryng
My solution was simple: I upgraded to Postgres 10, and it appears to push where clauses down to the remote server.
Related
For a project I need two types of tables.
hypertable (which is a special type of table in PostgreSQL (in PostgreSQL TimescaleDB)) for some timeseries records
my ordinary tables which are not timeseries
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it? Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB? If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
If I can, does it have any benefit if I store my ordinary table on a separate ordinary PostgreSQL database?
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it?
Absolutely... TimescaleDB is delivered as an extension to PostgreSQL and one of the biggest benefits is that you can use regular PostgreSQL tables alongside the specialist time-series tables. That includes using regular tables in SQL queries with hypertables. Standard SQL works, plus there are some additional functions that Timescale created using PostgreSQL's extensibility features.
Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB?
No, you have to explicitly create a table as a hypertable for it to implement TimescaleDB features. It would be worth checking out the how-to guides in the Timescale docs for full (and up to date) details.
If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
I don't think there's a storage overhead. You might see some performance gains e.g. for data ingest and query performance. This article may help clarify that https://docs.timescale.com/timescaledb/latest/overview/how-does-it-compare/timescaledb-vs-postgres/
Overall you can think of TimescaleDB as providing additional functionality to 'vanilla' PostgreSQL and so unless there's a reason around application design to separate non-time-series data to a separate database then you aren't obliged to do that.
One other point, shared by a very experienced member of our Slack community [thank you Chris]:
To have time-series data and “normal” data (normalized) in one or separate databases for us came down to something like “can we asynchronously replicate the time-series information”?
In our case we use two different pg systems, one replicating asynchronously (for TimescaleDB) and one with synchronous replication (for all other data).
Transparency: I work for Timescale
Aurora Postgres 11.9
In SQL Server we strictly follow the good programming practice that "every single call land on DB from the application will be a stored procedure instead of simple queries". In Oracle, we haven't experienced the same thing may be due to select stored procedures required additional cursors, and so on.
Can any expert Postgres person advise me what practice should we follow in progress in this regard and what are pros and cons in this case of Postgres?
In addition in SQL Server we use "rowversion" for data sync with BI and other external modules, is there any built-in alternate available in Postgres or should we have to do it with manual triggers?
I have an existing relational Postgresql database. A few of the tables contain very fat blobs, they would be much better of as NoSQL Documents. This would significantly lighten our relational database.
So, we thought of moving those blob-table out into a NoSQL solution like CosmosDB or MongoDB. However there are foreign key dependencies with purely relational tables and this complicates moving those tables out into their own database.
I have found that PSQL natively supports storing Documents and can be distributed. The solutions I looked at so far are CitusData and Postgres XL. For those who used those how do they compare?
Has anyone encountered similar situations before? Did you separate out into a NoSQL database? Or has anyone partitioned their PSQL into relational and NoSQL parts? How did that go? What would you recommend to look out for in hindsight?
(Citus Engineer Here)
Postgres has JSONB column type which is powerful and flexible. What you can do is to keep your structural table as is and put a jsonb column for the blob data. Test this with single node Postgres and if that works for you, great!
If you have a problem with the scale of your data, i.e. memory or storage or CPU of a single machine is not enough for your workload and you cannot go bigger, then you can try scaling out with Citus or Postgres-XL.
I have no experience with Postgres-XL but Citus is pretty easy to try. There are docker images that you can use or you can create an account on Citus Cloud to try a 1-week free dev plan (it would not be suitable for benchmarking purposes).
Every RDBMS->NoSQL migration would require one of the two:
1. embedding some of these dependent documents into the ones that are actually queried by the user
2. referencing dependent documents by id and inferring these relationships on read.
Very typical, everyone does it every day, don't be afraid. BTW, you don't have to make a choice between Cosmos DB and MongoDB - just use Cosmos DB with MongoDB API.
Can anyone suggest me what are the pros & cons of using dblink in Postgres?
I use dblink to connect to multiple database in my function in Postgres.
dblink is a great tool and it works very well.
The main cons are:
If you run a query between 2 servers not on the same network you will have a lot of latency and the performance will be very degraded
If you use dblink in a JOIN, in order to process this JOIN a lot of rows will have to be transferred from the remote server which will use bandwidth and degrade performance
If you have the possibility to use a single database for each query and not use multiple databases with dblink it will always be a better option.
Read also this interesting thread: http://www.postgresql-archive.org/dblink-performance-td5056145.html
i have two database cvtl and cvtl_db , i need to write a single query to retrieve data from table A in cvtl and table B in cvtl_db.
Postgres is throwing error: cross database reference are not implemented
Basically you have two ways:
Older tools.
If you need to support older versions of PostgreSQL, use dblink or DBI-link. These two provide robust support for cross-db queries across a number of PostgreSQL versions. pl/proxy is another possibility.
Newer tools.
The newer approach is to use foreign data wrappers. This has more functionality (such as better transaction handling) and probably has more eyes in terms of support than dblink etc do today.