How to retrieve data from two different database in POSTGRESQL - postgresql

i have two database cvtl and cvtl_db , i need to write a single query to retrieve data from table A in cvtl and table B in cvtl_db.
Postgres is throwing error: cross database reference are not implemented

Basically you have two ways:
Older tools.
If you need to support older versions of PostgreSQL, use dblink or DBI-link. These two provide robust support for cross-db queries across a number of PostgreSQL versions. pl/proxy is another possibility.
Newer tools.
The newer approach is to use foreign data wrappers. This has more functionality (such as better transaction handling) and probably has more eyes in terms of support than dblink etc do today.

Related

Have an ordinary table on a PostgreSQL TimescaleDB (timeseries) database

For a project I need two types of tables.
hypertable (which is a special type of table in PostgreSQL (in PostgreSQL TimescaleDB)) for some timeseries records
my ordinary tables which are not timeseries
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it? Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB? If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
If I can, does it have any benefit if I store my ordinary table on a separate ordinary PostgreSQL database?
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it?
Absolutely... TimescaleDB is delivered as an extension to PostgreSQL and one of the biggest benefits is that you can use regular PostgreSQL tables alongside the specialist time-series tables. That includes using regular tables in SQL queries with hypertables. Standard SQL works, plus there are some additional functions that Timescale created using PostgreSQL's extensibility features.
Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB?
No, you have to explicitly create a table as a hypertable for it to implement TimescaleDB features. It would be worth checking out the how-to guides in the Timescale docs for full (and up to date) details.
If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
I don't think there's a storage overhead. You might see some performance gains e.g. for data ingest and query performance. This article may help clarify that https://docs.timescale.com/timescaledb/latest/overview/how-does-it-compare/timescaledb-vs-postgres/
Overall you can think of TimescaleDB as providing additional functionality to 'vanilla' PostgreSQL and so unless there's a reason around application design to separate non-time-series data to a separate database then you aren't obliged to do that.
One other point, shared by a very experienced member of our Slack community [thank you Chris]:
To have time-series data and “normal” data (normalized) in one or separate databases for us came down to something like “can we asynchronously replicate the time-series information”?
In our case we use two different pg systems, one replicating asynchronously (for TimescaleDB) and one with synchronous replication (for all other data).
Transparency: I work for Timescale

In DBeaver, how can I run an SQL union query from two different connections..?

We recently migrated a large DB2 database to a new server. It got trimmed a lot in the migration, for instance 10 years of data chopped down to 3, to name a few. But now I find that I need certain data from the old server until after tax season.
How can I run a UNION query in DBeaver that pulls data from two different connections..? What's the proper syntax of the table identifiers in the FROM and JOIN keywords..?
I use DBeaver for my regular SQL work, and I cannot determine how to span a UNION query across two different connections. However, I also use Microsoft Access, and I easily did it there with two Pass-Through queries that are fed to a native Microsoft Access union query.
But how to do it in DBeaver..? I can't understand how to use two connections at the same time.
For instance, here are my connections:
And I need something like this...
SELECT *
FROM ASP7.F_CERTOB.LDHIST
UNION
SELECT *
FROM OLD.VIPDTAB.LDHIST
...but I get the following error, to which I say "No kidding! That's what I want!", lol... =-)
SQL Error [56023]: [SQL0512] Statement references objects in multiple databases.
How can this be done..?
This is not a feature of DBeaver. DBeaver can only access the data that the DB gives it, and this is restricted to a single connection at a time (save for import/export operations). This feature is being considered for development, so keep an eye out for this answer to be outdated sometime in 2019.
You can export data from your OLD database and import it into ASP7 using DBeaver (although vendor tools for this are typically more efficient for this). Then you can do your union as suggested.
Many RDBMS offer a way to logically access foreign databases as if they were local, in which case DBeaver would then be able to access the data from the OLD database (as far as DBeaver is concerned in this situation, all the data is coming from a single connection). In Postgres, for example, one can use a foreign data wrapper to access foreign data.
I'm not familiar with DB2, but a quick Google search suggests that you can set up foreign connections within DB2 using nicknames or three-part-names.
If you check this github issue:
https://github.com/dbeaver/dbeaver/issues/3605
The way to solve this is to create a task and execute it in different connections:
https://github.com/dbeaver/dbeaver/issues/3605#issuecomment-590405154

DBLINK vs Postgres_FDW, which one may provide better performance?

I have a use case to distribute data across many databases on many servers, all in postgres tables.
From any given server/db, I may need to query another server/db.
The queries are quite basic, standard selects with where clauses on standard fields.
I have currently implemented postgres_FDW, (I'm, using postgres 9.5), but I think the queries are not using indexes on the remote db.
For this use case (a random node may query N other nodes), which is likely my best performance choice based on how each underlying engine actually executes?
The Postgres foreign data wrapper (postgres_FDW) is newer to
PostgreSQL so it tends to be the recommended method. While the
functionality in the dblink extension is similar to that in the
foreign data wrapper, the Postgres foreign data wrapper is more SQL
standard compliant and can provide improved performance over dblink
connections.
Read this article for more detailed info: Cross Database queryng
My solution was simple: I upgraded to Postgres 10, and it appears to push where clauses down to the remote server.

db2look from SQL

Is it possible to get the table structure like db2look from SQL?
Or the only way is from command line? Thus, by wrapping a external stored procedure in C I could call the db2look, but that is not what I am looking for.
Clarification added later:
I want to know which tables have the non logged option from SQL.
It is possible to create the table structure from regular SQL and the public DB2 catalog - however, it is complex and requires some deeper skills.
The metadata is available in the DB2 catalog views in the SYSCAT schema. For a regular table you would first start off by looking into the values in SYSCAT.TABLES and SYSCAT.COLUMNS. From there you would need to branch off to other views depending on what table and column options you are after, whether time-travel tables, special partitioning rules, or many other options are involved.
Serge Rielau published an article on developerWorks called Backup and restore SQL schemas for DB2 Universal Database that provides a set of stored procedures that will do exactly what you're looking for.
The article is quite old (2006) so you may need to put some time in to update the procedures to be able to handle features that were added to DB2 since the date of publication, but the procedures may work for you now and are a nice jumping off point.

Upsert in Amazon RedShift without Function or Stored Procedures

As there is no support for user defined functions or stored procedures in RedShift, how can i achieve UPSERT mechanism in RedShift which is using ParAccel, a PostgreSQL 8.0.2 fork.
Currently, i'm trying to achieve UPSERT mechanism using IF...THEN...ELSE... statement
e.g:-
IF NOT EXISTS(SELECT...WHERE(SELECT..))
THEN INSERT INTO tblABC() SELECT... FROM tblXYZ
ELSE UPDATE tblABC SET.,.,.,. FROM tblXYZ WHERE...
which is giving me error. As i'm writing this code independently without including it in function or SP's.
So, is there any solution to achieve UPSERT.
Thanks
You should probably read this article on upsert by depesz. You can't rely on SERIALIABLE for this since, AFAIK, ParAccel doesn't support full serializability support like in Pg 9.1+. As outlined in that post, you can't really do what you want purely in the DB anyway.
The short version is that even on current PostgreSQL versions that support writable CTEs it's still hard. On an 8.0 based ParAccel, you're pretty much out of luck.
I'd do a staged merge. COPY the new data to a temporary table on the server, LOCK the destination table, then do an UPDATE ... FROM followed by an INSERT INTO ... SELECT. Doing the data uploads in big chunks and locking the table for the upserts is reasonably in keeping with how Redshift is used anyway.
Another approach is to externally co-ordinate the upserts via something local to your application cluster. Have all your tools communicate via an external tool where they take an "insert-intent lock" before doing an insert. You want a distributed locking tool appropriate to your system. If everything's running inside one application server, it might be as simple as a synchronized singleton object.