Is it possible to JOIN rows from two separate postgres databases?
I am working with system with couple databases in one server and sometimes I really need such a feature.
According to http://wiki.postgresql.org/wiki/FAQ
There is no way to query a database other than the current one.
Because PostgreSQL loads database-specific system catalogs, it is
uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of
course, a client can also make simultaneous connections to different
databases and merge the results on the client side.
EDIT: 3 years later (march 2014), this FAQ entry has been revised and is more helpful:
How do I perform queries using multiple databases?
There is no way to directly query a database other than the current
one. Because PostgreSQL loads database-specific system catalogs, it is
uncertain how a cross-database query should even behave.
The SQL/MED support in PostgreSQL allows a "foreign data wrapper" to
be created, linking tables in a remote database to the local database.
The remote database might be another database on the same PostgreSQL
instance, or a database half way around the world, it doesn't matter.
postgres_fdw is built-in to PostgreSQL 9.3 and includes read/write
support; a read-only version for 9.2 can be compiled and installed as
a contrib module.
contrib/dblink allows cross-database queries using function calls and
is available for much older PostgreSQL versions. Unlike postgres_fdw
it can't "push down" conditions to the remote server, so it'll often
land up fetching a lot more data than you need.
Of course, a client can also make simultaneous connections to
different databases and merge the results on the client side.
Forget about dblink!
Say hello to Postgres_FDW:
To prepare for remote access using postgres_fdw:
Install the postgres_fdw extension using CREATE EXTENSION.
Create a foreign server object, using CREATE SERVER, to represent each remote database you want to connect to. Specify connection
information, except user, and password, as options of the server
object.
Create a user mapping, using CREATE USER MAPPING, for each database user you want to allow to access each foreign server. Specify
the remote user name and password to use as user and password options
of the user mapping.
Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA, for each remote table you want to access. The columns
of the foreign table must match the referenced remote table. You can,
however, use table and/or column names different from the remote
table's, if you specify the correct remote names as options of the
foreign table object.
Now you need only SELECT from a foreign table to access the data
stored in its underlying remote table.
It's really useful even on large data.
Yes, it is possible to do this using dblink albeit with significant performance considerations.
The following example will require the current SQL user to have permissions on both databases. If db2 is not located on the same cluster, then you will need to replace dbname=db2 with the full connection string defined in the dblink documentation.
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
If table2 is very large, you could have performance issues because the sub-query loads up the entire table2 before performing the join.
No you can't. You could use dblink to connect from one database to another database, but that won't help if you're looking for JOIN's.
You can't use different SCHEMA's within a single database to store all you data?
Just a few steps and You can reach the goal:
follow this reference step by step
WE HAVE BEEN CONNECTED TO DB2 WITH TABLE TBL2 AND COLUMN COL2
ALSO THERE IS DB1 WITH TBL1 AND COLUMN COL1
*** connecting to second db ie db2
Now just **copy paste the 1-7 processes** (make sure u use correct username and password and ofcourse db name)
1.**CREATE EXTENSION dblink;**
2.**SELECT pg_namespace.nspname, pg_proc.proname
FROM pg_proc, pg_namespace
WHERE pg_proc.pronamespace=pg_namespace.oid
AND pg_proc.proname LIKE '%dblink%';**
3.**SELECT dblink_connect('host=localhost user=postgres password=postgres dbname=db1');**
4.**CREATE FOREIGN DATA WRAPPER postgres VALIDATOR postgresql_fdw_validator;**
5.**CREATE SERVER postgres2 FOREIGN DATA WRAPPER postgres OPTIONS (hostaddr '127.0.0.1', dbname 'db1');**
6.**CREATE USER MAPPING FOR postgres SERVER postgres2 OPTIONS (user 'postgres', password 'postgres');**
7.**SELECT dblink_connect('postgres2');**
---Now, you can SELECT the data of Database_One from Database_Two and even join both db results:
**SELECT * FROM public.dblink
('postgres2','SELECT col1,um_name FROM public.tbl1 ')
AS DATA(um_userid INTEGER),tbl2 where DATA.col1=tbl2.col2;**
You can also Check this :[How to join two tables of different databases together in postgresql [\[working finely in version 9.4\]][1]
You need to use dblink...as araqnid mentioned above, something like this works fine:
select ST.Table_Name, ST.Column_Name, DV.Table_Name, DV.Column_Name, *
from information_schema.Columns ST
full outer join dblink('dbname=otherdatabase','select Table_Name,
Column_Name from information_schema.Columns') DV(Table_Name text,
Column_Name text)
on ST.Table_Name = DV.Table_name
and ST.Column_Name = DV.Column_Name
where ST.Column_Name is null or DV.Column_Name is NULL
You have use dblink extension of postgresql.
Reference take from this Article:
DbLink extension of PostgreSQL which is used to connect one database to another database.
Install DbLink extension.
CREATE EXTENSION dblink;
Verify DbLink:
SELECT pg_namespace.nspname, pg_proc.proname
FROM pg_proc, pg_namespace
WHERE pg_proc.pronamespace=pg_namespace.oid
AND pg_proc.proname LIKE '%dblink%';
I have already prepared full demonstration on this. Please visit my post to learn step by step for executing cross database query in Postgresql.
Cannot be done? Of course we can, without special extensions. In our case, we had to compare two tables from different database servers, e.g. ACC and PROD, hence an even harder case than from most answers. Especially because ACC and PROD are deliberately on different servers to create a barrier, so you will not easily gain enough rights to perform a GRANT USAGE ON FOREIGN SERVER.
The obvious solution is to export both tables, and import both in the same database, e.g. DEV, or your own local db, under appropriate names, e.g. table1_acc and table1_prod, or schemas like acc and prod. Then, you may JOIN those with no special problems.
Related
For a long time I have been working only with Oracle Databases and I haven't had much contact with PostgreSQL.
So now, I have a few questions for people who are closer to Postgres.
Is it possible to create a connection from Postgres to Oracle (oracle_fdw?) and perform selects on views in a different schema than the one you connected to?
Is it possible to create a connection from Postgres to Oracle (oracle_fdw?) and perform inserts on tables in the same schema as the one you connected to?
Ad 1:
Yes, certainly. Just define the foreign table as
CREATE FOREIGN TABLE view_1_r (...) SERVER ...
OPTIONS (table 'VIEW_1', schema 'USERB');
Ad 2:
Yes, certainly. Just define a foreign table on the Oracle table and insert into it. Note that bulk inserts work, but won't perform well, since there will be a round trip between PostgreSQL and Oracle for each row inserted.
Both questions indicate a general confusion between a) the Oracle user that you use to establish the connection and b) the schema of the table or view that you want to access. These things are independent: The latter is determined by the schema option of the foreign table definition, while the former is determined by the user mapping.
I'm going to guess that the answer is "no" based on the below error message (and this Google result), but is there anyway to perform a cross-database query using PostgreSQL?
databaseA=# select * from databaseB.public.someTableName;
ERROR: cross-database references are not implemented:
"databaseB.public.someTableName"
I'm working with some data that is partitioned across two databases although data is really shared between the two (userid columns in one database come from the users table in the other database). I have no idea why these are two separate databases instead of schema, but c'est la vie...
Note: As the original asker implied, if you are setting up two databases on the same machine you probably want to make two schemas instead - in that case you don't need anything special to query across them.
postgres_fdw
Use postgres_fdw (foreign data wrapper) to connect to tables in any Postgres database - local or remote.
Note that there are foreign data wrappers for other popular data sources. At this time, only postgres_fdw and file_fdw are part of the official Postgres distribution.
For Postgres versions before 9.3
Versions this old are no longer supported, but if you need to do this in a pre-2013 Postgres installation, there is a function called dblink.
I've never used it, but it is maintained and distributed with the rest of PostgreSQL. If you're using the version of PostgreSQL that came with your Linux distro, you might need to install a package called postgresql-contrib.
dblink() -- executes a query in a remote database
dblink executes a query (usually a SELECT, but it can be any SQL
statement that returns rows) in a remote database.
When two text arguments are given, the first one is first looked up as
a persistent connection's name; if found, the command is executed on
that connection. If not found, the first argument is treated as a
connection info string as for dblink_connect, and the indicated
connection is made just for the duration of this command.
one of the good example:
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
Note: I am giving this information for future reference. Reference
I have run into this before an came to the same conclusion about cross database queries as you. What I ended up doing was using schemas to divide the table space that way I could keep the tables grouped but still query them all.
Just to add a bit more information.
There is no way to query a database other than the current one. Because PostgreSQL loads database-specific system catalogs, it is uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of course, a client can also make simultaneous connections to different databases and merge the results on the client side.
PostgreSQL FAQ
Yes, you can by using DBlink (postgresql only) and DBI-Link (allows foreign cross database queriers) and TDS_LInk which allows queries to be run against MS SQL server.
I have used DB-Link and TDS-link before with great success.
I have checked and tried to create a foreign key relationships between 2 tables in 2 different databases using both dblink and postgres_fdw but with no result.
Having read the other peoples feedback on this, for example here and here and in some other sources it looks like there is no way to do that currently:
The dblink and postgres_fdw indeed enable one to connect to and query tables in other databases, which is not possible with the standard Postgres, but they do not allow to establish foreign key relationships between tables in different databases.
If performance is important and most queries are read-only, I would suggest to replicate data over to another database. While this seems like unneeded duplication of data, it might help if indexes are required.
This can be done with simple on insert triggers which in turn call dblink to update another copy. There are also full-blown replication options (like Slony) but that's off-topic.
see https://www.cybertec-postgresql.com/en/joining-data-from-multiple-postgres-databases/ [published 2017]
These days you also have the option to use https://prestodb.io/
You can run SQL on that PrestoDB node and it will distribute the SQL query as required. It can connect to the same node twice for different databases, or it might be connecting to different nodes on different hosts.
It does not support:
DELETE
ALTER TABLE
CREATE TABLE (CREATE TABLE AS is supported)
GRANT
REVOKE
SHOW GRANTS
SHOW ROLES
SHOW ROLE GRANTS
So you should only use it for SELECT and JOIN needs. Connect directly to each database for the above needs. (It looks like you can also INSERT or UPDATE which is nice)
Client applications connect to PrestoDB primarily using JDBC, but other types of connection are possible including a Tableu compatible web API
This is an open source tool governed by the Linux Foundation and Presto Foundation.
The founding members of the Presto Foundation are: Facebook, Uber,
Twitter, and Alibaba.
The current members are: Facebook, Uber, Twitter, Alibaba, Alluxio,
Ahana, Upsolver, and Intel.
In case someone needs a more involved example on how to do cross-database queries, here's an example that cleans up the databasechangeloglock table on every database that has it:
CREATE EXTENSION IF NOT EXISTS dblink;
DO
$$
DECLARE database_name TEXT;
DECLARE conn_template TEXT;
DECLARE conn_string TEXT;
DECLARE table_exists Boolean;
BEGIN
conn_template = 'user=myuser password=mypass dbname=';
FOR database_name IN
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
conn_string = conn_template || database_name;
table_exists = (select table_exists_ from dblink(conn_string, '(select Count(*) > 0 from information_schema.tables where table_name = ''databasechangeloglock'')') as (table_exists_ Boolean));
IF table_exists THEN
perform dblink_exec(conn_string, 'delete from databasechangeloglock');
END IF;
END LOOP;
END
$$
I tried searching for it but couldn't find out
What is the best way to copy data from Redshift to Postgresql Database ?
using Talend job/any other tool/code ,etc
anyhow i want to transfer data from Redshift to PostgreSQL database
also,you can use any third party database tool if it has similar kind of functionality.
Also,as far as I know,we can do so using AWS Data Migration Service,but not sure our source db and destination db matches that criteria or not
Can anyone please suggest something better ?
The way I do it is with a Postgres Foreign Data Wrapper and dblink,
This way, the redshift table is available directly within Postgres.
Follow the instructions here to set it up https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink/
The important part of that link is this code:
CREATE EXTENSION postgres_fdw;
CREATE EXTENSION dblink;
CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '<amazon_redshift _ip>', port '<port>', dbname '<database_name>', sslmode 'require');
CREATE USER MAPPING FOR <rds_postgresql_username>
SERVER foreign_server
OPTIONS (user '<amazon_redshift_username>', password '<password>');
For my use case I then set up a postgres materialised view with indexes based upon that.
create materialized view if not exists your_new_view as
SELECT some,
columns,
etc
FROM dblink('foreign_server'::text, '
<the redshift sql>
'::text) t1(some bigint, columns bigint, etc character varying(50));
create unique index if not exists index1
on your_new_view (some);
create index if not exists index2
on your_new_view (columns);
Then on a regular basis I run (on postgres)
REFRESH MATERIALIZED VIEW your_new_view;
or
REFRESH MATERIALIZED VIEW CONCURRENTLY your_new_view;
In the past, I managed to transfer data from one PostgreSQL database to another by doing a pg_dump and piping the output as an SQL command to the second instance.
Amazon Redshift is based on PostgreSQL, so this method should work, too.
You can control whether pg_dump should include the DDL to create tables, or whether it should just load the data (--data-only).
See: PostgreSQL: Documentation: 8.0: pg_dump
I'm going to guess that the answer is "no" based on the below error message (and this Google result), but is there anyway to perform a cross-database query using PostgreSQL?
databaseA=# select * from databaseB.public.someTableName;
ERROR: cross-database references are not implemented:
"databaseB.public.someTableName"
I'm working with some data that is partitioned across two databases although data is really shared between the two (userid columns in one database come from the users table in the other database). I have no idea why these are two separate databases instead of schema, but c'est la vie...
Note: As the original asker implied, if you are setting up two databases on the same machine you probably want to make two schemas instead - in that case you don't need anything special to query across them.
postgres_fdw
Use postgres_fdw (foreign data wrapper) to connect to tables in any Postgres database - local or remote.
Note that there are foreign data wrappers for other popular data sources. At this time, only postgres_fdw and file_fdw are part of the official Postgres distribution.
For Postgres versions before 9.3
Versions this old are no longer supported, but if you need to do this in a pre-2013 Postgres installation, there is a function called dblink.
I've never used it, but it is maintained and distributed with the rest of PostgreSQL. If you're using the version of PostgreSQL that came with your Linux distro, you might need to install a package called postgresql-contrib.
dblink() -- executes a query in a remote database
dblink executes a query (usually a SELECT, but it can be any SQL
statement that returns rows) in a remote database.
When two text arguments are given, the first one is first looked up as
a persistent connection's name; if found, the command is executed on
that connection. If not found, the first argument is treated as a
connection info string as for dblink_connect, and the indicated
connection is made just for the duration of this command.
one of the good example:
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
Note: I am giving this information for future reference. Reference
I have run into this before an came to the same conclusion about cross database queries as you. What I ended up doing was using schemas to divide the table space that way I could keep the tables grouped but still query them all.
Just to add a bit more information.
There is no way to query a database other than the current one. Because PostgreSQL loads database-specific system catalogs, it is uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of course, a client can also make simultaneous connections to different databases and merge the results on the client side.
PostgreSQL FAQ
Yes, you can by using DBlink (postgresql only) and DBI-Link (allows foreign cross database queriers) and TDS_LInk which allows queries to be run against MS SQL server.
I have used DB-Link and TDS-link before with great success.
I have checked and tried to create a foreign key relationships between 2 tables in 2 different databases using both dblink and postgres_fdw but with no result.
Having read the other peoples feedback on this, for example here and here and in some other sources it looks like there is no way to do that currently:
The dblink and postgres_fdw indeed enable one to connect to and query tables in other databases, which is not possible with the standard Postgres, but they do not allow to establish foreign key relationships between tables in different databases.
If performance is important and most queries are read-only, I would suggest to replicate data over to another database. While this seems like unneeded duplication of data, it might help if indexes are required.
This can be done with simple on insert triggers which in turn call dblink to update another copy. There are also full-blown replication options (like Slony) but that's off-topic.
see https://www.cybertec-postgresql.com/en/joining-data-from-multiple-postgres-databases/ [published 2017]
These days you also have the option to use https://prestodb.io/
You can run SQL on that PrestoDB node and it will distribute the SQL query as required. It can connect to the same node twice for different databases, or it might be connecting to different nodes on different hosts.
It does not support:
DELETE
ALTER TABLE
CREATE TABLE (CREATE TABLE AS is supported)
GRANT
REVOKE
SHOW GRANTS
SHOW ROLES
SHOW ROLE GRANTS
So you should only use it for SELECT and JOIN needs. Connect directly to each database for the above needs. (It looks like you can also INSERT or UPDATE which is nice)
Client applications connect to PrestoDB primarily using JDBC, but other types of connection are possible including a Tableu compatible web API
This is an open source tool governed by the Linux Foundation and Presto Foundation.
The founding members of the Presto Foundation are: Facebook, Uber,
Twitter, and Alibaba.
The current members are: Facebook, Uber, Twitter, Alibaba, Alluxio,
Ahana, Upsolver, and Intel.
In case someone needs a more involved example on how to do cross-database queries, here's an example that cleans up the databasechangeloglock table on every database that has it:
CREATE EXTENSION IF NOT EXISTS dblink;
DO
$$
DECLARE database_name TEXT;
DECLARE conn_template TEXT;
DECLARE conn_string TEXT;
DECLARE table_exists Boolean;
BEGIN
conn_template = 'user=myuser password=mypass dbname=';
FOR database_name IN
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
conn_string = conn_template || database_name;
table_exists = (select table_exists_ from dblink(conn_string, '(select Count(*) > 0 from information_schema.tables where table_name = ''databasechangeloglock'')') as (table_exists_ Boolean));
IF table_exists THEN
perform dblink_exec(conn_string, 'delete from databasechangeloglock');
END IF;
END LOOP;
END
$$
I am new to PostgreSQL. I have 2 databases in PostgreSQL 9.0, db1 and db2, and with db2 I have read only access. I want to create a stored function that would be otherwise easily accomplished with a JOIN or a nested query, something PostgreSQL can't do across databases.
In db1, I have table1 where I can query for a set of foreign keys keys that I can use to search for records in a table2 in db2, something like:
SELECT * from db2.table2 WHERE db2.table2.primary_key IN (
SELECT db1.table1.foreign_key FROM db1.table1 WHERE
db1.table1.primary_key="whatever");
What is the best practice for doing this in Postgres? I can't use a temporary tables in db2, and passing in the foreign keys as a parameter in a stored function running in db2 doesn't seem like a good solution.
Note: the keys are all VARCHAR(11)
You'll want to look into the db_link contrib.
As an aside if you're familiar with C, there also is a cute functionality called foreign data wrappers. It allows to manipulate pretty much any source using plain SQL. Example with Twitter:
SELECT from_user, created_at, text FROM twitter WHERE q = '#postgresql';