Moving a table from a database to another - Only insert missing rows - postgresql

I have two databases that are alike, one called datastore and the other called datarestore.
datarestore is a copy of datastore which was created from a backup image. The problem is that I accidentally deleted a little too much data from datastore.
Both databases are located on different AWS instances and I typically connect to them using pgAdmin III or Python to create scripts that handle the data.
I want to get the rows that I accidentally deleted from datastore which are in datarestore into datastore. Does anyone have any idea of how this can be achieved. Both databases contain close to 1.000.000.000 rows and are on version 9.6.
I have seen some backup/import/restore options within pgAdmin III, I just don't know how they work and if they support my needs? I also thought about creating a python script, but querying my database has become pretty slow, so this seems not to be an option either.
-----------------------------------------------------
| id (serial - auto incrementing int) | - primary key
| did (varchar) |
| sensorid (int) |
| timestamp (bigint) |
| data (json) |
| db_timestamp (bigint) |
-----------------------------------------------------

If you preserved primary keys between those databases then you could create foreign tables pointing from datarestore to datastore and check what keys are missing (using for example select pk from old_table except select pk from new_table) and fetch those missing rows using the same foreign table you created. This should limit your first check for missing PK to just index only scans (+ network transfer) and then it will be index scan to fetch missing data. If you are missing only small part of it then it shouldn't take long.
If you require more detailed example then I'll update my answer.
EDIT:
Example of foreign table/server usage
Those commands need to be exuecuted on datarestore (or datastore if you choose to push data instead of pulling it).
If you don't have foreign data wrapper "installed" yet:
CREATE EXTENSION postgres_fdw;
This will create virtual server on your datarestore host. It is just some metadata pointing at foreign server:
CREATE SERVER foreign_datastore FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'foreign_hostname', dbname 'foreign_database_name',
port '5432_or_whatever_you_have_on_datastore_host');
This will tell your datarestore host what user should it connect as when using fdw on server foreign_datastore. It will be used only for your_local_role_name logged in on datarestore:
CREATE USER MAPPING FOR your_local_role_name SERVER foreign_datastore
OPTIONS (user 'foreign_username', password 'foreign_password');
You need to create schema on datarestore. It is where new foreign tables will be created.
CREATE SCHEMA schema_where_foreign_tables_will_be_created;
This will log in to remote host and create foreign tables on datarestore, pointing to tables at datastore. ONLY tables will be done this way.
No data will be copied, just structure of tables.
IMPORT FOREIGN SCHEMA foreign_datastore_schema_name_goes_here
FROM SERVER foreign_datastore INTO schema_where_foreign_tables_will_be_created;
This will return list of id that are missing in your datarestore database for this table
SELECT id FROM foreign_datastore_schema_name_goes_here.table_a
EXCEPT
SELECT id FROM datarestore_schema.table_a
You can either store them in temp table (CREATE TABLE table_a_missing_pk AS [query from above here]
Or use them right away:
INSERT INTO datarestore_schema.table_a (id, did, sensorid, timestamp, data, db_timestamp)
SELECT id, did, sensorid, timestamp, data, db_timestamp
FROM foreign_datastore_schema_name_goes_here.table_a
WHERE id = ANY((
SELECT array_agg(id)
FROM (
SELECT id FROM foreign_datastore_schema_name_goes_here.table_a
EXCEPT
SELECT id FROM datarestore_schema.table_a
) sub
)::int[])
From my tests, this should push-down (meaning send to remote host) something like that:
Remote SQL: SELECT id, did, sensorid, timestamp, data, db_timestamp
FROM foreign_datastore_schema_name_goes_here.table_a WHERE ((id = ANY ($1::integer[])))
You can make sure it does by running explain verbose on your full query to see what plan it will execute. You should see Remote SQL in there.
In case it does not work as expected, you can instead create temp table as mentioned earlier and make sure that this temp table is on datastore host.
Alternative approach would be to create foreign server on datastore pointing to datarestore and push data from your old database to new one (you can insert into foreign tables). This way you won't have to worry about list of id not being pushed down to datastore and instead fetching all data and filtering them afterwards (with would be extremely slow).

Related

Is possible to use STDIN with CREATE FOREIGN TABLE?

I was looking for the STDIN option, to use FOREIGN TABLE in similar way as COPY... And discovery a "bug" in the Guide: there are no documentation about options at official sql-create-foreign-table Guide. No link, nothing:
OPTIONS ( option 'value' [, ...] )
Options to be associated with the new foreign table or one of its columns. ...
So, to lack of information transformed this question in two:
It is possible to use STDIN with FOREIGN TABLE?
Where the "OPTIONS" documentation?
edit to add example
CREATE FOREIGN TABLE t1 (
aa text,
bb bigint
) SERVER files OPTIONS (
filename '/tmp/bigBigdata.csv',
format 'csv',
header 'true'
;
Is a classic ugly PostgreSQL limitation on use filesystem, so I need a terminal solution ... Imagine on shell something with pipes, as
psql -c "ALTER FOREIGN TABLE t1 ... STDIN; CREATE TABLE t2 AS SELECT trim(aa) as aa, bb+1 as bb FROM t WHERE bb>999" < /thePath/bigBigdata.csv
Is a kind of "no direct copy, only filtering a stream of data", and creating a final table t2 from this filtered stream.
I think you are confused about foreign tables, I'll try to explain.
The data of a foreign table do not reside in PostgreSQL, but in an external data source (a file, a different database, etc.).
The foreign table is just a way to access these data from PostgreSQL as if they were a PostgreSQL table.
You can COPY to a foreign table FROM STDIN if the foreign data wrapper supports it, but that has nothing to with CREATE FOREIGN TABLE. CREATE FOREIGN TABLE defines how PostgreSQL should locate the external data and what the format of the data is.
There is no documentation of the options in CREATE FOREIGN TABLE because they depend on the foreign data wrapper you are using.
Look at the documentation of the foreign data wrapper.
Your example makes clear that what you need is not a foreign table, but a temporary table into which you can COPY the raw data which you later want to modify. You cannot use file_fdw for data that resides on the client machine.

Logging records in a postgresql database

I am having trouble thinking of a way to copy three fields out of a database into and append them to another table along with the current date. Basically what I want to do is:
DB-A: ID (N9), Name (C69), Phone (N15) {and a list of other fields I dont care about}
DB-B: Date (Todays date/time), Nane, Address, Phone (as above)
Would be great is this was a trigger in the DB on add or update of DB-A.
Greg
Quick and dirty using postgres_fdw
CREATE EXTENSION IF NOT EXISTS postgres_fdw ;
CREATE SERVER extern_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'foreignserver.co.uk', port '5432', dbname 'mydb');
CREATE USER MAPPING FOR myuser SERVER extern_server OPTIONS (user 'anotheruser');
-- Creating a foreign table based on table t1 at the server described above
CREATE FOREIGN TABLE foreign_t1 (
dba INT,
name VARCHAR(9),
phone VARCHAR(15)
)
SERVER extern_server OPTIONS (schema_name 'public', table_name 't1');
--Inserting data to a new table + date
INSERT INTO t2 SELECT dba,name,phone,CURRENT_DATE FROM foreign_t1;
-- Or just retrieving what you need placing the current date as a column
SELECT dba,name,phone,CURRENT_DATE FROM foreign_t1;

How to Check if a Foreign Key Exists on a Specific Table in PostgreSQL

I have a foreign key named user__fk__store_id that was supposed to be created on the user table.
However, I made a mistake and instead have it created on another table I have named client.
My servers have an automated process that reads from a JSON file I created with what new tables to create, remove, etc... Every time a server needs to be upgraded with new stuff, it will run through this JSON file and run the queries it needs to.
In this case, in the json file, I'm trying to have it drop the existing incorrect foreign key constraint that was created on the client table, and recreate it correctly on the user table. So technically, it should be running these 2 queries back to back:
ALTER TABLE client DROP CONSTRAINT user__fk__store_id;
ALTER TABLE user ADD CONSTRAINT user__fk__store_id;
The problem I'm having is I can't figure out the query to run in order to see if the user__fk__store_id exists on the client table. I only know how to check if the constraint exists on any table in the database with the following:
SELECT COUNT(1) FROM pg_constraint WHERE conname='user__fk__store_id';
This would be a problem because this means every time I run my upgrade script on my servers, it will always think the constraint of that name already exists, but when it attempts to run the drop query it will error out because it can't find that constraint in the client table.
Is there a query I can run to check not just if the constraint exists, but also if it exists in a specific table?
I found the answer to my own question, I can just run the following query:
SELECT COUNT(1) FROM information_schema.table_constraints WHERE constraint_name='user__fk__store_id' AND table_name='client';
Above answer works fine, but following returns true/false and also checks for schema
SELECT EXISTS (SELECT 1 FROM information_schema.table_constraints
WHERE table_schema='schema_name' AND table_name='MyTable' AND
constraint_name='myTable_fkName_fkey');

how to copy derby table

I am using Eclipse, Java and a Derby database. I want to experiment with changing values that rewrite one of the tables in the db. Before starting the change I would like to copy the particular table (not in code) so that I can restore the original data if necessary. Sof ar googling and searching this site hasnt produced an answer. In Eclipse there is an option to export the db but it calls it a connection so I am not usre what would happen.
If you're not sure about how to connect to the database and issue sql statements, you will need to learn about JDBC. This is a good place to start.
If you're asking about the SQL, it's pretty straight forward. You can create a table based on a select statement.
e.g.
create table table2 as select * from table1 with no data;
Derby is a little strange in this area. You must specify the with no data, and the created table will be empty. You can then issue an insert that will populate the new table if you wish.
insert into table2 select * from table1;
The new table will not have indexes. You will need to create them if you want them. It might retain the primary key. You should check that if you're testing against it. If it doesn't retain the primary key, you should create the primary key before inserting data into the table.
In Eclipse there is an option to export the db but it calls it a connection so I am not sure what would happen.
If what Eclipse does isn't clear for you, you can just as well zip your entire database directory (content of DERBY_HOME env. variable) into an archive. The database must not be running while you make the backup.

Joining Results from Two Separate Databases

Is it possible to JOIN rows from two separate postgres databases?
I am working with system with couple databases in one server and sometimes I really need such a feature.
According to http://wiki.postgresql.org/wiki/FAQ
There is no way to query a database other than the current one.
Because PostgreSQL loads database-specific system catalogs, it is
uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of
course, a client can also make simultaneous connections to different
databases and merge the results on the client side.
EDIT: 3 years later (march 2014), this FAQ entry has been revised and is more helpful:
How do I perform queries using multiple databases?
There is no way to directly query a database other than the current
one. Because PostgreSQL loads database-specific system catalogs, it is
uncertain how a cross-database query should even behave.
The SQL/MED support in PostgreSQL allows a "foreign data wrapper" to
be created, linking tables in a remote database to the local database.
The remote database might be another database on the same PostgreSQL
instance, or a database half way around the world, it doesn't matter.
postgres_fdw is built-in to PostgreSQL 9.3 and includes read/write
support; a read-only version for 9.2 can be compiled and installed as
a contrib module.
contrib/dblink allows cross-database queries using function calls and
is available for much older PostgreSQL versions. Unlike postgres_fdw
it can't "push down" conditions to the remote server, so it'll often
land up fetching a lot more data than you need.
Of course, a client can also make simultaneous connections to
different databases and merge the results on the client side.
Forget about dblink!
Say hello to Postgres_FDW:
To prepare for remote access using postgres_fdw:
Install the postgres_fdw extension using CREATE EXTENSION.
Create a foreign server object, using CREATE SERVER, to represent each remote database you want to connect to. Specify connection
information, except user, and password, as options of the server
object.
Create a user mapping, using CREATE USER MAPPING, for each database user you want to allow to access each foreign server. Specify
the remote user name and password to use as user and password options
of the user mapping.
Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA, for each remote table you want to access. The columns
of the foreign table must match the referenced remote table. You can,
however, use table and/or column names different from the remote
table's, if you specify the correct remote names as options of the
foreign table object.
Now you need only SELECT from a foreign table to access the data
stored in its underlying remote table.
It's really useful even on large data.
Yes, it is possible to do this using dblink albeit with significant performance considerations.
The following example will require the current SQL user to have permissions on both databases. If db2 is not located on the same cluster, then you will need to replace dbname=db2 with the full connection string defined in the dblink documentation.
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
If table2 is very large, you could have performance issues because the sub-query loads up the entire table2 before performing the join.
No you can't. You could use dblink to connect from one database to another database, but that won't help if you're looking for JOIN's.
You can't use different SCHEMA's within a single database to store all you data?
Just a few steps and You can reach the goal:
follow this reference step by step
WE HAVE BEEN CONNECTED TO DB2 WITH TABLE TBL2 AND COLUMN COL2
ALSO THERE IS DB1 WITH TBL1 AND COLUMN COL1
*** connecting to second db ie db2
Now just **copy paste the 1-7 processes** (make sure u use correct username and password and ofcourse db name)
1.**CREATE EXTENSION dblink;**
2.**SELECT pg_namespace.nspname, pg_proc.proname
FROM pg_proc, pg_namespace
WHERE pg_proc.pronamespace=pg_namespace.oid
AND pg_proc.proname LIKE '%dblink%';**
3.**SELECT dblink_connect('host=localhost user=postgres password=postgres dbname=db1');**
4.**CREATE FOREIGN DATA WRAPPER postgres VALIDATOR postgresql_fdw_validator;**
5.**CREATE SERVER postgres2 FOREIGN DATA WRAPPER postgres OPTIONS (hostaddr '127.0.0.1', dbname 'db1');**
6.**CREATE USER MAPPING FOR postgres SERVER postgres2 OPTIONS (user 'postgres', password 'postgres');**
7.**SELECT dblink_connect('postgres2');**
---Now, you can SELECT the data of Database_One from Database_Two and even join both db results:
**SELECT * FROM public.dblink
('postgres2','SELECT col1,um_name FROM public.tbl1 ')
AS DATA(um_userid INTEGER),tbl2 where DATA.col1=tbl2.col2;**
You can also Check this :[How to join two tables of different databases together in postgresql [\[working finely in version 9.4\]][1]
You need to use dblink...as araqnid mentioned above, something like this works fine:
select ST.Table_Name, ST.Column_Name, DV.Table_Name, DV.Column_Name, *
from information_schema.Columns ST
full outer join dblink('dbname=otherdatabase','select Table_Name,
Column_Name from information_schema.Columns') DV(Table_Name text,
Column_Name text)
on ST.Table_Name = DV.Table_name
and ST.Column_Name = DV.Column_Name
where ST.Column_Name is null or DV.Column_Name is NULL
You have use dblink extension of postgresql.
Reference take from this Article:
DbLink extension of PostgreSQL which is used to connect one database to another database.
Install DbLink extension.
CREATE EXTENSION dblink;
Verify DbLink:
SELECT pg_namespace.nspname, pg_proc.proname
FROM pg_proc, pg_namespace
WHERE pg_proc.pronamespace=pg_namespace.oid
AND pg_proc.proname LIKE '%dblink%';
I have already prepared full demonstration on this. Please visit my post to learn step by step for executing cross database query in Postgresql.
Cannot be done? Of course we can, without special extensions. In our case, we had to compare two tables from different database servers, e.g. ACC and PROD, hence an even harder case than from most answers. Especially because ACC and PROD are deliberately on different servers to create a barrier, so you will not easily gain enough rights to perform a GRANT USAGE ON FOREIGN SERVER.
The obvious solution is to export both tables, and import both in the same database, e.g. DEV, or your own local db, under appropriate names, e.g. table1_acc and table1_prod, or schemas like acc and prod. Then, you may JOIN those with no special problems.