PG_DUMP to Skip Certain Items in PG_LARGEOBJECT? - postgresql

I am newer to PostgreSQL than Oracle. So, I'm going to explain what I do in Oracle and then ask if anyone knows if there is a way to do this in PostgreSQL.
I have over 300 tables in Oracle. Of them, some contain LOBs. Two of the ones I know, which consume a ton of space to house PDFs, are called JP_PDFS and JP_PRELIMPDFS. When I need a copy of this database DMP to transport to someone else, I don't need the contents of these tables to do a majority of so many other troubleshooting steps. So, I can export this database in two DMP files; one with the exclude directive and one with the include + content directives:
expdp full=n schemas=mySchema directory=DMPs dumpfile=mySchema_PDFS_schema.dmp logfile=expdp1.log include=TABLE:"IN('JP_PDFS','JP_PRELIMPDFS')" content=metadata_only
expdp full=n schemas=mySchema directory=DMPs dumpfile=mySchema_noPDFS.dmp logfile=expdp2.log exclude=TABLE:"IN('JP_PDFS','JP_PRELIMPDFS')"
Unfortunately, all LOBs are stored in PG_LARGEOBJECT. The references/pointers to the actual LO rows that comprise the LOB is stored in the aforementioned tables. But, there ARE other tables with LOBs I DO need to be exported to the .backup file with pg_dump.
What I want is a way to do what I do in the Oracle world with PostgreSQL. I know how to export the schemas, only, for JP_PDFS and JP_PRELIMPDFS. But, is there a way to tell pg_dump to not include the objects from PG_LARGEOBJECT for the referenced items from both JP_PDFS and JP_PRELIMPDFS?
Thanks!

No, large objects don't “belong” to anybody. Either all of them are dumped (for a dump of the complete database ot if the -b option of pg_dump was used) or none.
Large objects are cumbersome and require a special API. If the size of your binary data doesn't exceed 1GB, consider using the data type bytea for them. That is much easier to handle and will work like you expect.

Related

How to replicate a Postgres DB with only a sample of the data

I'm attempting to mock a database for testing purposes. What I'd like to do is given a connection to an existing Postgres DB, retrieve the schema, limit the data pulled to 1000 rows from each table, and persist both of these components as a file which can later be imported into a local database.
pg_dump doesn't seem to fullfill my requirements as theres no way to tell it to only retrieve a limited amount of rows from tables, its all or nothing.
COPY/\copy commands can help fill this gap, however, it doesn't seem like theres a way to copy data from multiple tables into a single file. I'd rather avoid having to create a single file per table, is there a way to work around this?

Difference between copy/migrate/export in SQL developer

I am using Oracle SQL developer, it has the following tools,
DATABASE copy, DATABASE export and Migrate.
I want to move one schema and all the data in it from one server to another.
What is the difference between these options? Does anything serve what I am looking for?
Database Copy is probably what you want.
Supply two database connections, and we'll take objects and data and copy them from one database to another.
However, if your schema is large, this will be inefficient. The Copy routine does inserts, row-by-row across the jdbc connections.
Database Export takes the objects and data and offloads them to flat files. These flat files could then be used later to put in another database.
Migrate is used to take a database from SQL Server, Sybase, Teradata, Redshift, DB2, etc. to Oracle. It has an online (jdbc row-by-row) data copy and an offline (flat files for SQL Loader) data move mode. For SQL Server/Sybase, we can also translate the T-SQL stored procedures to PL/SQL.
Your solution might also lie elsewhere - Data Pump. We have a wizard for that as well, and works great for very large schemas/databases. You'll just need access to the database OS so you can put the DMP files into a Database Directory.

Migrating a schema from one database to other

As part of some requirement, I need to migrate a schema from some existing database to a new schema in a different database. Some part of it is already done and now I need to compare the 2 schema and make changes in the new schema as per gap finding.
I am not using a tool and was trying to understand some details using syscat command but could not get much success.
Any pointer on what is the best way to solve this?
Regards,
Ramakant
A tool really is the best way to solve this – IBM Data Studio is free and can compare schemas between databases.
Assuming you are using DB2 for Linux/UNIX/Windows, you can do a rudimentary compare by looking at selected columns in SYSCAT.TABLES and SYSCAT.COLUMNS (for table definitions), and SYSCAT.INDEXES (for indexes). Exporting this data to files and using diff may be the easiest method. However, doing this for more complex structures (tables with range or database partitioning, foreign keys, etc) will become very complex very quickly as this information is spread across a lot of different system catalog tables.
An alternative method would be to extract DDL using the db2look utility. However, you can't specify the order that db2look outputs objects (db2look extracts DDL based on the objects' CREATE_TIME), so you can't extract DDL for an entire schema into a file and expect to use diff to compare. You would need to extract DDL into a separate file for each table.
Use SchemaCrawler for IBM DB2, a free open-source tool that is designed to produce text output that is designed to be diffed. You can get very detailed information about your schema, including view and stored procedure definitions. All of the information that you need will be output in a single file, and can be compared very easily using a standard diff tool.
Sualeh Fatehi, SchemaCrawler
unfortunately as per company policy, cannot use these tools at this point of time. So am writing some program using JDBC to get the details and do some comparison kind of stuff.

Physical location of objects in a PostgreSQL database?

I'm interested to get the physical locations of tables, views, functions, data/content available in the tables of PostgreSQL in Linux OS. I've a scenario that PostgreSQL could be installed in SD-Card facility and Hard-Disk. If I've tables, views, functions, data in SD, I want to get the physical locations of the same and merge/copy into my hard-disk whenever I wish to replace the storage space. I hope the storage of database should be in terms of plain files architecture.
Also, is it possible to view the contents of the files? I mean, can I access them?
Kevin and Mike already provided pointers where to find the data directory. For the physical location of a table in the file system, use:
SELECT pg_relation_filepath('my_table');
Don't mess with the files directly unless you know exactly what you are doing.
A database as a whole is represented by a subdirectory in PGDATA/base:
If you use tablespaces it gets more complicated. Read details in the chapter Database File Layout in the manual:
For each database in the cluster there is a subdirectory within
PGDATA/base, named after the database's OID in pg_database. This
subdirectory is the default location for the database's files; in
particular, its system catalogs are stored there.
...
Each table and index is stored in a separate file. For ordinary
relations, these files are named after the table or index's filenode
number, which can be found in pg_class.relfilenode.
...
The pg_relation_filepath() function shows the entire path (relative to
PGDATA) of any relation.
Bold emphasis mine.
The manual about the function pg_relation_filepath().
The query show data_directory; will show you the main data directory. But that doesn't necessarily tell you where things are stored.
PostgreSQL lets you define new tablespaces. A tablespace is a named directory in the filesystem. PostgreSQL lets you store individual tables, indexes, and entire databases in any permissible tablespace. So if a database were created in a specific tablespace, I believe none of its objects would appear in the data directory.
For solid run-time information about where things are stored on disk, you'll probably need to query pg_database, pg_tablespace, or pg_tables from the system catalogs. Tablespace information might also be available in the information_schema views.
But for merging or copying to your hard disk, using these files is almost certainly a Bad Thing. For that kind of work, pg_dump is your friend.
If you're talking about copying the disk files as a form of backup, you should probably read this, especially the section on Continuous Archiving and Point-in-Time Recovery (PITR):
http://www.postgresql.org/docs/current/interactive/backup.html
If you're thinking about trying to directly access and interpret data in the disk files, bypassing the database management system, that is a very bad idea for a lot of reasons. For one, the storage scheme is very complex. For another, it tends to change in every new major release (issued once per year). Thirdly, the ghost of E.F. Codd will probably haunt you; see rules 8, 9, 11, and 12 of Codd's 12 rules.

suggest a postgres tool to find the difference between the schema and the data

Dear all ,
Can any one suggest me the postgres tool for linux which is used to find the
difference between the 2 given database
I tried with the apgdiff 2.3 but it gives the difference in terms of schema not the data
but I need both !
Thanks in advance !
Comparing data is not easy especially if your database is huge. I created Python program that can dump PostgreSQL data schema to file that can be easily compared via 3rd party diff programm: http://code.activestate.com/recipes/576557-dump-postgresql-db-schema-to-text/?in=user-186902
I think that this program can be extended by dumping all tables data into separate CSV files, similar to those used by PostgreSQL COPY command. Remember to add the same ORDER BY in SELECT ... queries. I have created tool that reads SELECT statements from file and saves results in separate files. This way I can manage which tables and fields I want to compare (not all fields can be used in ORDER BY, and not all are important for me). Such configuration can be easily created using "dump schema" utility.
Check out dbsolo DBSOLO. It does both object and data compares and can create a sync script based on the results. It's free to try and $99 to buy. My guess is the 99 bucks will be money well spent to avoid trying to come up with your own software to do this.
Data Compare
http://www.dbsolo.com/help/datacomp.html
Object Compare
http://www.dbsolo.com/help/compare.html
apgdiff https://www.apgdiff.com/
It's an opensource solution. I used it before for checking differences between differences in dumps. Quite useful
[EDIT]
It's for differenting by schema only