export/import all the information of a table - db2

For a mandatory assignment of a DB2 class I'm asked to write o procedure to export "export information about all xxx, delete all xxx and import the information again." where xxx is my table.
This procedure has to be as efficiently as possible.
I'm quite stuck here, quite naively I see two options
1) write a select * from xxx; drop ...; insert; using python or something
2) using some export/import utility of db2
But I can be totally wrong, suggestions?
what I've noticed is that there are not integrity constraints.

You can do that via "export/load/set integrity". I think it is the best way if you execute that in the server.
If you use python, you will have to use a odbc driver or similar to get the data, processes, etc.
If you use python just to execute the commands, it is ok, finally, it is just a call to the database.
If you execute the process in other machine, the net use is increased, and the performance is lower.
Using import, it is just like an "insert" per row in the file which uses a lot of transaction log. Instead, the load command, puts the data diretly in the tablespace and then check the referential integrity (faster process)
Finally, if you want to extract the information very fast, you can buy the IBM InfoSphere® Optim™ High Performance Unload for DB2 for Linux, UNIX and Windows

I have had a similar task before.
The solution is simple and sweet:
A simple export to csv; and once the data has been exported, the main thing is to TRUNCATE the table with your logs being disabled and then load the data back into the table.
EXPORT TO <FileName>.CSV OF DEL SELECT * FROM <TableName>;
ALTER TABLE <TableName> ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE;
LOAD FROM "./<FileName>.CSV" OF DEL INSERT INTO <TableName>;

Related

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

Command Line Interface (CLI) for SQLDeveloper

I rely on SQLDeveloper to edit and export a schema.
It works like a charm, and I can run import with sqlplus.
I have tried using sqlplus to generate the same schema export, with no result.
I cannot use the Oracle expdp tool, because I need an ASCII file to be able to diff it.
So the only option I have is SQLDeveloper.
I would like to automate the export (data + DDL) with a cron job on a Linux box, but I can't find a way to use SQLDeveloper from a command line to generate the export.
Any clue?
Short answer: no.
For just the schema side of things you may want checkout show create table equivalent in oracle sql which will get you the SQL source of the DDL.
Are you sure you want an ASCII file for the automated export of an entire DB though? I would be surprised if you really want to diff an entire export of a DB. This SO Answer may help a little though.
If you really want to get a full data dump plus DDL you will have to write your own script that gets the DDL as described in the first link and then select * and process each result into a sql insert.

Writing scripts for PostgreSQL to update database?

I need to write an update script that will check to see if certain tables, indexes, etc. exist in the database, and if not, create them. I've been unable to figure out how to do these checks, as I keep getting Syntax Error at IF messages when I type them into a query window in PgAdmin.
Do I have to do something like write a stored procedure in the public schema that does these updates using Pl/pgSQL and execute it to make the updates? Hopefully, I can just write a script that I can run without creating extra database objects to get the job done.
If you are on PostgreSQL 9.1, you can use CREATE TABLE ... IF NOT EXISTS
On 9.0 you can wrap your IF condition code into a DO block: http://www.postgresql.org/docs/current/static/sql-do.html
For anything before that, you will have to write a function to achieve what you want.
Have you looked into pg_tables?
select * from pg_tables;
This will return (among other things) the schemas and tables that exist in the database. Without knowing more of what you're looking for, this seems like a reasonable place to start.

In postgres is possible to dynamically build a INSERT script from a table?

Similar to SQL script to create insert script, I need to generate a list of INSERT from a table to load onto another database (sqlite), like the dump command. I do this for sync purposes.
I have limitations because this will run on a cloud server without acces to the filesystem, so I need to do this in the DB (I can do this in the app server, I'm asking if is possible to do this in the DB directly).
In the app server, I load a datatable, walk his fieldnames and datatypes and build a insert... I wonder if exist a way to do the same in the DB...
I am not entirely sure whether that helps, but you can use simple ETL tool like 'Pentaho Kettle'. I used it once for a similar function and it did not take me more than 10 min. You can also schedule the jobs. I am not sure whether it is supported in database level.
Thanks,
Shankar

Replicating data between Postgres DBs

I have a Postgres DB that is used by a chat application. The chat system often truncates these tables when they grow to big but I need this data copied to another Postgres database. I will not be truncating the tables in this DB.
How I can configure a few tables on the chat-system's database to replicate data to another Postgres database. Is there a quick way to accomplish this?
Slony can replicate only select tables, but I'm not sure how it handles truncates, and it can be a pain to configure.
You might also use something like pgpool to send copies of the insert statements to a second database.
You might modify the source of your chat application to do two writes (one to each db) when a new record is created.
You could just write a script in Perl/PHP/Python to read from one and write to another, then fire it by cron so that you're sure it gets run before truncation.
If you only copy a batch of rows every other day, you may be better off with a plain INSERT to a different schema in the same database or a different database in the same database cluster (you need something like dblink for that).
The safest / fastest solution in the same database would be a data-modifying CTE. Something along these lines:
WITH del AS (
DELETE FROM tbl
WHERE <some condition>
RETURNING *
)
INSERT INTO backup.tbl
SELECT * FROM del;
For true replication consider these official sources:
https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
https://www.postgresql.org/docs/current/runtime-config-replication.html