IBM DB2 9.7 archiving specific tables and columns - db2

I would like to periodically, e.g. once a year, archive a set of table rows from our DB2 9.7 database based on some criteria. So e.g. once a year, archive all EMPLOYEE rows that have a creation date older than 1 year ago?
By archive I mean that the data is moved out of the DB schema and stored in some other location, in a retrievable format. Is this possible to do?

System doesn't need to access archived data
If you don't need to access archived data by your program then I would suggest this:
create export (reference here) script, e.g.:
echo '===================== export started ';
values current time;
-- maybe ixf format would be better?
export to tablename.del of del
select * from tablename
where creation_date < (current date - 1 year)
;
echo '===================== export finished ';
create delete db2 script, e.g.:
echo '===================== delete started ';
values current time;
delete from tablename.del of del
where creation_date < (current date - 1 year)
;
commit;
echo '===================== delete finished ';
write batch script which calls everything copies new exported file to the safe location. When calling the script we want to ensure that delete is not done until data is placed on safe:
db2 connect to db user xx using xxx
db2 -s -vtf export.sql
7z a safe-location-<date-time>.7z tablename.del
if no errors till now:
db2 -s -vtf delete.sql
register batch script as a cron job to do this automatically
Again, since deleting is very sensitive operation, I would suggest to have more than one backup mechanisms to ensure that no data will be lost (e.g. delete to have some different timeframe - e.g. delete older than 1.5 year).
System should access archived data
If you need your system to access archived data, then I would suggest one of the following methods:
export / import to other db or table / delete
stored procedure which does select + insert to other db or table / delete - for example you can adapt sp in answer nr. 3 in this question
do table partitioning - reference here

Sure, why not? One fairly straight forward way is to write a stored procedure thab basically would:
extract all records of the a given table you wish to archive into a temp table,
insert those temp records into the archive table,
delete from the given table where the primary key is IN the temp table
If you wanted only a subset of columns to go into your archive, you could extract from a view containing just those columns, as long as you still capture the primary key in your temp file.

Related

Is there a way to filter pg_dump by timestamp for PostgreSQL?

I have a database that needs backing up, but only for specific timestamps and tables, like from the first of October to the 15th of October. After looking up on multiple sites, I have not found any methods that can suit my requirements.
Let's say I have database_A, and database A has 15 tables. I want to be able to use pg_dump to back up 10 tables from database_A, from the 1st of October to the 15th of October, all into 1 file. Below is what I have managed to do, but have not gotten the date portion yet as I'm not entirely sure.
pg_dump -U postgres -t"\"table_1\"" -t"\"table_2\"" database_A > backup.csv
This above code will work if I want to back up multiple tables into one file, and it will back up the entire table, from start to end.
I would much appreciate if someone could help me with this, as I am still mostly a beginner at this. Thank you!
If the data you're copying has a column named timestamp you can use psql and the COPY command to accomplish this:
# Optional: clear existing table since COPY FROM will append data
psql -c "TRUNCATE TABLE my_table" target_db
psql -c "COPY (SELECT * FROM my_table WHERE timestamp >= '...' AND timestamp <= '...') TO STDOUT" source_db | psql -c "COPY my_table FROM STDIN" target_db
You can repeat this pattern for as many tables as necessary. I've used this approach before to copy a subset of live data into a development database and it works quite well, especially if you put the above commands into a shell script.

Talend - Insert/Update versus table-file-table

I have been using insert/update to update or insert a table in mysql from sql server. The job is set up as a cronjob. The job runs every 8 hours. The number of records in the source table is around 400000. Every 8 hours around 100 records might get updated or inserted.
I run the job in such a away that at the source level, I only take the modified runs between the last run and the current run.
I have observed that just to update / insert 100 rows the time taken is 30 minutes.
However, another way was to dump all of the 400000 in a file and then truncate the destination table and insert all of those records all over again. This process is done at every job run
So, now may I know why does insert/update take so much time?
Thanks
Rathi
As you said you run the job in such a away that at the source level, I only take the modified runs between the last run and the current run.
So just insert all these modified rows in a temp table
Take the min date modified date from temp table or use the same criteria which you use to extract only modified rows from source and delete all the rows from the destination table.
Then you can insert all the rows from temp to end table.
Let me know if you have any question.
Without knowing how your database is configured, it's hard to tell the exact reason, but I'd say the updates are slow because you don't have an index on your target table.
Try adding an index on your insert/update key column, it will speed things up.
Also, are you doing a commit after each insert ? If so, disable autocommit, and only commit on success like this : tMysqlOutput -- OnComponentOk -- tMysqlCommit.

How do I find load history of a Redshift table?

I loaded data that isn't passing QA, and need to go back to a previous day where the data matched QA results. Is there a system table that I can query to see dates where the table was previously loaded or copied from S3?
I'm mostly interested in finding the date information of previous loads.
SELECT Substring(querytxt, 5, Regexp_instr(querytxt, ' FROM') - 5),
querytxt
FROM stl_query s1,
stl_load_commits s2
WHERE s1.query = s2.query
AND Upper(s1.querytxt) LIKE '%COPY%'
AND Lower(s1.querytxt) LIKE '%s3://%'
You can use these tables all load ( copy) STL_LOAD_COMMITS
This table for all load error STL_LOAD_ERRORS
To do rollback, there is no direct process. What we need to do is create copy of data into some _QA_passed table before we do fresh load into table and rename tables if we have some issues.
Other way is if you have date , using which you are loading some data, then you can use delete query to remove , fresh data which is not good and run vacuum commands to free up space.

Command to read a file and execute script with psql

I am using PostgreSQL 9.0.3. I have an Excel spreadsheet with lots of data to load into couple of tables in Windows OS.
I have written the script to get the data from input file and Insert into some 15 tables. This can't be done with COPY or Import. I named the input file as DATALD.
I find out the psql command -d to point the db and -f for the script sql. But I need to know the commands how to feed the input file along with the script so that the data gets inserted into the tables..
For example this is what I have done:
begin
for emp in (select distinct w_name from DATALD where w_name <> 'w_name')
--insert in a loop
INSERT INTO tblemployer( id_employer, employer_name,date_created, created_by)
VALUES (employer_id,emp.w_name,now(),'SYSTEM1');
Can someone please help?
For an SQL script you must ..
either have the data inlined in your script (in the same file).
or you need to utilize COPY to import the data into Postgres.
I suppose you use a temporary staging table, since the format doesn't seem to fit the target tables. Code example:
How to bulk insert only new rows in PostreSQL
There are other options like pg_read_file(). But:
Use of these functions is restricted to superusers.
Intended for special purposes.

Postgresql, query results to new table

Windows/NET/ODBC
I would like to get query results to new table on some handy way which I can see through data adapter but I can't find a way to do it.
There is no much examples around to satisfy beginner's level on this.
Don't know temporary or not but after seeing results that table is no more needed so I can delete it 'by hand' or it can be deleted automatically.
This is what I try:
mCmd = New OdbcCommand("CREATE TEMP TABLE temp1 ON COMMIT DROP AS " & _
"SELECT dtbl_id, name, mystr, myint, myouble FROM " & myTable & " " & _
"WHERE myFlag='1' ORDER BY dtbl_id", mCon)
n = mCmd.ExecuteNonQuery
This run's without error and in 'n' I get correct number of matched rows!!
But with pgAdmin I don't see those table no where?? No matter if I look under opened transaction or after transaction is closed.
Second, should I define columns for temp1 table first or they can be made automatically based on query results (that would be nice!).
Please minimal example to illustrate me what to do based on upper code to get new table filled with query results.
A shorter way to do the same thing your current code does is with CREATE TEMPORARY TABLE AS SELECT ... . See the entry for CREATE TABLE AS in the manual.
Temporary tables are not visible outside the session ("connection") that created them, they're intended as a temporary location for data that the session will use in later queries. If you want a created table to be accessible from other sessions, don't use a TEMPORARY table.
Maybe you want UNLOGGED (9.2 or newer) for data that's generated and doesn't need to be durable, but must be visible to other sessions?
See related: Is there a way to access temporary tables of other sessions in PostgreSQL?