I am trying to export data to a db2 database using sqoop export with
--update-mode allowinsert
--update-key some_column
But I can’t determine if DB2 supports this feature. The sqoop export failure message is cryptic and doesn’t give any information as to what actually failed and I can’t find any examples of this being used specifically for DB2 anywhere.
Any information would be greatly appreciated.
After some further attempts/research, I found that DB2 does support
--update-key id
but does not yet have support for
--update-mode allowinsert
Meaning you cannot do upserts to DB2 using sqoop, but you can do updates and inserts separately. For updates, simply use --update-key id and for inserts, remove the argument altogether.
However, it should be noted that if your data set contains updates & inserts and you try to do an insert without the --update-key, the export will fail. You first need to separate your data into new records/updated records and then push each of them individually.
Lastly, doing an update where there are also new records will simply ignore the new records and will not fail. However, the new records will not be inserted even though the sqoop stdout tells you all records were exported.
Related
I am trying to import a dump file which consists of a table with its data c_emailnotificationtemplate which was generated by this command :-
mysqldump --host=10.88.129.238 --user=root --password client_1002
c_emailnotificationtemplate --single-transaction
--set-gtid-purged=OFF > c_emailnotificationtemplate.sql
But when I am trying to import this c_emailnotificationtemplate.sql to my database , my database gets locked , I am not able perform any query also data is not get inserted on the table.
I tried to add --skip-lock-tables on the command but it doesn't work
so is there any way I can skip the lock operation which is happening when I am trying to import the sql file.
some details
database:- client_1002 ,
tablename:- c_emailnotificationtemplate ,
db instance :- gcp cloud sql
When importing your data to a Cloud SQL instance it is likely that you encounter long import times, depending on the respective file size of the data you are trying to import.
It's possible for queries to lock the MySQL database causing all subsequent queries to block/timeout.Connect to the database and try to execute this query:
SHOW PROCESSLIST
The first item in the list may be the one holding the lock, which the subsequent items are waiting on.Try to check if there are any issues with redundancies or data consistency and eliminate those.
Also check for the status logs to understand what is the table data or item which is causing this issue and try fixing this.
SHOW ENGINE INNODB STATUS\G
Something you can do in order to avoid the locks and operation stuck issues and also decrease the amount of time it takes to complete each operation is using the Cloud SQL import or export functionality with smaller batches of data. Another way of reducing the amount of time the import operation may take would be by reducing the number of connections the database receives while you’re importing data into your instance.
Check for the Best Practices for SQL Import and Export and also a helpful guide Import export document for your reference.
I am using below command to load Object Storage file into DB2 table:NLU_TEMP_2.
CALL SYSPROC.ADMIN_CMD('load from "S3::s3.jp-tok.objectstorage.softlayer.net::
<s3-access-key-id>::<s3-secret-access-key>::nlu-test::practice_nlu.csv"
OF DEL modified by codepage=1208 coldel0x09 method P (2) WARNINGCOUNT 1000
MESSAGES ON SERVER INSERT into DASH12811.NLU_TEMP_2(nlu)');
above command inserts 2nd column from object storage file to DASH12811.NLU_TEMP_2 in nlu column.
I want to insert request_id from variable as a additional column:request_id in DASH12811.NLU_TEMP_2(request_id,nlu).
I read in some article to use statement concentrator literals to dynamically pass a value. Please let us know if anyone has an idea on how to use it.
Note, i would be using this query in DB2 but not DB2 warehouse. External tables wont work in DB2.
LOAD does not have any ability to include extra values that are not part of the load file. You can try to do things with columns that are generated by default in Db2 but it is not a good solution.
Really you need to wait until DB2 on Cloud supports external tables
I'd like to understand if the CDC enabled IBM IMS segments and IBM DB2 table sources would be able to provide both the before and after snapshot change values (like the Oracle .OLD and .NEW values in trigger) so that both could be used for further processing.
Note:
We are supposed to retrieve these values through Informatica PowerExchange and process and push to targets.
As of now, we need to know would we be able to retrieve both before snapshot and after snapshot values from IBM DB2 and IBM IMS (.OLD and .NEW as in Oracle triggers - not an exact similar example, but mentioned just as an example to understand)
Any help is much appreciated, Thanks.
I don't believe CDC captures before data in its change messages that it compiles from the DBMS log data. It's main purpose is to issue the minimum number of commands needed to replicate the data from one database to another. You'll want to take a snapshot of your replica database prior to processing the change messages if you want to preserve the state of data such that you can query it.
Alternatively for Db2, it's probably easier to work with the temporal tables feature added in Db2 10 as that allows you to define what changes should drive a snapshot. You can then access the temporal data using a temporal SQL query.
SELECT … FROM…period specification
Example trigger with old and new referencing...
CREATE TRIGGER danny117
NO CASCADE BEFORE Update ON mylib.myfile
REFERENCING NEW AS N old as O
FOR EACH ROW
-- don't let the claim change and force upper case
--just do something automatically on update blah...
BEGIN ATOMIC
SET N.claim = ucase(O.claim);
END
w.r.t PowerExchange 9.1.0 & 9.6:
Before snapshot data can't be processed via the powerexchange for DB2 database. Recently I worked on a migration project and I thought like the Oracle CDC which uses SCN numbers there should be something for db2 to start the logger from any desired point. But to my surprise Inforamtica global support confirmed that before snapshot data can't be captured by PowerExchange.
They talk about materialize and de-materialize targets which was out of my knowledge at that time, later I found out they meant to export and import of history data.
Even if you have table with CDC enanbled, you can't capture the data before snapshot from PWX.
DB2 reads capture data from the DB2-logs which has a marking for the operation like U/I/D that's enough for PowerExchange to progress.
I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.
For a mandatory assignment of a DB2 class I'm asked to write o procedure to export "export information about all xxx, delete all xxx and import the information again." where xxx is my table.
This procedure has to be as efficiently as possible.
I'm quite stuck here, quite naively I see two options
1) write a select * from xxx; drop ...; insert; using python or something
2) using some export/import utility of db2
But I can be totally wrong, suggestions?
what I've noticed is that there are not integrity constraints.
You can do that via "export/load/set integrity". I think it is the best way if you execute that in the server.
If you use python, you will have to use a odbc driver or similar to get the data, processes, etc.
If you use python just to execute the commands, it is ok, finally, it is just a call to the database.
If you execute the process in other machine, the net use is increased, and the performance is lower.
Using import, it is just like an "insert" per row in the file which uses a lot of transaction log. Instead, the load command, puts the data diretly in the tablespace and then check the referential integrity (faster process)
Finally, if you want to extract the information very fast, you can buy the IBM InfoSphere® Optim™ High Performance Unload for DB2 for Linux, UNIX and Windows
I have had a similar task before.
The solution is simple and sweet:
A simple export to csv; and once the data has been exported, the main thing is to TRUNCATE the table with your logs being disabled and then load the data back into the table.
EXPORT TO <FileName>.CSV OF DEL SELECT * FROM <TableName>;
ALTER TABLE <TableName> ACTIVATE NOT LOGGED INITIALLY WITH EMPTY TABLE;
LOAD FROM "./<FileName>.CSV" OF DEL INSERT INTO <TableName>;