db2 data load runs slowly unless test data applied - db2

I have a database build and a reference data build which are loaded onto my computer. When I try to load the transaction data from a file to the database via staging tables and stored procedures it takes 20 minutes to load 10,000 records.
If I load the database build, reference data build and also my test data then loading the transaction data via the same process takes between 40-50 seconds.
I am trying to find out what causes the process to speed up when test data is added and have considered that it could be that the database may have worked out the best route to inserting the transaction data by loading the test data first, but I wouldn't expect it to be this big a difference in time.
Can anyone recommend what I could do to identity the problem or have any ideas to what it could be?
Thanks for any help.

Related

How to process and insert millions of MongoDB records into Postgres using Talend Open Studio

I need to process millions of records coming from MongoDb and put a ETL pipeline to insert that data into a PostgreSQL database. However, in all the methods I've tried, I keep getting the out memory heap space exception. Here's what I've already tried -
Tried connecting to MongoDB using tMongoDBInput and put a tMap to process the records and output them using a connection to PostgreSQL. tMap could not handle it.
Tried to load the data into a JSON file and then read from the file to PostgreSQL. Data got loaded into JSON file but from there on got the same memory exception.
Tried increasing the RAM for the job in the settings and tried the above two methods again, still no change.
I specifically wanted to know if there's any way to stream this data or process it in batches to counter the memory issue.
Also, I know that there are some components dealing with BulkDataLoad. Could anyone please confirm whether it would be helpful here since I want to process the records before inserting and if yes, point me to the right kind of documentation to get that set up.
Thanks in advance!
As you already tried all the possibilities the only way that I can see to do this requirement is breaking done the job into multiple sub-jobs or going with incremental load based on key columns or date columns, Considering this as a one-time activity for now.
Please let me know if it helps.

How to cache a query response with Postgres?

I have a database that syncs completely every 2 hours. All data is dropped and populated from the main data source.
I have some queries coming from client app, that have the same response for the current 2-hours dataset. So, if 100 clients run their apps, I will have to run this query 100 times for each of them, even though they don't differ.
How do I avoid running this real query against my database every time, but just keep its response somewhere and return it instead?
I think I can run this query after each sync and save to its own table then return from it.
What are other options, probably provided by Postgres itself?
You should use something like redis to store the result or your query in memory. It comes with many clients. You can invalidate the result of this query when it's time to.
There are other memory caching like memcache, easy to install & to use.
Note these are specific to postgres.

Is there any way in sqlmap(sql-injection testing tool) to fetch database tables without running the complete test?

Is there any way in sqlmap(sql-injection testing tool) to fetch database tables without running the complete test?
When I test a URL it takes a long time to Complete the whole test and retrieve database tables. Is there any shorter way to do fetch database or database tables ?

getting data from DB in spring batch and store in memory

In the spring batch program, I am reading the records from a file and comparing with the DB if the data say column1 from file is already exists in table1.
Table1 is fairly small and static. Is there a way I can get all the data from table1 and store it in memory in the spring batch code? Right now for every record in the file, the select query is hitting the DB.
The file is having 3 columns delimited with "|".
The file I am reading is having on an average 12 million records and it is taking around 5 hours to complete the job.
Preload in memory using a StepExecutionListener.beforeStep (or #BeforeStep).
Using this trick data will be loaded once before step execution.
This also works for step restarting.
I'd use caching like a standard web app. Add service caching using Spring's caching abstractions and that should take care of it IMHO.
Load static table in JobExecutionListener.beforeJob(-) and keep this in jobContext and you can access through multiple steps using 'Late Binding of Job and Step Attributes'.
You may refer 5.4 section of this link http://docs.spring.io/spring-batch/reference/html/configureStep.html

what happens to my dataset in case of unexpected failure

i know this has been asked here. But my question is slightly different. When the dataset was designed keeping the disconnected principle in mind, what was provided as a feature which would handle unexpected termination of the application, say a power failure or a windows hang or system exception leading to restart. Say the user has entered some 100 rows and it is modified at the dataset alone. Usually the dataset is updated at the application close or at a timely period.
In old times which programming using vb 6.0 all interaction used to take place directly with the database, thus each successful transaction was committing itself automatically. How can that be done using datasets?
DataSets are never for direct access to database, they are a disconnected model only. There is no intent that they be able to recover from machine failures.
If you want to work live against the database you need to use DataReaders and issue DbCommands against the database live for changes. This of course will increase your load on the database server though.
You have to balance the two for most applications. If you know a user just entered vital data as a new row, execute an insert command to the database, and put a copy in your local cached DataSet. Then your local queries can run against the disconnected data, and inserts are stored immediately.
A DataSet can be serialized very easily, so you could implement your own regular backup to disk by using serialization of the DataSet to the filesystem. This will give you some protection, but you will have to write your own code to check for any data that your application may have saved to disk previously and so on...
You could also ignore DataSets and use SqlDataReaders and SqlCommands for the same sort of 'direct access to the database' you are describing.