I am using Postgres for DB and have a big table (+10.000.000 rows inside).
When I run the update query (update table set field=1 where id in ("maximum 2500 id I put there")) my system stops working.
I put Grafana monitoring to check which parameters go up on DB Server but seems all good there.
Can someone help me with the idea what can be the reason? For days I am not able to understand what's happening.
Related
I need to process millions of records coming from MongoDb and put a ETL pipeline to insert that data into a PostgreSQL database. However, in all the methods I've tried, I keep getting the out memory heap space exception. Here's what I've already tried -
Tried connecting to MongoDB using tMongoDBInput and put a tMap to process the records and output them using a connection to PostgreSQL. tMap could not handle it.
Tried to load the data into a JSON file and then read from the file to PostgreSQL. Data got loaded into JSON file but from there on got the same memory exception.
Tried increasing the RAM for the job in the settings and tried the above two methods again, still no change.
I specifically wanted to know if there's any way to stream this data or process it in batches to counter the memory issue.
Also, I know that there are some components dealing with BulkDataLoad. Could anyone please confirm whether it would be helpful here since I want to process the records before inserting and if yes, point me to the right kind of documentation to get that set up.
Thanks in advance!
As you already tried all the possibilities the only way that I can see to do this requirement is breaking done the job into multiple sub-jobs or going with incremental load based on key columns or date columns, Considering this as a one-time activity for now.
Please let me know if it helps.
So I want to do like an email queue, say
insert into messages_to_send (email, firstname,lastname, message_text)
select email, firstname,lastname,'hello' from subscribers where list=99
So something like this but say I want to do 100k rows or maybe some day a million.
Seems like the COPY command would perform better. I don't want to lock the messages_to_send table or slow down the rest of the database. Speed isn't a big issue, I just want it to get in there eventually and another process will pick those up. I'm not that familiar with postgres, maybe COPY is good like that i couldn't tell from reading.
So what I came up with is to insert 1000 at a time (I saw someone post that was safer), make a table especially for the queue and searching and send to another table after the send. Make sure that table is cleaned out a lot. I suppose if I really want to scale I put that table in a different type of db that postgres, but it should be good for me for now.
I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html
PostgreSQL 9.5.
A very small update SQL uses very high CPU for a long time like a hang.
My Windows console application uses a simple UPDATE statement to update the latest time as follows.
UPDATE META_TABLE SET latest_time = current_timestamp WHERE host = 'MY_HOST'
There are just 2 console applications which issue above SQL.
No index on META_TABLE.
Only 1 row.
When it is hanging, no lock information.
UNLOGGED Table.
IDLE status in pg_stat_activity
Commit after UPDATE.
During that the hang time, I can INSERT or DELETE data with the table above.
After starting the application, about in 20 minutes, this issue happens.
I think it is not a SQL statement or table structure issue, probably something wrong in a database side.
Can you guess anything to resolve this issue?
Update
There are 2 db connections in the console application. 1 for Select & 1 for DML.
I tried to close DML DB connection every 2 minutes. Then, I haven't seen the issue!! However, the hang issue happened on SELECT statement (also very simple SELECT).
It seems that there is a some limit per the session.
Now, I am also closing the Select db connection as well per 3 minutes and monitoring.
I have a problem, with no errors or nothing wrong in logs.
I'm trying to repopulate relate tables with Enterprise Manager but the tables aren't being loaded correctly, only the default value appears.
The mechanism I'm using is simple: if a date is bigger then another, the process executes.
But even when I manually changed the dates with a SQL query, the process doesn't give me any results.
I also created a data load with the option to repopulate and still no changes.
Even tried to apply the fix to the 10.1 version and still nothing... (https://community.microstrategy.com/s/article/KB268380-In-MicroStrategy-10-Enterprise-Manager-Lookup-tables-in)
If any of you have any idea I'd be grateful.