Troubleshooting an insert statement, fails without error - postgresql

I am trying to do what should be a pretty straightforward insert statement in a postgres database. It is not working, but it's also not erroring out, so I don't know how to troubleshoot.
This is the statement:
INSERT INTO my_table (col1, col2) select col1,col2 FROM my_table_temp;
There are around 200m entries in the temp table, and 50m entries in my_table. The temp table has no index or constraints, but both columns in my_table have btree indexes, and col1 has a foreign key constraint.
I ran the first query for about 20 days. Last time I tried a similar insert of around 50m, it took 3 days, so I expected it to take a while, but not a month. Moreover, my_table isn't getting longer. Queried 1 day apart, the following produces the same exact number.
select count(*) from my_table;
So it isn't inserting at all. But it also didn't error out. And looking at system resource usage, it doesn't seem to be doing much of anything at all, the process isn't drawing resources.
Looking at other running queries, nothing else that I have permissions to view is touching either table, and I'm the only one who uses them.
I'm not sure how to troubleshoot since there's no error. It's just not doing anything. Any thoughts about things that might be going wrong, or things to check, would be very helpful.

For the sake of anyone stumbling onto this question in the future:
After a lengthy discussion (see linked discussion from the comments above), the issue turned out to be related to psycopg2 buffering the query in memory.
Another useful note: inserting into a table with indices is slow, so it can help to remove them before bulk loads, and then add them again after.

in my case it was date format issue. i commented date attribute before interting to DB and it worked.

In my case it was a TRIGGER on the same table I was updating and it failed without errors.
Deactivated the trigger and the update worked flawlessly.

Related

At what point are DB2 declared global temporary tables automatically deleted...?

When are DB2 declared global temporary tables 'cleaned up' and automatically deleted by the system...? This is for DB2 on AS400 v7r3m0, with DBeaver 5.2.5 as the dev client, and MS-Access 2007 for packaged apps for the end-users.
Today I started experimenting with a DGTT, thanks to this answer. So far I'm pleased with the functionality, although I did find our more recent system version has the WITH DATA option, which is an obvious advantage.
Everything is working, but at times I receive this error:
SQL Error [42710]: [SQL0601] NEW_PKG_SHEETS_DATA in QTEMP type *FILE already exists.
The meaning of the error is obvious, but the timing is not. When I started today, I could run the query multiple times, and the error didn't occur. It seemed as if the system was cleaning up and deleting it, which is just what I was looking for. But then the error started and now it's happening with more frequency.
If I make strategic use of DROP TABLE, this resolves the error, unless the table doesn't exist, in which case I get another error. I can also disconnect/reconnect to the server from my SQL dev client, as I would expect, since that would definitely drop the session.
This IBM article about DGTTs speaks much of sessions, but not many specifics. And this article is possibly the longest command syntax I've yet encountered in the IBM documentation. I got through it, but it didn't answer the question of what decided when a DGTT is deleted.
So I would like to ask:
What are the boundaries of a session..?
I'm thinking this is probably defined by the environment in my SQL client..?
I guess the best/safest thing to do is use DROP TABLE as needed..?
Does any one have any tips, tricks, or pointers they could share..?
Below is the SQL that I'm developing. For brevity, I've excluded chunks of the WITH-AS and SELECT statements:
DROP TABLE SESSION.NEW_PKG_SHEETS ;
DECLARE GLOBAL TEMPORARY TABLE SESSION.NEW_PKG_SHEETS_DATA
AS ( WITH FIRSTDAY AS (SELECT (YEAR(CURDATE() - 4 MONTHS) * 10000) +
(MONTH(CURDATE() - 4 MONTHS) * 100) AS DATEISO
FROM SYSIBM.SYSDUMMY1
-- <VARIETY OF ADDITIONAL CTE CLAUSES>
-- <SELECT STATEMENT BELOW IS A BIT LONGER>
SELECT DAACCT AS DAACCT,
DAIDAT AS DAIDAT,
DAINV# AS DAINV,
CAST(DAITEM AS NUMERIC(6)) AS DAPACK,
CAST(0 AS NUMERIC(14)) AS UPCNUM,
DAQTY AS DAQTY
FROM DAILYTRANS
AND DAIDAT >= (SELECT DATEISO+000 FROM FIRSTDAY) -- 1ST DAY FOUR MONTHS AGO
AND DAIDAT <= (SELECT DATEISO+399 FROM FIRSTDAY) -- LAST DAY OF LAST MONTH
) WITH DATA ;
DROP TABLE SESSION.NEW_PKG_SHEETS ;
The DGTT will only get cleaned automatically up by Db2 when the connection ends successfully (connect reset or equivalent according to whatever interface to Db2 is being used ).
For both Db2 for i and Db2-LUW, consider using the WITH REPLACE clause for the DECLARE GLOBAL TEMPORARY TABLE statement. That will ensure you don't need to explicitly drop the DGTT if the session remains open but the code needs the table to be replaced at next execution whether or not the DGTT already exists.
Using that WITH REPLACE clause means you do not need to worry about issuing a DROP statement for the DGTT, unless you really want to issue a drop.
Sometimes sessions may get re-used, or a close/disconnect might not happen or might not complete, or more likely a workstation performs a retry, and in those cases the WITH REPLACE can be essential for easily avoiding runtime errors.
Note that Db2 for Z/OS (at v12) does not offer the WITH REPLACE clause for DGTT, but has instead an optional syntax on commit drop table (but this is not documented for Db2-for-i and Db2-LUW).

Postgresql insert stopped working, duplicate key value violations

About 8 months ago I used a suggestion to set up a holding table, then push to the formal table and prevent duplicate entries, per this post: Best way to prevent duplicate data on copy csv postgresql
It's been working very nicely, but today I noticed some errors and gaps in the data.
Here's my insert statement:
And here's how the index is set up:
And here's an example of the error I'm getting, although on the next chron insert, it went through.
Here's it going through fine:
I haven't noted any large changes in the data incoming. Here's what the data looks like that's coming in now:
In summary, I've noticed recent oddities with the insert statements, and success is erratic, resulting in large data gaps in the database. Thanks for any help, and I'm happy to provide more details, but I wanted to see if my information sounds like something someone else has already dealt with.
Thanks very much for any help,
S
As Gordon pointed out in his answer to your previous question, this approach only works if you have exclusive access to the table. There is a delay between the existence check and the insert itself, and if another process modifies the table during this window, you may end up with duplicates.
If you're on Postgres 9.5+, the best approach is to skip the existence check and simply use an INSERT ... ON CONFLICT DO NOTHING statement.
On earlier versions, the simplest solution (if you can afford to do so) would be to lock the table for the duration of the import. Otherwise, you can emulate ON CONFLICT DO NOTHING (albeit less efficiently) using a loop and an exception handler:
DO $$
DECLARE r RECORD;
BEGIN
FOR r IN (SELECT * FROM holding) LOOP
BEGIN
INSERT INTO ltg_data (pulsecount, intensity, time, lon, lat, ltg_geom)
VALUES (r.pulsecount, r.intensity, r.time, r.lon, r.lat, r.ltg_geom);
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END
$$
As an aside, DELETE FROM holding is time-consuming and probably unnecessary. Create your staging table as TEMP, and it will be cleaned up automatically at the end of your session. You can easily build a table which matches the structure of your import target:
CREATE TEMP TABLE holding (LIKE ltg_data);

Best way to prevent duplicate data on copy csv postgresql

This is more of a conceptual question because I'm planning how best to achieve our goals here.
I have a postgresql/postgis table with 5 columns. I'll be inserting/appending data into the database from a csv file every 10 minutes or so via the copy command. There will likely be some duplicate rows of data, so I'd like to copy the data from the csv file to the postgresql table but prevent any duplicate entries from getting into the table from the csv file. There are three columns, where if they are all equal, that will mean the entry is a duplicate. They are "latitude", "longitude" and "time". Should I make a composite key from all three columns? If I do that, will it just throw an error upon trying to copy the csv file into the database? I'm going to be copying the csv file automatically so I would want it to go ahead and copy the rest of the file that aren't duplicates and not copy the duplicates. Is there a way to do this?
Also, I of course want it to look for duplicates in the most efficient way. I don't need to look through the whole table (which will be quite large) for duplicates...just the past 20 minutes or so via the timestamp on the row. And I've indexed the db with the time column.
Thanks for any help!
Upsert
The Answer by Linoff is correct but can simplified a bit by Postgres 9.5 new ”UPSERT“ feature (a.k.a. MERGE). That new feature is implemented in Postgres as INSERT ON CONFLICT syntax.
Rather than explicitly check for violation of the unique index, we can let the ON CONFLICT clause detect the violation. Then we DO NOTHING, meaning we abandon the effort to INSERT without bothering to attempt an UPDATE. So if we cannot insert, we just move on to next row.
We get the same results as Linoff’s code but lose the WHERE clause.
INSERT INTO bigtable(col1, … )
SELECT col1, …
FROM stagingtable st
ON CONFLICT idx_bigtable_col1_col2_col
DO NOTHING
;
I think I would take the following approach.
First, create an index on the three columns that you care about:
create unique index idx_bigtable_col1_col2_col3 on bigtable(col1, col2, col3);
Then, load the data into a staging table using copy. Finally, you can do:
insert into bigtable(col1, . . . )
select col1, . . .
from stagingtable st
where (col1, col2, col3) not in (select col1, col2, col3 from bigtable);
Assuming no other data modifications are going on, this should accomplish what you want. Checking for duplicates using the index should be ok from a performance perspective.
An alternative method is to emulates MySQL's "on duplicate key update" to ignore such records. Bill Karwin suggests implementing a rule in an answer to this question. The documentation for rules is here. Something similar could also be done with triggers.
The method posted by Basil Bourque was great, but there was a slight syntax error.
Based on the documentation, I modified it to the following, which works:
INSERT INTO bigtable(col1, … )
SELECT col1, …
FROM stagingtable st
ON CONFLICT (col1)
DO NOTHING
;

SQL Server temp tables via MS Access

Well I've been using #temp tables in standard T-SQL coding for years and thought I understood them.
However, I've been dragged into a project based in MS Access, utilizing pass-through queries, and found something that has really got me puzzled.
Though maybe it's the inner workings of Access that has me fooled !?
Here we go : Under normal usage, I understand the if I create a temp table in a Sproc, it's scope ends with the end of the SProc, and is dropped by default.
In the Access example, I found it was possible to do this in one Query:
select top(10) * into #myTemp from dbo.myTable
And then this in second separate query:
select * from #myTemp
How is this possible ?
If a temp table dies with the current session, does this mean that Access keeps a single session open, and uses that session for all Queries executed ?
Or has my fundamental understanding of scope been wrong all this time ?
Hope someone out there can help clarify what is occurring under the hood !?
Many Thanks
I found this answer of a kind of similar question:
Temp table is stored in tempdb until the connection is dropped (or in the case of a global temp tables when the last connection using it is dropped). You can also (and it is a good proctice to do so) manually drop the table when you are finished using it with a drop table statement.
I hope this helps out.

syntax for COPY in postgresql

INSERT INTO contacts_lists (contact_id, list_id)
SELECT contact_id, 67544
FROM plain_contacts
Here I want to use Copy command instead of Insert command in sql to reduce the time to insert values. I fetched the data using select operation. How can i insert it into a table using Copy command in postgresql. Could you please give an example for it?. Or any other suggestion in order to achieve the reduction of time to insert the values.
As your rows are already in the database (because you apparently can SELECT them), then using COPY will not increase the speed in any way.
To be able to use COPY you have to first write the values into a text file, which is then read into the database. But if you can SELECT them, writing to a textfile is a completely unnecessary step and will slow down your insert, not increase its speed
Your statement is as fast as it gets. The only thing that might speed it up (apart from buying a faster harddisk) is to remove any potential index on contact_lists that contains the column contact_id or list_id and re-create the index once the insert is finished.
You can find the syntax described in many places, I'm sure. One of those is this wiki article.
It looks like it would basically be:
COPY plain_contacts (contact_id, 67544) TO some_file
And
COPY contacts_lists (contact_id, list_id) FROM some_file
But I'm just reading from the resources that Google turned up. Give it a try and post back if you need help with a specific problem.