I have a table BigTable and a table LittleTable. I want to move a copy of some records from BigTable into LittleTable and then (for these records) set BigTable.ExportedFlag to T (indicating that a copy of the record has been moved to little table).
Is there any way to do this in one statement?
I know I can do a transaction to:
moves the records from big table based on a where clause
updates big table setting exported to T based on this same where clause.
I've also looked into a MERGE statement, which does not seem quite right, because I don't want to change values in little table, just move records to little table.
I've looked into an OUTPUT clause after the update statement but can't find a useful example. I don't understand why Pinal Dave is using Inserted.ID, Inserted.TEXTVal, Deleted.ID, Deleted.TEXTVal instead of Updated.TextVal. Is the update considered an insertion or deletion?
I found this post TSQL: UPDATE with INSERT INTO SELECT FROM saying "AFAIK, you cannot update two different tables with a single sql statement."
Is there a clean single statement to do this? I am looking for a correct, maintainable SQL statement. Do I have to wrap two statements in a single transaction?
You can use the OUTPUT clause as long as LittleTable meets the requirements to be the target of an OUTPUT ... INTO
UPDATE BigTable
SET ExportedFlag = 'T'
OUTPUT inserted.Col1, inserted.Col2 INTO LittleTable(Col1,Col2)
WHERE <some_criteria>
It makes no difference if you use INSERTED or DELETED. The only column it will be different for is the one you are updating (deleted.ExportedFlag has the before value and inserted.ExportedFlag will be T)
Related
This is more of a conceptual question because I'm planning how best to achieve our goals here.
I have a postgresql/postgis table with 5 columns. I'll be inserting/appending data into the database from a csv file every 10 minutes or so via the copy command. There will likely be some duplicate rows of data, so I'd like to copy the data from the csv file to the postgresql table but prevent any duplicate entries from getting into the table from the csv file. There are three columns, where if they are all equal, that will mean the entry is a duplicate. They are "latitude", "longitude" and "time". Should I make a composite key from all three columns? If I do that, will it just throw an error upon trying to copy the csv file into the database? I'm going to be copying the csv file automatically so I would want it to go ahead and copy the rest of the file that aren't duplicates and not copy the duplicates. Is there a way to do this?
Also, I of course want it to look for duplicates in the most efficient way. I don't need to look through the whole table (which will be quite large) for duplicates...just the past 20 minutes or so via the timestamp on the row. And I've indexed the db with the time column.
Thanks for any help!
Upsert
The Answer by Linoff is correct but can simplified a bit by Postgres 9.5 new ”UPSERT“ feature (a.k.a. MERGE). That new feature is implemented in Postgres as INSERT ON CONFLICT syntax.
Rather than explicitly check for violation of the unique index, we can let the ON CONFLICT clause detect the violation. Then we DO NOTHING, meaning we abandon the effort to INSERT without bothering to attempt an UPDATE. So if we cannot insert, we just move on to next row.
We get the same results as Linoff’s code but lose the WHERE clause.
INSERT INTO bigtable(col1, … )
SELECT col1, …
FROM stagingtable st
ON CONFLICT idx_bigtable_col1_col2_col
DO NOTHING
;
I think I would take the following approach.
First, create an index on the three columns that you care about:
create unique index idx_bigtable_col1_col2_col3 on bigtable(col1, col2, col3);
Then, load the data into a staging table using copy. Finally, you can do:
insert into bigtable(col1, . . . )
select col1, . . .
from stagingtable st
where (col1, col2, col3) not in (select col1, col2, col3 from bigtable);
Assuming no other data modifications are going on, this should accomplish what you want. Checking for duplicates using the index should be ok from a performance perspective.
An alternative method is to emulates MySQL's "on duplicate key update" to ignore such records. Bill Karwin suggests implementing a rule in an answer to this question. The documentation for rules is here. Something similar could also be done with triggers.
The method posted by Basil Bourque was great, but there was a slight syntax error.
Based on the documentation, I modified it to the following, which works:
INSERT INTO bigtable(col1, … )
SELECT col1, …
FROM stagingtable st
ON CONFLICT (col1)
DO NOTHING
;
I have a rule for a table which simply checks if the new entry matches a name and intersects with that matching existing row using st_intersects from postgis library.
It seems that only a part are NOT inserted, but most get through this rule. I checked some entries manually after the insert and can confirm the rule should have blocked that insert.
Is something wrong with my RULE?
Table has 3 columns. id serial, name varchar(200) and way geometry(Linestring,4326)
And my RULE is as follows (excerpt from \d names)
blockduplicate AS
ON INSERT TO nameslist
WHERE (EXISTS ( SELECT 1
FROM nameslist
WHERE nameslist.name::text = new.name::text AND st_intersects(nameslist.way, new.way) = true)) DO INSTEAD NOTHING
This table simply takes a line having a name, and whenever another entry comes in with the same name and intersecting with another existing entry having the same name, it should be blocked. So I have only one entry with this name in the area represented by the geometry field way. After the insert I see plenty of duplicates (name matches and st_intersects returns true when checking way field). Why is my rule not blocking the insert?
Update: Is it because I do multiple inserts in one query. I actually insert 12000 entries in one shot with the query INSERT INTO (a,b,c) VALUES (...),(...),(...),...
Does PostgreSQL call the RULE for each value? I need to do multiple inserts otherwise it would take months to finish my inserts.
Ok, usually you are going to find triggers are cleaner than rules. Triggers are fired on each insert. Rules are macros which rewrite your SQL. There is a time and place for the use of rules but they are certainly an advanced area.
Let's look at what happens with your insert. Suppose you:
INSERT INTO nameslist
SELECT * FROM nameslist_import;
Your rule will actually rewrite the query into something like:
INSERT INTO nameslist
SELECT * FROM nameslist_import WHERE not (expression modelled on your rule);
Usually it is cleaner here to just write this into your query rather than using a rule to rewrite your query for you. This allows you to tune exactly what you want to do in each query. If you want to prevent such data from overlapping, look at exclude constraints if they are applicable or triggers.
Hope this helps.
I want to do something like
use mydb
go
begin tran
merge dbo.aTestTarget as T
using dbo.aTestSource as S
on (T.link = S.link)
when not matched by target and (s.code like '*I%') then
-- is there a way to do this sort of thing?
insert (T.*) values (S.*)
when matched and ...
rollback tran
go
Is there some way to do this WITHOUT defining EVERY column? I have a number of tables with 20 to 50 fields.
No, there is not. Using the * syntax is a bad practice anyway because it makes for fragile code that will be hard to maintain.
However, in SSMS you can drag&drop the Columns folder under a table into the editor to get a comma separated list of all columns for that table. That makes typing a little easier. :)
I was trying to do select for all entries in select statement - abap. I'm not getting the clear idea what select for entries does. Does any one know ?
Kindly have a look at the statements below:
1.
select bukrs belnr xblnr budat
from bkpf
into table it_bkpf
where belnr in s_belnr
2.
select bukrs belnr buzei gsber zuonr wrbtr kunnr
from bseg
into table it_bseg
for all entries in it_bkpf
where belnr = it_bkpf-belnr.
Please let me know the difference in two statements.
Siva
Some obvious differences:
Different tables
Different target fields
The 2nd select had a syntax problem: You used form instead from (I corrected it with my edit)
Other differences:
The selection 1.) uses in in the where clause. So it uses a select-options (or a range-object).
for all entries in it_bkpf means, that the internal table it_bkpf contains a list of elements, you want to select. Or in other words: Select all entries in bseg, where a filed belnr is an element of hte internal table bseg.
You will get clear answer through ST05 transaction.
You could execute st05 transaction, choose trace SQL and activate
trace.
After that run your code.
Enter st05 again choose deactivate trace, then view trace result.
There you can see exact SQL code that is forwarded to database server. As BSEG is clustered table, you could not use intuitive header-item join to retrieve needed financial movements inforamation. It's just because there are several tables including BSEG are storing in single database table, so database server technically can not separate BSEG rows and find BSEG-specific fields to make proper join.
So you can do join-like construction at application server. First you are retrieving all header-related columns from header table ( BKPF). Next when SELECT ... FOR ALL ENTRIES IN ... is executed application server will take a little portions of header rows (typically 5) and construct SQL queries for retrieving packs of items, corresponding to that portions. Next all that portions will merged in single internal table. So there will be only items of desired document as it were if you could execute normal join.
Here's what happens the way I understand it. The two statements are probably executed after another:
The first statement selects a few entries from the bkpf table. These entries are stored in the internal table it_bkpf (say belnr 1, 2, 3).
Each of these entries is then used as part of the select #2. The "for all entries" matches the belnr in table bseg to those in the internal table it_bkpf from the first statement. The matching entries are then put into the internal table it_bseg.
With the example you've given this is pretty much the same if the where clause in SQL #2 was where belnr in s_belnr (instead of the whole for all entries). This would only make sense if you needed the contents of it_bkpf for some other purpose. Another typical situation is if you determine the contents of the internal table used in the for all entries clause with some program logic instead of reading it directly from the database.
One catch with "for all entries": Make sure the internal table in the for all entries is not empty - then the whole table in the from clause would be selected.
INSERT INTO contacts_lists (contact_id, list_id)
SELECT contact_id, 67544
FROM plain_contacts
Here I want to use Copy command instead of Insert command in sql to reduce the time to insert values. I fetched the data using select operation. How can i insert it into a table using Copy command in postgresql. Could you please give an example for it?. Or any other suggestion in order to achieve the reduction of time to insert the values.
As your rows are already in the database (because you apparently can SELECT them), then using COPY will not increase the speed in any way.
To be able to use COPY you have to first write the values into a text file, which is then read into the database. But if you can SELECT them, writing to a textfile is a completely unnecessary step and will slow down your insert, not increase its speed
Your statement is as fast as it gets. The only thing that might speed it up (apart from buying a faster harddisk) is to remove any potential index on contact_lists that contains the column contact_id or list_id and re-create the index once the insert is finished.
You can find the syntax described in many places, I'm sure. One of those is this wiki article.
It looks like it would basically be:
COPY plain_contacts (contact_id, 67544) TO some_file
And
COPY contacts_lists (contact_id, list_id) FROM some_file
But I'm just reading from the resources that Google turned up. Give it a try and post back if you need help with a specific problem.