T SQL merge example needed to help comprehension - tsql

The following:
MERGE dbo.commissions_history AS target
USING (SELECT #amount, #requestID) AS source (amount, request)
ON (target.request = source.request)
WHEN MATCHED THEN
UPDATE SET amount = source.amount
WHEN NOT MATCHED THEN
INSERT (request, amount)
VALUES (source.request, source.amount);
from https://stackoverflow.com/a/2967983/857994 is a pretty nifty way to do insert/update (and delete with some added work). I'm finding it hard to follow though even after some googling.
Can someone please:
explain this a little in simple terms - the MSDN documentation mutilated my brain in this case.
show me how it could be modified so the user can type in values for amount & request instead of having them selected from another database location?
Basically, I'd like to use this to insert/update from a C# app with information taken from XML files I'm getting. So, I need to understand how I can formulate a query manually to get my parsed data into the database with this mechanism.

If you aren't familiar with join statements then that is where you need to start. Understanding how joins work is key to the rest. Once you're familiar with joins then understanding the merge is easiest by thinking of it as a full join with instructions on what to do for rows that do or do not match.
So, using the code sample provided lets look at the table commissions_history
| Amount | Request | <other fields> |
--------------------------------------------
| 12.00 | 1234 | <other data> |
| 14.00 | 1235 | <other data> |
| 15.00 | 1236 | <other data> |
The merge statement creates a full join between a table, called the "target" and an expression that returns a table (or a result set that is logically very similar to a table like a CTE) called the "source".
In the example given it is using variables as the source which we'll assume have been set by the user or passed as a parameter.
DECLARE #Amount Decimal = 18.00;
DECLARE #Request Int = 1234;
MERGE dbo.commissions_history AS target
USING (SELECT #amount, #requestID) AS source (amount, request)
ON (target.request = source.request)
Creates the following result set when thought of as a join.
| Amount | Request | <other fields> | Source.Amount | Source.Request |
------------------------------------------------------------------------------
| 12.00 | 1234 | <other data> | 18.00 | 1234 |
| 14.00 | 1235 | <other data> | null | null |
| 15.00 | 1236 | <other data> | null | null |
Using the instructions given on what to do to the target on the condition that a match was found.
WHEN MATCHED THEN
UPDATE SET amount = source.amount
The resulting target table now looks like this. The row with request 1234 is updated to be 18.
| Amount | Request | <other fields> |
--------------------------------------------
| 18.00 | 1234 | <other data> |
| 14.00 | 1235 | <other data> |
| 15.00 | 1236 | <other data> |
Since a match WAS found nothing else happens. But lets say that the values from the source were like this.
DECLARE #Amount Decimal = 18.00;
DECLARE #Request Int = 1239;
The resulting join would look like this:
| Amount | Request | <other fields> | Source.Amount | Source.Request |
------------------------------------------------------------------------------
| 12.00 | 1234 | <other data> | null | null |
| 14.00 | 1235 | <other data> | null | null |
| 15.00 | 1236 | <other data> | null | null |
| null | null | null | 18.00 | 1239 |
Since a matching row was not found in the target the statement executes the other clause.
WHEN NOT MATCHED THEN
INSERT (request, amount)
VALUES (source.request, source.amount);
Resulting in a target table that now looks like this:
| Amount | Request | <other fields> |
--------------------------------------------
| 12.00 | 1234 | <other data> |
| 14.00 | 1235 | <other data> |
| 15.00 | 1236 | <other data> |
| 18.00 | 1239 | <other data> |
The merge statements true potential is when the source and target are both large tables. As it can do a large amount of updates and/or inserts for each row with a single simple statement.
A final note. It's important to keep in mind that not matched defaults to the full clause not matched by target, however you can specify not matched by source in place of, or in addition to, the default clause. The merge statement supports both types of mismatch (records in source not in target, or records in target not in source as defined by the on clause). You can find full documentation, restrictions, and complete syntax on MSDN.

In the given answer example you've done
DECLARE #Request Int
, but calling it in the SQL as follows:
SELECT #amount, #requestID
Another would be naming and calling variables identically:
#amount vs. Amount -> #Amount & Amount

Related

Insert a record for evey row from one table into another using one field in postesql

I'm trying to fill a table with data to test a system.
I have two tables
User
+----+----------+
| id | name |
+----+----------+
| 1 | Majikaja |
| 2 | User 2 |
| 3 | Markus |
+----+----------+
Goal
+----+----------+---------+
| id | goal | user_id |
+----+----------+---------+
I want to insert into goal one record for every user only using their IDs (they have to exists) and some fixed or random value.
I was thinking in something like this:
INSERT INTO Goal (goal, user_id) values ('Fixed value', select u.id from user u)
So it will generate:
Goal
+----+-------------+---------+
| id | goal | user_id |
+----+-------------+---------+
| 1 | Fixed value | 1 |
| 2 | Fixed value | 2 |
| 3 | Fixed value | 3 |
+----+-------------+---------+
I could just write a simple PHP script to achieve it but I wonder if is it possible to do using raw SQL only.

Reset column with numeric value that represents the order when destroying a row

I have a table of users that has a column called order that represents the order in they will be elected.
So, for example, the table might look like:
| id | name | order |
|-----|--------|-------|
| 1 | John | 2 |
| 2 | Mike | 0 |
| 3 | Lisa | 1 |
So, say that now Lisa gets destroyed, I would like that in the same transaction that I destroy Lisa, I am able to update the table so the order is still consistent, so the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 2 | Mike | 0 |
Or, if Mike were the one to be deleted, the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 3 | Lisa | 0 |
How can I do this in PostgreSQL?
If you are just deleting one row, one option uses a cte and the returning clause to then trigger an update
with del as (
delete from mytable where name = 'Lisa'
returning ord
)
update mytable
set ord = ord - 1
from del d
where mytable.ord > d.ord
As a more general approach, I would really recommend trying to renumber the whole table after every delete. This is inefficient, and can get tedious for multi-rows delete.
Instead, you could build a view on top of the table:
create view myview as
select id, name, row_number() over(order by ord) ord
from mytable

Selecting value for the latest two distinct columns

I am trying to do an SQL which will return the latest data value of the two distinct columns of my table.
Currently, I select distinct the values of the column and afterwards, I iterate through the columns to get the distinct values selected before then order and limit to 1. These tags can be any number and may not always be posted together (one time only tag 1 can be posted; whereas other times 1, 2, 3 can).
Although it gives the expected outcome, this seems to be inefficient in a lot of ways, and because I don't have enough SQL experience, this was so far the only way I found of performing the task...
--------------------------------------------------
| name | tag | timestamp | data |
--------------------------------------------------
| aa | 1 | 566 | 4659 |
--------------------------------------------------
| ab | 2 | 567 | 4879 |
--------------------------------------------------
| ac | 3 | 568 | 1346 |
--------------------------------------------------
| ad | 1 | 789 | 3164 |
--------------------------------------------------
| ae | 2 | 789 | 1024 |
--------------------------------------------------
| af | 3 | 790 | 3346 |
--------------------------------------------------
Therefore the expected outcome is {3164, 1024, 3346}
Currently what I'm doing is:
"select distinct tag from table"
Then I store all the distinct tag values programmatically and iterate programmatically through these values using
"select data from table where '"+ tags[i] +"' in (tag) order by timestamp desc limit 1"
Thanks,
This comes close, but beware if you have two rows with the same tag share a maximum timestamp you will get duplicates in the result set
select data from table
join (select tag, max(timestamp) maxtimestamp from table t1 group by tag) as latesttags
on table.tag = latesttags.tag and table.timestamp = latesttags.maxtimestamp

KSQL: append multiple child records to parent record

I'm trying to use KSQL (as part of confluent-5.0.0) to create a single record out of a set of parent records and child records, where every parent record has multiple child records (spefically, payment details and the parties involved in the payment). These parent/child records are linked by the parent's id. To illustrate, I'm dealing with records of roughly this structure in the source system:
payment:
| id | currency | amount | payment_date |
|------------------------------------------|
| pmt01 | USD | 20000 | 2018-11-20 |
| pmt02 | USD | 13000 | 2018-11-23 |
payment_parties:
| id | payment_id | party_type | party_ident | party_account |
|-----------------------------------------------------------------|
| prt01 | pmt01 | sender | XXYYZZ23 | (null) |
| prt02 | pmt01 | intermediary | AADDEE98 | 123456789 |
| prt03 | pmt01 | receiver | FFGGHH56 | 987654321 |
| prt04 | pmt02 | sender | XXYYZZ23 | (null) |
| prt05 | pmt02 | intermediary | (null) | (null) |
| prt06 | pmt02 | receiver | FFGGHH56 | 987654321 |
These records are loaded, in AVRO format, onto a set of Kafka topics using Oracle Golden Gate, with one topic for every table. This means the following topics exist: src_payment and src_payment_parties. As per the way the source system functions, the timestamps of these records fall within several milliseconds.
Now, the intent is to 'flatten' these records into a single record, which will be consumed from an outgoing topic. To illustrate, for the records above, the desired output would be along these lines:
payment_flattened:
| id | currency | amount | payment_date | sender_ident | sender_account | intermediary_ident | intermediary_account | receiver_ident | receiver_account |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| pmt01 | USD | 20000 | 2018-11-20 | XXYYZZ23 | (null) | AADDEE98 | 123456789 | FFGGHH56 | 987654321 |
| pmt02 | USD | 13000 | 2018-11-23 | XXYYZZ23 | (null) | (null) | (null) | FFGGHH56 | 987654321 |
The first question I'd like to ask here, is the following: How can I best achieve this combination of data from the source topics?
Of course, I have tried some actions myself. In the interest of brevity, I'll describe what I have tried to achieve appending the first of the payment parties to the payment records.
Step one: set up the source streams
Note: due to the OGG setup adding a property called 'table' to the AVRO schema, I have to specify the fields to take from the topic. Additionally, I'm not interested in the fields specifying the type of operation (e.g. insert or update).
create stream payment_stream (id varchar, currency varchar, amount double, \
payment_date varchar) with (kafka_topic='src_payment',value_format='avro');
create stream payment_parties_stream (id varchar, payment_id varchar, party_type varchar, \
party_ident varchar, party_account varchar) with (kafka_topic='src_payment_parties',\
value_format='avro');
Step two: create stream for the payment senders
Note: from what I've gathered from the documentation, and found out from experimenting, in order to be able to join the payment stream to a payment party stream, the latter needs to be partitioned by the payment id. Additionally, the only way I have gotten the join to work is by renaming the column.
create stream payment_sender_stream as select payment_id as id, party_ident, \
party_account from payment_parties_stream where party_type = 'sender' partition by id;
Step three: join two streams
Note: I'm using a left join, because not all parties are present for every payment. As in the example records above, where pmt02 does not have an intermediary.
create stream payment_with_sender as select pmt.id as id, pmt.currency, pmt.amount, \
pmt.payment_date, snd.party_ident, snd.party_account from payment_stream pmt left join \
payment_sender_stream snd within 1 seconds on pmt.id = snd.id;
Now, the output I would expect from this stream is something along these lines:
ksql> select * from payment_with_sender;
rowtime | pmt01 | pmt01 | USD | 20000 | 2018-11-20 | XXYYZZ23 | null
rowtime | pmt02 | pmt02 | USD | 13000 | 2018-11-23 | XXYYZZ23 | null
Instead, the output I'm seeing is along these lines:
ksql> select * from payment_with_sender;
rowtime | pmt01 | pmt01 | USD | 20000 | 2018-11-20 | null | null
rowtime | pmt01 | pmt01 | USD | 20000 | 2018-11-20 | XXYYZZ23 | null
rowtime | pmt02 | pmt02 | USD | 13000 | 2018-11-23 | null | null
rowtime | pmt02 | pmt02 | USD | 13000 | 2018-11-23 | XXYYZZ23 | null
Hence, the second (two-part) question I'd like to ask is: Why does the left join produce these duplicate records? And can this be avoided?
Apologies for the wall of text, I tried to be as complete as possible in the description of the issue. Of course, I'd be happy to add any possible missing information, and answer questions regarding the setup to the best of my knowledge.
You're almost there :-)
WITHIN 1 SECONDS will give you results triggered from both sides of the join.
Instead, try WITHIN (0 SECONDS, 1 SECONDS). Then only records from the right side of the join will be joined to the left, and not visa versa.
You can read more about this pattern in the article I wrote here.
BTW if you want to work around the table reserved word issue from OGG, you can set includeTableName to false in the GG config.

Query on multiple postgres hstores combined with or

This is a hardcoded example of what I'm trying to achieve:
SELECT id FROM places
WHERE metadata->'route'='Route 23'
OR metadata->'route'='Route 22'
OR metadata->'region'='Northwest'
OR metadata->'territory'='Territory A';
Metadata column is an hstore column and I'm wanting to build up the WHERE clause dynamically based on another query from a different table. The table could either be:
id | metadata
---------+----------------------------
1647 | "region"=>"Northwest"
1648 | "route"=>"Route 23"
1649 | "route"=>"Route 22"
1650 | "territory"=>"Territory A"
or
id | key | value
----+-------------+-------+---
1 | route | Route 23
2 | route | Route 22
3 | region | Northwest
4 | territory | Territory A
Doesnt really matter, just whatever works to build up that where clause. It could potentially have 1 to n number of OR's in it based on the other query.
Ended up with a solution using the 2nd table (distribution table):
id | metadata
---------+----------------------------
1647 | "region"=>"Northwest"
1648 | "route"=>"Route 23"
1649 | "route"=>"Route 22"
1650 | "territory"=>"Territory A"
Used the following join, which the #> sees if the places.metadata contains the distributions.metadata
SELECT places.id, places.metadata
FROM places INNER JOIN distributions
ON places.metadata #> distributions.metadata
WHERE distributions.some_other_column = something;