Complex match and Join PostgreSQL - postgresql

I have the below 3 tables:
Opening transactions, closing transactions and another one with prices (or quotes).
Opening and closing are mirror images of each other. If one is BUY the other is SELL. They are matched by the same txn_id.
INSERT INTO opening_txns (txn_id,txn_timestamp,cust_txn_type,exch_txn_type,currency,amount) VALUES
('0001','2019-01-16 09:00:00.000','SELL','BUY','Euro',1000)
,('0002','2019-01-25 09:00:00.000','BUY','SELL','Euro',1000)
,('0003','2019-01-30 09:00:00.000','BUY','SELL','Euro',1000)
,('0004','2019-02-06 09:00:00.000','SELL','BUY','Euro',1000)
,('0005','2019-02-12 09:00:00.000','SELL','BUY','Euro',1000)
,('0006','2019-02-25 09:00:00.000','BUY','SELL','Euro',1000)
,('0007','2019-03-21 09:00:00.000','BUY','SELL','Euro',1000)
;
INSERT INTO closing_txns (txn_id,txn_timestamp,cust_txn_type,exch_txn_type,currency,amount) VALUES
('0001','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0002','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0003','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0004','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0005','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0006','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0007','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
;
INSERT INTO bc_quotes (quote_timestamp,currency,unit,quote_type,"quote") VALUES ('2019-02-25 09:00:00.000','Euro',1,'SELL',1.1375) ,('2019-02-25 09:00:00.000','Euro',1,'BUY',1.1355) ,('2019-03-21 09:00:00.000','Euro',1,'SELL',1.1416) ,('2019-03-21 09:00:00.000','Euro',1,'BUY',1.1392) ,('2019-03-29 12:00:00.000','Euro',1,'BUY',1.1225) ,('2019-03-29 12:00:00.000','Euro',1,'SELL',1.1246) ;
I am looking for the below outcome:
txn_id
amount
sell_price (Find which one of opening or closing txns is a SELL cust_txn. Match the currency, timestamp and exch_txn_type of that transaction with currency, timestamp and quote_type in the bc_quotes table and pick the quote)
buy price (Find which one of opening or closing is a BUY csut_txn. Match the currency, timestamp and exch_txn_type with currency, timestamp and quote_type in the bc_quotes table and pick the quote)

My answer is assuming the columns of opening_txns and closing_txns are of the same type.
Please try the following and tell me if it works for you:
WITH txns AS (
SELECT
txn_id,
amount,
currency,
timestamp,
exch_txn_type
FROM opening_txns
UNION
SELECT
txn_id,
amount,
currency,
timestamp,
exch_txn_type
FROM closing_txns
)
SELECT
txn_id,
amount,
CASE WHEN cust_txn_type = 'SELL' THEN quote ELSE NULL END AS sell_price,
CASE WHEN cust_txn_type = 'BUY' THEN quote ELSE NULL END AS buy_price
FROM txns T
LEFT JOIN bc_quotes Q
ON (T.currency = Q.currency AND T.timestamp = Q.timestamp AND T.exch_txn_type = Q.quote_type);
Explanations:
txns is a common table expression to help clarify the query.
Since both opening_txns and closing_txns share the same columns,
you can use UNION to effectively merge two results sets into one
txns result set.
Then you can use LEFT JOIN to match the rows of
txns to their respective quotes using the conditions provided in
the ON clause.
Lastly, you can use the conditional CASE statement
in the SELECT to distinguish between 'SELL' and 'BUY
transactions; if the transaction is a 'BUY' (resp. 'SELL'), then
the buy_price column will be quote and sell_price will be
NULL.
The final result set has the following columns: txn_id, amount and sell_price, buy_price.
I hope this helps.

Related

How to get ids of grouped by rows in postgresql and use result?

I have a table containing transactions with an amount. I want to create a batch of transactions so that the sum of amount of each 'group by' is negative.
My problematic is to get all ids of the rows concerned by a 'group by' where each group is validate by a sum condition.
I find many solutions which don't work for me.
The best solution I found is to request the db a first time with the 'group by' and the sum, then return ids to finally request the db another time with all of them.
Here an example of what I would like (it doesn't work!) :
SELECT * FROM transaction_table transaction
AND transaction.id IN (
select string_agg(grouped::character varying, ',' ) from (
SELECT array_agg(transaction2.id) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
)
);
the result of the array_agg is like:
{39758,39759}
{39757,39756,39755,39743,39727,39713}
and the string_agg is :
{39758,39759},{39757,39756,39755,39743,39727,39713}
Now I just need to use them but I don't know how to...
unfortunatly, it doesn't work because of type casting :
ERROR: operator does not exist: integer = integer[]
IndiceĀ : No operator matches the given name and argument type(s). You might need to add explicit type casts.
Maybe you are looking for
SELECT id, motto, accountbnf, payment, accountclnt, amount
FROM (SELECT id, motto, accountbnf, payment, accountclnt, amount,
sum(amount)
OVER (PARTITION BY motto, accountbnf, payment, accountclnt)
AS group_total
FROM transaction_table) AS q
WHERE group_total < 0;
The inner SELECT adds an additional column using a window function that calculates the sum for each group, and the outer query removes all results where that sum is not negative.
Finally I found this option using the 'unnest' method. It works perfectly.
Array_agg bring together all ids in different array
unnest flattened all of them
This comes from here
SELECT * FROM transaction_table transaction
WHERE transaction.id = ANY(
SELECT unnest(array_agg(transaction2.id)) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
);
The problem with this solution is that hibernate doesn't take into account the array_agg method.

In DB2 SQL, is it possible to set a variable in the SELECT statement to use multiple times..?

In DB2 SQL, is it possible to SET a variable with the contents of a returned field in the SELECT statement, to use multiple times for calculated fields and criteria further along in the same SELECT statement?
The purpose is to shrink and streamline the code, by doing a calculation once at the beginning and using it multiple times later on...including the HAVING, WHERE, and ORDER BY.
To be honest, I'm not sure this is possible in any version of SQL, much less DB2.
This is on an IBM iSeries 8202 with DB2 SQL v6, which unfortunately is not a candidate for upgrade at this time. This is a very old & messy database, which I have no control over. I must regularly include "cleanup functions" in my SQL.
To to clarify the question, note the following pseudocode. Actual working code follows further below.
DECLARE smnum INTEGER --Not sure if this is correct.
SELECT
-- This is where I'm not sure what to do.
SET CAST((CASE WHEN %smnum%='' THEN '0' ELSE %smnum% END) AS INTEGER) INTO smnum,
%smnum% AS sm,
invdat,
invno,
daqty,
dapric,
dacost,
(dapric-dacost)*daqty AS profit
FROM
saleshistory
WHERE
%smNum% = 30
ORDER BY
%smnum%
Below is my actual working SQL. When adjusted for 2017 or 2016, it can return >10K rows, depending on the salesperson. The complete table has >22M rows.
That buttload of CASE((CAST... function is what I wish to replace with a variable. This is not the only example of this. If I can make it work, I have many other queries that could benefit from the technique.
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
WHERE
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER)=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(TRIM(DACOMP) AS INTEGER)=1
ORDER BY
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER),
DAIDAT,
DAINV#,
DALIN#
Just use a subquery or CTE. I can't figure out the actual logic you want, but the structure looks like this:
select . . .
from (select d.*,
(CASE . . . END) as calc_field
from VIPDTAB.DAILYV d
) d
No variable declaration is needed.
Here is what your SQL would look like with the sub-query that Gordon suggested:
SELECT
DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(DAITEM AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
(SELECT
D.*,
CAST((CASE WHEN D.DASM#='' THEN '0' ELSE D.DASM# END) AS INTEGER) AS DASM
FROM VIPDTAB.DAILYV D
) D
WHERE
DASM=30 AND
TRIM(DABSW)='B' AND
DAIDAT BETWEEN (YEAR(CURDATE())*10000) AND (((YEAR(CURDATE())+1)*10000)-1) AND
CAST(DACOMP AS INTEGER)=1
ORDER BY
DASM,
DAIDAT,
DAINV#,
DALIN#
Notice that I removed a lot of the trim() functions, and you could likely remove the rest. The way IBM resolves the Varchar vs. Char comparison thing is by ignoring trailing blanks. So trim(anything) = '' is the same as anything = ''. And since cast(' 123 ' as integer) = 123, I have removed trims from within the cast functions as well. In addition trim(dabsw) = 'B' is the same as dabsw = 'B' as long as the 'B' is the first character in dabsw. So you could even remove that trim if all you are concerned with is trailing blanks.
Here are some additional notes based on comments. The above paragraph is not talking about auto-trim. Fixed length fields will always return as fixed length fields, the trailing blanks will remain. But in comparisons and expressions where trailing blanks are unimportant, or even a hindrance, they are ignored. In expressions where trailing blanks are important, like concatenation, the trailing blanks are not ignored. Another thing, trim() removes both leading and trailing blanks. If you are using trim() to read a fixed length character field into a Varchar, then rtrim() is likely the better choice as it only removes the trailing blanks.
Also, I didn't go through your fields to make sure I got everything you need, I just used * in the sub-query. For performance, it would be best to only return the fields you need. So if you replace D.* with an actual field list, you can remove the correlation name in the from clause of the sub-query. But, the sub-query itself still needs a correlation clause.
My verification was done using IBM i v7.1.
You can encapsalate the case statement in a view. I even have the fancy profit calc in there for you to order by profit. Now the biggest issue you have is the CCSID on the view for calculated columns but that's another question.
create or replace view VIPDTAB.DAILYVQ as
SELECT
CAST((CASE WHEN TRIM(DASM#)='' THEN '0' ELSE TRIM(DASM#) END) AS INTEGER) AS DASM,
DAIDAT,
DAINV# AS DAINV,
DALIN# AS DALIN,
CAST(TRIM(DAITEM) AS INTEGER) AS DAITEM,
TRIM(DABSW) AS DABSW,
TRIM(DAPCLS) AS DAPCLS,
DAQTY,
DAPRIC,
DAICOS,
DADPAL,
(DAPRIC-DAICOS+DADPAL)*DAQTY AS PROFIT
FROM
VIPDTAB.DAILYV
now you can
select dasm, count(*) from vipdtab.dailyvq where dasm = 0 group by dasm order by dasm
or
select * from vipdtab.dailyvq order by profit desc

Converting String to Decimal Redshift SQL

I have this redshift SQL query. I extracted a number with decimal from the comment using the "REGEXP_SUBSTR" function. I also need to convert it from string to number/decimal. Then, I need to subtract that number from the total.
This is my query
SELECT sc.comment,
sm.subtotal,
to_number(REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1),'9999999D99')
FROM "sales_memo_comments" sc INNER JOIN "sales_memo" sm ON sc.foreign_id = sm.parent_id
I tried using the "to_number" function on Redshift SQL, but its giving me the following: ERROR: invalid input syntax for type numeric: " "
This is the current output Before extracting the number refund amount from the comment column:
comment
"SAR719.00 Refund transaction executed successfully, Refund Request ID:504081288877953603 \n , Authorization Code:095542 "
"AUD52.07 Refund transaction executed successfully, Refund Request ID:6J45695858A90833"
Canceled by : ron.johnd#company.co.us
refund amount is [MYR197.41]
"Please Ignore Order refunded by Refund Request ID:5002758809696048 , Authorization Code:2587759"
OMR37.83($98.23) Refund transaction executed successfully
This is it after using the above SQL query with REGEXP. I still get some anomalies.
comment
719
52.07
.co.
197.41
5.0027621
37.83($98.23
Two questions
How do I edit the REGEXP to take account for the anomalies seen above
How do I convert my string REGEXP to a numeric value to do a subtraction with another numeric column?
Any help would be appreciated.
Here is a way - you need to be able to test whether a string is numeric and for that you need a UDF - so just run this once to define that function
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Then, you could change your code as follows
SELECT sc.comment,
sm.subtotal,
to_number(
case when isnumeric(REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1))
then REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1)
else 0 end
,'9999999D99')
FROM "sales_memo_comments" sc INNER JOIN "sales_memo" sm ON sc.foriegn_id = sm.parent_id
My approach would be to first add two columns: one for string length and the other counting allowed characters. From this table, you could filter for only rows where the two matched (ie no non-allowed characters) and then just cast the remaining values to floats or decimals or whatever.
with temp as (
SELECT '719' as comment
UNION SELECT '52.07'
UNION SELECT '.co.'
UNION SELECT '197.41'
UNION SELECT '5.0027621'
UNION SELECT '37.83($98.23'
),
temp2 as (
SELECT *
,regexp_count(comment, '[0-9.]') as good_char_length
,len(comment) as str_length
FROM
temp
)
SELECT *
,comment::float as comment_as_float
FROM
temp2
WHERE
good_char_length = str_length

Using many connection.cursor()

I want to fetch data from 3 tables in a single database at once. I used 3 conn.cursor() to it.. Are there any sophisticated ways to do it?
conn = psycopg2.connect(database="plottest", user="postgres")
self.statusbar.showMessage("Database opened Sucessfully", 1000)
cur = conn.cursor()
cur1 = conn.cursor()
cur2 = conn.cursor()
cur.execute("SELECT id ,actual from \"%s\" " % date)
rows = cur.fetchall()
cur1.execute("SELECT qty from DAILY where date = \'%s\'" % date)
dailyqty = cur1.fetchone()
cur2.execute("SELECT qty from MONTHLY where month = \'%s\'" % month)
monthqty = cur2.fetchone()
Awoogah awoogah, SQL injection warning! Don't write code using string interpolation. What happens if someone calls your code with the "date" ');-- DROP TABLE DAILY;-- ?
Use bind parameters. Always.
The only exception is for dynamic identifiers, like in the case above where you seem to use a table named after the current date. In that case you must "double quote" them and double any contained double-quotes. In your case that means that date should be date.replace('"', '""') where you substitute it into the SQL.
Now, back to our regular programming.
Since you fetchall from each cursor you can just re-use it. You don't need new cursors each time.
You can also combine the daily and monthly stats if you want, with a UNION ALL. I fixed your capitalisation and parameter binding in the process:
cur.execute("""SELECT 1, qty FROM daily WHERE date = %s
UNION ALL
SELECT 2, qty FROM monthly WHERE month = %s
ORDER BY 1""",
(date, month))
Note that string interpolation isn't used, instead a 2-tuple of parameters is passed to psycopg2 to bind directly. There's no need for quotes around the parameters, psycopg2 adds them if needed.
This avoids a client-server round trip by bundling the two queries. The extra column andORDER BY is technically needed so you can safely assume the first row is the daily results and second is the monthly. In practice PostgreSQL won't re-order them with UNION ALL though.
You can combine
SELECT a1 FROM t1 WHERE b1 = 'v1';
and
SELECT a2 FROM t2 WHERE b2 = 'v2';
to a single statement like this:
SELECT t1.a1, t2.a2 FROM t1, t2
WHERE t1.b1 = 'v1' AND t2.b2 = 'v2';
provided that both queries return exactly one row.

Is there a more efficient / elegant way to write this code I have?

I'm wondering if anybody can help me out with any or all of this code below. I've made it work, but it seems inefficient to me and is probably quite a bit slower than optimal.
Some basic background on the necessity of this code in the first place:
I have a table of shipping records that does not include the corresponding invoice number. I've looked all through the tables and I continue to do so. In fact, only this morning I discovered that if a packing slip has been generated that I can link the shipping table to the packing slip table via that packing slip ID and grab the invoice number from there. Absent that link, however, I'm forced to guess. In most instances, that's not terribly difficult, because the invoice table has number, line and release that can match up. But when there are multiple shipments for number, line and release (for instance, when a line is partially shipped) then there can be multiple answers, only one of which is correct. I am partially helped by the presence of a a column in the shipping table that states what the date sequence is for that number, line and release, but there are still circumstances where the process I use for "guessing" can be somewhat ambiguous.
What my procedure does is this. First, it creates a table of data that includes the invoice number if there was a pack slip to link it through.
Next, it dumps all of that data into a second table, this time using--only if the invoice was NULL in the first table--a "guess" about the invoice number based on partitioning all the shipping records by number, line, release, date sequence and date, and then comparing that to the same type of thing for the invoice table, and trying to line everything up by date.
Finally, it parses through that table and finds any last nulls and essentially matches them up with the first record of any invoice for that number, line and release.
Both guesses have added characters to show that they are, in fact, guesses.
IF OBJECT_ID('tempdb..#cosTAble') IS NOT NULL
DROP TABLE #cosTable
DECLARE #cosTable2 TABLE (
ID INT IDENTITY
,co_num CoNumType
,co_line CoLineType
,co_release CoReleaseType
,date_seq DateSeqType
,ship_date DateType
,inv_num NVARCHAR(14)
)
DECLARE
#co_num_ck CoNumType
,#co_line_ck CoLineType
,#co_release_ck CoReleaseType
DECLARE #Counter1 INT = 0
SELECT cos.co_num, cos.co_line, cos.co_release, cos.date_seq, cos.ship_date, cos.qty_invoiced, pck.inv_num
INTO #cosTable
FROM co_ship cos
LEFT JOIN pckitem pck
ON cos.pack_num = pck.pack_num
AND cos.co_num = pck.co_num
AND cos.co_line = pck.co_line
AND cos.co_release = pck.co_release
;WITH cos_Order
AS(
SELECT co_num, co_line, co_release, qty_invoiced, date_seq, ship_date, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY ship_date) AS cosrow
FROM co_ship
WHERE qty_invoiced > 0
),
invi_Order
AS(
SELECT inv_num, co_num, co_line, co_release, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY RecordDate) AS invirow
FROM inv_item
WHERE qty_invoiced > 0
),
cos_invi
AS(
SELECT cosO.*, inviO.inv_num
FROM cos_Order cosO
LEFT JOIN invi_Order inviO
ON cosO.co_num = inviO.co_num AND cosO.co_line = inviO.co_line AND cosO.cosrow = inviO.invirow)
INSERT INTO #cosTable2
SELECT cosT.co_num, cosT.co_line, cosT.co_release, cosT.date_seq, cosT.ship_date, COALESCE(cosT.inv_num,'*'+cosi.inv_num) AS inv_num
FROM #cosTable cosT
LEFT JOIN cos_invi cosi
ON cosT.co_num = cosi.co_num
AND cosT.co_line = cosi.co_line
AND cosT.co_release = cosi.co_release
AND cosT.date_seq = cosi.date_seq
AND cosT.ship_date = cosi.ship_date
WHILE #Counter1 < (SELECT MAX(ID) FROM #cosTable2) BEGIN
SET #Counter1 += 1
SET #co_num_ck = (SELECT co_num FROM #cosTable2 WHERE ID = #Counter1)
SET #co_line_ck = (SELECT co_line FROM #cosTable2 WHERE ID = #Counter1)
SET #co_release_ck = (SELECT co_release FROM #cosTable2 WHERE ID = #Counter1)
IF EXISTS (SELECT * FROM #cosTable2 WHERE ID = #Counter1 AND inv_num IS NULL)
UPDATE #cosTable2
SET inv_num = '^' + (SELECT TOP 1 inv_num FROM #cosTable2 WHERE
#co_num_ck = co_num AND
#co_line_ck = co_line AND
#co_release_ck = co_release)
WHERE ID = #Counter1 AND inv_num IS NULL
END
SELECT * FROM #cosTable2
ORDER BY co_num, co_line, co_release, date_seq, ship_date
You're in a bad spot - as #craig.white and #HLGEM suggest, you've inherited something without sufficient constraints to make the data correct or safe...and now you have to "synthesize" it. I get that guesses are the best you can do, and you can, at least make your guesses reasonable performance-wise.
After that, you should squeal loudly to get some time to fix the db - to apply the constraints needed to prevent further crapification of the data.
Performance-wise, the while loop is a disaster. You'd be better off replacing that whole mess with a single update statement...something like:
update c0
set inv_nbr = '^' + c1.inv_nbr
from
#cosTable2 c0
left outer join
(
select
co_num,
co_line,
co_release,
inv_nbr
from
#cosTable2
where
inv_nbr is not null
group by
co_num,
co_line,
co_release,
inv_nbr
) as c1
on
c0.co_num = c1.co_num and
c0.co_line = c1.co_line and
c0.co_release = c1.co_release
where
c0.inv_num is null
...which does the same thing the loop does, only in a single statement.
It seems to me that you are trying very hard to solve a problem that should not exist. What you describe is an unfortunately common situation where a process has grown organically without intent and specific direction as a business has grown which has made data extraction near impossible to automate. You very much need a set of policies and procedures- For (very crude and simple) example:
1: An Order must exist before a packing slip can be generated.
2: a packing slip must exist before an invoice can be generated.
3: an invoice is created using data from the packing slip and order (what was requested, what was picked, what do we bill)
-Again, this is a crude example just to illustrate the idea.
All of the data MUST be entered at the proper time or someone has not done their job.
It is not in the IT departments typical skillset to accurately and consistently provide management good data when such data does not exist.