Converting String to Decimal Redshift SQL - amazon-redshift

I have this redshift SQL query. I extracted a number with decimal from the comment using the "REGEXP_SUBSTR" function. I also need to convert it from string to number/decimal. Then, I need to subtract that number from the total.
This is my query
SELECT sc.comment,
sm.subtotal,
to_number(REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1),'9999999D99')
FROM "sales_memo_comments" sc INNER JOIN "sales_memo" sm ON sc.foreign_id = sm.parent_id
I tried using the "to_number" function on Redshift SQL, but its giving me the following: ERROR: invalid input syntax for type numeric: " "
This is the current output Before extracting the number refund amount from the comment column:
comment
"SAR719.00 Refund transaction executed successfully, Refund Request ID:504081288877953603 \n , Authorization Code:095542 "
"AUD52.07 Refund transaction executed successfully, Refund Request ID:6J45695858A90833"
Canceled by : ron.johnd#company.co.us
refund amount is [MYR197.41]
"Please Ignore Order refunded by Refund Request ID:5002758809696048 , Authorization Code:2587759"
OMR37.83($98.23) Refund transaction executed successfully
This is it after using the above SQL query with REGEXP. I still get some anomalies.
comment
719
52.07
.co.
197.41
5.0027621
37.83($98.23
Two questions
How do I edit the REGEXP to take account for the anomalies seen above
How do I convert my string REGEXP to a numeric value to do a subtraction with another numeric column?
Any help would be appreciated.

Here is a way - you need to be able to test whether a string is numeric and for that you need a UDF - so just run this once to define that function
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Then, you could change your code as follows
SELECT sc.comment,
sm.subtotal,
to_number(
case when isnumeric(REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1))
then REGEXP_SUBSTR(sc.comment, '[0.00-9]+..[0.00-9]+', 1)
else 0 end
,'9999999D99')
FROM "sales_memo_comments" sc INNER JOIN "sales_memo" sm ON sc.foriegn_id = sm.parent_id

My approach would be to first add two columns: one for string length and the other counting allowed characters. From this table, you could filter for only rows where the two matched (ie no non-allowed characters) and then just cast the remaining values to floats or decimals or whatever.
with temp as (
SELECT '719' as comment
UNION SELECT '52.07'
UNION SELECT '.co.'
UNION SELECT '197.41'
UNION SELECT '5.0027621'
UNION SELECT '37.83($98.23'
),
temp2 as (
SELECT *
,regexp_count(comment, '[0-9.]') as good_char_length
,len(comment) as str_length
FROM
temp
)
SELECT *
,comment::float as comment_as_float
FROM
temp2
WHERE
good_char_length = str_length

Related

How to get ids of grouped by rows in postgresql and use result?

I have a table containing transactions with an amount. I want to create a batch of transactions so that the sum of amount of each 'group by' is negative.
My problematic is to get all ids of the rows concerned by a 'group by' where each group is validate by a sum condition.
I find many solutions which don't work for me.
The best solution I found is to request the db a first time with the 'group by' and the sum, then return ids to finally request the db another time with all of them.
Here an example of what I would like (it doesn't work!) :
SELECT * FROM transaction_table transaction
AND transaction.id IN (
select string_agg(grouped::character varying, ',' ) from (
SELECT array_agg(transaction2.id) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
)
);
the result of the array_agg is like:
{39758,39759}
{39757,39756,39755,39743,39727,39713}
and the string_agg is :
{39758,39759},{39757,39756,39755,39743,39727,39713}
Now I just need to use them but I don't know how to...
unfortunatly, it doesn't work because of type casting :
ERROR: operator does not exist: integer = integer[]
IndiceĀ : No operator matches the given name and argument type(s). You might need to add explicit type casts.
Maybe you are looking for
SELECT id, motto, accountbnf, payment, accountclnt, amount
FROM (SELECT id, motto, accountbnf, payment, accountclnt, amount,
sum(amount)
OVER (PARTITION BY motto, accountbnf, payment, accountclnt)
AS group_total
FROM transaction_table) AS q
WHERE group_total < 0;
The inner SELECT adds an additional column using a window function that calculates the sum for each group, and the outer query removes all results where that sum is not negative.
Finally I found this option using the 'unnest' method. It works perfectly.
Array_agg bring together all ids in different array
unnest flattened all of them
This comes from here
SELECT * FROM transaction_table transaction
WHERE transaction.id = ANY(
SELECT unnest(array_agg(transaction2.id)) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
);
The problem with this solution is that hibernate doesn't take into account the array_agg method.

Complex match and Join PostgreSQL

I have the below 3 tables:
Opening transactions, closing transactions and another one with prices (or quotes).
Opening and closing are mirror images of each other. If one is BUY the other is SELL. They are matched by the same txn_id.
INSERT INTO opening_txns (txn_id,txn_timestamp,cust_txn_type,exch_txn_type,currency,amount) VALUES
('0001','2019-01-16 09:00:00.000','SELL','BUY','Euro',1000)
,('0002','2019-01-25 09:00:00.000','BUY','SELL','Euro',1000)
,('0003','2019-01-30 09:00:00.000','BUY','SELL','Euro',1000)
,('0004','2019-02-06 09:00:00.000','SELL','BUY','Euro',1000)
,('0005','2019-02-12 09:00:00.000','SELL','BUY','Euro',1000)
,('0006','2019-02-25 09:00:00.000','BUY','SELL','Euro',1000)
,('0007','2019-03-21 09:00:00.000','BUY','SELL','Euro',1000)
;
INSERT INTO closing_txns (txn_id,txn_timestamp,cust_txn_type,exch_txn_type,currency,amount) VALUES
('0001','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0002','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0003','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0004','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0005','2019-03-29 12:00:00.000','BUY','SELL','Euro',1000)
,('0006','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
,('0007','2019-03-29 12:00:00.000','SELL','BUY','Euro',1000)
;
INSERT INTO bc_quotes (quote_timestamp,currency,unit,quote_type,"quote") VALUES ('2019-02-25 09:00:00.000','Euro',1,'SELL',1.1375) ,('2019-02-25 09:00:00.000','Euro',1,'BUY',1.1355) ,('2019-03-21 09:00:00.000','Euro',1,'SELL',1.1416) ,('2019-03-21 09:00:00.000','Euro',1,'BUY',1.1392) ,('2019-03-29 12:00:00.000','Euro',1,'BUY',1.1225) ,('2019-03-29 12:00:00.000','Euro',1,'SELL',1.1246) ;
I am looking for the below outcome:
txn_id
amount
sell_price (Find which one of opening or closing txns is a SELL cust_txn. Match the currency, timestamp and exch_txn_type of that transaction with currency, timestamp and quote_type in the bc_quotes table and pick the quote)
buy price (Find which one of opening or closing is a BUY csut_txn. Match the currency, timestamp and exch_txn_type with currency, timestamp and quote_type in the bc_quotes table and pick the quote)
My answer is assuming the columns of opening_txns and closing_txns are of the same type.
Please try the following and tell me if it works for you:
WITH txns AS (
SELECT
txn_id,
amount,
currency,
timestamp,
exch_txn_type
FROM opening_txns
UNION
SELECT
txn_id,
amount,
currency,
timestamp,
exch_txn_type
FROM closing_txns
)
SELECT
txn_id,
amount,
CASE WHEN cust_txn_type = 'SELL' THEN quote ELSE NULL END AS sell_price,
CASE WHEN cust_txn_type = 'BUY' THEN quote ELSE NULL END AS buy_price
FROM txns T
LEFT JOIN bc_quotes Q
ON (T.currency = Q.currency AND T.timestamp = Q.timestamp AND T.exch_txn_type = Q.quote_type);
Explanations:
txns is a common table expression to help clarify the query.
Since both opening_txns and closing_txns share the same columns,
you can use UNION to effectively merge two results sets into one
txns result set.
Then you can use LEFT JOIN to match the rows of
txns to their respective quotes using the conditions provided in
the ON clause.
Lastly, you can use the conditional CASE statement
in the SELECT to distinguish between 'SELL' and 'BUY
transactions; if the transaction is a 'BUY' (resp. 'SELL'), then
the buy_price column will be quote and sell_price will be
NULL.
The final result set has the following columns: txn_id, amount and sell_price, buy_price.
I hope this helps.

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.

TSQL split comma delimited string

I am trying to create a stored procedure that will split 3 text boxes on a webpage that have user input that all have comma delimited strings in it. We have a field called 'combined_name' in our table that we have to search for first and last name and any known errors or nicknames etc. such as #p1: 'grei,grie' #p2: 'joh,jon,j..' p3: is empty.
The reason for the third box is after I get the basics set up we will have does not contain, starts with, ends with and IS to narrow our results further.
So I am looking to get all records that CONTAINS any combination of those. I originally wrote this in LINQ but it didn't work as you cannot query a list and a dataset. The dataset is too large (1.3 million records) to be put into a list so I have to use a stored procedure which is likely better anyway.
Will I have to use 2 SP, one to split each field and one for the select query or can this be done with one? What function do I use for contains in tsql? I tried using IN win a query but cannot figure out how it works with multiple parameters.
Please note that this will be an internal site that has limited access so worrying about sql injection is not a priority.
I did attempt dynamic SQL but am not getting the correct results back:
CREATE PROCEDURE uspJudgments #fullName nvarchar(100) AS
EXEC('SELECT *
FROM new_judgment_system.dbo.defendants_ALL
WHERE combined_name IN (' + #fullName + ')')
GO
EXEC uspJudgments #fullName = '''grein'', ''grien'''
Even if this did retrieve the correct results how would this be done with 3 parameters?
You may try use this to split string and obtain a tables of strings. Then to have all the combinations you may use full join of these two tables. And then do your select.
Here is the Table valued function I set up:
ALTER FUNCTION [dbo].[Split] (#sep char(1), #s varchar(8000))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM splitter_cte
WHERE pos > 0
)
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as OutputValues
FROM splitter_cte
)
)

Unexpected SQL results: string vs. direct SQL

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.
The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.
From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.