Recursive replace from a table of characters - tsql

In short, I am looking for a single recursive query that can perform multiple replaces over one string. I have a notion it can be done, but am failing to wrap my head around it.
Granted, I'd prefer the biz-layer of the application, or even the CLR, to do the replacing, but these are not options in this case.
More specifically, I want to replace the below mess - which is C&P in 8 different stored procedures - with a TVF.
SET #temp = REPLACE(RTRIM(#target), '~', '-')
SET #temp = REPLACE(#temp, '''', '-')
SET #temp = REPLACE(#temp, '!', '-')
SET #temp = REPLACE(#temp, '#', '-')
SET #temp = REPLACE(#temp, '#', '-')
-- 23 additional lines reducted
SET #target = #temp
Here is where I've started:
-- I have a split string TVF called tvf_SplitString that takes a string
-- and a splitter, and returns a table with one row for each element.
-- EDIT: tvf_SplitString returns a two-column table: pos, element, of which
-- pos is simply the row_number of the element.
SELECT REPLACE('A~B!C#D#C!B~A', MM.ELEMENT, '-') TGT
FROM dbo.tvf_SplitString('~-''-!-#-#', '-') MM
Notice I've joined all the offending characters into a single string separated by '-' (knowing that '-' will never be one of the offending characters), which is then split. The result from this query looks like:
TGT
------------
A-B!C#D#C!B-A
A~B!C#D#C!B~A
A~B-C#D#C-B~A
A~B!C-D-C!B~A
A~B!C#D#C!B~A
So, the replace clearly works, but now I want it to be recursive so I can pull the top 1 and eventually come out with:
TGT
------------
A-B-C-D-C-B-A
Any ideas on how to accomplish this with one query?
EDIT: Well, actual recursion isn't necessary if there's another way. I'm pondering the use of a table of numbers here, too.

You can use this in a scalar function. I use it to remove all control characters from some external input.
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)

I figured it out. I failed to mention that the tvf_SplitString function returns a row number as "pos" (although a subquery assigning row_number could also have worked). With that fact, I could control cross join between the recursive call and the split.
-- the cast to varchar(max) matches the output of the TVF, otherwise error.
-- The iteration counter is joined to the row number value from the split string
-- function to ensure each iteration only replaces on one character.
WITH XX AS (SELECT CAST('A~B!C#D#C!B~A' AS VARCHAR(MAX)) TGT, 1 RN
UNION ALL
SELECT REPLACE(XX.TGT, MM.ELEMENT, '-'), RN + 1 RN
FROM XX, dbo.tvf_SplitString('~-''-!-#-#', '-') MM
WHERE XX.RN = MM.pos)
SELECT TOP 1 XX.TGT
FROM XX
ORDER BY RN DESC
Still, I'm open to other suggestions.

Related

PostgreSQL. How to concatenate two strings value without duplicates

I have two strings as below:
_var_1 text := '815 PAADLEY ROAD PL';
_var_2 text := 'PAADLEY ROAD PL';
_var_3 text;
I want to merge these two strings into one string and to remove duplicates:
_var_3 := _var_1 || _var_2;
As a result, the variable (_var_3) should contain only - 815 PAADLEY ROAD PL without dublicate.
Can you advise or help recommend any PostgreSQL feature?
I read the documentation and could not find the necessary string function to solve this problem... I am trying to use regexp_split_to_table but nothing is working.
I tried to use this method, but it's not what I need and the words in the output are mixed up::
WITH ts AS (
SELECT
unnest(
string_to_array('815 PAADLEY ROAD PL PAADLEY ROAD PL', ' ')
) f
)
SELECT
f
FROM ts
GROUP BY f
-- f
-- 815
-- ROAD
-- PL
-- PAADLEY
I assume you want to treat strings as word lists and then you have to concat them like they were a sets to be unioned, with retaining order. This is basically done by following SQL:
with splitted (val, input_number, word_number) as (
select v, 1, i
from unnest(regexp_split_to_array('815 PAADLEY 2 ROAD 3 PL',' ')) with ordinality as t(v,i)
union
select v, 2, i
from unnest(regexp_split_to_array('PAADLEY ROAD 4 PL',' ')) with ordinality as t(v,i)
), numbered as (
select val, input_number, word_number, row_number() over (partition by val order by input_number, word_number) as rn
from splitted
)
select string_agg(val,' ' order by input_number, word_number)
from numbered
where rn = 1
string_agg
815 PAADLEY 2 ROAD 3 PL 4
fiddle
However this is not kind of task to be solved in SQL in smart and elegant way. Moreover, it is not clear from your specification what to do with duplicate words or if you want to process multiple input pairs (both requirements would be possible, though SQL is probably not the right tool). At least please provide more sample inputs with expected outputs.

How do i extract texts from string and save it as two columns and add character at the end for third column

I m using TSQL, I want to extract text from the string and save it as two column and third one as
The following code is not complete but just getting rid of PRD1T_ the Finapp and not sure how to cater for rest of the text
Select substring(Table_name,
charindex('_',Table_name)+1,
Len(Table_name) - charindex('.',Table_name)) as Landing_Schema_Name
FROM [e].[Load_History_test]
When asking questions like this it is best to provide some sample data and expected results, as Ronen suggested. For SQL questions a really good way of doing this is with a temp table and sample data, like this:
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
So that provides something people can run and starts towards the criteria for a minimal, reproducible example, which is the secret to getting a good answer on StackOverflow. I have used SELECT with UNION ALL as Azure Synapse does not currently support the VALUES clause for multiple records.
For expected results, it's often good to display them in a table, something like this:
col1
col2
col3
PRD1T
FINAPP
HOLCONTRACT
PRD1T
FINAPP
TOCCASE
This way it's clear to people what you expect. It is not clear why you have two TOCCASE examples in your screenprint.
For your problem, there is more than one approach. You are along the right lines with CHARINDEX, SUBSTRING and LEFT but things can start to look complicated. Therefore I tend to wrap up some complexity in a Common Table Expression (CTE), see below for an example. There is also a kind of 'trick' approach with a built-in SQL function called PARSENAME. This is designed to extract from four-part object names common in SQL Server eg <server-name>.<database-name>.<schema-name>.<object-name>. As long as your object names will never have more than four parts, this approach will work for you. See the main help for PARSENAME here. See below for a complete demo that runs end to end with a temp table to demonstrate the different principles:
IF OBJECT_ID('#load_history_test') IS NOT NULL
DROP TABLE #load_history_test;
CREATE TABLE #load_history_test (
table_name VARCHAR(100)
);
INSERT INTO #load_history_test
SELECT 'PRD1T_FINAPP.HOLCONTRACT'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE'
UNION ALL
SELECT 'PRD1T_FINAPP.TOCCASE';
;WITH cte AS (
SELECT
table_name AS original_table_name,
CHARINDEX( '_', table_name ) underscorePos,
CHARINDEX( '.', table_name ) stopPos,
REPLACE( table_name, '_', '.' ) AS clean_table_name
FROM #load_history_test
)
SELECT
*,
PARSENAME( clean_table_name, 3 ) a,
PARSENAME( clean_table_name, 2 ) b,
PARSENAME( clean_table_name, 1 ) c,
LEFT( original_table_name, underscorePos - 1 ) getItemBeforeUnderscore,
SUBSTRING( original_table_name, underscorePos + 1, ( ( stopPos - 1 ) - underscorePos ) ) AS getItemAfterUnderscore,
SUBSTRING( original_table_name, stopPos + 1, 99 ) getItemAfterStop
FROM cte;

Need help in parsing column value based on value in other column

I have two columns, COL1 and COL2. COL1 has value like 'Birds sitting on $1 and enjoying' and COL2 has value like 'the.location_value[/tree,\building]'
I need to update third column COL3 with values like 'Birds sitting on /tree and enjoying'
i.e. $1 in 1st column is replaced with /tree
which is the 1st word from list of comma separated words with in square brackets [] in COL2 i.e. [/tree,\building]
I wanted to know the best suitable combination of string function in postgresql to use to achieve this.
You need to first extract the first element from the comma separated list, to do that, you can use split_part() but you first need to extract the actual list of values. This can be done using substring() with a regular expression:
substring(col2 from '\[(.*)\]')
will return /tree,\building
So the complete query would be:
select replace(col1, '$1', split_part(substring(col2 from '\[(.*)\]'), ',', 1))
from the_table;
Online example: http://rextester.com/CMFZMP1728
This one should work with any (int) number after $:
select t.*, c.col3
from t,
lateral (select string_agg(case
when o = 1 then s
else (string_to_array((select regexp_matches(t.col2, '\[(.*)\]'))[1], ','))[(select regexp_matches(s, '^\$(\d+)'))[1]::int] || substring(s from '^\$\d+(.*)')
end, '' order by o) col3
from regexp_split_to_table(t.col1, '(?=\$\d+)') with ordinality s(s, o)) c
http://rextester.com/OKZAG54145
Note:it is not the most efficient though. It splits col2's values (in the square brackets) each time for replacing $N.
Update: LATERAL and WITH ORDINALITY is not supported in older versions, but you could try a correlating subquery instead:
select t.*, (select array_to_string(array_agg(case
when s ~ E'^\\$(\\d+)'
then (string_to_array((select regexp_matches(t.col2, E'\\[(.*)\\]'))[1], ','))[(select regexp_matches(s, E'^\\$(\\d+)'))[1]::int] || substring(s from E'^\\$\\d+(.*)')
else s
end), '') col3
from regexp_split_to_table(t.col1, E'(?=\\$\\d+)') s) col3
from t

TSQL split comma delimited string

I am trying to create a stored procedure that will split 3 text boxes on a webpage that have user input that all have comma delimited strings in it. We have a field called 'combined_name' in our table that we have to search for first and last name and any known errors or nicknames etc. such as #p1: 'grei,grie' #p2: 'joh,jon,j..' p3: is empty.
The reason for the third box is after I get the basics set up we will have does not contain, starts with, ends with and IS to narrow our results further.
So I am looking to get all records that CONTAINS any combination of those. I originally wrote this in LINQ but it didn't work as you cannot query a list and a dataset. The dataset is too large (1.3 million records) to be put into a list so I have to use a stored procedure which is likely better anyway.
Will I have to use 2 SP, one to split each field and one for the select query or can this be done with one? What function do I use for contains in tsql? I tried using IN win a query but cannot figure out how it works with multiple parameters.
Please note that this will be an internal site that has limited access so worrying about sql injection is not a priority.
I did attempt dynamic SQL but am not getting the correct results back:
CREATE PROCEDURE uspJudgments #fullName nvarchar(100) AS
EXEC('SELECT *
FROM new_judgment_system.dbo.defendants_ALL
WHERE combined_name IN (' + #fullName + ')')
GO
EXEC uspJudgments #fullName = '''grein'', ''grien'''
Even if this did retrieve the correct results how would this be done with 3 parameters?
You may try use this to split string and obtain a tables of strings. Then to have all the combinations you may use full join of these two tables. And then do your select.
Here is the Table valued function I set up:
ALTER FUNCTION [dbo].[Split] (#sep char(1), #s varchar(8000))
RETURNS table
AS
RETURN (
WITH splitter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM splitter_cte
WHERE pos > 0
)
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as OutputValues
FROM splitter_cte
)
)

Unexpected SQL results: string vs. direct SQL

Working SQL
The following code works as expected, returning two columns of data (a row number and a valid value):
sql_amounts := '
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( '|| id || ', 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken';
FOR r, amount IN EXECUTE sql_amounts LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Non-Working SQL
The following code does not work as expected; the first column is a row number, the second column is NULL.
FOR r, amount IN
SELECT
row_number() OVER (ORDER BY taken)::integer,
avg( amount )::double precision
FROM
x_function( id, 25 ) ca,
x_table m
WHERE
m.category_id = 1 AND
m.location_id = ca.id AND
extract( month from m.taken ) = 1 AND
extract( day from m.taken ) = 1
GROUP BY
m.taken
ORDER BY
m.taken
LOOP
SELECT array_append( v_row, r::integer ) INTO v_row;
SELECT array_append( v_amount, amount::double precision ) INTO v_amount;
END LOOP;
Question
Why does the non-working code return a NULL value for the second column when the query itself returns two valid columns? (This question is mostly academic; if there is a way to express the query without resorting to wrapping it in a text string, that would be great to know.)
Full Code
http://pastebin.com/hgV8f8gL
Software
PostgreSQL 8.4
Thank you.
The two statements aren't strictly equivalent.
Assuming id = 4, the first one gets planned/prepared on each pass, and behaves like:
prepare dyn_stmt as '... x_function( 4, 25 ) ...'; execute dyn_stmt;
The other gets planned/prepared on the first pass only, and behaves more like:
prepare stc_stmt as '... x_function( $1, 25 ) ...'; execute stc_stmt(4);
(The loop will actually make it prepare a cursor for the above, but that's besides the point for our sake.)
A number of factors can make the two yield different results.
Search path changes before calling the procedure will be ignored by the second call. In particular if this makes x_table point to something different.
Constants of all kinds and calls to immutable functions are "hard-wired" in the second call's plan.
Consider this as an illustration of these side-effects:
deallocate all;
begin;
prepare good as select now();
prepare bad as select current_timestamp;
execute good; -- yields the current timestamp
execute bad; -- yields the current timestamp
commit;
execute good; -- yields the current timestamp
execute bad; -- yields the timestamp at which it was prepared
Why the two aren't returning the same results in your case would depend on the context (you only posted part of your pl/pgsql function, so it's hard to tell), but my guess is you're running into a variation of the above kind of problem.
From Tom Lane:
I think the problem is that you're assuming "amount" will refer to a table column of the query, when actually it's a local variable of the plpgsql function. The second interpretation will take precedence unless you qualify the column reference with the table's name/alias.
Note: PG 9.0 will throw an error by default when there is an ambiguity of this type.