Need help in parsing column value based on value in other column - postgresql

I have two columns, COL1 and COL2. COL1 has value like 'Birds sitting on $1 and enjoying' and COL2 has value like 'the.location_value[/tree,\building]'
I need to update third column COL3 with values like 'Birds sitting on /tree and enjoying'
i.e. $1 in 1st column is replaced with /tree
which is the 1st word from list of comma separated words with in square brackets [] in COL2 i.e. [/tree,\building]
I wanted to know the best suitable combination of string function in postgresql to use to achieve this.

You need to first extract the first element from the comma separated list, to do that, you can use split_part() but you first need to extract the actual list of values. This can be done using substring() with a regular expression:
substring(col2 from '\[(.*)\]')
will return /tree,\building
So the complete query would be:
select replace(col1, '$1', split_part(substring(col2 from '\[(.*)\]'), ',', 1))
from the_table;
Online example: http://rextester.com/CMFZMP1728

This one should work with any (int) number after $:
select t.*, c.col3
from t,
lateral (select string_agg(case
when o = 1 then s
else (string_to_array((select regexp_matches(t.col2, '\[(.*)\]'))[1], ','))[(select regexp_matches(s, '^\$(\d+)'))[1]::int] || substring(s from '^\$\d+(.*)')
end, '' order by o) col3
from regexp_split_to_table(t.col1, '(?=\$\d+)') with ordinality s(s, o)) c
http://rextester.com/OKZAG54145
Note:it is not the most efficient though. It splits col2's values (in the square brackets) each time for replacing $N.
Update: LATERAL and WITH ORDINALITY is not supported in older versions, but you could try a correlating subquery instead:
select t.*, (select array_to_string(array_agg(case
when s ~ E'^\\$(\\d+)'
then (string_to_array((select regexp_matches(t.col2, E'\\[(.*)\\]'))[1], ','))[(select regexp_matches(s, E'^\\$(\\d+)'))[1]::int] || substring(s from E'^\\$\\d+(.*)')
else s
end), '') col3
from regexp_split_to_table(t.col1, E'(?=\\$\\d+)') s) col3
from t

Related

PostgreSQL. How to concatenate two strings value without duplicates

I have two strings as below:
_var_1 text := '815 PAADLEY ROAD PL';
_var_2 text := 'PAADLEY ROAD PL';
_var_3 text;
I want to merge these two strings into one string and to remove duplicates:
_var_3 := _var_1 || _var_2;
As a result, the variable (_var_3) should contain only - 815 PAADLEY ROAD PL without dublicate.
Can you advise or help recommend any PostgreSQL feature?
I read the documentation and could not find the necessary string function to solve this problem... I am trying to use regexp_split_to_table but nothing is working.
I tried to use this method, but it's not what I need and the words in the output are mixed up::
WITH ts AS (
SELECT
unnest(
string_to_array('815 PAADLEY ROAD PL PAADLEY ROAD PL', ' ')
) f
)
SELECT
f
FROM ts
GROUP BY f
-- f
-- 815
-- ROAD
-- PL
-- PAADLEY
I assume you want to treat strings as word lists and then you have to concat them like they were a sets to be unioned, with retaining order. This is basically done by following SQL:
with splitted (val, input_number, word_number) as (
select v, 1, i
from unnest(regexp_split_to_array('815 PAADLEY 2 ROAD 3 PL',' ')) with ordinality as t(v,i)
union
select v, 2, i
from unnest(regexp_split_to_array('PAADLEY ROAD 4 PL',' ')) with ordinality as t(v,i)
), numbered as (
select val, input_number, word_number, row_number() over (partition by val order by input_number, word_number) as rn
from splitted
)
select string_agg(val,' ' order by input_number, word_number)
from numbered
where rn = 1
string_agg
815 PAADLEY 2 ROAD 3 PL 4
fiddle
However this is not kind of task to be solved in SQL in smart and elegant way. Moreover, it is not clear from your specification what to do with duplicate words or if you want to process multiple input pairs (both requirements would be possible, though SQL is probably not the right tool). At least please provide more sample inputs with expected outputs.

PostgreSQL calculate prefix combinations after split

I do have a string as entry, of the form foo:bar:something:221. I'm looking for a way to generate a table with all prefixes for this string, like:
foo
foo:bar
foo:bar:something
foo:bar:something:221
I wrote the following query to split the string, but can't figure out where to go from there:
select unnest(string_to_array('foo:bar:something:221', ':'));
An option is to simulate a loop over all elements, then take the sub-array from the input for each element index:
with data(input) as (
values (string_to_array('foo:bar:something:221', ':'))
)
select array_to_string(input[1:g.idx], ':')
from data
cross join generate_series(1, cardinality(input)) as g(idx);
generate_series(1, cardinality(input)) generates as many rows as the array has elements. And the expression input[1:g.idx] takes the "sub-array" starting with the first up to the "idx" one. As the output is an array, I use array_to_string to re-create the representation with the :
You can use string_agg as a window function. The default frame is from the beginning of the partition to the current row:
SELECT string_agg(s, ':') OVER (ORDER BY n)
FROM unnest(string_to_array('foo:bar:something:221', ':')) WITH ORDINALITY AS u(s, n);
string_agg
-----------------------
foo
foo:bar
foo:bar:something
foo:bar:something:221
(4 rows)

SQL query to Break on row into multiple row based on some delimiter

I have a table named `test' which has following structure.
category key value
name real_name:Brad,nick_name:Brady,name_type:small NOVALUE
other description cool
But I want to break key column into multiple rows based on , delimiter and value after : delimiter should be a part of value column where value is equal to NOVALUE. So output should look like:
category key value
name real_name Brad
name nick_name Brady
name name_type small
other description cool
How to write sql query for this . I am using postgresql.
Any help ? Thanks in advance.
You can use string_to_array and unnest to do this:
select ts.category,
split_part(key_value, ':', 1) as key,
split_part(key_value, ':', 2) as value
from test ts
cross join lateral unnest(string_to_array(ts.key, ',')) as t (key_value)
where ts.value = 'NOVALUE'
union all
select category,
key,
value
from test
where value <> 'NOVALUE';
SQLFiddle example: http://sqlfiddle.com/#!15/6f1e6/1
select category,
split_part(key_value, ':', 1) as key,
case when value = 'NOVALUE' then split_part(key_value, ':', 2) else value end
from test
cross join lateral unnest(string_to_array(key, ',')) as t (key_value)

Postgres query: array_to_string with empty values

I am trying to combine rows and concatenate two columns (name, vorname) in a Postgres query.
This works good like this:
SELECT nummer,
array_to_string(array_agg(name|| ', ' ||vorname), '\n') as name
FROM (
SELECT DISTINCT
nummer, name, vorname
FROM myTable
) AS m
GROUP BY nummer
ORDER BY nummer;
Unfortunately, if "vorname" is empty I get no results although name has a value.
Is it possible get this working:
array_to_string(array_agg(name|| ', ' ||vorname), '\n') as name
also if one column is empty?
Use coalesce to convert NULL values to something that you can concatenate:
array_to_string(array_agg(name|| ', ' ||coalesce(vorname, '<missing>')), '\n')
Also, you can concatenate strings directly without collecting them to an array by using the string_agg function.
If you have 9.1, then you can use third parameter for array_to_string - null string
array_to_string(array_agg(name), ',', '<missing>') from bbb

Recursive replace from a table of characters

In short, I am looking for a single recursive query that can perform multiple replaces over one string. I have a notion it can be done, but am failing to wrap my head around it.
Granted, I'd prefer the biz-layer of the application, or even the CLR, to do the replacing, but these are not options in this case.
More specifically, I want to replace the below mess - which is C&P in 8 different stored procedures - with a TVF.
SET #temp = REPLACE(RTRIM(#target), '~', '-')
SET #temp = REPLACE(#temp, '''', '-')
SET #temp = REPLACE(#temp, '!', '-')
SET #temp = REPLACE(#temp, '#', '-')
SET #temp = REPLACE(#temp, '#', '-')
-- 23 additional lines reducted
SET #target = #temp
Here is where I've started:
-- I have a split string TVF called tvf_SplitString that takes a string
-- and a splitter, and returns a table with one row for each element.
-- EDIT: tvf_SplitString returns a two-column table: pos, element, of which
-- pos is simply the row_number of the element.
SELECT REPLACE('A~B!C#D#C!B~A', MM.ELEMENT, '-') TGT
FROM dbo.tvf_SplitString('~-''-!-#-#', '-') MM
Notice I've joined all the offending characters into a single string separated by '-' (knowing that '-' will never be one of the offending characters), which is then split. The result from this query looks like:
TGT
------------
A-B!C#D#C!B-A
A~B!C#D#C!B~A
A~B-C#D#C-B~A
A~B!C-D-C!B~A
A~B!C#D#C!B~A
So, the replace clearly works, but now I want it to be recursive so I can pull the top 1 and eventually come out with:
TGT
------------
A-B-C-D-C-B-A
Any ideas on how to accomplish this with one query?
EDIT: Well, actual recursion isn't necessary if there's another way. I'm pondering the use of a table of numbers here, too.
You can use this in a scalar function. I use it to remove all control characters from some external input.
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)
I figured it out. I failed to mention that the tvf_SplitString function returns a row number as "pos" (although a subquery assigning row_number could also have worked). With that fact, I could control cross join between the recursive call and the split.
-- the cast to varchar(max) matches the output of the TVF, otherwise error.
-- The iteration counter is joined to the row number value from the split string
-- function to ensure each iteration only replaces on one character.
WITH XX AS (SELECT CAST('A~B!C#D#C!B~A' AS VARCHAR(MAX)) TGT, 1 RN
UNION ALL
SELECT REPLACE(XX.TGT, MM.ELEMENT, '-'), RN + 1 RN
FROM XX, dbo.tvf_SplitString('~-''-!-#-#', '-') MM
WHERE XX.RN = MM.pos)
SELECT TOP 1 XX.TGT
FROM XX
ORDER BY RN DESC
Still, I'm open to other suggestions.