Basically, I want to remove the whitespace that exists after numbers
Input:
medication_title
CLORIDRATO DE VENLAFAXINA 75 MG
VIIBRYD 40 MG
KTRIZ UNO 0.6 U/G
Ouput:
medication_title medication_title2
CLORIDRATO DE VENLAFAXINA 75 MG CLORIDRATO DE VENLAFAXINA 75MG
VIIBRYD 40 MG VIIBRYD 40MG
KTRIZ UNO 0.6 U/G KTRIZ UNO 0.6U/G
Ideas?
We can use a regex replacement here:
SELECT
medication_title,
REGEXP_REPLACE(medication_title,
'\y(\d+(?:\.\d+)?) ([^[:space:]]*G)\y',
'\1\2') AS medication_title2
FROM yourTable;
Demo
Here is an explanation of the regex pattern:
\y word boundary
( match and capture in \1
\d+ a number
(?:\.\d+)? followed by optional decimal component
) close capture group \1
match a space (the one we want to remove)
( match and capture in \2
[^[:space:]]* zero or more leading non whitespace characters
G folllwed by "G"
) close capture group \2
\y another word boundary
You can capture the sequences with a regular expression and then assemble them back as needed, as in regexp_replace(x, '([^0-9]*[0-9]) +([^0-9\.]+)', '\1\2').
For example:
select *, regexp_replace(x, '([^0-9]*[0-9]) +([^0-9\.]+)', '\1\2') as y
from (
select 'CLORIDRATO DE VENLAFAXINA 75 MG'
union all select 'VIIBRYD 40 MG'
union all select 'KTRIZ UNO 0.6 U/G '
) t (x)
Result:
x y
-------------------------------- ------------------------------
CLORIDRATO DE VENLAFAXINA 75 MG CLORIDRATO DE VENLAFAXINA 75MG
VIIBRYD 40 MG VIIBRYD 40MG
KTRIZ UNO 0.6 U/G KTRIZ UNO 0.6U/G
Related
I am trying to translate a query from Postgres to DuckDB that does the following: for a given string the query returns
All numbers
All pairs of consecutive tokens
The original Postgres queries are:
select (regexp_matches('34 121 adelaide st melbourne 3000', '[\d]+', 'g'))[1];
select (regexp_matches ( '34 121 adelaide st melbourne 3000', '[a-z0-9]+ [a-z0-9]+', 'g' ))[1] union select (regexp_matches ( regexp_replace ( '34 121 adelaide st melbourne 3000', '[a-z0-9]+', '' ), '[a-z0-9]+ [a-z0-9]+', 'g' ))[1];
For example, given the string '34 121 adelaide st melbourne 3000':
Return a table with row values 34, 121, 3000
Return a table with row values '34 121', '121 adelaide', 'adelaide st', 'st melbourne', 'melbourne 3000']
Using the regexp_extract function I can only return the first match. E.g.,
select regexp_extract('34 121 adelaide st melbourne 3000', '[\d]+');
produces '34' but none of the other digits.
Similarly select regexp_extract('34 121 adelaide st melbourne 3000', '[a-z0-9]+ [a-z0-9]+'); produces '34 121'
The second query I can re-write using a join to produce the correct results (although I would still prefer to do this in a simpler way).
Would anyone be able to assist?
I tried `select regexp_extract('34 121 adelaide st melbourne 3000', '[\d]+');' which results in a table containing only '34' and none of the other numbers.
I have a table in Postgres that has records like
ID
Address
1
862 N Longbranch Road Voorhees, NJ 08043
2
7300 Overlook, Ave Moncks Corner, SC 29461
3
76 SW Green Lake, Street Sterling, VA 20164
4
597 Wintergreen St Erlanger, KY 41018
So for searching a specific address my query is simple
select * from profile where address ilike '7300 Overlook, Ave Moncks Corner, SC%'
This is returning record 2
What I want is
select * from profile where address ilike '7300 Overlook Ave Moncks Corner SC%'
(Please note that commas are missing in second query)
Even if the string inside ilike doesn't contain comma , result 2 should be returned.
I have two sql columns each with delimited data that I want to collate and combine into a single delimited column. The number of items in the column is variable for each row. However there will always be a matching number of items between the two columns of each row. For Example...
*******************************
ORIGINAL SQL TABLE
*******************************
value * unit
*******************************
4 ; 5 * mg ; kg
50 * mg
7.5 ; 325 * kg ; mg
100 ; 1.5 ; 50 * mg ; g ; mg
********************************
*********************************
DESIRED SQL RESULT
*********************************
value-unit
*********************************
4 mg; 5 kg
50 mg
7.5 kg; 325 mg
100 mg; 1.5 g; 50 mg
*********************************
How do I do this with T-SQL? I'm using SQL Server 2012
Using only common table expressions, we can get to the required results too, as below:-
First lets set up the data
declare #original table(
[value] varchar(250),
[unit] varchar(250)
)
insert into #original values
('4 ; 5','mg ; kg '),
('50','mg ' ),
('7.5 ; 325','kg ; mg ' ),
('100 ; 1.5 ; 50 ','mg ; g ; mg' )
Now, lets build the common table expressions:-
;with cte as (
select o.[value]+';' [value],o.[unit]+';' [unit],row_number() over (ORDER BY (Select 0)) [row] from #original o
),cte2 as (
select *
,1 [ValueStart],CHARINDEX(';',[value]) [ValueEnd]
,1 [UnitStart],CHARINDEX(';',[unit]) [UnitEnd]
from cte
),cte3 as (
select * from cte2
union all
select [value],[unit],[row]
,[ValueEnd]+1 [ValueStart],CHARINDEX(';',[value],[ValueEnd]+1) [ValueEnd]
,[UnitEnd]+1 [UnitStart],CHARINDEX(';',[unit],[UnitEnd]+1) [UnitEnd]
from cte3 where [UnitEnd]>0
),cte4 as (
select *,row_number() over (partition by [row] order by [row]) [subRow]
, rtrim(ltrim(substring([unit],[UnitStart],[UnitEnd]-[UnitStart]))) [subUnit]
, rtrim(ltrim(substring([value],[ValueStart],[ValueEnd]-[ValueStart]))) [subValue]
from cte3
where [UnitEnd]>0
),cte5 as (
select subRow,[row],[subValue],[subUnit],cast([subValue]+' '+[subUnit] as varchar(max)) [ValueUnit] from cte4 where subRow=1
union all
select cte4.subRow,cte4.[row],cte4.[subUnit],cte4.[subValue]
,cte5.[ValueUnit]+';'+ cte4.[subValue]+' '+cte4.[subUnit] [ValueUnit]
from cte4
inner join cte5 on (cte5.subRow+1)=cte4.subRow and cte5.[row]=cte4.[row]
),cte6 as (
select *,row_number() over (partition by [row] order by subRow desc) [selected] from cte5
)
select ValueUnit from cte6
where [selected]=1
order by [row]
Results will be as below:-
ValueUnit
============
4 mg;5 kg
50 mg
7.5 kg;325 mg
100 mg;1.5 g;50 mg
STRING_SPLIT is the most current solution. If you're not on 2016 or later and cannot set your db compatibility to that version or later, here is a more old fashioned approach using xml:
I added an Identity to your original table to have something to order by -
SELECT splitNumbers.splitNumber,
splitValues.splitValue,
splitNumbers.splitNumber + ' ' + splitValues.splitValue AS combined
FROM
(
SELECT --numbers,
LTRIM(RTRIM(m.n.value('.[1]', 'varchar(8000)'))) AS splitNumber,
ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM
(
SELECT id,
CAST('<XMLRoot><RowData>' + REPLACE(numbers, ' ; ', '</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS xmlNumbers
FROM #x
) fee
CROSS APPLY xmlNumbers.nodes('/XMLRoot/RowData') m(n)
) splitNumbers
INNER JOIN
(
SELECT LTRIM(RTRIM(m.v.value('.[1]', 'varchar(8000)'))) AS splitValue,
ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM
(
SELECT id,
CAST('<XMLRoot><RowData>' + REPLACE(units, ' ; ', '</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS xmlUnits
FROM #x
) fee
CROSS APPLY xmlUnits.nodes('/XMLRoot/RowData') m(v)
) splitValues
ON splitNumbers.rn = splitValues.rn;
This gives the following results:
================================
splitNumber splitValue combined
4 mg 4 mg
5 kg 5 kg
50 mg 50 mg
7.5 kg 7.5 kg
325 mg 325 mg
100 mg 100 mg
1.5 g 1.5 g
50 mg 50 mg
After the second space, I need to fetch the values till the particular position in the string.
Source:
"8 115 MACKIE STREET VICTORIA PARK WA 6100 AU"
"6A CAMBOON ROAD MORLEY WA 6062 AU"
output:
"MACKIE STREET VICTORIA PARK"
"CAMBOON ROAD MORLEY"
I'm trying to split the street name and suburb from the unit #,street# present in the beginning and the state, postcode, country present in the end.
t=# with s(v) as (values('6A CAMBOON ROAD MORLEY WA 6062 AU'),('8 115 MACKIE STREET VICTORIA PARK WA 6100 A'))
, split as (select *,count(1) over (partition by v) from s, regexp_matches(v,'( [A-Z]+)','g') with ordinality t(m,o))
select distinct v,string_agg(m[1],'') over (partition by v) from split where o <= count-(3-1);
v | string_agg
---------------------------------------------+------------------------------
8 115 MACKIE STREET VICTORIA PARK WA 6100 A | MACKIE STREET VICTORIA PARK
6A CAMBOON ROAD MORLEY WA 6062 AU | CAMBOON ROAD MORLEY
(2 rows)
I excluded index (or any not fitting mask [A-Z]+) thus cutting not three positions from the end, but two (3-1) where 1 is ahead known index.
Also I start not from the second space as it would be against your desired result...
My SQL function:
with recursive locpais as (
select l.id, l.nome, l.tipo tid, lp.pai
from loc l
left join locpai lp on lp.loc = l.id
where l.id = 12554
union
select l.id, l.nome, l.tipo tid, lp.pai
from loc l
left join locpai lp on lp.loc = l.id
join locpais p on (l.id = p.pai)
)
select * from locpais
gives me
12554 | PARNA Pico da Neblina | 9 | 1564
12554 | PARNA Pico da Neblina | 9 | 1547
1547 | São Gabriel da Cachoeira | 8 | 1400
1564 | Santa Isabel do Rio Negro | 8 | 1400
1400 | RIO NEGRO | 7 | 908
908 | NORTE AMAZONENSE | 6 | 234
234 | Amazonas | 5 | 229
229 | Norte | 4 | 30
30 | Brasil | 3 |
which is a hierarchy of places. "PARNA" stands for "National Park", and this one covers two cities: São Gabriel da Cachoeira and Santa Isabel do Rio Negro. Thus it's appearing twice.
If I change the last line for
select string_agg(nome,', ') from locpais
I get
"PARNA Pico da Neblina, PARNA Pico da Neblina, São Gabriel da
Cachoeira, Santa Isabel do Rio Negro, RIO NEGRO, NORTE AMAZONENSE,
Amazonas, Norte, Brasil"
Which is almost fine, except for the double "PARNA Pico da Neblina". So I tried:
select string_agg(distinct nome, ', ') from locpais
but now I get
"Amazonas, Brasil, Norte, NORTE AMAZONENSE, PARNA Pico da Neblina, RIO
NEGRO, Santa Isabel do Rio Negro, São Gabriel da Cachoeira"
Which is out of order. I'm trying to add an order by inside the string_agg, but couldn't make it work yet. The definition of the tables were given here.
As you've found out, you cannot combine DISTINCT and ORDER BY if you don't order by the distinct expression first:
neither in aggregates:
If DISTINCT is specified in addition to an order_by_clause, then all the ORDER BY expressions must match regular arguments of the aggregate; that is, you cannot sort on an expression that is not included in the DISTINCT list.
nor in SELECT:
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
However could use something like
array_to_string(arry_uniq_stable(array_agg(nome ORDER BY tid DESC)), ', ')
with the help of a function arry_uniq_stable that removes duplicates in an array w/o altering it's order like I gave an example for in https://stackoverflow.com/a/42399297/5805552
Please take care to use an ORDER BY expression that actually gives you an deterministic result. With the example you have given, tid alone would be not enough, as there are duplicate values (8) with different nome.
select string_agg(nome,', ')
from (
select distinct nome
from locpais
order by tid desc
) s