I have an amount field and a commission field that I need to remove the comma: , decimal point: . dash: - and the percent sign: %.
I have tried replicate, format, replace and stuff,
right ('000000000')
right('000000000') + rtrim(field), len#)
RTRIM(replicate('0', 9 - len(field)) + REPLACE(REPLACE(REPLACE(cast(field as varchar), ',', ''), '.',''), '-', ''))
RTRIM(replicate('0', 9 - len(t.Commission_Amount)) + REPLACE(REPLACE(REPLACE(cast(t.Commission_Amount as varchar(9)), ',', ''), '.',''), '-', ''))
but I never get the results that I want. When I use replace it replaces the comma, dash, or % but cuts the field short and does not pad to the left with zeros. I know it's probably right in front of my face I just need some clarity please.
00-126.47 comes out as 0012647
0.00 comes out as 00000000
000126.47 comes out as 00012647
Try this:
create sample table
DECLARE #Table as table (
field varchar(15)
)
populate sample table
INSERT INTO #Table VALUES
('00-126.47'),
('0.00'),
('000126.47'),
('00033%2.422')
select
SELECT field As before,
RIGHT(REPLICATE('0', 9) +
REPLACE(
REPLACE(
REPLACE(
REPLACE(field, '-', '')
, '.', '')
, ',', '')
, '%', '')
, 9) As [After]
FROM #Table
results:
before After
--------------- ---------
00-126.47 000012647
0.00 000000000
000126.47 000012647
00033%2.422 000332422
this solved my problem;
RIGHT(REPLICATE('0', 9) + REPLACE(REPLACE(REPLACE(REPLACE(field, '-', ''), '.', ''), ',', ''), '%', ''), 9) As [After]
Related
I have a t-sql query that looks like this:
select * from (
SELECT [Id], replace(ca.[AKey], '-', '') as [AKey1], rtrim(replace(replace(replace(lower([Name]), '#', ''), '(1.0)', ''), '(2.5)', '')) as [Name], [Key], dw.[AKey], replace(lower(trim([wName])), '#', '') as [wName]
FROM [dbo].[wTable] ca
FULL JOIN (select * from [dw].[wTable]) dw on
rtrim(left( replace(replace(replace(lower(dw.[wName]), '(1.0)', ''), '(2.5)', ''), '#', ''), 5))+'%'
like
rtrim(left( replace(replace(replace(lower(ca.[Name] ), '(1.0)', ''), '(2.5)', ''), '#', ''), 5))+'%'
and
right(rtrim(replace(replace(replace(lower(dw.[wName]), '(1.0)', ''), '(2.5)', ''), '#', '')), 2)
like
right(rtrim(replace(replace(replace(lower(ca.[Name] ), '(1.0)', ''), '(2.5)', ''), '#', '')), 2)
) tp
As you can see, during the JOIN, it's removing some fuzzy characters that may or may not exist, and it's checking to see if the first 5 characters in the wName column match with the first 5 characters in the Name column, then doing the same for the last 2 characters in the columns.
So essentially, it's matching on the first 5 characters AND last 2 characters.
What I'm trying to add is an additional column that will tell me if the resulting columns are an exact match or if they are fuzzy. In other words, if they are an exact match it should say 'True' or something like that, and if they are a fuzzy match I would ideally like it to tell me how far off they are. For example, how many characters do not match.
As JNevil mentioned you could use Levenshtein. You can also use Damarau-Levenshtein or the Longest Common Substring depending on how accurate you want to get and what your performance expectations are.
Below are two solutions. The first is a Levenshtein solution using a copy I grabbed from Phil Factor here. The Longest Common Substring solution uses my version of the Longest Common Substring which is fastest available for SQL Server (by far).
-- sample data
declare #t1 table (string1 varchar(100));
declare #t2 table (string2 varchar(100));
insert #t1 values ('abc'),('xxyz'),('1234'),('9923');
insert #t2 values ('abcd'),('xyz'),('2345'),('zzz');
-- Levenshtein
select string1, string2, Ld
from
(
select *, Ld = dbo.LEVENSHTEIN(t1.string1, t2.string2)
from #t1 t1
cross join #t2 t2
) compare
where ld <= 2;
-- Longest Common Substring
select string1, string2, lcss = item, lcssLen = itemlen, diff = mx.L-itemLen
from #t1 t1
cross join #t2 t2
cross apply dbo.lcssWindowAB(t1.string1, t2.string2, 20)
cross apply (values (IIF(len(string1) > len(string2), len(string1),len(string2)))) mx(L)
where mx.L-itemLen <= 2;
RESULTS
string1 string2 Ld
-------- -------- -----
abc abcd 1
xxyz xyz 1
1234 2345 2
string1 string2 lcss lcssLen diff
-------- -------- ----- ----------- -----------
abc abcd abc 3 1
xxyz xyz xyz 3 1
1234 2345 234 3 1
9923 2345 23 2 2
This does not answer your question but should get you started.
P.S. The Levenshtein function I posted does have a small bug, it says the distance between "9923" and "2345" is 4, the correct answer would be two. There's other Levenshtein functions out there though.
I need to split a string as follows, when I try with with split_part no luck
select split_part('8 HAMPSHIRE RD',' ',2)
Expected output: HAMPSHIRE RD
A cheaper solution without a regular expression:
SELECT substring (
'8 HAMPSHIRE RD'
FROM position(' ' IN '8 HAMPSHIRE RD') + 1
);
Use regexp_replace():
select regexp_replace('8 HAMPSHIRE RD', '.*?\s', '');
regexp_replace
----------------
HAMPSHIRE RD
(1 row)
An alternative solution using string manipulation functions:
with my_table(str) as (
values ('8 HAMPSHIRE RD')
)
select right(str, -strpos(str, ' '))
from my_table;
If you want to skip the first word if it contains only digits you should use \d (digit) instead of . (any char):
select regexp_replace('8 HAMPSHIRE RD', '\d*?\s', '');
My records in the table are as follows:
id column1
1 'Record1'
2 ' Record2'
3 ' Record3a, Record3b'
4 'Record4a , Record4b, Record4c '
column1 type: text
pre-defined array= {record1,record2,record3a}
While I'm checking the values with a pre-defined array using && operator, most of the values are missed because of the delimiter space between those which are unnecessary.
Hence I need to first remove these space that are there in beginning or end (only) and then do string_to_array() so that the result could be compared to my pre-defined array
Use trim() to remove leading a trailing whitespace:
SELECT string_to_array(trim(both ' ' from regexp_replace(column1, '\s*,\s*', ',')), ',')
FROM yourTable
SELECT string_to_array(trim(both ' ' from regexp_replace(column1,
'\s*,\s*', ',')), ',') FROM yourTable
It works, but the 'g' flag should be added to remove all whitespaces:
SELECT string_to_array(
trim(both ' ' from regexp_replace(column1, '\s*,\s*', ',')), ',','g')
FROM yourTable
So if I have a varchar length string column let's call ID(samples below):
97.128.39.256.1460854333288493
25.365.49.12.13454154815132
346.45.156.354.1523425161233
I want to grab, like a left in excel, everything to the left of the 4th period. How do i create a dynamic string to find the fourth instance of a period?
I know substring is a start but not sure how to write in the dynmic length that exists
This is probably the easiest for someone else to read:
select split_part(i, '.', 1) || '.' ||
split_part(i, '.', 2) || '.' ||
split_part(i, '.', 3) || '.' ||
split_part(i, '.', 4)
from (select '97.128.39.256.1460854333288493' as i) as sub;
Or if you don't like split_part and prefer to use arrays:
select array_to_string((string_to_array(i, '.'))[1:4], '.')
from (select '97.128.39.256.1460854333288493' as i) as sub;
I think the array example is a bit harder to grasp at first glance but both work.
Updated answer based on revised question to also convert the Unix timestamp to a Greenplum timestamp:
select 'epoch'::timestamp + '1 second'::interval *
(split_part(i, '.', 5)::numeric/1000000) as event_time,
array_to_string((string_to_array(i, '.'))[1:4], '.') as ip_address
from (
select '97.128.39.256.1460854333288493' as i
) as sub;
You could also try this:
mydb=> select regexp_replace('97.128.39.256.1460854333288493', E'^((?:\\d+\\.){3}\\d+).+$', E'\\1');
regexp_replace
----------------
97.128.39.256
(1 row)
Time: 0.634 ms
with t (s) as ( values
('97.128.39.256.1460854333288493'),
('25.365.49.12.13454154815132'),
('346.45.156.354.1523425161233')
)
select a[1] || '.' || a[2] || '.' || a[3] || '.' || a[4]
from (
select regexp_split_to_array(s, '\.')
from t
) t (a)
;
?column?
----------------
97.128.39.256
25.365.49.12
346.45.156.354
I have a table in with the following layout:
CREATE TABLE dbo.tbl (
Ten_Ref VARCHAR(20) NOT NULL,
Benefit VARCHAR(20) NOT NULL
);
INSERT INTO dbo.tbl (Ten_Ref, Benefit)
VALUES ('1', 'HB'),
('1', 'WTC'),
('1', 'CB'),
('2', 'CB'),
('2', 'HB'),
('3', 'WTC');
I then run this code to perform a transform and concatenation (I need all the benefit information in one field'
with [pivot] as
(
SELECT Ten_Ref
,[HB] = (Select Benefit FROM tbl WHERE t.Ten_Ref = Ten_Ref and Benefit = 'HB')
,[CB] = (Select Benefit FROM tbl WHERE t.Ten_Ref = Ten_Ref and Benefit = 'CB')
,[WTC] = (Select Benefit FROM tbl WHERE t.Ten_Ref = Ten_Ref and Benefit = 'WTC')
/*Plus 7 more of these*/
FROM tbl as t
GROUP BY Ten_Ref
)
select p.ten_Ref
/*A concatenation to put them all in one field, only problem is you end up with loads of spare commas*/
,[String] = isnull (p.HB,0) + ',' + isnull (p.cb,'') + ',' + isnull (p.wtc,'')
from [pivot] as p
My problem is not every ten_ref has all of the Benefits attached.
Using this code, where there is a gap or NULL then I end up with loads of double commas e.g 'HB,,WTC'
How can I get it so it is only one comma, regardless of the amount of benefits each tenancy has?
Are you looking for something like this?
SELECT A.Ten_Ref,
STUFF(CA.list,1,1,'') list
FROM tbl A
CROSS APPLY(
SELECT ',' + Benefit
FROM tbl B
WHERE A.Ten_Ref = B.Ten_Ref
ORDER BY Benefit
FOR XML PATH('')
) CA(list)
GROUP BY A.ten_ref,CA.list
Results:
Ten_Ref list
-------------------- ------------------
1 CB,HB,WTC
2 CB,HB
3 WTC
Or if you really want to use pivot and manually concatenate, you could do this:
SELECT Ten_Ref,
--pvt.*,
ISNULL(HB + ',','') + ISNULL(CB + ',','') + ISNULL(WTC + ',','') AS list
FROM tbl
PIVOT
(
MAX(Benefit) FOR Benefit IN([HB],[CB],[WTC])
) pvt