T-SQL Pattern matching issue - tsql

I need to determine whether a given string is of the format 'abcd efg -4' i.e '% -number'. I need to isolate the '4', and increment it to '5'.
The rest of the string can contain dates and times like so:
abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM
this string, for instance, does NOT satisfy the pattern i.e. -[number]. For this string, the output from my SQL should be
abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM -1
If the above is input, I should get:
abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM -2
The number can be any number of digits i.e. so a string could be 'abcd efg -123', and my T-SQL would return 'abcd efg -124'
This T-SQL code is going to be embedded in a stored procedure. I know I could implement a .Net stored proc/function and use Regex to do this, however there are various access issues which I have to get around in order to switch-on the CLR on the SQL Server.
I have tried the following patterns:
'%[ ][-]%[0-9]', this works for most cases, but put in an extra space somewhere and it fails
'%[ ][-]%[^a-z][^A-Z]%[0-9]', this manages to skip '-4' (as shown in the above example), but works in several cases, such
'%[ ][-][^a-z][^A-Z]%[0-9]', this again works in some, doesn't in others...
This pattern ' -[number]' would always be at the end of the string, if it's not present the code would append it, as seen in the examples above.
I would like a pattern that works for ALL cases...

Interesting problem. You do realize that this is much more difficult than it really needs to be. If you properly normalized your table so that each column only contains one piece of information, you wouldn't have a problem at all. If it's possible, I would strongly encourage you to consider normalizing this data.
If you cannot normalize the data, then I would approach this backwards. You said the dash-number you are looking for would always appear at the end of the data. Why not reverse the string, parse it, and put it back together. By reversing the string, you will be looking for '[0-9]%[-]' which is a whole lot easier to find.
I put your test data in to a table variable so that I could test the code I've come up with. You can copy/paste this to a query window to see how it works.
Declare #Temp Table(Data VarChar(100))
Insert Into #Temp Values('abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM')
Insert Into #Temp Values('abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM -1')
Insert Into #Temp Values('abcd efg - ghis asdjh - 07-07-2011 05-30-34 AM -2')
Insert Into #Temp Values('abcd efg -123')
Select Case When PatIndex('[0-9]%[-]%', Reverse(Data)) = 1
Then Left(Data, Len(Data)-CharIndex('-', Reverse(Data))) + '-' +
Convert(VarChar(20), 1+Convert(Int, Reverse(Left(Reverse(Data), CharIndex('-', Reverse(Data))-1))))
Else Data + ' -1'
End
From #Temp

Related

How to remove special characters from a string in postgresql

I am trying to remove using REGEXP_REPLACE the following special characters: "[]{}
from the following text field: [{"x":"y","s":"G_1","cn":"C8"},{"cn":"M2","gn":"G_2","cn":"CA99"},{"c":"ME3","gn":"G_3","c":"CA00"}]
and replace them with nothing, not even a space.
*Needless to say, this is just an example string, and I need to find a consistent solution for similar but different strings.
I was trying to run the following: SELECT REGEXP_REPLACE('[{"x":"y","s":"G_1","cn":"C8"},{"cn":"M2","gn":"G_2","cn":"CA99"},{"c":"ME3","gn":"G_3","c":"CA00"}] ','[{[}]":]','')
But received pretty much the same string..
Thanks in advance!
You need to escape the special characters (\), and to specify that you want to repeat the operation for every characters ('g') else it will stop at the 1st match
SELECT REGEXP_REPLACE(
'[{"x":"y","s":"G_1","cn":"C8"},{"cn":"M2","gn":"G_2","cn":"CA99"},{"c":"ME3","gn":"G_3","c":"CA00"}] ',
'[{\[}\]":]',
'',
'g');
regexp_replace
--------------------------------------------------
xy,sG_1,cnC8,cnM2,gnG_2,cnCA99,cME3,gnG_3,cCA00
(1 row)

PostgreSQL Trim excessive trailing zeroes: type numeric but expression is of type text

I'm trying to clean out excessive trailing zeros, I used the following query...
UPDATE _table_ SET _column_=trim(trailing '00' FROM '_column_');
...and I received the following error:
ERROR: column "_column_" is of
expression is of type text.
I've played around with the quotes since that usually is what it barrels down to for text versus numeric though without any luck.
The CREATE TABLE syntax:
CREATE TABLE _table_ (
id bigint NOT NULL,
x bigint,
y bigint,
_column_ numeric
);
You can cast the arguments from and the result back to numeric:
UPDATE _table_ SET _column_=trim(trailing '00' FROM _column_::text)::numeric;
Also note that you don't quote column names with single quotes as you did.
Postgres version 13 now comes with the trim_scale() function:
UPDATE _table_ SET _column_ = trim_scale(_column_);
trim takes string parameters, so _column_ has to be cast to a string (varchar for example). Then, the result of trim has to be cast back to numeric.
UPDATE _table_ SET _column_=trim(trailing '00' FROM _column_::varchar)::numeric;
Another (arguably more consistent) way to clean out the trailing zeroes from a NUMERIC field would be to use something like the following:
UPDATE _table_ SET _column_ = CAST(to_char(_column_, 'FM999999999990.999999') AS NUMERIC);
Note that you would have to modify the FM pattern to match the maximum expected precision and scale of your _column_ field. For more details on the FM pattern modifier and/or the to_char(..) function see the PostgreSQL docs here and here.
Edit: Also, see the following post on the gnumed-devel mailing list for a longer and more thorough explanation on this approach.
Be careful with all the answers here. Although this looks like a simple problem, it's not.
If you have pg 13 or higher, you should use trim_scale (there is an answer about that already). If not, here is my "Polyfill":
DO $x$
BEGIN
IF count(*)=0 FROM pg_proc where proname='trim_scale' THEN
CREATE FUNCTION trim_scale(numeric) RETURNS numeric AS $$
SELECT CASE WHEN trim($1::text, '0')::numeric = $1 THEN trim($1::text, '0')::numeric ELSE $1 END $$
LANGUAGE SQL;
END IF;
END;
$x$;
And here is a query for testing the answers:
WITH test as (SELECT unnest(string_to_array('1|2.0|0030.00|4.123456000|300000','|'))::numeric _column_)
SELECT _column_ original,
trim(trailing '00' FROM _column_::text)::numeric accepted_answer,
CAST(to_char(_column_, 'FM999999999990.999') AS NUMERIC) another_fancy_one,
CASE WHEN trim(_column_::text, '0')::numeric = _column_ THEN trim(_column_::text, '0')::numeric ELSE _column_ END my FROM test;
Well... it looks like, I'm trying to show the flaws of the earlier answers, while just can't come up with other testcases. Maybe you should write more, if you can.
I'm like short syntax instead of fancy sql keywords, so I always go with :: over CAST and function call with comma separated args over constructs like trim(trailing '00' FROM _column_). But it's a personal taste only, you should check your company or team standards (and fight for change them XD)

TSQL Concatenation

I often need to concatenate fields in TSQL...
Two issues TSQL forces you to deal with when using the '+' operator are Data Type Precedence and NULL values.
With Data Type Precedence, the problem is conversion errors.
1) SELECT 1 + 'B' = Conversion ERROR
2) SELECT 1 + '1' = 2
3) SELECT '1' + '1' = '11'
In 2), the varchar '1' is implicitly converted to an int, and the math works. However, in 1), the int 1 is NOT implicitly converted to a varchar. This is where DTP is (IMO) getting in the way. Essentially, it favors Math functions over String functions. I Wish :-) that DTP wasn't even a consideration in this case -- why wouldn't the '+' operator be configured so that the operation could favor success over specific data-types? I wouldn't mind if it still favored MATH over String functions when possible -- but why doesn't it favor String functions over Errors? (The only way to be successful in 1) is to treat it as a string function -- so it's not like there's any ambiguity there.) Somebody at Microsoft thought that throwing an error in 1) would be more valuable to the programmer than treating the '+' as a string function. Why? And why didn't they provide a way to override it? (Or did they...that's really the heart of my question.) SET STRING_PREFERENCE ON would have been nice! :-P
In order to deal with this, you have to do more work -- you have to explicitly convert the 1 to a varchar, using any number of different string functions -- typically CAST/CONVERT, but also many others (like LTRIM()) will work.
Conversions become work-intensive when you deal with table fields when you don't know the data-type. This might work:
SELECT 'Fall ' + ' (' + [Term] + ')' -- Output: Fall (2011)
But then again, it might not. It just depends on what data-type of [Term] is. And to complicate that, the dba might change the dataype at some point without telling anyone (because it came as part of a big upgrade package once the vendor finally realized that there are only ever numbers stored in the [Term] field, or whatever reason).
So if you want to be a boyscount, you do this:
SELECT 'Fall ' + ' (' + LTRIM([Term]) + ')'
So now I'm running this LTRIM function every time, even though it might not be necessary, because I don't know the data-type of [Term] (OK -- I can look that up, but that's almost like work, and I don't like interruptions while I'm coding :-P *grump), and also, I don't know that the data-type will never change.
The second issue you have to confront with TSQL concatenation is how to deal with NULL values. For example, this would fail:
SELECT NULL + 'B'
So you need to do this:
SELECT 'Fall ' + ' (' + LTRIM(ISNULL([Term],'')) + ')'
What a pain -- I wish I could just do this:
SELECT 'Fall ' + ' (' + [Term] + ')'
So I'm wondering if there are any (TSQL) ways to avoid having to do explicit data-type conversions and null checks on every field where I have to ensure the '+' operator behaves itself as I need it to.
Thanks!
EDIT
#a1ex07 came up with a great answer for working around the NULL issue (SET CONCAT_NULL_YEILDS_NULL OFF), but as I looked into it, it appears to be problematic as far as forcing stored procedures to re-compile every time they're executed.
SQL Server 2012 does have the CONCAT function which addresses all the issues you raise.
A good summary of the functionality is provided here by SQL Menace
CONCAT takes a variable number of string arguments and concatenates
them into a single string. It requires a minimum of two input values;
otherwise, an error is raised. All arguments are implicitly converted
to string types and then concatenated. Null values are implicitly
converted to an empty string. If all the arguments are null, then an
empty string of type varchar(1) is returned. The implicit conversion
to strings follows the existing rules for data type conversions
UPDATE
You can use CONCAT_NULL_YIELDS_NULL to specify whether NULL + 'txt' results NULL.
Microsft says that CONCAT_NULL_YIELDS_NULL will not work in further versions of SQL Server, but there is still an option to set it through sp_dboption procedure . But probably it's better to use ISNULL as you mentioned in the question.
Try this:
/* 1 */ SELECT cast(1 as varchar) + 'B'; /* = 1B */
/* or */ SELECT convert(varchar, 1) + 'B'; /* = 1B */
/* 2 */ SELECT cast(1 as varchar) + '1'; /* = 11 */
/* 3 */ SELECT '1' + '1'; /* = 11 */
/* or */ SELECT CONVERT(VARCHAR, '1') + CONVERT(VARCHAR, '1');
--NULL value:
DECLARE #A AS VARCHAR(10);
SET #A = NULL;
SELECT ISNULL(#A, '') + 1; /* = 1 */
There is no answer to this for 2005 or 2008. Concatenation without explicit conversions and null checks simply isn't possible.
It looks like the next version of SQL-Server will have a CONCAT function (thanks #Martin) which sounds like exactly what I'm looking for. The downside though is that it will probably be at least a handful of years before my institution decides to upgrade to that version, since they're pretty shy about being early adopters, especially when it comes to Microsoft.
There is a shortcut for the NULL checks right now (CONCAT_NULL_YIELDS_NULL -- thanks #a1ex07), however, using that has a pretty big penalty (re-compiles the procedure every time it is executed), not to mention Microsoft isn't planning to support it in future versions of SQL-Server.

I need a T-SQL script that strips the numeric chars on the right

For example
Input: 0123BBB123456 Output: 123456
Input: ABC00123 Output: 00123
Input: 123AB0345 Output: 0345
In other words, the code should start stripping characters from the right and stop when a character that is no 0-9 is encountered.
I have to run this agains several millions of records, so I am looking for an efficient set based approach, not a cursor approach that performs substring functions in a loop for each record.
I am having issues trying to format this for reading. Give me a few minutes.
Frustrating...I think that the browser that I am using, IE6 (mandated by my company) is making this challenging. This site doesnt work well with 6.
How about;
;with test(value) as (
select '0123BBB123456' union
select 'ABC00123' union
select '123AB0345' union
select '123'
)
select
value,
right(value, patindex('%[^0-9]%', reverse('?' + value)) - 1)
from test
0123BBB123456 123456
123 123
123AB0345 0345
ABC00123 00123

PostgreSQL - how to check if my data contains a backslash

SELECT count(*) FROM table WHERE column ilike '%/%';
gives me the number of values containing "/"
How to do the same for "\"?
SELECT count(*)
FROM table
WHERE column ILIKE '%\\\\%';
Excerpt from the docs:
Note that the backslash already has a special meaning in string literals, so to write a pattern constant that contains a backslash you must write two backslashes in an SQL statement (assuming escape string syntax is used, see Section 4.1.2.1). Thus, writing a pattern that actually matches a literal backslash means writing four backslashes in the statement. You can avoid this by selecting a different escape character with ESCAPE; then a backslash is not special to LIKE anymore. (But it is still special to the string literal parser, so you still need two of them.)
Better yet - don't use like, just use standard position:
select count(*) from table where 0 < position( E'\\' in column );
I found on 12.5 I did not need an escape character
# select * from t;
x
-----
a/b
c\d
(2 rows)
# select count(*) from t where 0 < position('/' in x);
count
-------
1
(1 row)
# select count(*) from t where 0 < position('\' in x);
count
-------
1
(1 row)
whereas on 9.6 I did.
Bit strange but there you go.
Usefully,
position(E'/' in x)
worked on both versions.
You need to be careful - E'//' seems to work (i.e. parses) but does not actually find a slash.
You need E'\\\\' because the argument to LIKE is a regex and regex escape char is already \ (e.g ~ E'\\w' would match any string containing a printable char).
See the doc