Is it possible to match any character that is not ']' in PATINDEX? - tsql

I need to find the index of the first character that is not ]. Normally to match any character except X, you use the pattern [^X]. The problem is that [^]] simply closes the first bracket too early. The first part, [^], will match any character.
In the documentation for the LIKE operator, if you scroll down to the section "Using Wildcard Characters As Literals" it shows a table of methods to indicated literal characters like [ and ] inside a pattern. It makes no mention of using [ or ] inside double brackets. If the pattern is being used with the LIKE operator, you would use the ESCAPE clause. LIKE doesn't return an index and PATINDEX doesn't seem to have a parameter for an escape clause.
Is there no way to do this?
(This may seem arbitrary. To put some context around it, I need to match ] immediately followed by a character that is not ] in order to locate the end of a quoted identifier. ]] is the only character escape inside a quoted identifier.)

This isn't possible. The Connect item PATINDEX Missing ESCAPE Clause is closed as won't fix.
I'd probably use CLR and regular expressions.
A simple implementation might be
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlInt32 PatIndexCLR(SqlString pattern, SqlString expression)
{
if (pattern.IsNull || expression.IsNull)
return new SqlInt32();
Match match = Regex.Match(expression.ToString(), pattern.ToString());
if (match.Success)
{
return new SqlInt32(match.Index + 1);
}
else
{
return new SqlInt32(0);
}
}
}
With example usage
SELECT [dbo].[PatIndexCLR] ( N'[^]]', N']]]]]]]]ABC[DEF');
If that is not an option a possible flaky workaround might be to substitute a character unlikely to be in the data without this special significance in the grammar.
WITH T(Value) AS
(
SELECT ']]]]]]]]ABC[DEF'
)
SELECT PATINDEX('%[^' + char(7) + ']%', REPLACE(Value,']', char(7)))
FROM T
(Returns 9)

Related

Is there a function to escape all regex-relevant characters?

The regex I'm using in my application is a combination of user-input and code. Because I don't want to restrict the user I would like to escape all regex-relevant characters like "+", brackets , slashes etc. from the entry.
Is there a function for that or at least an easy way to get all those characters in an array so that I can do something like this:
for regexChar in regexCharacterArray{
myCombinedRegex = myCombinedRegex.replaceOccurences(of: regexChar, with: "\\" + regexChar)
}
Yes, there is NSRegularExpression.escapedPattern(for:):
Returns a string by adding backslash escapes as necessary to protect any characters that would match as pattern metacharacters.
Example:
let escaped = NSRegularExpression.escapedPattern(for: "[*]+")
print(escaped) // \[\*]\+

Could I specify pattern match priority in lex code?

I've got a related thread in the site(My lex pattern doesn't work to match my input file, how to correct it?)
The problems I met, is about how "greedy" lex will do pattern match, e.g. I've got my lex file:
$ cat b.l
%{
#include<stdio.h>
%}
%%
"12" {printf("head\n");}
"34" {printf("tail\n");}
.* {printf("content\n");}
%%
What I wish to say is, when meet "12", print "head"; when meet "34", print "tail", otherwise print "content" for the longest match that doesn't contain either "12" or "34".
But the fact was, ".*" was a greedy match that whatever I input, it prints "content".
My requirement is, when I use
12sdf2dfsd3sd34
as input, the output should be
head
content
tail
So seems there're 2 possible ways:
1, To specify a match priority for ".*", it should work only when neither "12" and "34" works to match. Does lex support "priority"?
2, to change the 3rd expression, as to match any contiguous string that doesn't contain sub-string of "12", or "34". But how to write this regular expression?
Does (f)lex support priority?
(F)lex always produces the longest possible match. If more than one rule matches the same longest match, the first one is chosen, so in that case it supports priority. But it does not support priority for shorter matches, nor does it implement non-greedy matching.
How to match a string which does not contain one or more sequences?
You can, with some work, create a regular expression which matches a string not containing specified substrings, but it is not particularly easy and (f)lex does not provide a simple syntax for such regular expressions.
A simpler (but slightly less efficient) solution is to match the string in pieces. As a rough outline, you could do the following:
"12" { return HEAD; }
"34" { if (yyleng > 2) {
yyless(yyleng - 2);
return CONTENT;
}
else
return TAIL;
}
.|\n { yymore(); }
This could be made more efficient by matching multiple characters when there is not chance of skipping a delimiter; change the last rule to:
.|[^13]+ { yymore(); }
yymore() causes the current token to be retained, so that the next match appends to the current token rather than starting a new token. yyless(x) returns all but the first x characters to the input stream; in this case, that is used to cause the end delimiter 34 to be rescanned after the CONTENT token is identified.
(That assumes you actually want to tokenize the input stream, rather than just print a debugging message, which is why I called it an outline solution.)

why can the character's order in regex expression affect sed?

The tv.txt file is as following:
mms://live21.gztv.com/gztv_gz 广州台[可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3]
mms://live21.gztv.com/gztv_news 广州新闻台·直播广州(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
mms://live21.gztv.com/gztv_kids 广州少儿台(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
mms://live21.gztv.com/gztv_econ 广州经济台
I want to group it into three groups.
sed -r 's/([^ ]*)\s([^][()]*)((\(.+\))*|(\[.+\])*)/\3/' tv.txt
got the result:
[可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3]
(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
When I write it into
sed -r 's/([^ ]*)\s([^][()]*)((\(.+\))*|(\[.+\])*)/\3/' tv.txt
It can't work.
The only difference is [^][()] and [^[]()]; neither of the [^\[\]()] ,escape characters can not make it run properly.
I want to know the reason.
The POSIX rules for getting ] into a character class are a little arcane, but they make sense when you think about it hard.
For a positive (non-negated) character class, the ] must be the first character:
[]and]
This recognizes any character a, n, d or ] as part of the character class.
For a negated character class, the ] must be the first character after the ^:
[^]and]
This recognizes any character except a, n, d or ] as part of the character class.
Otherwise, the first ] after the [ marks the end of the character class. Inside a character class, most of the normal regex special characters lose their special meaning, and others (notably - minus) acquire special meanings. (If you want a - in a character class, it has to be 'first' or last, where 'first' means 'after the optional ^ and only if ] is not present'.)
In your examples:
[^][()] — this is a negated character class that recognizes any character except [, ], ( or ), but
[^[]()] — this is a negated character class that recognizes any character except [, followed by whatever () symbolizes in the regex family you're using, and ] which represents itself.

Scala string pattern matching for mathematical symbols

I have the following code:
val z: String = tree.symbol.toString
z match {
case "method +" | "method -" | "method *" | "method ==" =>
println("no special op")
false
case "method /" | "method %" =>
println("we have the special div operation")
true
case _ =>
false
}
Is it possible to create a match for the primitive operations in Scala:
"method *".matches("(method) (+-*==)")
I know that the (+-*) signs are used as quantifiers. Is there a way to match them anyway?
Thanks from a avidly Scala scholar!
Sure.
val z: String = tree.symbol.toString
val noSpecialOp = "method (?:[-+*]|==)".r
val divOp = "method [/%]".r
z match {
case noSpecialOp() =>
println("no special op")
false
case divOp() =>
println("we have the special div operation")
true
case _ =>
false
}
Things to consider:
I choose to match against single characters using [abc] instead of (?:a|b|c).
Note that - has to be the first character when using [], or it will be interpreted as a range. Likewise, ^ cannot be the first character inside [], or it will be interpreted as negation.
I'm using (?:...) instead of (...) because I don't want to extract the contents. If I did want to extract the contents -- so I'd know what was the operator, for instance, then I'd use (...). However, I'd also have to change the matching to receive the extracted content, or it would fail the match.
It is important not to forget () on the matches -- like divOp(). If you forget them, a simple assignment is made (and Scala will complain about unreachable code).
And, as I said, if you are extracting something, then you need something inside those parenthesis. For instance, "method ([%/])".r would match divOp(op), but not divOp().
Much the same as in Java. To escape a character in a regular expression, you prefix the character with \. However, backslash is also the escape character in standard Java/Scala strings, so to pass it through to the regular expression processing you must again prefix it with a backslash. You end up with something like:
scala> "+".matches("\\+")
res1 : Boolean = true
As James Iry points out in the comment below, Scala also has support for 'raw strings', enclosed in three quotation marks: """Raw string in which I don't need to escape things like \!""" This allows you to avoid the second level of escaping, that imposed by Java/Scala strings. Note that you still need to escape any characters that are treated as special by the regular expression parser:
scala> "+".matches("""\+""")
res1 : Boolean = true
Escaping characters in Strings works like in Java.
If you have larger Strings which need a lot of escaping, consider Scala's """.
E. g. """String without needing to escape anything \n \d"""
If you put three """ around your regular expression you don't need to escape anything anymore.

RowFilter including [ character in search string

I fill a DataSet and allow the user to enter a search string. Instead of hitting the database again, I set the RowFilter to display the selected data. When the user enters a square bracket ( "[" ) I get an error "Error in Like Operator". I know there is a list of characters that need prefixed with "\" when they are used in a field name, but how do I prevent RowFilter from interpreting "[" as the beginning of a column name?
Note: I am using a dataset from SQL Server.
So, you are trying to filter using the LIKE clause, where you want the "[" or "]" characters to be interpreted as text to be searched ?
From Visual Studio help on the DataColumn.Expression Property :
"If a bracket is in the clause, the bracket characters should be escaped in brackets (for example [[] or []])."
So, you could use code like this :
DataTable dt = new DataTable("t1");
dt.Columns.Add("ID", typeof(int));
dt.Columns.Add("Description", typeof(string));
dt.Rows.Add(new object[] { 1, "pie"});
dt.Rows.Add(new object[] { 2, "cake [mud]" });
string part = "[mud]";
part = part.Replace("[", "\x01");
part = part.Replace("]", "[]]");
part = part.Replace("\x01", "[[]");
string filter = "Description LIKE '*" + part + "*'";
DataView dv = new DataView(dt, filter, null, DataViewRowState.CurrentRows);
MessageBox.Show("Num Rows selected : " + dv.Count.ToString());
Note that a HACK is used. The character \x01 (which I'm assuming won't be in the "part" variable initially), is used to temporarily replace left brackets. After the right brackets are escaped, the temporary "\x01" characters are replaced with the required escape sequence for the left bracket.