Lines that have three upper case characters in parenthesis like (LTI) - remember to escape certain characters :) - postgresql

In this assignment you will create a regular expression to retrieve a subset data from the purpose column of the taxdata table in the readonly database (access details below). Write a regular expressions to retrieve that meet the following criteria:
Lines that have three upper case characters in parenthesis like (LTI) - remember to escape certain characters :)
Lines that have three upper case characters in parenthesis like (LTI) - remember to escape certain characters :)

Related

T-SQL Pattern Match/Regex 0 or 1 occurrences of square bracket

I've been trying to match, using LIKE or PATINDEX where a string of format
[subdomain.domain.com].[dbo].[MyDatabase]
occurs in a SQL string
While the string [subdomain.domain.com] will always appear in this format
the [dbo] and/or [MyDatabase] may appear with or without the square brackets
I have tried the following
WHERE SQLString LIKE '%[[]subdomain.domain.com[]].[[]{0,1}dbo[]]{0,1}.[[]{0,1}MyDatabase[]]{0,1}%'
or
WHERE SQLString LIKE '%\[subdomain.domain.com\].\[{0,1}dbo\]{0,1}.\[{0,1}MyDatabase\]{0,1}%' ESCAPE '\'
I could create multiple WHERE OR WHERE statements, covering each possiblity, but would like to understand why the regex isn't working, in this case.
The square bracket is a special character in the like clause, that means "Any single character within the specified range (for example [a-z])."
You could just use the % special character wherever there might be a square bracket in your text. Something like:
WHERE SQLString LIKE '%subdomain.domain.com%.%dbo%.%MyDatabase%'
Since that will match "Any string of zero or more characters", if there is a square bracket there, great, if not, it still matches.

If there's even one non-english character in any entry in a column I want have 'TRUE' in another column

There are several entries in the column, eng characters with non english characters, eng characters numbers/symbols, non eng characters with numbers/symbols etc. If there's even one non-english character in any entry in the column, I want 'TRUE' in the adjacent column.
SELECT * 
FROM companies
WHERE name LIKE '%[a-z]%';
This code doesn't work.
You can achieve this using regular expressions. Here's a regular expression that will match all ASCII printable characters along with tab (\t), new-line/line-feed (\n), and carriage return (\r).
SELECT
*,
name ~ '[^\t\n\r\x20-\x7E]' AS has_bad_chars
FROM companies
Now this will match any character that's not A-Z, a-z, 0-9, , ., ;, :, ", ', /.
Working from the assumption that the adjacent column you mention is defined in the table as
has_non_english_char boolean then try
update companies
set has_non_english_char = name ~ '[^A-Za-z0-9_.,/$]';
]
Alternatively look into character classes or include additional characters in the above. Note: The regular expression should include the 'English' characters you want to allow.
Perhaps this is not the preferred SO protocol, but I think another answer may be better than just expanding a previous one. If not community please forgive me.
Run:
with companies as
( select * from (values ('abc.com:;'), ('abc.com - "Something')) as c(name))
SELECT 'My RE',name,name ~ '[^A-Za-z0-9&_`!##$^&*()_+=\|\][{’\;:"<.,.}? -]' has_bad_chars, 'f' desirded FROM companies
union
SELECT 'Your RE',name,name ~ '[^A-Za-z0-9&_`!##$^&*()_+=\|][{’;:""<.,.}?-]', 'f' from companies
order by 2,1 desc;
Let's examine the RE themselves and see the difference:
My RE [^A-Za-z0-9&_`!##$^&*()_+=\|\][{’\;:""<.,.}? -]
Your RE [^A-Za-z0-9&_`!##$^&*()_+=\|][{’;:""<.,.}?-]
^^^
Notice the difference at the indicated position. You have '|]' while I have '|\]' also I later have a space.
See the regular expression reference from #cpburnz earlier, in particular section "9.7.3.2. Bracket Expressions".
Your RE breaks down to '[^...][...]' which breaks the RE into looking for 2 distinct set of characters telling the RE engine to find 'any character not in the first bracketed expression' followed immediately by 'any character that is in the second bracketed expression'
The difference is I escaped the right bracket ] thus removing its special meaning in the RE and making it just another character. This is the nature of REs the exact individual characters can make all the difference. If you are going to do much of this type of stuff study REs intently.
Good luck finding the exact RE you need, its out there you just need to work with it until you find it.

Regular expression in Swift to validate Cardholder name

I am looking for a regular expression to use in Swift to validate cardholder name for a credit card. I am looking for a regEx which:
Has minimum 2 and maximum of 26 characters
Accept dashes (-) and apostrophes (') only and no other special character
Capital and small alphabets and no numbers.
Should not start with a blank space.
I was using this
"^[^-\\s][\\p{L}\\-'\\s]{2,26}$"
but it only accepts dash (-) no apostrophe (')
try with this regex
(?<! )[-a-zA-Z' ]{2,26}
see here
https://regex101.com/r/0UVvR1/1
Guessing from your description, this is what you are looking for:
^[\p{L}'-][\p{L}' -]{1,25}$
Demo
A few remarks:
you propbably do not want to allow all possible white-space chars [\r\n\t\f\v ] but just spaces.
you have to adjust the allowed lenght of the second string if you add a 1st group that does not include space and dash (since that group contributs an additional character).
with \p{L} you allow any kind of letter from any language (which is good); otherwise use [a-zA-z] if just want to allow the regular (ASCII) alphabet.
PS: Do not forget to escape the pattern properly: "^[\\p{L}'][\\p{L}' -]{1,25}$"

T-SQL Procedure parameter name

I have to create a procedure with same parameters names as excel columns. Some loook like this 'xxx/xxx' or 'xxx - xxx'. Is there any work around to name parameteres in a stored procedure like this?
Forward slash (/) or dash (-) are not allowed in variable names
According to http://msdn.microsoft.com/en-us/library/ms175874.aspx, that means that the allowed characters are:
Letters as defined in the Unicode Standard 3.2. The Unicode
definition of letters includes Latin characters from a through z,
from A through Z, and also letter characters from other languages.
Decimal numbers from either Basic Latin or other national scripts.
The at sign (#), dollar sign ($), number sign (#), or underscore (_).
Okay first of all why would you ever want to use special characters? That is like saying I want to fry toast underwater with electricity, why can't I have an outlet to allow that? Special characters denote special things and as such many engines, not just in SQL, but most languages will not allow reserved characters for use in variables. The best you could do was put in parameters with the reversed '_' and then replace that AFTER the object was already created for echoing out. The placeholder name of '#(something)' is really arbitrary and could be #X or #LookAtMe. It's type is important to form a contract that must be fulfilled for execution but the naming is really for hooking up. Having said that if you just must have these weird names echoed out you could do something like this:
CREATE PROC pSimpleParam #My_Param INT
AS
SELECT #My_Param
GO
ALTER PROC pSimpleParam #My_Param INT
AS
SELECT
pr.name AS ParameterName
, REPLACE(pr.name, '_', '-') AS AlteredParameterName
FROM sys.procedures p
INNER JOIN sys.parameters pr ON pr.object_id = p.object_id
AND p.name = 'pSimpleParam'
GO

What's the difference between GC=Mark and GC=Punctuation in Unicode general categories?

I'm having trouble understanding some concepts. In the Unicode spec, there's a property called general category.
OK I understood what are each of letters (usual characters; GC=L), numbers (like digits 0–9 and other characters that have numeric values; GC=N) and separators (dividers; GC=Z). But it's really hard to distinguish between symbols (GC=S), punctuation (GC=P), and marks (GC=M).
I looked up a list of them, but I couldn't find conceptual difference. And the document doesn't help me a lot. What's the difference between all these?
Marks aren't standalone characters, but are applied to another character. Non-spacing marks are displayed over the target character, spacing marks are displayed attached to the target character and enclosing marks are displayed surrounding the target character. For example here's an a in a box (the character "a" combined with the enclosing square character):
a⃞
Regarding punctuations versus symbols: As the text you linked explains, some edge cases are classified rather arbitrarily, but in principle the difference is that punctuation is used "to organize and delimit textual units" (i.e. to mark the end of a sentence, separate different parts of a sentence, separate the elements of an enumeration etc.) and symbols "to represent concepts" (like units for example or mathematical notations).