I'm trying to check a string to make sure that it only contains lowercase and uppercased letters, the digits 0-9, underscores, dashes and periods.
The regular expression I've been using for letter, numbers, underscores and dashes works fine and is this:
"[^a-zA-Z0-9_-]"
I'm having difficulty adding the check for spaces and periods though.
I've tried:
"[^a-zA-Z0-9_- ]" (added a space after the dash)
"[^a-zA-Z0-9_-\s\.]" (trying to escape a white space character)
I've also tried putting the \s and \. outside of the main block and also in blocks of their own.
Thanks for any advice.
A hyphen (representing the character) must be at the beginning or at the end of the (negating) character class.
Inside a character class the period is a normal character, it doesn't need to be escaped.
let pattern = "[^a-zA-Z0-9_. -]+"
Be careful about adding characters which have a special meaning: you forgot the hyphen.
I think that this is what you are looking for:
"[\^ a-zA-Z0-9_,\.\-]"
Related
I want a character with a hyphen/dash on top of an underscore.
Is there such a character?
You can do it the other way around by adding a combining character U+0332 COMBINING LOW LINE (alias: NON-SPACING UNDERSCORE) after a U+002D HYPHEN-MINUS (or possibly U+2013 EN DASH) character:
-̲ <U+002D, U+0332>
Yes/Maybe.
See https://superuser.com/a/675743 which has some suggestions:
Use http://shapecatcher.com/ to draw and find a symbol and;
To use a combining character modifier.
Using the shape catcher, I found “ニ” (Katakana letter ni).
And combining “_” with U+305 resulted in “_̅”. The site http://jkorpela.fi/fui.html8 was used to enter the characters.
I found a code with regex where it is claimed that it strips the text of any non-ASCII characters.
The code is written in Perl and the part of code that does it is:
$sentence =~ tr/\000-\011\013-\014\016-\037\041-\055\173-\377//d;
I want to understand how this regex works and in order to do this I have used regexr. I found out that \000, \011, \013, \014, \016, \037, \041, \055, \173, \377 mean separate characters as NULL, TAB, VERTICAL TAB ... But I still do not get why "-" symbols are used in the regex. Do they really mean "dash symbol" as shown in regexr or something else? Is this regex really suited for deleting non-ASCII characters?
This isn't really a regex. The dash indicates a character range, like inside a regex character class [a-z].
The expression deletes some ASCII characters, too (mainly whitespace) and spares a range of characters which are not ASCII; the full ASCII range would simply be \000-\177.
To be explicit, the d flag says to delete any characters not between the first pair of slashes. See further the documentation.
How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");
I am looking for a regular expression to use in Swift to validate cardholder name for a credit card. I am looking for a regEx which:
Has minimum 2 and maximum of 26 characters
Accept dashes (-) and apostrophes (') only and no other special character
Capital and small alphabets and no numbers.
Should not start with a blank space.
I was using this
"^[^-\\s][\\p{L}\\-'\\s]{2,26}$"
but it only accepts dash (-) no apostrophe (')
try with this regex
(?<! )[-a-zA-Z' ]{2,26}
see here
https://regex101.com/r/0UVvR1/1
Guessing from your description, this is what you are looking for:
^[\p{L}'-][\p{L}' -]{1,25}$
Demo
A few remarks:
you propbably do not want to allow all possible white-space chars [\r\n\t\f\v ] but just spaces.
you have to adjust the allowed lenght of the second string if you add a 1st group that does not include space and dash (since that group contributs an additional character).
with \p{L} you allow any kind of letter from any language (which is good); otherwise use [a-zA-z] if just want to allow the regular (ASCII) alphabet.
PS: Do not forget to escape the pattern properly: "^[\\p{L}'][\\p{L}' -]{1,25}$"
Normally, for simple character strings, a leading backtick does the trick.
Example: `abc
However, if the string has some special characters, such as space, this will not work.
Example: `$"abc def"
Example: `$"BAT-3Kn.BK"
What are the rules when $"" is required?
Simple syntax for symbols can be used when the symbol consists of alphanumeric characters, dots (.), colons (:), and (non-leading) underscores (_). In addition, slashes (/) are allowed when there is a colon before it. Everything else requires the `$"" syntax.
The book 'Q for mortals', which is available online, has a section discussing datatypes. For symbols it states:
A symbol can include arbitrary text, including text that cannot be
directly entered from the console – e.g., embedded blanks and special
characters such as back-tick. You can manufacture a symbol from any
text by casting the corresponding list of char to a symbol. (You will
need to escape special characters into the string.) See §6.1.5 for
more on casting.
q)`$"A symbol with blanks and `"
`A symbol with blanks and `
The essential takeaway here is that converting a string to a symbol is required when special characters are involved. In the examples you have given both space " " and hyphen "-" are characters that cannot be directly placed into a symbol type.