I am trying to have an inbetween step for deleting words backwards, which should stop at a capital letter (in the case of camelCase).
For this I thought to use the following to obtain the position of the first capital letter backwards:
(search-backward-regexp "[:upper:]")
If you'd run this when point is after that last parenthesis, it will go here:
(search-backward-regexp "[:upper | :]"), that is, after the r.
How so?
(search-backward-regexp "[[:upper:]]")
[:upper:] is not the "upper" call yet,
but simply a character class which matches a single character which has to be one of ":" or "u" or "p" or "e" or "r".
Only second "[]" makes it search the class.
Andreas answer is correct. However, based on what you are trying to achieve, I would suggest you to take a look to subword-mode (it comes bundled with emacs, at least for modern emacsen)
I realize now that it has to do with the variable case-fold-search.
When set to t, it will ignore case in searches.
Related
I want to search for a string which starts with a single small case letter followed by a capital letter. Like aString or aSTring.
I tried with the regular expression ^[a-z][A-Z]* from eclipse file search with regular expression as ticked, but that isn't getting the desired result.
Use this \b[a-z][A-Z][a-zA-Z]* and enable Regular Expressions in the Find-Window.
The \b is a word boundary, which works better than the ^ you used, because it matches the beginning of a "word" (somewhat blurry term in this context) and not the whole string (your code).
Also, the pattern you used would only find strings that start with a lower case letter and continue with only upper case letters.
#mumpitz's answer works except that (as #Ab_sin rightly pointed out) he missed mentioning that the Case sensitive checkbox in the find window is also needed.
So that:
the regular expression pattern is: \b[a-z][A-Z]* (\b stands for word boundary)
And Regular Expression and Case sensitive checkboxes need to be checked
How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");
This simple example shows the issue I've run into, but I don't understand why...
I'm testing for the location of the first character that is either a lower or upper case letter, a single dash, or a period in a string parameter passed to me.
These two pattern matches appear to check the same thing, and yet run this code yourself and it will print a 0 then a 3:
PRINT PATINDEX ( '%[a-z,A-Z,-,.]%', '16-82')
PRINT PATINDEX ( '%[-,a-z,A-Z,.]%', '16-82')
I don't understand why it works only if the dash character is the first one we check for.
Is this a bug? Or working as designed and I missed something... I'm using SQL Server 2016, but I don't think that matters.
A dash within a character group may play either of the two roles:
It may denote the dash itself, like it does in the expression [-abc]
It may denote the "everything inbetween" operator, like it does in the expression [a-z].
In your particular example, the character group [a-z,A-Z,-,.] denotes the following:
Everything from a to z
Comma ,
Everything from A to Z
Everything from , to , (i.e. just the comma again).
Dot .
In fact, you probably wanted to write [-a-zA-Z.]
I used pg_trgrm to check string matches and I am pretty happy with the results. But it is not pefrectly the way I want it. I want that searches like "poduto" finds "produtos" (the r was missing). And Also that "sofáa" finds "sofa". I am using posgresql 9.6.
It does find "vermelho" when I type "vermelo" (h is missing). And it does find "sofa" when I type "sof". It seems that only some letters in middle can be left out and I always can miss a final letter. I want to be able to miss any letter in the middle of the word. And also be able to commit "two mistakes" in the case of sofáa and sofá (I used an accent and used one additional "a").
The solution is to lower pg_trgm.similarity_threshold (or pg_trgm.word_similarity_threshold if you are using <% or %>).
Then words with lower similarity will also be found.
I desperatly tried to find out what symbol '\nquit' is... and I couldnt find any reference in the web.
What I tried to find is a complete list of all of those characters (\n, \p, \0, ...) but I couldn't find any.
cheers usche
Wikipedia has a list of C language escapes here.
As noted in my comment, I believe this represents the newline (linefeed) character \n followed by the word quit (which would be forced by the newline to the beginning of the next line of output). But in that case the string should be "double"-quoted rather than 'single'-quoted.