notepad++ how to use regex to find lines that have two capital letters in a row? - notepad

how to use regular expression to find strings containing two capital letters in a row?
^([A-Z\s]+)$

^.*[A-Z]{2}.*$ matches as follows
^ Beginning of the line
.* Any char for any number of times
[A-Z]{2} Two consecutive capital letters
.* Any char for any number of times
$ End of line
Find a live example here:
https://regex101.com/r/m2hPbh/1

([A-Z][A-Z][a-z0-9]*) would find every word that contains 2 capital letters in a row

Related

How can I check if number is out specified hexa range?

How can I write regular expression that recognize any expression from this form: "\xdd" while dd represents hexadecimal number out of the range 00-7F ?
Regular expressions do not express numerical ranges, but sequences of characters in a character set. You have to express those ranges one character at a time.
So the hex digits are [0-9A-F] which describes the set of characters for one digit using the two ranges [0-9] and [A-F] (you'd also have to decide if lower case letters are permitted). For two digits you'd have to notice that the first digit is of a shorter range using only [0-7]. The combined result would be:
[0-7][0-9A-Fa-f]
Putting the other symbols in place we could get:
\\x[0-7][0-9A-Fa-f]
(Assuming \ is a meta-character that needs escaping).

Matlab read one digit at a time from text file

I have a file that contains byte values 0 or 1 that are formatted without any whitespace between, like 1010111101010010010101. I want to make a [1, 0, 1, ...] vector out of those, reading one digit at a time. How can I do that? I tried using fscanf(fileId,'%c') but I get ASCII codes instead of actual values. '%d' on the other hand reads the entire file as one number.
I also tried writing to file:
fprintf(file1,'%d ',matrix); //notice the space after `%d`
and reading
fscanf(file2,'%d');
but I get a Nx1 matrix and I want to keep it as 1xN.
I could transpose it to be horizontal, but I still need to add space between digits, and I don't want to do that if possible.
You can convert easily from ascii char code to integer format as follows:
text = fscanf(fileId,'%c') - '0' ;
Note that you will also pick up end-of-line characters this way if there are any.
If you only have 0/1 in your file, using fileread will accomplish the same thing but also catches EOL characters:
text = fileread('test.txt');
text = text' - '0';
You can also read the entire file with textread:
text = textread('test.txt','%s');
text = char(text) - '0' ;
Now lines are returned in a cell array with one row per line. char then converts the cell array to a regular char array. This will not capture EOL characters but char will append blank spaces (ascii code 32) if the lines are not all equal in length.
Finally, you can also read line by line by looping and applying fgetl at each iteration until the function returns a -1.
while ~isnumeric(c)
c = fscanf(fileId,'%c')
c - '0';
end
This avoids reading EOL characters and appending blank space but you need to handle catenating the data.

What are the valid formats for numbers in MATLAB?

What are the valid formats are for numbers in MATLAB? The following seem to be valid:
x=0;
x=0.;
x=0.0;
x=0e0;
x=0E0;
x=0000.00; % Trailing and leading zeros seem to be irrelevant
Are there other valid general number specifications? I can't find this in the documentation.
I believe this is the regex of floating-point number formats, valid in MATLAB:
^[-+]*([0-9]+|[0-9]*\.[0-9]+|[0-9]+\.[0-9]*)([eEdD][+-]?[0-9]+)?$
Compiled from here, and slightly modified for MATLAB:
added 'd' exponent character (as is common in FORTRAN, MATLAB's ancestor)
added uppercase exponent characters
added extra case in the required order before and after the decimal symbol
I'm pretty sure the locale can mess this up, e.g., the decimal separator . might be set to , as is common here in Europe. Oh well.
The regex in words:
string start, followed by
zero or more consecutive sign symbols, followed by
non-zero length string of consecutive integers, OR
possibly zero-length string of consecutive integers, followed by a dot, followed by non-zero length string of consecutive integers, OR
non-zero length string of consecutive integers, followed by a dot, followed by a possibly zero-length string of consecutive integers
optionally followed by the exponent part:
one of e, E, d or D.
zero or one sign symbols, followed by
non-zero length string of consecutive integers
followed by string terminator
Note that this is for non-complex floating point values. For complex values, you'd have to
use the regex once for the real, once for the imaginary part
append [ij]{1} to the imaginary part (only lower case)
take care of spacing (\s*) and a [+-]{1} in between the two parts
take care of the fact that the imaginary part may appear alone, but the real part may not appear with a trailing [+-]{1}, but no imaginary part.

matlab text input delimeters

I am trying to read a text file into matlab where the text file has been designed so that the columns are right-aligned so that my columns look like,
3 6 10.5
13 12 9.5
104 5 200000
This has given me two situations that I'm not sure how to handle in matlab, the first is the whitespace before the first data and the other is the variable number of whitespace characters in each row which seems to be beyond my knowledge of textscan. I'm tempted to use sed to reformat the text file but I'm sure this is trivial to someone. Is there a way that I can an arbitrary amount of whitespace as the delimeter (and have the line start with the delimeter)?
Use regexp on every line.
M = regexp(str, '\w+(\d+)','tokens')
Use the load command:
l = load('C:\myFile.txt')
It will work as long as you have only numbers, and same number of columns.

Detecting Capital Letter Strings in a Larger String

Is there a nice and clean way to find strings of capital letters of size 2-4 in length within a larger string in matlab. For example, lets say I have a string...
stringy = 'I imagine I could FLY';
Is there a nice way to just extract the FLY portion of the string? Currently I'm using the upper() function to identify all the characters in the string that are upper case like this...
for count = 1:length(stringy)
if upper(stringy(count))==stringy(count)
isupper(count)=1;
else
isupper(count)=0;
end
end
And then, I'm just going through the binary vector and identifying when
there there are 2-4 1's in the row.
This method is working... but I'm wondering if there is a cleaner way
to be doing this... thanks!!!
You can use regular expressions for this. The regular expression [A-Z]{2,4} will search for 2-4 capital letters in a string.
The corresponding matlab function is called regexp.
regexp(string,pattern) returns subindexes into string of all the places it matches pattern.
For your pattern I have two suggestions:
\<[A-Z]{2,4}\>. This searches for whole words that consist of 2-4 capital letters (so it doesn't grab TOUCH below):
stringy = 'I imagine I could FLY and TOUCH THE SKY';
regexp(stringy,'\<[A-Z]{2,4}\>') % returns 19, 33, 37 ('FLY','THE','SKY')
(Edit: Matlab uses \< and \> for word boundaries not the standard \b).
If you have strings where case can be mixed within a word and you want to extract those, try (?<![A-Z])[A-Z]{2,4}(?![A-Z]) (which means "2-4 capital letters that aren't surrounded by capital letters):
stringy = 'I image I could FLYandTouchTHEsky';
% returns 17 and 28 ('FLY', 'THE')
regexp(stringy,'(?<![A-Z])[A-Z]{2,4}(?![A-Z])')
% note '\<[A-Z]{2,4}\>' wouldn't match anything here since it looks for
% *whole words* that consist of 2-4 capital letters only.
% 'FLYandTouchTHEsky' doesn't satisfy this.
Pick the regex based on what behaviour you want to occur.