regex matching white spaces and non-characters in perl - perl

I'm looking for pattern matching for the following.
While space at start followed by characters and then a decimal number like 3.2 and then followed by symbols like $ and #.
For ex: " bash-3.2#"
My code:
while(#wait = $t->waitfor('/^[\s]bash\-3\.2[.] $/i'))
How do i do this.
Thanks,
Sharath

While space at start
^\s
followed by characters
\w+
and then a decimal number like 3.2
-?\d+\.\d+
and then followed by symbols like $ and #.
[\$\#]
So, something like this:
/^\s\w+-?\d+\.\d+[\$\#]/
I assumed that the characters are typical word characters and that the number could be negative

Related

Regex expression for detecting 2 consecutive words when first word starts with #

I wanted to know the regex expression that detects names starting with #. For eg, in the sentence "Hi #Steve Rogers, how are you?", I want to extract out #Steve Rogers using regex. I tried using Pattern.compile("#\\s*(\\w+)").matcher(text), but only "#Steve" get detected. What else should I use.??
Thanks
Try (#[\w\s]+)
It will only capture word and spaces after the #
See example at https://regex101.com/r/4Pv9bu/1
If you don't want to match an # sign followed by a space only like # and if there can be more than a single word after it:
(?<!\S)#\w+(?:\h+\w+)?
Explanation
(?<!\S) Assert a whitespace boundary to the left
# Match literally
\w+ Match 1+ word characters
(?:\s+\w+)? Optionally match 1+ horizontal whitespace chars and 1+ word chars
Regex demo
In Java
String regex = "(?<!\\S)#\\w+(?:\\h+\\w+)?";

PATINDEX does not recognize dot and comma

I have a column that should contain phone numbers but it contains whatever the user wanted. I need to create an update to remove all the characters after an invalid character.
To do this I am using a regex as PATINDEX('%[^0-9+-/()" "]%', [MobilNr]) and it seemed to work until I had some numbers as +1235, 36446 and to my surprise the result is 0 instead of 6. Also if the number contains . it returns 0.
Does PATINDEX ignores dot(".") and comma(",")? Are there other characters that PATINDEX will ignore?
It's not that PATINDEX ignores the comma and the dot, it's your pattern that created this problem.
With PATINDEX, the hyphen char (-) has a special meaning - it's in fact an operator that denotes an inclusive range - like 0-9 denotes all digits between 0 and 9 - so when you do +-/ it means all the chars between + and / (inclusive, of course). The comma and dot chars are within this range, that's why you get this result.
Fixing the pattern is easy: either use | as a logical or, or simply move the hyphen to the end of the pattern:
SELECT PATINDEX('%[^0-9/()" "+-]%', '+1235, 36446') -- Result: 6

Using sed to replace a number located between two other numbers

I need to replace a numeric value, that occurs in a specific line of a series of config files in a pattern like this:
string number_1 number_to_replace number_2
I want to obtain something like this:
string number_1 number_replaced number_2
The difficulties I encountered are:
number_1 or number_2 can be equal to number_to_replace, so a simple replacement is not possible.
number_1 and number_2 vary between config files so I don't know them in advance.
The closest attempt I got until now is:
echo "field 4 4 4" | sed 's/\s4\s/3/'
Which ouputs:
field34 4
This is close, given that I want to replace the intermediate number I added another "\s" to try to use the known fact that the line starts with a character.
echo "field 4 4 4" | sed 's/\s\s4\s/3/'
Which gives:
field 4 4 4
So, nothing is replaced this time. How can I proceed? A somewhat detailed explanation would be ideal, because my knowledge of replacing expressions that involve patterns in nearly zero.
Thanks.
You can do something like below, which matches your exact sequence of digits as in the example. You could replace 3 with any digit of your choice.
sed 's/\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)/\1 3 \3/'
Notice that I've used the POSIX bracket expression to match the whitespace character which should be supported in any variant of sed you are using. Note that \s is supported in only the GNU variants.
The literal meaning of the regex definition is to match a single digit followed by a space, then a digit and space and another digit. The captured groups are stored from \1. Since your intention is to remove the 2nd digit, you replace that with the word of your choice.
If the extra escapes causes it unreadable, use the -E flag for extended regex support. I've used the default BRE version

Using sed to eliminate all lines that do not match the desired form

I have a single column csv that looks something like this:
KFIG
KUNV
K~LK
K7RT
3VGT
Some of the datapoints are garbled in transmission. I need to keep only the entries that begin with a capital letter, then the other three digits could be a capital letter OR a number. For example, in the list above I would have to delete K~LK and 3VGT.
I know that to delete all but capital letters I can write
sed -n '/[A-Z]\{4,\}/p'
I just want to adjust this to where the last three digits could be capital letters or numbers. Any help would be appreciated.
Just use:
sed -n '/[A-Z][A-Z0-9]\{3,\}/p'
However, if these identifiers are really all that there is in the file, I would propose the following command (it will assure that the whole line is matched, so it will reject for example identifiers more than 4 characters long):
sed -n '/^[A-Z][A-Z0-9]\{3\}$/p'
^ means "match zero-length string at the beginning of line";
\{3\} means "match exactly 3 occurences of the previous atom", the previous atom being [A-Z0-9];
$ means "match zero-length string at the end of line".

Perl pattern matching for a specific line

I'm new to Perl and Regex. I need to read a line in an Arabic text file with IBM864 encoding by using a regular expression specific to that line in file. The line structure is as follows:
16 whitespace character, 4 arabic characters, 36 whitespace characters, 3 digits, 2 whitespaces, \n escape character.
please advise.
Thank you.
Well on the face of it what you want is a regex like this
/ ^ \s{16} \p{Arabic}{4} \s{36} \d{3} \s{2} $ /x