lexer rule, "anything but" for a multi-character string - lex

In flex, how can I define a lexer rule such as anything but $$. For a single character, it is defined as:
[^\$]
For $$, is it equal to [^\$\$]?

You have the single character correct, now just double it: [^\$][^\$]. This will accept any two characters that are not '$'. If you want to allow a single '$' then you will need to provide that as alternatives: [^\$][^\$]|\$[^\$]|[^\$]\$.
This will end up as a DFA eventually anyway, so there are no efficiency concerns. If you need to do this more than once, give the pattern a name so you only need to have it once.

Related

Not able to understand a command in perl

I need help to understand what below command is doing exactly
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
and $abc{hier} contains a path "/home/test1/test2/test3"
Can someone please let me know what the above command is doing exactly. Thanks
s/PATTERN/REPLACEMENT/ is Perl's substitution operator. It searches a string for text that matches the regex PATTERN and replaces it with REPLACEMENT.
By default, the substitution operator works on $_. To tell it to work on a different variable, you use the binding operator - =~.
The default delimiter used by the substitution operator is a slash (/) but you can change that to any other character. This is useful if your PATTERN or your REPLACEMENT contains a slash. In this case, the programmer has used # as the delimiter.
To recap:
$abc{hier} =~ s#PATTERN#REPLACEMENT#;
means "look for text in $abc{hier} that matches PATTERN and replace it with REPLACEMENT.
The substitution operator also has various options that change its behaviour. They are added by putting letters after the final delimiter. In this case we have a g. That means "make the substitution global" - or match and change all occurrences of PATTERN.
In your case, the REPLACEMENT string is empty (we have two # characters next to each other). So we're replacing the PATTERN with nothing - effectively deleting whatever matches PATTERN.
So now we have:
$abc{hier} =~ s#PATTERN*##g;
And we know it means, "in the variable $abc{hier}, look for any string that matches PATTERN and replace it with nothing".
The last thing to look at is the PATTERN (or regular expression - "regex"). You can get the full definition of regexes in perldoc perlre. But to explain what we're using here:
/tools : is the fixed string "/tools"
.* : is zero or more of any character
/dfII : is the fixed string "/dfII"
/? : is an optional slash character
.* : is (again) zero or more of any character
So, basically, we're removing bits of a file path from a value that's stored in a hash.
This =~ means "Do a regex operation on that variable."
(Actually, as ikegami correctly reminds me, it is not necessarily only regex operations, because it could also be a transliteration.)
The operation in question is s#something#else#, which means replace the "something" with something "else".
The g at the end means "Do it for all occurences of something."
Since the "else" is empty, the replacement has the effect of deleting.
The "something" is a definition according to regex syntax, roughly it means "Starting with '/tools' and later containing '/dfII', followed pretty much by anything until the end."
Note, the regex mentions at the end /?.*. In detail, this would mean "A slash (/) , or maybe not (?), and then absolutely anything (.) any number of times including 0 times (*). Strictly speaking it is not necessary to define "slash or not", if it is followed by "anything any often", because "anything" includes as slash, and anyoften would include 0 or one time; whether it is followed by more "anything" or not. I.e. the /? could be omitted, without changing the behaviour.
(Thanks ikeagami for confirming.)
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
The above commands use regular expression to strip/remove trailing /tools.*/dfII and
/tools.*/dfII/.* from value of hier member of %abc hash.
It is pretty basic perl except non standard regular expression limiters (# instead of standard /). It allows to avoid escaping / inside the regular expression (s/\/tools.*\/dfII\/?.*//g).
My personal preferred style-guide would make it s{/tools.*/dfII/?.*}{}g .

Get prev directory path in a variable in linux

I am trying to get the parent directory of a given directory in a variable in linux script but I am unable to get it.
MN_CURR=/home/sshekhar/Desktop
MN_PREV=`$MN_CURR/..`
echo " Displayng $MN_PREV"
I am using CentOS. Can anyone please help?
Following on from my comment, when using POSIX shell, while the parameter expansions are limited compared to a more advanced shell such as bash, ksh, or zsh, POSIX shell does provide expansions to handle string length and substring removal.
In your case you want to remove the last component of the path (the suffix beginning with '/') leaving the parent directory. For that you can use:
MN_PREV=${MN_CURR%/*}
(which will remove all characters from the right -- up to and including the last '/')
The reference documentation for the POSIX shell parameter expansions can be found at POSIX Programmers Guide - 2.6.2 Parameter Expansion. The expansions concerning string length and substring removal are:
${#parameter}
String Length. The length in characters of the value of parameter shall be substituted. If parameter is '*' or '#', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail.
${parameter%[word]}
Remove Smallest Suffix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in parameter, with the smallest portion of the suffix matched by the pattern deleted. If present, word shall not begin with an unquoted '%'.
${parameter%%[word]}
Remove Largest Suffix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in parameter, with the largest portion of the suffix matched by the pattern deleted.
${parameter#[word]}
Remove Smallest Prefix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in parameter, with the smallest portion of the prefix matched by the pattern deleted. If present, word shall not begin with an unquoted '#'.
${parameter##[word]}
Remove Largest Prefix Pattern. The word shall be expanded to produce a pattern. The parameter expansion shall then result in parameter, with the largest portion of the prefix matched by the pattern deleted.

perl increment number in string with evaluation modifier

I'm trying to increment:
Text_1_string(0)
to
Text_1_string(1)
and so on.
Note that I only want to increment the number in the parenthesis.
I've used:
name =~ s/\(([0-9]+)\)/$1 + 1/e;
but it turns out as:
Text_1_string1
and I don't understand why. The group captured is the number, it shouldn't replace the parenthesis.
It replaces the whole pattern that it matched, not only what is also captured. So you do need to put back the parens
$name =~ s/\(([0-9]+)\)/'('.($1 + 1).')'/e;
Since the replacement part is evaluated as code it need be normal Perl code, thus the quotes and concatenation, and parenthesis for precedence.
To add, there are patterns that need not be put back in the replacement part: lookahead and lookbehind assertions. Like common anchors, these are zero width assertions, so they do not consume what they match -- you only "look"
$name =~ s/(?<=\() ([0-9]+) (?=\))/$1 + 1/xe;
The lookbehind can't be of variable length (like \w+); it takes only a fixed string pattern.
The (?<=...) asserts that the (fixed length) pattern in parenthesis (which do not capture!) must precede the number while (?=...) asserts that the pattern in its parens must follow, for the whole pattern to match.
Often very useful is the lookbehind-type construct \K, which makes the engine keep in the string what it had matched up to that point (instead of "consuming" it); so it "drops" previous matches, much like the (?<=...) form
$name =~ s/\(\K ([0-9]+) (?=\))/$1 + 1/xe;
This is also more efficient. While it is also termed a "lookbehind" in documentation, there are in fact distinct differences in behavior. See this post and comments. Thanks to ikegami for a comment.
All these are positive lookarounds; there are also negative ones, asserting that given patterns must not be there for the whole thing to match.
A bit of an overkill in this case but a true gift in some other cases.

RegexKitLite not matching square brackets

I'm trying to match usernames from a file. It's kind of like this:
username=asd123 password123
and so on.
I'm using the regular expression:
username=(.*) password
To get the username. But it doesn't match if the username would be say and[ers] or similar. It won't match the brackets. Any solution for this?
I would probably use the regular expression:
username=([a-zA-Z0-9\[\]]+) password
Or something similar. Notes regarding this:
Escaping the brackets ensures you get a literal bracket.
The a-zA-Z0-9 spans match alphanumeric characters (as per your example, which was alphanumerc). So this would match any alphanumeric character or brackets.
The + modifier ensures that you match at least one character. The * (Kleene star) will allow zero repetitions, meaning you would accept an empty string as a valid username.
I don't know if RegexKitLite allows POSIX classes. If it does, you could use [:alnum:] in place of a-zA-Z0-9. The one I gave above should work if it doesn't, though.
Alternatively, I would disallow brackets in usernames. They're not really needed, IMO.
Your Regular Expression is correct. Instead, you may try this one:
username=([][[:alpha:]]*) password
[][[:alpha:]] means ] and [ and [:alpha:] are contained within the brackets.

Funky 'x' usage in perl

My usual 'x' usage was :
print("#" x 78, "\n");
Which concatenates 78 times the string "#". But recently I came across this code:
while (<>) { print if m{^a}x }
Which prints every line of input starting with an 'a'. I understand the regexp matching part (m{^a}), but I really don't see what that 'x' is doing here.
Any explanation would be appreciated.
It's a modifier for the regex. The x modifier tells perl to ignore whitespace and comments inside the regex.
In your example code it does not make a difference because there are no whitespace or comments in the regex.
The "x" in your first case, is a repetition operator, which takes the string as the left argument and the number of times to repeat as the right argument. Perl6 can replicate lists using the "xx" repetition operator.
Your second example uses the regular expression m{^a}x. While you may use many different types of delimiters, neophytes may like to use the familiar notation, which uses a forward slash: m/^a/x
The "x" in a regex is called a modifier or a flag and is but one of many optional flags that may be used. It is used to ignore whitespace in the regex pattern, but it also allows the use of normal comments inside. Because regex patterns can get really long and confusing, using whitespace and comments are very helpful.
Your example is very short (all it says is if the first letter of the line starts with "a"), so you probably wouldn't need whitespace or comments, but you could if you wanted to.
Example:
m/^a # first letter is an 'a'
# <-- you can put more regex on this line because whitespace is ignored
# <-- and more here if you want
/x
In this use case 'x' is a regex modifier which "Extends your pattern's legibility by permitting whitespace and comments." according to the perl documentation. However it seems redundant here