Is a character in a variable is a letter or number - numbers

I would like to find out how you can tell if a character in a variable is a letter or number
for example:
if I used the code: ABC123
How would I find out if a variable followed that pattern so if a inputted code would be DNM567 it would print "correct"
But if a code was DNM56T it would print "incorrect".
Many thanks

You can use regular expressions, or linearly scan the character array to ensure that no letter comes after a number.
More information about the question would be helpful.

you can use regular expression:
if(Regex.IsMatch(myString, "^[A-Za-z]{3}[0-9]{3}$"))
{
// you got the right pattern...
}
edit: this is C# but regular expression can be found in almost any OOP language out there.

Related

PATINDEX incorrect result when looking for dash character "-"

This simple example shows the issue I've run into, but I don't understand why...
I'm testing for the location of the first character that is either a lower or upper case letter, a single dash, or a period in a string parameter passed to me.
These two pattern matches appear to check the same thing, and yet run this code yourself and it will print a 0 then a 3:
PRINT PATINDEX ( '%[a-z,A-Z,-,.]%', '16-82')
PRINT PATINDEX ( '%[-,a-z,A-Z,.]%', '16-82')
I don't understand why it works only if the dash character is the first one we check for.
Is this a bug? Or working as designed and I missed something... I'm using SQL Server 2016, but I don't think that matters.
A dash within a character group may play either of the two roles:
It may denote the dash itself, like it does in the expression [-abc]
It may denote the "everything inbetween" operator, like it does in the expression [a-z].
In your particular example, the character group [a-z,A-Z,-,.] denotes the following:
Everything from a to z
Comma ,
Everything from A to Z
Everything from , to , (i.e. just the comma again).
Dot .
In fact, you probably wanted to write [-a-zA-Z.]

Sed: use found string as a variable

I'm looking for a way to replace all instances of a form:
model->variable
with
models[variable][index]
where variable can be pretty much any combination of letters and numbers, probably defined like [0-9a-Z]{4,12}.
There are hundreds of such variables in the text. I need to know exact form of found string "variable" to use it in replacement. Is there a way to "remember" the string and use it later? Or any other method / software which could help in such case?
Thanks in advance.
If you could convert "variable" to uppercase by the way, it would be awesome.
You can use things in the pattern to replace with if you enclose it in \(...\). You then use \1 to insert the thing that was captured by the first such bracket.
A naïve solution to your problem would be this:
sed 's/model->\(.*\)/models[\1][index]/' file.txt

Can actions in Lex access individual regex groups?

Can actions in Lex access individual regex groups?
(NOTE: I'm guessing not, since the group characters - parentheses - are according to the documentation used to change precedence. But if so, do you recommend an alternative C/C++ scanner generator that can do this? I'm not really hot on writing my own lexical analyzer.)
Example:
Let's say I have this input: foo [tagName attribute="value"] bar and I want to extract the tag using Lex/Flex. I could certainly write this rule:
\[[a-z]+[[:space:]]+[a-z]+=\"[a-z]+\"\] printf("matched %s", yytext);
But let's say I would want to access certain parts of the string, e.g. the attribute but without having to parse yytext again (as the string has already been scanned it doesn't really make sense to scan part of it again). So something like this would be preferable (regex groups):
\[[a-z]+[[:space:]]+[a-z]+=\"([a-z]+)\"\] printf("matched attribute %s", $1);
You can separate it to start conditions. Something like this:
%x VALUEPARSE ENDSTATE
%%
char string_buf[100];
<INITIAL>\[[a-z]+[[:space:]]+[a-z]+=\" {BEGIN(VALUEPARSE);}
<VALUEPARSE>([a-z]+) (strncpy(string_buf, yytext, yyleng);BEGIN(ENDSTATE);} //getting value text
<ENDSTATE>\"\] {BEGIN(INITIAL);}
%%
About an alternative C/C++ scanner generator - I use QT class QRegularExpression for same things, it can very easy get regex group after match.
Certainly at least some forms of them do.
But the default lex/flex downloadable from sourceforge.org do not seem to list it in their documentation, and this example leaves the full string in yytext.
From IBM's LEX documentation for AIX:
(Expression)
Matches the expression in the parentheses.
The () (parentheses) operator is used for grouping and causes the expression within parentheses to be read into the yytext array. A group in parentheses can be used in place of any single character in any other pattern.
Example: (ab|cd+)?(ef)* matches such strings as abefef, efefef, cdef, or cddd; but not abc, abcd, or abcdef.

Regular Expression for number.(space), objective-c

I have an NSArray of lines (objective-c iphone), and I'm trying to find the line which starts with a number, followed by a dot and a space, but can have any number of spaces (including none) before it, and have any text following it eg:
1. random text
2. text random
3.
what regular expression would I use to get this? (I'm trying to learn it, and I needed the above expression anyway, so I thought I'd use it as an example)
With C#:
#"^ *[0-9]+\. "
It doesn't check for the presence of something after the ., so this is legal:
1.(space)
If you delete the # and escape the \ it should work with other languages (it is pretty "down-to-earth" as RegExpes go)
I may suggest (Perl-compatible regexp):
^\s*\d+\.\s
At the beginning of a line:
Any number (0-n) of spaces
One or more digits
A dot
A space
Something like
^\s*\d+\.
But it depends on the language.
/^\s*[0-9]+\.\s+/
would be my guess providing you don't have any space before the number

How should I handle digits from different sets of UNICODE digits in the same string?

I am writing a function that transliterates UNICODE digits into ASCII digits, and I am a bit stumped on what to do if the string contains digits from different sets of UNICODE digits. So for example, if I have the string "\x{2463}\x{24F6}" ("④⓶"). Should my function
return 42?
croak that the string contains mixed sets?
carp that the string contains mixed sets and return 42?
give the user an additional argument to specify one of the three above behaviours?
do something else?
Your current function appears to do #1.
I suggest that you should also write another function to do #4, but only when the requirement appears, and not before .
I'm sure Joel wrote about "premature implementation" in a blog article sometime recently, but I can't find it.
I'm not sure I see a problem.
You support numeric conversion from a range of scripts, which is to say, you are aware of the Unicode codepoints for their numeric characters.
If you find an unknown codepoint in your input data, it is an error.
It is up to you what you do in the event of an error; you may insert a space or underscore, or you may abort conversion. What you would do will depend on the environment in which your function executes; it is not something we can tell you.
My initial thought was #4; strictly based on the fact that I like options. However, I changed my mind, when I viewed your function.
The purpose of the function seems to be, simply, to get the resulting digits 0..9. Users may find it useful to send in mixed sets (a feature :) . I'll use it.
If you ever have to handle input in bases greater than 10, you may end up having to treat many variants on the first 6 letters of the Latin alphabet ('ABCDEF') as digits in all their forms.