“Undefined definition” error in lex program - lex

I am writing a lex program. The objective of this problem is that I enter a string Exemple Name#PhoneNumber also the first letter in the name should be uppercase letter
letterMin ([a-z])
letterMaj ([A-Z])
Letter ({letterMaj}({letterMin})*)
Number ([0-9])
Chaine ({letter}#({Number})*)
%%
{Chaine} printf("enter your chaine");
.* printf("Lexical Error");
%%
int yywrap(){return 1;}
main ()
{
yylex ();
}

When it comes to symbol names, case is important. You want something like this:
letterMin ([a-z])
letterMaj ([A-Z])
letter ({letterMaj}({letterMin})*)
Number ([0-9])
Chaine ({letter}#({Number})*)

Related

How to check if a letter is upper/lower cased?- Flutter

The question is pretty self-explainable. I want to check if certain letter is uppercase or another letter is lowercase. Could you give me any examples of how to do that in Flutter/Dart?
you can use the .toUpperCase() in a boolean statement:
bool isUppercased(String str){
return str == str.toUpperCase();
}
If you want to use regular expressions, here is how you could do:
bool isUpperCase(String letter) {
assert(s.length == 1);
final regExp = RegExp('[A-Z]');
return regExp.hasMatch(letter);
}
The one solution that is coming to my mind is to check its ASCII code.
The ASCII code of a-z starts at 97 and ends at 122.
Similarly, in the case of Uppercase letters A-Z it starts from 65 and ends at 90.
Keeping this in mind you can use the method string.codeUnitAt(index) which will return you the ASCII code and later you can check its range and find its an Uppercase or lowercase.
Have a look into this example
main() {
String ch = 'Rose';
print(' ASCII value of ${ch[0]} is ${ch.codeUnitAt(0)}');
print(' ASCII value of ${ch[1]} is ${ch.codeUnitAt(1)}');
}
The output will be:
ASCII value of R is 82
ASCII value of o is 111
Now you can compare with the range using if statement and find out.

How can I obtain only word without All Punctuation Marks when I read text file?

The text file abc.txt is an arbitrary article that has been scraped from the web. For example, it is as follows:
His name is "Donald" and he likes burger. On December 11, he married.
I want to extract only words in lower case and numbers except for all kinds of periods and quotes in the above article. In the case of the above example:
{his, name, is, Donald, and, he, likes, burger, on, December, 11, he, married}
My code is as follows:
filename = 'abc.txt';
fileID = fopen(filename,'r');
C = textscan(fileID,'%s','delimiter',{',','.',':',';','"','''});
fclose(fileID);
Cstr = C{:};
Cstr = Cstr(~cellfun('isempty',Cstr));
Is there any simple code to extract only alphabet words and numbers except all symbols?
Two steps are necessary as you want to convert certain words to lowercase.
regexprep converts words, which are either at the start of the string or follow a full stop and whitespace, to lower case.
In the regexprep function, we use the following pattern:
(?<=^|\. )([A-Z])
to indicate that:
(?<=^|\. ) We want to assert that before the word of interest either the start of string (^), or (|) a full stop (.) followed by whitespace are found. This type of construct is called a lookbehind.
([A-Z]) This part of the expression matches and captures (stores the match) a upper case letter (A-Z).
The ${lower($0)} component in the regex is called a dynamic expression, and replaces the contents of the captured group (([A-Z])) to lower case. This syntax is specific to the MATLAB language.
You can check the behaviour of the above expression here.
Once the lower case conversions have occurred, regexp finds all occurrences of one or more digits, lower case and upper case letters.
The pattern [a-zA-Z0-9]+ matches lower case letters, upper case letters and digits.
You can check the behavior of this regex here.
text = fileread('abc.txt')
data = {regexp(regexprep(text,'(?<=^|\. )([A-Z])','${lower($0)}'),'[a-zA-Z0-9]+','match')'}
>>data{1}
13×1 cell array
{'his' }
{'name' }
{'is' }
{'Donald' }
{'and' }
{'he' }
{'likes' }
{'burger' }
{'on' }
{'December'}
{'11' }
{'he' }
{'married' }

calculating upper case letters ,lowercase letters, and other characters

Write a program that accepts a sentence as console input and calculate the number of upper case letters , lower case letters and other characters.
Suppose the following input is supplied to the program:
Hello World;!#
Since this question sounds like a programming assignment, I've written this is a more-wordy manner. This is standard Python 3, not Jes.
#! /usr/bin/env python3
import sys
upper_case_chars = 0
lower_case_chars = 0
total_chars = 0
found_eof = False
# Read character after character from stdin, processing it in turn
# Stop if an error is encountered, or End-Of-File happens.
while (not found_eof):
try:
letter = str(sys.stdin.read(1))
except:
# handle any I/O error somewhat cleanly
break
if (letter != ''):
total_chars += 1
if (letter >= 'A' and letter <= 'Z'):
upper_case_chars += 1
elif (letter >= 'a' and letter <= 'z'):
lower_case_chars += 1
else:
found_eof = True
# write the results to the console
print("Upper-case Letters: %3u" % (upper_case_chars))
print("Lower-case Letters: %3u" % (lower_case_chars))
print("Other Letters: %3u" % (total_chars - (upper_case_chars + lower_case_chars)))
Note that you should modify the code to handle end-of-line characters yourself. Currently they're counted as "other". I've also left out handling of binary input, probably the str() will fail.

lex program to count the Number of Words

I made the following lex program to count the Number of words in a Textfile. A 'Word' for me is any string that starts with an alphabet and is followed by 0 or more occurrence of alphabets/numbers/_ .
%{
int words;
%}
%%
[a-zA-Z][a-zA-Z0-9_]* {words++; printf("%s %d\n",yytext,words);}
. ;
%%
int main(int argc, char* argv[])
{
if(argc == 2)
{
yyin = fopen(argv[1], "r");
yylex();
printf("No. of Words : %d\n",words);
fclose(yyin);
}
else
printf("Invalid No. of Arguments\n");
return 0;
}
The Problem is that for the following Textfile, I am getting the No. of Words : 13. I tried printing the yytext and it shows that it is taking 'manav' from '9manav' as a word even though it doesnot match my definition of a word.
I also tried including [0-9][a-zA-Z0-9_]* ; within my code but still shows the same output. I want to know why is this happening and possible ways to avoid it.
Textfile : -
the quick brown fox jumps right over the lazy dog cout for
9manav
-99-7-5 32 69 99 +1
First, the manav is perfectly matching your definition of word. The 9 in front of it is matched by the . rule. Remember, that white space is not special in lex.
You had the right idea by adding another rule [0-9][a-zA-Z0-9_]* ; but since the ruleset is ambiguous (there are several ways to match the input) order of the rules matters. It's a while I worked with lex but I think putting the new rule before the word rule should work.

Unicode characters having asymmetric upper/lower case. Why?

Why do the following three characters have not symmetric toLower, toUpper results
/**
* Written in the Scala programming language, typed into the Scala REPL.
* Results commented accordingly.
*/
/* Unicode Character 'LATIN CAPITAL LETTER SHARP S' (U+1E9E) */
'\u1e9e'.toHexString == "1e9e" // true
'\u1e9e'.toLower.toHexString == "df" // "df" == "df"
'\u1e9e'.toHexString == '\u1e9e'.toLower.toUpper.toHexString // "1e9e" != "df"
/* Unicode Character 'KELVIN SIGN' (U+212A) */
'\u212a'.toHexString == "212a" // "212a" == "212a"
'\u212a'.toLower.toHexString == "6b" // "6b" == "6b"
'\u212a'.toHexString == '\u212a'.toLower.toUpper.toHexString // "212a" != "4b"
/* Unicode Character 'LATIN CAPITAL LETTER I WITH DOT ABOVE' (U+0130) */
'\u0130'.toHexString == "130" // "130" == "130"
'\u0130'.toLower.toHexString == "69" // "69" == "69"
'\u0130'.toHexString == '\u0130'.toLower.toUpper.toHexString // "130" != "49"
For the first one, there is this explanation:
In the German language, the Sharp S ("ß" or U+00df) is a lowercase letter, and it capitalizes to the letters "SS".
In other words, U+1E9E lower-cases to U+00DF, but the upper-case of U+00DF is not U+1E9E.
For the second one, U+212A (KELVIN SIGN) lower-cases to U+0068 (LATIN SMALL LETTER K). The upper-case of U+0068 is U+004B (LATIN CAPITAL LETTER K). This one seems to make sense to me.
For the third case, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) is a Turkish/Azerbaijani character that lower-cases to U+0069 (LATIN SMALL LETTER I). I would imagine that if you were somehow in a Turkish/Azerbaijani locale you'd get the proper upper-case version of U+0069, but that might not necessarily be universal.
Characters need not necessarily have symmetric upper- and lower-case transformations.
Edit: To respond to PhiLho's comment below, the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):
Three letterlike symbols have been given canonical equivalence to regular letters: U+2126
OHM SIGN, U+212A KELVIN SIGN, and U+212B ANGSTROM SIGN. In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex #15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents.
In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B.
May I refer to another post about Unicode and upper and lower case..
It is a common mistake to think that signs for a language have to be available in upper and lower case!
Unicode-correct title case in Java