Lex program rules not working - lex

%{
#include <stdio.h>
int sline=0,mline=0;
%}
%%
"/*"[a-zA-Z0-9 \t\n]*"*/" { mline++; }
"//".* { sline++; }
.|\n { fprintf(yyout,"%s",yytext); }
%%
int main(int argc,char *argv[])
{
if(argc!=3)
{
printf("Invalid number of arguments!\n");
return 1;
}
yyin=fopen(argv[1],"r");
yyout=fopen(argv[2],"w");
yylex();
printf("Single line comments = %d\nMultiline comments=%d\nTotal comments = %d\n",sline,mline,sline+mline);
return 0;
}
I am trying to make a Lex program which would count the number of comment lines (single-line comments and multi-line comments separately).
Using this code, I gave a .c file and a blank text file as input and output arguments.
When I have any special characters in multi-line comments, its not working for that multi-line and mline is not incremented for the comment line.
How do I fix this problem?

Below is a nudge in the right direction. The main differences between what you did and what I have done is that I made only two regex - one for whitespace and one for ident (identifiers). What I mean by identifiers is anything that you want to comment out. This regex can obviously be expanded out to include other characters and symbols. I also just defined the three patterns that begin and end comments and associated them with tokens that we could pass to the syntax analyzer (but that's a whole new topic).
I also changed the way that you feed input to the program. I find it cleaner to redirect input to a program from a file and redirect output to another file - if you need this.
Here is an example of how you might use this program:
flex filename.l
g++ lex.yy.c -o lexer
./lexer < input.txt
You can redirect the output to another file if you need to by using:
./lexer < input.txt > output.txt
Instead of the last command above.
Note: the '.'(dot) character at the end of the pattern matching is used as a catch-all for characters, sequences of characters, symbols, etc. that do not have a match.
There are many nuances to pattern matching using regex to match comment lines. For example, this would still match even if the comment line was part of a string.
Ex. " //This is a comment in a string! "
You will need to do a little more work to get past these nuances - like I said, this is a nudge in the right direction.
You can do something similar to this to accomplish your goal:
%{
#include <stdio.h>
int sline = 0;
int mline = 0;
#define T_SLINE 0001
#define T_BEGIN_MLINE 0002
#define T_END_MLINE 0003
#define T_UNKNOWN 0004
%}
WSPACE [ \t\r]+
IDENT [a-zA-Z0-9]
%%
"//" {
printf("TOKEN: T_SLINE LEXEME: %s\n", yytext);
sline++;
return T_SLINE;
}
"/*" {
printf("TOKEN: T_BEGIN_MLINE LEXEME: %s\n", yytext);
return T_BEGIN_MLINE;
}
"*/" {
printf("TOKEN: T_END_MLINE LEXEME: %s\n", yytext);
mline++;
return T_END_MLINE;
}
{IDENT} {/*Do nothing*/}
{WSPACE} { /*Do Nothing*/}
. {
printf("TOKEN: UNKNOWN LEXEME: %s\n", yytext);
return T_UNKNOWN;
}
%%
int yywrap(void) { return 1; }
int main(void) {
while ( yylex() );
printf("Single-line comments = %d\n Multi-line comments = %d\n Total comments = %d\n", sline, mline, (sline + mline));
return 0;
}

The problem is your regex for multiline comments:
"/*"[a-zA-Z0-9 \t\n]*"*/"
This only matches multiline comments that ONLY contain letters, digits, spaces, tabs, and newlines. If the comment contains anything else it won't match. You want something like:
/"*"([^*]|"*"+[^*/])*"*"+/
This will match anything except a */ between the /* and */.

Below is the full lex code to count the number of comment line and executable line.
%{
int cc=0,cl=0,el=0,flag=0;
%}
%x cmnt
%%
^[ \t]*"//".*\n {cc++;cl++;}
.+"//".*\n {cc++;cl++;el++;}
^[ \t]*"/*" {BEGIN cmnt;}
<cmnt>\n {cl++;}
<cmnt>.\n {cl++;}
<cmnt>"*/"\n {cl++;cc++;BEGIN 0;}
<cmnt>"*/" {cl++;cc++;BEGIN 0;}
.*"/*".*"*/".+\n {cc++;cl++;}
.+"/*".*"*/".*\n {cc++;cl++;el++;}
.+"/*" {BEGIN cmnt;}
.\n {el++;}
%%
main()
{
yyin=fopen("abc.cpp","r");
yyout=fopen("abc.txt","w");
yylex();
fprintf(yyout,"Comment Count: %d \nCommented Lines: %d \nExecutable Lines: %d",cc,cl,el);
}
int yywrap()
{
return 1;
}
The program takes the input as a c++ program that is abc.cpp and appends the output in the file abc.txt

Related

JSONata: words to lowerCamelCase

I have a string consisting of words and punctuation, such as "Accept data protection terms / conditions (German)". I need to normalize that to camelcase, removing punctuation.
My closest attempt so far fails to camelcase the words, I only manage to make them into kebab-case or snake_case:
$normalizeId := function($str) <s:s> {
$str.$lowercase()
.$replace(/\s+/, '-')
.$replace(/[^-a-zA-Z0-9]+/, '')
};
Anindya's answer works for your example input, but if (German) was not capitalized, it would result in the incorrect output:
"acceptDataProtectionTermsConditionsgerman"
Link to playground
This version would work and prevent that bug:
(
$normalizeId := function($str) <s:s> {
$str
/* normalize everything to lowercase */
.$lowercase()
/* replace any "punctuations" with a - */
.$replace(/[^-a-zA-Z0-9]+/, '-')
/* Find all letters with a dash in front,
strip the dash and uppercase the letter */
.$replace(/-(.)/, function($m) { $m.groups[0].$uppercase() })
/* Clean up any leftover dashes */
.$replace("-", '')
};
$normalizeId($$)
/* OUTPUT: "acceptDataProtectionTermsConditionsGerman" */
)
Link to playground
You should target the letters which has a space in front, and capitalize them by using this regex /\s(.)/.
Here is my snippet: (Edited
(
$upper := function($a) {
$a.groups[0].$uppercase()
};
$normalizeId := function($str) <s:s> {
$str.$lowercase()
.$replace(/[^-a-zA-Z0-9]+/, '-')
.$replace(/-(.)/, $upper)
.$replace(/-/, '')
};
$normalizeId("Accept data protection terms / conditions (German)");
)
/* OUTPUT: "acceptDataProtectionTermsConditionsGerman" */
Edit: Thanks #vitorbal. The "$lower" function on regex replacement earlier was not necessary, and did not handle the scenario you mentioned. Thanks for pointing that out. I have updated my snippet as well as added a link to the playground below.
Link to playground

I'm getting an error from gcc on my lex code: "#endif without #if"

This is my first attempt at writing a compiler using flex and bison. I wrote what look to me like legal lex and yacc code, but when I run it through the compiler I get an error message. This seems to come from gcc, so it's something wrong with the code generated by flex.
ghlex.l:20:2: error: #endif without #if
.map return DOTMAP;
Can anybody tell me what's wrong with the .map pattern and/or action?
(In case it's not obvious or I miscoded it, that's supposed to match the token ".map")
Here's my lex/flex source code:
%{
#include <stdio.h>
#include <string.h>
#include "gh.tab.h"
%}
DIGIT [0-9]
STARTCHAR [_a-zA-Z]
WORDCHAR {DIGIT}|{STARTCHAR}
FILECHAR {WORDCHAR}|[-.+##$%()]
FILECHAR1 {FILECHAR}|[/" ]
FILECHAR2 {FILECHAR}|[/' ]
/* special "start states" for matching
%s DESC FNAME1 FNAME2
%%
{DIGIT}+ yylval.number = atoi(yytext); return INT;
{STARTCHAR}{WORDCHAR}* yylval.string = strdup(yytext); return WORD;
FILECHAR$ yylval.string = strdup(yytext); return FILEPART;
'FILECHAR1+' yylval.string = strdup(yytext); return QUOTE;
"FILECHAR2+" yylval.string = strdup(yytext); return QUOTE;
\.map return DOTMAP;
\.m return DOTM;
\.r return DOTR;
\.c return DOTC;
\.d return DOTD;
\.t return DOTT;
\.o return DOTO;
\.u return DOTU;
\.v return DOTV;
, return COMMA;
\+ return PLUS;
- return MINUS;
\/ return SLASH;
; return SEMI;
\[ return LBRACKET;
\] return RBRACKET;
\%\$ return PCTDOL;
\%\/ return PCTSLASH;
\%t return PCTT;
\%# return PCTAT;
\n /* ignore newlines */
[ \t] /* ignore whitespace */
\/\/ /* ignore c++-style comments */
<DESC>.* yylval.string = strdup(yytext); return STRING;
%%
The problem is your unterminated comment on line 12:
/* special "start states" for matching
since this line begins with whitespace, it is copied verbatim into the lex.yy.c file, where it screws things up, commenting out several following lines generated by flex, including an #ifdef and the #line directives that would make the compiler output better line number information.
If you compile with -Wall (which you ALWAYS should), you'll get a warning: "/*" within comment before the error, which at least hints that the problem is related to comments (though this message too has an incorrect line number.)

search a string only inside a function definetion

Is there any way to search a string only inside a function definition.
I mean to say suppose there is a c program file a.c , in which there is definition of several functions are present , but i want output of search only when that string present inside specific function ( lets say do_something()) definition, is there any way to search string like that, from command prompt?
for example , for following code:
#include <stdio.h>
void f(int n,
int j,
int k)
{
printf("name is is pankaj ");
printf("name is is kumar ");
printf("name is is mayank ");
}
int main()
{
printf("name is is pankaj ");
return 0;
}
for above program, I want only one occurrence of pankaj which is present in function f(), I don't want pankaj present in main function as output of search.
Please ignore any semantic or syntax error in program , my query is only for search of a string in program.
Of course, try this:
$0 ~ fun {
count = 1
while (! ($0 ~ /{/))
getline
getline
}
count > 0 {
if ($0 ~ /{/)
count++
if ($0 ~ /}/)
count--
if ($0 ~ query)
print FILENAME ": l" FNR ". " $0
}
And invoke the script like this:
awk -v query="pankaj" -v fun="void f[(]" -f script.awk inputfile.java
Where query is the string to search and fun the regex for the function name.
This script counts { and } to see when we leave the function and should print the line if a match is found.
Edit: you may want to extend the regex for counting brackets, perhaps an extra check to see if they aren't placed in comments is required (although you'd never do that).

Getting a string which ends with a string "lngt" in Lex

I am writing a lex script to tokenize C ASTs. I want to write a regex in lex to get a string that ends with a specific string "lngt" but does not include "lngt" in the final string returned by lex. So basically the string form would be (.*lngt), but I haven't been able to figure out how to do this in lex. Any advice/direction would be really helpful
Example:I have this line in my file
#65 string_cst type: #71 strg: Reverse order of the given number is : %d lngt: 42
I want to retrieve string after strg: and before lngt: ie "Reverse order of the given number is : %d" (NOTE: this string could be composed of any characters possible)
Thanks.
This question needs an answer is similar to the one I wrote here. It can be done by writing your own state machine in lex. It could also be done by writing some C code as shown in the cited answer or in the other texts cited below.
If we assume that the string you want is always between "strg" and "lngt" then this is the same as any other non-symmetric string delimiters.
%x STRG LETTERL LN LNG LNGT
ws [ \t\r\n]+
%%
<INITIAL>"strg: " {
BEGIN(STRG);
}
<STRG>[^l]*l {
yymore();
BEGIN(LETTERL);
}
<LETTERL>n {
yymore();
BEGIN(LN);
}
<LN>g {
yymore();
BEGIN(LNG);
}
<LNG>t {
yymore();
BEGIN(LNGT);
}
<LNGT>":" {
printf("String is '%s'\n", yytext);
BEGIN(INITIAL);
}
<LETTERL>[^n] {
BEGIN(STRG);
yymore();
}
<LN>[^g] {
BEGIN(STRG);
yymore();
}
<LNG>[^t] {
BEGIN(STRG);
yymore();
}
<LNGT>[^:] {
BEGIN(STRG);
yymore();
}
<INITIAL>{ws} /* skip */ ;
<INITIAL>. /* skip anything not in the string */
%%
To quote my other answer:
There are suggested solutions on several university compiler courses. The one that explains it well is here (at Manchester). Which cites a couple of good books which also cover the problems:
J.Levine, T.Mason & D.Brown: Lex and Yacc (2nd ed.)
M.E.Lesk & E.Schmidt: Lex - A Lexical Analyzer Generator
The two techniques described are to use Start Conditions to explicity specify the state machine, or manual input to read characters directly.

ncurses and stdin blocking

I have stdin in a select() set and I want to take a string from stdin whenever the user types it and hits Enter.
But select is triggering stdin as ready to read before Enter is hit, and, in rare cases, before anything is typed at all. This hangs my program on getstr() until I hit Enter.
I tried setting nocbreak() and it's perfect really except that nothing gets echoed to the screen so I can't see what I'm typing. And setting echo() doesn't change that.
I also tried using timeout(0), but the results of that was even crazier and didn't work.
What you need to do is tho check if a character is available with the getch() function. If you use it in no-delay mode the method will not block. Then you need to eat up the characters until you encounter a '\n', appending each char to the resulting string as you go.
Alternatively - and the method I use - is to use the GNU readline library. It has support for non-blocking behavior, but documentation about that section is not so excellent.
Included here is a small example that you can use. It has a select loop, and uses the GNU readline library:
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#include <stdlib.h>
#include <stdbool.h>
int quit = false;
void rl_cb(char* line)
{
if (NULL==line) {
quit = true;
return;
}
if(strlen(line) > 0) add_history(line);
printf("You typed:\n%s\n", line);
free(line);
}
int main()
{
struct timeval to;
const char *prompt = "# ";
rl_callback_handler_install(prompt, (rl_vcpfunc_t*) &rl_cb);
to.tv_sec = 0;
to.tv_usec = 10000;
while(1){
if (quit) break;
select(1, NULL, NULL, NULL, &to);
rl_callback_read_char();
};
rl_callback_handler_remove();
return 0;
}
Compile with:
gcc -Wall rl.c -lreadline