Flex: easy way to see if a line has any content? - lex

Among many rules in my Altair BASIC Flex file is this one:
[\n]
{
++num_lines;
++num_statements;
return '\n';
}
++statements; is not actually correct - in theory the line might be empty (due to bad data in the .BAS file for instance) and thus not have any statements on that line. So is there any way to know if there's any tokens in front of the \n since the last \n? I know you can do this with the BEGIN() et all, but that seems like a LOT of work for a simple problem! Is there an easier way?

It's easy to match a blank line, although I'm not sure that's really what you're looking for.
The first pattern matches a line which only contains space and tab characters (adjust as necessary to match other whitespace). The second pattern matches the same whitespace when it's not at the beginning of a line. (Actually, it would match the whitespace anywhere, but at the beginning of a line, the first pattern wins.)
^[ \t]*\n ;
[ \t]*\n { ++num_statements; return '\n'; }
Instead of counting lines yourself, I suggest you use %option yylineno so flex will count them for you. (In yylineno.)

Related

Prevent newline in (.md) files

How do I prevent newlines in the readme.md files (GitHub)?
We can always write the whole thing in one line to prevent it. But is there an exclusive tag/option to prevent the same, especially for tags that create newlines (headings) like span in html?
Doesn't a space followed by a backslash do the concatenation you want? It does for me. That way I can break a paragraph into one sentence per line.

Matching specific characters in FLEX/LEX

Say for example I have strings like
The Undiscovered Country
Return of the Jedi
The Motion Picture
The Phantom Menace
Attack of the Clones
And I only want it to return the ones starting with "The" how do I match the first 3 characters specifically in a line?
You can match text starting only at the beginning of a line by putting a ^ at the beginning of the pattern. (The beginning of a line is either the beginning of the input, or the character immediately following a newline (\n) character.)
So the following will categorize lines into whole lines starting with The and all other whole lines:
^"The ".* { /* yytext is a line starting with The, not including the newline. */ }
.+ { /* yytext is a line not starting with The. */ }
\n ; /* Ignore newline characters */
Note: The pattern only matches lines which start with The (including the space). That's not the same as lines starting The (which would include These errors), and nor is it the same as lines starting with the word The (which might include something like The--only--way forward). In general, getting a lexical specification right means thinking carefully about all the corner cases, and deciding for each one what the desired outcome is.

Adding a comment character in most simple possible way

I want to search a file for a specific string and then place a comment at the beginning of that string. But I need an answer that avoids regex, global changes, and all the other fancy stuff.
I wrote this line:
sed -i.bak '/PermitRootLogin no/# PermitRootLogin no/' ./sshd_config
but I get an error:
sed: -e expression #1, char 21: comments don't accept any addresses
I assume the issue is that I need to escape the # character, but I'm not finding any resources on how to do that, or even mentioning it. I've tried various combinations of putting ^ or \ or \^ in front of the # but I'm jut not getting it right.
Please note I am intentionally repeating the text to be replaced. I would like the most simple possible solution to this question: how to replace "XYX" with "# XYZ" in the most obvious possible way.
As indicated in the comments by #mlt , you could try adding an s at the beginning your sed command. Straight from his comment:
s/PermitRootLogin....
I see that you said you're intentionally repeating the test to be replaced. If by that you mean, you want it to be the same, maybe consider grouping your matched text. I understand you may have meant that you just want it hand typed. Anyway, here is how to match the grouped text and add the comment character:
s/(PermitRootLogin)/# \1/
The parens indicated that the matched text should be consider a group, the \1 indicates that you want to put that matched group there.
I hope this was helpful. Happy coding! Leave a comment if you have any questions.

Why does Github Flavored Markup only add newlines for lines that start with [\w\<]?

In our site (which is aimed at highly non-technical people), we let them use Markdown when sending emails. That way, they get nice things like bold, italic, etc. Being non-technical, however, they would never get past the “add two lines to make newlines actually work” quirk.
For that reason mainly, we are using a variant of Github Flavored Markdown.
We mainly borrowed this part:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/^[\w\<][^\n]*\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
This works well, but in some cases it doesn’t add the new-lines, and I guess the key to that is the “in very clear cases” part of that comment.
If I interpret it correctly, this is only adding newlines for lines that start with either a word character or a ‘<’.
Does anyone know why that is? Particularly, why ‘<’?
What would be the harm in just adding the two spaces to essentially anything (lines starting with spaces, hyphens, anything)?
'<' character is used at the beginning of a line to quote messages. I guess that is the reason.
The other answer to this question is quite wrong. This has nothing to do with quoting, and the character for markdown quoting is >.
^[\w\<][^\n]*\n+
Let's break the above regex into parts:
^ = anchor start of string.
[\w\<] matches a word character or the start of word boundary. \< is not a literal, but rather a GNU word boundary. See here (do a ctrl+f for \<).
[^\n]* matches any length of non-newline characters
\n matches a new line.
+ is, I believe, a possessive quantifier.
I believe, but am not 100% sure, that this simply is used to set x to a line of text. Then, the heavy work is done with the next line:
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
This says "if x satisfies the regex \n{2} (that is, has two line breaks), leave x as is. Otherwise, strip x and append a newline character.

Regex to match lines between two expressions

I am sorting some data and want to 'cut' out some rubbish between two bits of useful information.
Eg:
Useful one
rubbish
rubbish //rubbish here is covered by [.*], but the number of lines can be any number 1 or above
rubbish
useful two
I have successfully matched the useful parts of my information, I just need to know how to match the rubbish stuff. The pattern is as follows: useful, new line (no content), new line (no content), rubbish, new line (no content), new line (no content), useful.
The important part of this is that the rubbish section can vary in number of lines, but always has at least one line. Im not sure if i described this very well, any help is appreciated.
The best way I know of doing this is to do this
(exp1)(.+?)(exp2)
and replace or use in code the two groups
$1 $3
where $x is the group place holder
comment me for more specific syntax
your regexp (rubbish\s+)(rubbish\s+)(rubbish)
Try a pattern like (useful\n\n\n(.*)\n\n\nuseful\n)+, capturing rubbish into parenthesis. Improving and applying this pattern depends on your needs and your code.