Using Reactive Extensions, how can I ignore a sequence of characters based on delimiters? - system.reactive

I have an app that uses Rx to receive data from a device on the serial port. So I have an IObservable<char> that I slice and dice into various strings. However, the device vendor added some debugging information that is enclosed in braces:
interesting stuff {debug stuff} interesting stuff
source ---a-b-c-{-d-e-b-u-g-}-d-e-f---|
| | | | | |
output ---a-b-c---------------d-e-f---|
I need to filter out (discard, ignore) the {debug stuff} from my character sequence?. Is there a simple way to do that? "When you see this character, ignore elements until you see this other character".
I looked at Until but that would terminate the sequence and I don't want that to happen...

This should do what you want, assuming no nested or unbalanced brackets.
source
.Scan((prev, c) =>
{
if (prev == '{')
return c == '}' ? c : '{';
else
return c;
})
.Where(c => c != '{' && c != '}')
It converts everything after the { into { until the }, then filters out all braces. The diagrammed output is:
source ---a-b-c-{-d-e-b-u-g-}-d-e-f---|
scan ---a-b-c-{-{-{-{-{-{-}-d-e-f---|
| | | | | |
where ---a-b-c---------------d-e-f---|

Related

operator precedence in bison

I have a flex-bison project in which I need to support a few string operators, and operator '^' means reverse a string and operator [i] means return index i in the string.
correct input and output for example :
input : ^"abc"[0] ---> correct output: "c", my output: "a"
that's because first I want to reverse it("cba") and then take the 0 index ("cba"[0] is c).
Now, I don't know how to do that precedence, so my code outputs "a" since it first takes "abc"[0]--> "a" and then reverses it-->"a". as of now I have in my bison file:
%left STR MINI
%left '^'
substring:
STR MINI { //THIS IS DONE FIRST, SUBSTRING
$$ = substringFind($1,$2,$2,temp);
}
| '^' substring { //BUT I WANT THIS (REVERSING) TO BE FIRST
$$ = reverseStrings($2,temp);
}
;
how do I change that precedence? I don't really understand the precedence rules, it was very easy with plus (+) before multiple (*) but with those operators I don't really know how to work with it.
ANY HELP...?
You need separate productions, not alternates within the same production, something like:
string
: substring
;
substring
: reverse MINI { ... }
| reverse
;
reverse
: "^" reverse { ... }
| STR
;

Avoiding nested objects using ModelBuilderSemantics in Grako

If you take a look at the grammar below, you can see a primary rule, expression, which gets parsed into more specific expression types.
expression::Expression
=
or_ex:and_expr {'||' or_ex:and_expr}+
| andex:and_expr
;
and_expr::AndExpression
=
and_ex:sub_expr {'&&' and_ex:sub_expr}+
| subex:sub_expr
;
sub_expr::SubExpression
=
{'!!'}* '!(' not_ex:expression ')'
| {'!!'}* '(' sub_ex:expression ')'
| compex:comp_expr
;
comp_expr::CompareExpression
=
comp:identifier operator:('>=' | '<=' | '==' | '!=' | '>' | '<') comp:identifier
;
identifier::str
=
?/[a-zA-Z][A-Za-z0-9_]*/?
;
The parsing of the test_input, below, works as expected, but I would prefer to label the and_expr element in the expression rule with an '#' instead of 'andex'. My hope was that the parsed output would result in only a CompareExpression object which is inside a not_ex element in an Expression object.
!(a == b)
It seems that when using the '#' label on the and_expr element, there are no attributes shown in the Expression object! Is this a bug or intentional? Must I label all elements with names and not use the '#' label when using ModelBuilderSemantics?
Another issue I've been facing is that if a later rule, such as comp_expr, did not have an associated class name, its elements would appear in a dictionary when printed, but the dot notation accessor would fail with an AttributeError, i.e. "AttributeError: 'dict' object has no attribute 'comp'". Is there any way to use the dot notation accessor even when rules do not have class names associated with them?
Some of the criteria I use:
Not every rule must have an associated Node class.
Rules with a closure {} as main expression are good for returning a list.
Rules with a a choice | as main expression are best returning whatever the successful option returns, even if this often requires factoring the option into its own rule.
Precedence is important.
Ect.
The idea is that generated parse model should be easy to use, specially with walkers, with a minimum of if-else or isinstance().
This is how I would do your example:
start
=
expression $
;
expression
=
| or_expre
| and_expre
| sub_expre
;
or_expre::OrExpression
=
operands:'||'.{and_expre}+
;
and_expr::AndExpression
=
operands:'&&'.{sub_expre}+
;
sub_expr
=
| not_expr
| comp_expre
| atomic
;
not_expre::NotExpression
=
'!!' ~ sub_expr
;
comp_expr::CompareExpression
=
lef:atomic operator:('>=' | '<=' | '==' | '!=' | '>' | '<') ~ right:atomic
;
atomic
=
| group_expre
| identifier
;
group_expr::GroupExpression
=
'(' ~ expre:expression ')'
;
identifier::str
=
/[a-zA-Z][A-Za-z0-9_]*/
;

Extracting specific data from a string with regex using Powershell

I'm returning some data like this in powershell :
1)Open;#1
2)Open;#1;#Close;#2;#pending;#6
3)Closed;#5
But I want an output like this :
1)1 Open
2)
1 Open
2 Close
6 pending
3)
5 Closed
The code:
$lookupitem = $lookupList.Items
$CMRSItems = $list.Items | where {$_['ID'] -le 5}
$CMRSItems | ForEach-Object {
$realval = $_['EventType']
Write-Host "RefNumber: " $_['RefID']
Write-Host $realval
}
Any help would be appreciated as my powershell isn't that good.
Without regular expressions, you could do something like the following:
Ignore everything up to the first ')' character
Split the string on the ';' character
foreach pair of the split string
the state is the first part (ignore potentially leading '#')
the number is the second part (ignore leading '#')
Or you could do it using the .NET System.Text.RegularExpressions.Regex class with the following regular expression:
(?:#?(?<state>[a-zA-Z]+);#(?<number>\d);?)
The Captures property on the MatchCollection returned by the Matches method would be a collection in which each item will contain two instances in the Group collection; named state and number respectively.

xText Variable/Attribute Assignment

I built a grammar in xText to recognize formal expressions of a specific format
and to use the generated object tree in Java.
This is what it looks like:
grammar eu.gemtec.device.espa.texpr.Texpr with org.eclipse.xtext.common.Terminals
generate texpr "http://www.gemtec.eu/device/espa/texpr/Texpr"
Model:
(expressions+=AbstractExpression)*
;
AbstractExpression:
MatcherExpression | Assignment;
MatcherExpression:
TerminalMatcher ({Operation.left=current} operator='or' right= MatcherExpression)?
;
TerminalMatcher returns MatcherExpression:
'(' MatcherExpression ')' | {MatcherLiteral} value=Literal
;
Literal:
CharMatcher | ExactMatcher
;
CharMatcher:
type=('text'|'number'|'symbol'|'whitespace') ('(' cardinality=Cardinality ')')?
;
/* Kardinalitäten für CharMatcher*/
Cardinality:
CardinalityMin | CardinalityMinMax | CardinalityMax| CardinalityExact
;
CardinalityMin: min=INT '->';
CardinalityMinMax: min=INT '->' max=INT;
CardinalityMax: '->' max=INT;
CardinalityExact: exact=INT;
ExactMatcher:
(ignoreCase='ignoreCase''(' expected=STRING ')') | expected=STRING
;
/* Variablenzuweisung
*
* z.B. $myVar=number
* */
Assignment:
'$' name=ID '=' expression=MatcherExpression
;
Everything works fine except for the 'cardinality' assignment.
The Expressions look like this:
text number(3) - (an arbitrary amount of letters followed by exactly 3 numbers)
symbol number(2->) - (an arbitrary amount of special characters followed by at least 2 numbers)
whitespace number(->4) - (an arbitrary amount of whitespaces followed by a maximum of 4 numbers)
number(3->6) - (at least 3 numbers but not more than 6)
When I run Eclipse with this grammar (so that my language is recognized and has code completion and so on), everything I type is shown in the "Outline"-tab as a tree-structure as it should, except for the cardinality values.
When I add a cardinality statement to a CharMatcher, the little plus appears before it, but when I click on it it just disappears.
Can anyone tell me why this does not work?
I found the solution myself, I think the problem was that the compiler could not decide which class to create at this point:
Cardinality:
CardinalityMin | CardinalityMinMax | CardinalityMax| CardinalityExact
;
CardinalityMin: min=INT '->';
CardinalityMinMax: min=INT '->' max=INT;
CardinalityMax: '->' max=INT;
CardinalityExact: exact=INT;
So I simplified the whole thing a little, it now looks like this:
Cardinality:
CardinalityMinMax | CardinalityExact
;
CardinalityMinMax: (min=INT '..' max=INT) | (min=INT '..') | ('..' max=INT);
CardinalityExact: exact=INT;
It is still not shown in the "Outline"-Tab, but I suppose that is a problem of the visualisation.
The generated classes now work as intended.

Lisp grammar in yacc

I am trying to build a Lisp grammar. Easy, right? Apparently not.
I present these inputs and receive errors...
( 1 1)
23 23 23
ui ui
This is the grammar...
%%
sexpr: atom {printf("matched sexpr\n");}
| list
;
list: '(' members ')' {printf("matched list\n");}
| '('')' {printf("matched empty list\n");}
;
members: sexpr {printf("members 1\n");}
| sexpr members {printf("members 2\n");}
;
atom: ID {printf("ID\n");}
| NUM {printf("NUM\n");}
| STR {printf("STR\n");}
;
%%
As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.
edit - this was my "top terminal" approach:
program: slist;
slist: slist sexpr | sexpr;
But it allows problems such as:
( 1 1
Edit2: The FLEX code is...
%{
#include <stdio.h>
#include "a.yacc.tab.h"
int linenumber;
extern int yylval;
%}
%%
\n { linenumber++; }
[0-9]+ { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\" { return STR; }
[a-zA-Z][a-zA-Z0-9]* { return ID; }
.
%%
An example of the over-matching...
(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr
What's the error here?
edit: The error was in the lexer.
Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code.
It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.
If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.
The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.
Add rules like
\) { return RPAREN; }
\( { return LPAREN; }
to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)
Note: I'm not sure about the syntax, could be the backslashes are wrong.
You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:
program: sexpr*
Indicating 0 or more sexpr.
Update with YACC syntax:
program : /* empty */
| program sexpr
;
Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):
program: (sexpr)*;
sexpr: list
| atom {Console.WriteLine("matched sexpr");}
;
list:
'('')' {Console.WriteLine("matched empty list");}
| '(' members ')' {Console.WriteLine("matched list");}
;
members: (sexpr)+ {Console.WriteLine("members 1");};
atom: Id {Console.WriteLine("ID");}
| Num {Console.WriteLine("NUM");}
;
Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};
This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.
Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).
However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).
It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?
I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.
You could try this grammar here.
I just tried it, my "yacc lisp grammar" works fine :
%start exprs
exprs:
| exprs expr
/// if you prefer right recursion :
/// | expr exprs
;
list:
'(' exprs ')'
;
expr:
atom
| list
;
atom:
IDENTIFIER
| CONSTANT
| NIL
| '+'
| '-'
| '*'
| '^'
| '/'
;