Lexer rule (regex) for TeX equation - lex

In TeX, equations are defined in between $...$. How can I define the lexer rule for lex, for the instance of any number of any characters between two dollar signs?
I tried:
equation \$[^\$]*\$
without a success.

You can try using start conditions if you don't want the dollar signs to be included as part of the equation:
%x EQN
%%
\$ { BEGIN(EQN); } /* switch to EQN start condition upon seeing $ */
<EQN>{
\$ { BEGIN(INITIAL); } /* return to initial state upon seeing another $ */
[^\$]* { printf(yytext); } /* match everything that isn't a $ */
}
Alternately instead of using BEGIN(STATE) you can use yy_push_state() and yy_pop_state() if you have other states defined in your lexer.

Related

Lex priority label opcode priority

I am using lex / yacc to write an assembler
I have some opcodes for example
ORA [Oo][Rr][Aa]
AND [Aa][Nn][Dd]
EOR [Ee][Oo][Rr]
and rules
{ORA} { yylval.iValue = ora; return OPCODE; }
{AND} { yylval.iValue = and; return OPCODE; }
{EOR} { yylval.iValue = eor; return OPCODE; }
I also have rules for labels
[A-Za-z_][A-Za-z0-9_]*: { yylval.sIndex = AddSymbol(yytext); return SYMBOL; }
[A-Za-z_][A-Za-z0-9_]* { yylval.sIndex = AddSymbol(yytext); return SYMBOL; }
labels in the syntax can be
ldx #$FF
loop:
sta $5535,X
dex
bne loop
The problem is it will match the label instead of the opcodes.
The first label rule works because of the ':' but the second label rule takes presidence over the opcode rule.
Is there a way for me to get the second case to the label to work(the bne loop)?
Thanks in advance.
I am fairly new to lex.
Make sure that the opcode rules come before the catch-all identifier rule. If two rules both apply to the longest matched token, (f)lex generated scanners choose the first one in the source.
Definitions do not alter the priority of rules. What is important is the order of the rules themselves.
By the way, you might want to consider making : a token by itself, rather than merging both instances of loop (one a definition and the other one a use) into the same token type.
Including the colon in the token, as you do, prevents the user from putting whitespace between the label name and the :, but that might be your intent. And in some grammars, a two-token label definition causes the grammar to be LR(2) instead of LR(1).
In any case, you will almost certainly find it simpler if you mark the definition as a definition by giving it a different token type.

Correct use of tilde operator for input arguments

Function:
My MATLAB function has one output and several input arguments, most of which are optional, i.e.:
output=MyFunction(arg1,arg2,opt1,opt2,...,optN)
What I want to do:
I'd like to give only arg1, arg2 and the last optional input argument optN to the function. I used the tilde operator as follows:
output=MyFunction(str1,str2,~,~,...,true)
Undesired result:
That gives the following error message:
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
The error points to the comma after the first tilde, but I don't know what to make of it to be honest.
Problem identification:
I use MATLAB 2013b, which supports the tilde operator.
According to MATLAB's documentation the above function call should work:
You can ignore any number of function inputs, in any position in the argument list. Separate consecutive tildes with a comma...
I guess there are a few workarounds, such as using '' or [] as inputs, but I'd really like to understand how to correctly use '~' because actually leaving inputs out allows me to use exist() when checking the input arguments of a function.
If you need any further info from me, please let me know.
Thank you very much!
The tilde is only for function declaration. Matlab's mlint recommends to replace unused arguments by ~. The result is a function declared like this function output = MyFunction(a, b, ~, c). This is a very bad practice.
Since you have a function where the parameters are optional, you must call the function with empty arguments output=MyFunction(str1,str2,[],[],...,true).
A better way to do it is to declare the function with the varargin argument and prepare your function for the different inputs:
function output = MyFunction(varargin)
if nargin == 1
% Do something for 1 input
elseif nargin == 2
% Do something for 3 inputs
elseif nargin == 3
% Do something for 3 inputs
else
error('incorrect number of input arguments')
end
It is even possible to declare your function as follows:
function output = MyFunction(arg1, arg2, varargin)
The declaration above will tell Matlab that you are expecting at least two parameters.
See the documentation of nargin here.
... and the documentation of varargin here
To have variable number of inputs, use varargin. Use it together with nargin.
Example:
function varlist2(X,Y,varargin)
fprintf('Total number of inputs = %d\n',nargin);
nVarargs = length(varargin);
fprintf('Inputs in varargin(%d):\n',nVarargs)
for k = 1:nVarargs
fprintf(' %d\n', varargin{k})
end

Algorithm to evaluate value of Boolean expression

I had programming interview which consisted of 3 interviewers, 45 min each.
While first two interviewers gave me 2-3 short coding questions (i.e reverse linked list, implement rand(7) using rand(5) etc ) third interviewer used whole timeslot for single question:
You are given string representing correctly formed and parenthesized
boolean expression consisting of characters T, F, &, |, !, (, ) an
spaces. T stands for True, F for False, & for logical AND, | for
logical OR, ! for negate. & has greater priority than |. Any of these
chars is followed by a space in input string. I was to evaluate value
of expression and print it (output should be T or F). Example: Input:
! ( T | F & F ) Output: F
I tried to implement variation of Shunting Yard algorithm to solve the problem (to turn input in postfix form, and then to evaluate postfix expression), but failed to code it properly in given timeframe, so I ended up explaining in pseudocode and words what I wanted.
My recruiter said that first two interviewers gave me "HIRE", while third interviewer gave me "NO HIRE", and since the final decision is "logical AND", he thanked me for my time.
My questions:
Do you think that this question is appropriate to code on whiteboard in approx. 40 mins? To me it seems to much code for such a short timeslot and dimensions of whiteboard.
Is there shorter approach than to use Shunting yard algorithm for this problem?
Well, once you have some experience with parsers postfix algorithm is quite simple.
1. From left to right evaluate for each char:
if its operand, push on the stack.
if its operator, pop A, then pop B then push B operand A onto the stack. Last item on the stack will be the result. If there's none or more than one means you're doing it wrong (assuming the postfix notation is valid).
Infix to postfix is quite simple as well. That being said I don't think it's an appropriate task for 40 minutes if You don't know the algorithms. Here is a boolean postfix evaluation method I wrote at some stage (uses Lambda as well):
public static boolean evaluateBool(String s)
{
Stack<Object> stack = new Stack<>();
StringBuilder expression =new StringBuilder(s);
expression.chars().forEach(ch->
{
if(ch=='0') stack.push(false);
else if(ch=='1') stack.push(true);
else if(ch=='A'||ch=='R'||ch=='X')
{
boolean op1 = (boolean) stack.pop();
boolean op2 = (boolean) stack.pop();
switch(ch)
{
case 'A' : stack.push(op2&&op1); break;
case 'R' : stack.push(op2||op1); break;
case 'X' : stack.push(op2^op1); break;
}//endSwitch
}else
if(ch=='N')
{
boolean op1 = (boolean) stack.pop();
stack.push(!op1);
}//endIF
});
return (boolean) stack.pop();
}
In your case to make it working (with that snippet) you would first have to parse the expression and replace special characters like "!","|","^" etc with something plain like letters or just use integer char value in your if cases.

Perl - Pattern contains new line

I am trying to grab all functions from a C file in a perl script.
Pattern example :
function return type
function name (function parameters)
{
So far I have: m/^(.*)\((.*)\)/
But this grabs functions inside as well, such as if statements, so I was hoping to match for the { as well since that would eliminate all internal functions but m/^(.*)\((.*)\)/\n\{/
doesn't work.
How do I match for the \n{ i.e the { in the next line, so that I can catch
add(int a, int b)
{
... but avoid, say
if(a = b)
Have a look at C::Scan over at CPAN
There are no asterisks you want to match in the C source. Therefore, remove the backslashes before asterisks in the pattern.
The following might be closer to what you want:
m/^(.*?\(.*?\))\s*\n{/m

Best way to create generic/method consistency for sort.data.frame?

I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame to take decreasing as an argument as in sort.data.frame(form,decreasing,dat) and discard decreasing, then it loses its simplicity because you'll always have to specify dat= and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing), then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)
Use the arrange function in plyr. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm is highly recommended if you're writing this sort of function.
There are a few problems there. sort.data.frame needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x is your dat, formula is your form, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form represents.)
Of course, you don't need to specify decreasing in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame to be an S3 generic, call it something else and then you are free to have whatever arguments you want.
I agree with #Gavin that x must come first. I'd put the decreasing parameter after the formula though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula argument would be used much more and therefore should be the second argument. I also strongly agree with #Gavin that it should be called formula, and not form.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B