operator precedence in bison - substring

I have a flex-bison project in which I need to support a few string operators, and operator '^' means reverse a string and operator [i] means return index i in the string.
correct input and output for example :
input : ^"abc"[0] ---> correct output: "c", my output: "a"
that's because first I want to reverse it("cba") and then take the 0 index ("cba"[0] is c).
Now, I don't know how to do that precedence, so my code outputs "a" since it first takes "abc"[0]--> "a" and then reverses it-->"a". as of now I have in my bison file:
%left STR MINI
%left '^'
substring:
STR MINI { //THIS IS DONE FIRST, SUBSTRING
$$ = substringFind($1,$2,$2,temp);
}
| '^' substring { //BUT I WANT THIS (REVERSING) TO BE FIRST
$$ = reverseStrings($2,temp);
}
;
how do I change that precedence? I don't really understand the precedence rules, it was very easy with plus (+) before multiple (*) but with those operators I don't really know how to work with it.
ANY HELP...?

You need separate productions, not alternates within the same production, something like:
string
: substring
;
substring
: reverse MINI { ... }
| reverse
;
reverse
: "^" reverse { ... }
| STR
;

Related

Array not recognized by powershell parser when other operators are involved

Assigning an array looks like this:
PS> $x = "a", "b"
PS> $x
a
b
Now, i wanted to add a 'root string' ("r") to any element so I did this (actually i used a variable, but for the sakeness of simplicity let's just use a string here):
PS> $x = "r" + "a" , "r" + "b"
PS> $x
ra rb
Looking at the output, I didn't get the array that I expected, but a single string with a "space" (I checked: it's a 32 ascii char, so a space, not a tab or another character).
That is: the comma seems to be interpreted as a string join operator, which I couldn't find any reference to.
Even worst, I get the feeling of not understanding how the parser works here. I had a look at about_Parsing; what I found seems not to apply to this case.
Commas (,) introduce lists passed as arrays, except when the command
to be called is a native application, in which case they are
interpreted as part of the expandable string. Initial, consecutive or
trailing commas are not supported.
The first obvious fix that I came up with is the following:
PS> $x = ("r" + "a") , ("r" + "b")
PS> $x
ra
rb
Maybe there are others, and I am expecially intrested in the ones that reveal how the parser actually works. What I would like to fix the most is my knowledge of the parsing rules.
To flesh out the helpful comments on the answer:
tl;dr
Due to operator precedence, your command is parsed as "r" + ("a" , "r") + "b", causing array "a", "r" to be implicitly stringified to verbatim a r, resulting in two string concatenation operations yielding a single string with verbatim content ra rb.
Using (...) is indeed the correct way to override operator precedence.
"r" + "a" , "r" + "b"
is an expression involving operators.
Expressions are parsed in expression mode, which contrasts with argument mode; the latter applies to commands, i.e. named units of functionality that are called with shell-typical syntax (whitespace-separated arguments, quotes around simple strings optional). Arguments (parameter values) in argument mode are parsed differently from operands in expression mode, as explained in the conceptual about_Parsing help topic. Your quote about , relates to argument mode, not expression mode.
The conceptual about_Operator_Precedence help topic describes the relative precedence among operators, from which you can glean that ,, the array constructor operator has higher precedence than the + operator
Therefore, your expression is parsed as follows (using (...), the grouping operator, to make the implicit rules explicit):
"r" + ("a" , "r") + "b"
+ is polymorphic in PowerShell, and with a [string] instance as the LHS the RHS is coerced to a string too.
Therefore, array "a" , "r" is stringified, which uses PowerShell's custom array stringification, namely joining the (potentially stringified) array elements with a space.[1]
That is, the array stringifies to a string with verbatim content a r.
As an aside: The same stringification is applied in the context of string interpolation via expandable (double-quoted) strings ("..."); that is, "$("a", "r")" also yields verbatim a r
Therefore, the above is equivalent to:
"r" + "a r" + "b"
which yields verbatim ra rb.
(...) is indeed the appropriate way to ensure the desired precedence:
("r" + "a"), ("r" + "b") # -> array 'ra', 'rb'
[1] Space is the default separator character. Technically, you can override it via the $OFS preference variable, though that is rarely used in practice.
Another way to do it. The type of the first term controls what type of operation the plus performs. The first term here is an empty array. If you want the plus to do both kinds of operations, there's no getting around extra parentheses to change the operator precedence.
#() + 'ra' + 'rb'
ra
rb
Or more commonly:
'ra','rb' + 'rc'
ra
rb
rc

Parsing Infix Mathematical Expressions in Swift Using Regular Expressions

I would like to convert a string that is formatted as an infix mathematical to an array of tokens, using regular expressions. I'm very new to regular expressions, so forgive me if the answer to this question turns out to be too trivial
For example:
"31+2--3*43.8/1%(1*2)" -> ["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "*", "2", ")"]
I've already implemented a method that achieves this task, however, it consists of many lines of code and a few nested loops. I figured that when I define more operators/functions that may even consist of multiple characters, such as log or cos, it would be easier to edit a regex string rather than adding many more lines of code to my working function. Are regular expressions the right job for this, and if so, where am I going wrong? Or am I better off adding to my working parser?
I've already referred to the following SO posts:
How to split a string, but also keep the delimiters?
This one was very helpful, but I don't believe I'm using 'lookahead' correctly.
Validate mathematical expressions using regular expression?
The solution to the question above doesn't convert the string into an array of tokens. Rather, it checks to see if the given string is a valid mathematical expression.
My code is as follows:
func convertToInfixTokens(expression: String) -> [String]?
{
do
{
let pattern = "^(((?=[+-/*]))(-)?\\d+(\\.\\d+)?)*"
let regex = try NSRegularExpression(pattern: pattern)
let results = regex.matches(in: expression, range: NSRange(expression.startIndex..., in: expression))
return results.map
{
String(expression[Range($0.range, in: expression)!])
}
}
catch
{
return nil
}
}
When I do pass a valid infix expression to this function, it returns nil. Where am I going wrong with my regex string?
NOTE: I haven't even gotten to the point of trying to parse parentheses as individual tokens. I'm still figuring out why it won't work on this expression:
"-99+44+2+-3/3.2-6"
Any feedback is appreciated, thanks!
Your pattern does not work because it only matches text at the start of the string (see ^ anchor), then the (?=[+-/*]) positive lookahead requires the first char to be an operator from the specified set but the only operator that you consume is an optional -. So, when * tries to match the enclosed pattern sequence the second time with -99+44+2+-3/3.2-6, it sees +44 and -?\d fails to match it (as it does not know how to match + with -?).
Here is how your regex matches the string:
You may tokenize the expression using
let pattern = "(?<!\\d)-?\\d+(?:\\.\\d+)?|[-+*/%()]"
See the regex demo
Details
(?<!\d) - there should be no digit immediately to the left of the current position
-? - an optional -
\d+ - 1 or more digits
(?:\.\d+)? - an optional sequence of . and 1+ digits
| - or
\D - any char but a digit.
Output using your function:
Optional(["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "1", "*", "2", ")"])

Is it possible to match any character that is not ']' in PATINDEX?

I need to find the index of the first character that is not ]. Normally to match any character except X, you use the pattern [^X]. The problem is that [^]] simply closes the first bracket too early. The first part, [^], will match any character.
In the documentation for the LIKE operator, if you scroll down to the section "Using Wildcard Characters As Literals" it shows a table of methods to indicated literal characters like [ and ] inside a pattern. It makes no mention of using [ or ] inside double brackets. If the pattern is being used with the LIKE operator, you would use the ESCAPE clause. LIKE doesn't return an index and PATINDEX doesn't seem to have a parameter for an escape clause.
Is there no way to do this?
(This may seem arbitrary. To put some context around it, I need to match ] immediately followed by a character that is not ] in order to locate the end of a quoted identifier. ]] is the only character escape inside a quoted identifier.)
This isn't possible. The Connect item PATINDEX Missing ESCAPE Clause is closed as won't fix.
I'd probably use CLR and regular expressions.
A simple implementation might be
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlInt32 PatIndexCLR(SqlString pattern, SqlString expression)
{
if (pattern.IsNull || expression.IsNull)
return new SqlInt32();
Match match = Regex.Match(expression.ToString(), pattern.ToString());
if (match.Success)
{
return new SqlInt32(match.Index + 1);
}
else
{
return new SqlInt32(0);
}
}
}
With example usage
SELECT [dbo].[PatIndexCLR] ( N'[^]]', N']]]]]]]]ABC[DEF');
If that is not an option a possible flaky workaround might be to substitute a character unlikely to be in the data without this special significance in the grammar.
WITH T(Value) AS
(
SELECT ']]]]]]]]ABC[DEF'
)
SELECT PATINDEX('%[^' + char(7) + ']%', REPLACE(Value,']', char(7)))
FROM T
(Returns 9)

Matlab: how to convert character array or string into a formatted output OR parse a string

Could someone please tell me how to convert character array into a formatted output using Matlab?
I am expecting data like this:
CHAR (1 x 29) : 0.050822999 3.141592979 ; (1)
OR
CELL (1 x 1) or string: '0.050822999 3.141592979 ; (1)'
I am looking for output like this:
d1 = 0.050822999; %double
d2 = 3.141592979; %double
index = 1; % integer
I tried transposing and then using str2num(Str'); but, it's returning me 0x 0 double.
Any help would be appreciated.
Regards,
DK
you can use regexp to parse the string
c = { '0.050822999 3.141592979 ; (1)' };
p = regexp( c{1}, '^(\d+\.\d+)\s(\d+\.\d+)\s*;\s*\((\d+)\)$', 'tokens', 'once' ); %//parse the input string
numbers = str2mat(p); %// convert extracted strings to numerical values
Example result
ans =
0.050822999
3.141592979
1
Explaining the regexp pattern:
^ - pattern starts at the beginning of the input string
(\d+\.\d+) - parentheses ('()') enclosing this sub-pattern indicates it as a single token
\d+ matches one or more digits, then expecting \. a dot (notice the \, since . alone in regexp acts as a wildcard) and after the dot \d+ one or more digits are expected.
This token should correspond to the first number, e.g., 0.050822999
\s expecting a single space
(\d+\.\d+) - again, expecting another decimal fraction as the second token.
\s* - expecting white space (zero or more).
; - capture the ; in the expression, but not as a token.
\s+ - expecting white space (zero or more).
\( - expecting an open parenthesis, note the \ since parentheses in regexp are used to denote tokens.
(\d+) - expecting one or more digits as the third token, only integer numbers are expected here. no decimal point.
\) - expecting a closing parenthesis.
$ - pattern should reach the end of the input string.
You can use something like this (if I understood you correctly)
function str_dump(var)
info = whos;
disp([info.class ' ' mat2str(info.size) ' : ' var]);
end
This just shows information about the string. If you want to parse it and convert to another Matlab's structure, you have to explain it more carefully.
%// Input
a = [0.050822999 3.141592979];
n = 1;
%// Output
str = [num2str(a,'%0.9f ') ' ; (' num2str(n) ')']
Result:
str =
0.050822999 3.141592979 ; (1)

Comparing characters in Rebol 3

I am trying to compare characters to see if they match. I can't figure out why it doesn't work. I'm expecting true on the output, but I'm getting false.
character: "a"
word: "aardvark"
(first word) = character ; expecting true, getting false
So "a" in Rebol is not a character, it is actually a string.
A single unicode character is its own independent type, with its own literal syntax, e.g. #"a". For example, it can be converted back and forth from INTEGER! to get a code point, which the single-letter string "a" cannot:
>> to integer! #"a"
== 97
>> to integer! "a"
** Script error: cannot MAKE/TO integer! from: "a"
** Where: to
** Near: to integer! "a"
A string is not a series of one-character STRING!s, it's a series of CHAR!. So what you want is therefore:
character: #"a"
word: "aardvark"
(first word) = character ;-- true!
(Note: Interestingly, binary conversions of both a single character string and that character will be equivalent:
>> to binary! "μ"
== #{CEBC}
>> to binary! #"μ"
== #{CEBC}
...those are UTF-8 byte representations.)
I recommend for cases like this, when things start to behave in a different way than you expected, to use things like probe and type?. This will help you get a sense of what's going on, and you can use the interactive Rebol console on small pieces of code.
For instance:
>> character: "a"
>> word: "aardvark"
>> type? first word
== char!
>> type? character
== string!
So you can indeed see that the first element of word is a character #"a", while your character is the string! "a". (Although I agree with #HostileFork that comparing a string of length 1 and a character is for a human the same.)
Other places you can test things are http://tryrebol.esperconsultancy.nl or in the chat room with RebolBot