Lex Parsing for exponent

Lex Parsing for exponent - lex

I am trying to parse a file the data looks like
size = [5e+09, 5e+09, 5e+09]
I have 'size OSQUARE NUMBER COMMA NUMBER COMMA NUMBER ESQUARE'
And NUMBER is defined in tokrules as
t_NUMBER = r'[-]?[0-9]*[\.]*[0-9]+([eE]-?[0-9]+)*'
But I get
Syntax error in input!
LexToken(ID,'e',6,113)
Illegal character '+'
Illegal character '+'
Illegal character '+'
What is wrong with my NUMBER definition?
I am using https://www.dabeaz.com/ply/

The part of your rule which matches exponents is
([eE]-?[0-9]+)*
Clearly, that won't match a +. It should be:
([eE][-+]?[0-9]+)*
Also, it will match 0 or more exponents, which is not correct. It should match 0 or 1:
([eE][-+]?[0-9]+)?

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.

As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"

A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.

As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".

For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

How to check column that contain letter and number in Talend

My columns must contains 2 letter and 4 number like this (AV1234)
How can i check this ?

You can use sql templates as mentioned in talend documentation here and you can check your column that contain letter and number using regular expressions.
Use this [a-zA-Z]{2}[0-9]{6}
Use this If you want only uppercase letters [A-Z]{2}[0-9]{6}
[a-zA-Z] # Match a single character present in the list below
# A character in the range between “a” and “z”
# A character in the range between “A” and “Z”
{2} # Exactly 2 times
[0-9] # Match a single character in the range between “0” and “9”
{6} # Exactly 6 times

Thank you for your answer ! it Works
My routine code:
public static Boolean MyPattern(String str) {
String stringPattern = "[A-Z]{2}[0-9]{4}";
boolean match = Pattern.matches(stringPattern, str);
return match ;
}

String to Integer (atoi) [Leetcode] gave wrong answer?

String to Integer (atoi)
This problem is implement atoi to convert a string to an integer.
When test input = " +0 123"
My code return = 123
But why expected answer = 0?
======================
And if test input = " +0123"
My code return = 123
Now expected answer = 123
So is that answer wrong?

I think this is expected result as it said
Requirements for atoi:
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.
Your first test case has a space in between two different digit groups, and atoi only consider the first group which is '0' and convert into integer

I want to convert as below via preg_replace

I want to convert as below via preg_replace.
How can i know answer??
preg_replace($pattern, "$2/$1", "One001Two111Three");
result> Three/Two111/One001

You'd better use preg_split, it's much more simple than preg_replace and it works with any number of elements:
$str = "One001Two111Three";
$res = implode('/', array_reverse(preg_split('/(?<=\d)(?=[A-Z])/', $str)));
echo $res,"\n";
output:
Three/Two111/One001
The regex /(?<=\d)(?=[A-Z])/ splits on boundary between a digit and a capital letter, array_reverse reverse the order of the array given by preg_split, then the elements of reversed array are joined by implode with a /

$string = "One001Two111Three";
$result = preg_replace('/^(.*?\d+)(.*?\d+)(.*?)$/im', '$3/$2/$1', $string );
echo $result;
RESULT: Three/Two111/One001
DEMO
EXPLANATION:
^(.*?\d+)(.*?\d+)(.*?)$
-----------------------
Options: Case insensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Greedy quantifiers; Regex syntax only
Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) «^»
Match the regex below and capture its match into backreference number 1 «(.*?\d+)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(.*?\d+)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 3 «(.*?)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line (at the end of the string or before a line break character) (line feed) «$»
$3/$2/$1
Insert the text that was last matched by capturing group number 3 «$3»
Insert the character “/” literally «/»
Insert the text that was last matched by capturing group number 2 «$2»
Insert the character “/” literally «/»
Insert the text that was last matched by capturing group number 1 «$1»

How is the full substring different from using .text()?

I'm failing to see how taking the full substring is different from just using .text()?
This is a snippet of a larger code set that I'm trying to understand but failing:
$(this).text().substring(0, ($(this).text().length - 1))
Substring takes a portion of the full text/string, but in this case it is taking the whole string, correct?

No, here substring is returning characters 0 to n-1 of an n length string.
x = "hello";
>>> "hello"
x.substring(0, x.length - 1)
>>> "hell"
From the MDN documentation linked:
substring extracts characters from indexA up to but not including indexB. In particular:
If indexA equals indexB, substring returns an empty string.
If indexB is omitted, substring extracts characters to the end of the string.
If either argument is less than 0 or is NaN, it is treated as if it were 0.
If either argument is greater than stringName.length, it is treated as if it were stringName.length.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Lex Parsing for exponent - lex

The part of your rule which matches exponents is ([eE]-?[0-9]+)* Clearly, that won't match a +. It should be: ([eE][-+]?[0-9]+)* Also, it will match 0 or more exponents, which is not correct. It should match 0 or 1: ([eE][-+]?[0-9]+)?

Related

Regex expression in q to match specific integer range following string

How to check column that contain letter and number in Talend

String to Integer (atoi) [Leetcode] gave wrong answer?

I want to convert as below via preg_replace

How is the full substring different from using .text()?

Categories

Resources