Lex Parsing for exponent - lex

I am trying to parse a file the data looks like
size = [5e+09, 5e+09, 5e+09]
I have 'size OSQUARE NUMBER COMMA NUMBER COMMA NUMBER ESQUARE'
And NUMBER is defined in tokrules as
t_NUMBER = r'[-]?[0-9]*[\.]*[0-9]+([eE]-?[0-9]+)*'
But I get
Syntax error in input!
LexToken(ID,'e',6,113)
Illegal character '+'
Illegal character '+'
Illegal character '+'
What is wrong with my NUMBER definition?
I am using https://www.dabeaz.com/ply/

The part of your rule which matches exponents is
([eE]-?[0-9]+)*
Clearly, that won't match a +. It should be:
([eE][-+]?[0-9]+)*
Also, it will match 0 or more exponents, which is not correct. It should match 0 or 1:
([eE][-+]?[0-9]+)?

Related

Regex expression in q to match specific integer range following string

Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"

How to check column that contain letter and number in Talend

My columns must contains 2 letter and 4 number like this (AV1234)
How can i check this ?
You can use sql templates as mentioned in talend documentation here and you can check your column that contain letter and number using regular expressions.
Use this [a-zA-Z]{2}[0-9]{6}
Use this If you want only uppercase letters [A-Z]{2}[0-9]{6}
[a-zA-Z] # Match a single character present in the list below
# A character in the range between “a” and “z”
# A character in the range between “A” and “Z”
{2} # Exactly 2 times
[0-9] # Match a single character in the range between “0” and “9”
{6} # Exactly 6 times
Thank you for your answer ! it Works
My routine code:
public static Boolean MyPattern(String str) {
String stringPattern = "[A-Z]{2}[0-9]{4}";
boolean match = Pattern.matches(stringPattern, str);
return match ;
}

String to Integer (atoi) [Leetcode] gave wrong answer?

String to Integer (atoi)
This problem is implement atoi to convert a string to an integer.
When test input = " +0 123"
My code return = 123
But why expected answer = 0?
======================
And if test input = " +0123"
My code return = 123
Now expected answer = 123
So is that answer wrong?
I think this is expected result as it said
Requirements for atoi:
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.
Your first test case has a space in between two different digit groups, and atoi only consider the first group which is '0' and convert into integer

I want to convert as below via preg_replace

I want to convert as below via preg_replace.
How can i know answer??
preg_replace($pattern, "$2/$1", "One001Two111Three");
result> Three/Two111/One001
You'd better use preg_split, it's much more simple than preg_replace and it works with any number of elements:
$str = "One001Two111Three";
$res = implode('/', array_reverse(preg_split('/(?<=\d)(?=[A-Z])/', $str)));
echo $res,"\n";
output:
Three/Two111/One001
The regex /(?<=\d)(?=[A-Z])/ splits on boundary between a digit and a capital letter, array_reverse reverse the order of the array given by preg_split, then the elements of reversed array are joined by implode with a /
$string = "One001Two111Three";
$result = preg_replace('/^(.*?\d+)(.*?\d+)(.*?)$/im', '$3/$2/$1', $string );
echo $result;
RESULT: Three/Two111/One001
DEMO
EXPLANATION:
^(.*?\d+)(.*?\d+)(.*?)$
-----------------------
Options: Case insensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Greedy quantifiers; Regex syntax only
Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) «^»
Match the regex below and capture its match into backreference number 1 «(.*?\d+)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(.*?\d+)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 3 «(.*?)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line (at the end of the string or before a line break character) (line feed) «$»
$3/$2/$1
Insert the text that was last matched by capturing group number 3 «$3»
Insert the character “/” literally «/»
Insert the text that was last matched by capturing group number 2 «$2»
Insert the character “/” literally «/»
Insert the text that was last matched by capturing group number 1 «$1»

How is the full substring different from using .text()?

I'm failing to see how taking the full substring is different from just using .text()?
This is a snippet of a larger code set that I'm trying to understand but failing:
$(this).text().substring(0, ($(this).text().length - 1))
Substring takes a portion of the full text/string, but in this case it is taking the whole string, correct?
No, here substring is returning characters 0 to n-1 of an n length string.
x = "hello";
>>> "hello"
x.substring(0, x.length - 1)
>>> "hell"
From the MDN documentation linked:
substring extracts characters from indexA up to but not including indexB. In particular:
If indexA equals indexB, substring returns an empty string.
If indexB is omitted, substring extracts characters to the end of the string.
If either argument is less than 0 or is NaN, it is treated as if it were 0.
If either argument is greater than stringName.length, it is treated as if it were stringName.length.