regexp_extract is getting spaces - scala

I have this sample data to test regexp_extract function.
message_txt="test 9341Come Products Preferred*TEST*TEST, the mfg SYSTEM, paid18.26 toward the"
message_txt="mfg of TR tt 100 test, paid $861.82 toward your "
message_txt="TEST 0.015% , paid $1119.00toward your "
I need to extract the numeric value between "paid" and "toward", i.e. 18.26, 861.82 and 1119.00. I execute the below statement
regexp_extract(col("message_txt"),"(?i)paid\\s+(.*?)\\s+(?i)toward",1)
... but getting only spaces.

I don't know regexp_extract() but it looks to me like...
You don't want $ in your results, so you need to move that outside of the capture group.
There aren't always spaces before/after the target, so \\s needs to be optional.
There's no point in having a 2nd (?i).
It's usually better to describe exactly what's permitted in the capture group.
Try something like: "(?i)paid\\s*\\$?([\\d.]+)\\s*toward"

Related

Regular expression: retrieve one or more numbers after certain text

I'm trying to parse HTML code and extra some data from with using regular expressions. The website that provides the data has no API and I want to show this data in an iOS app build using Swift. The HTML looks like this:
$(document).ready(function() {
var years = ['2020','2021','2022'];
var currentView = 0;
var amounts = [1269.2358,1456.557,1546.8768];
var balances = [3484626,3683646,3683070];
rest of the html code
What I'm trying to extract is the years, amounts and balances.
So I would like to have an array with the year in in [2020,2021,2022] same for amount and balances. In this example there are 3 years, but it could be more or less. I'm able to extra all the numbers but then I'm unable to link them to the years or amounts or balances. See this example: https://regex101.com/r/WMwUji/1, using this pattern (\d|\[(\d|,\s*)*])
Any help would be really appreciated.
Firstly I think there are some errors in your expression. To capture the whole number you have to use \d+ (which matches 1 or more consecutive numbers e.g. 2020). If you need to include . as a separator the expression then would look like \d+\.\d+.
In addition using non-capturing group, (?:) and non-greedy matches .*? the regular-expression that gives the desired result for years is
(?:year.*?|',')(\d+)
This can also be modified for the amount field which would look like this:
(?:amounts.*?|,)(\d+\.\d+)
You can try it here: https://regex101.com/r/QLcFQN/1
Edited: in the previous Version my proposed regex was non functional and only captured the last match.
You can continue with this regex:
^var (years \= (?'year'.*)|balances \= (?'balances'.*)|amounts \= (?'amounts'.*));$
It searches for lines with either years, balances or amount entries and names the matches acordingly. It matches the whole string within the brackets.

Can I write a PCRE conditional that only needs the no-match part?

I am trying to create a regular expression to determine if a string contains a number for an SQL statement. If the value is numeric, then I want to add 1 to it. If the number is not numeric, I want to return a 1. More or less. Here is the SQL:
SELECT
field,
CASE
WHEN regexp_like(field, '^ *\d*\.?\d* *$') THEN dec(field) + 1
ELSE 1
END nextnumber
FROM mytable
This actually works, and returns something like this:
INVALID 1
00000 1
00001E 1
00379 380
00013 14
99904 99905
But to push the envelope of understanding, what if I wanted to cover negative numbers, or those with a positive sign. The sign would have to immediately precede or follow the number, but not both, and I would not want to allow white space between the sign and the number.
I came up with a conditional expression with a capture group to capture the sign on the front of the number to determine if a sign was allowed on the end, but it seems a little awkward to handle given I don't really need a yes-pattern.
Here is the modified regex: ^ ([+-]?)*\d*\.?\d*(?(1) *|[+-]? *)$
This works at regex101.com, but in order for it to work I need to have something before the pipe, so I have to duplicate the next pattern in both the yes-pattern and the no-pattern.
All that background for this question: How can I avoid that duplication?
EDIT: DB2 for i uses International Components for Unicode to provide regular expression processing. It turns out that this library does not support conditionals like PRCE, so I changed the tags on this question. The answer given by Wiktor Stribiżew provides a working alternative to the conditional by using a negative lookahead.
You do not have to duplicate the end pattern, just move it outside the conditional:
^ *([+-])?\d*\.?\d*(?(1)|[+-]?) *$
See the regex demo. So, the yes-part is empty, and the no-part has an optional pattern.
You may also solve it with a mere negative lookahead:
^ *([+-](?!.*[-+]))?\d*\.?\d*[+-]? *$
See another regex demo. Here, ([+-](?!.*[-+]))? matches (optionally) a + or - that are not followed with any 0+ char followed with another + or -.

In DB2 SQL RegEx, how can a conditional replacement be done without CASE WHEN END..?

I have a DB2 v7r3 SQL SELECT statement with three instances of REGEXP_SUBSTR(), all with the same regex pattern string, each of which extract one of three groups.
I'd like to change the first SUBSTR to REGEXP_REPLACE() to do a conditional replacement if there's no match, to insert a default value similarly to the ELSE section of a CASE...END. But I can't make it work. I could easily use a CASE, but it seems more compact & efficient to use RegEx.
For example, I have descriptions of food containers sizes, in various states of completeness:
12X125
6X350
1X1500
1500ML
1000
The last two don't have the 'nnX' part at the beginning, in which case '1X' is assumed and needs to be inserted.
This is my current working pattern string:
^(?:(\d{1,3})(?:X))?((?:\d{1,4})(?:\.\d{1,3})?)(L|ML|PK|Z|)$
The groups returned are: quantity, size, and unit.
But only the first group needs the conditional replacement:
(?:(\d{1,3})(?:X))?
This RexEgg webpage describes the (?=...) operator, and it seems to be what I need, but I'm not sure. It's in the list of operators for my version of DB2, but I can't make it work. Frankly, it's a bit deeper than my regex knowledge, and I can't even make it work in my favorite online regex tester, Regex101.
So...does anyone have any idea or suggestions..? Thanks.
Try this (replace "digits not followed by X_or_digit"):
with t(s) as (values
'12X125'
, '6X350'
, '1X1500'
, '1500'
, '1125'
)
select regexp_replace(s, '^([\d]+(?![X\d]))', '1X\1')
from t;

Grafana Query challenge for $ and ^

I was reading some of the query on grafana dashboard.
There is one query I am not quite understand
sum (container_memory_working_set_bytes{pod_name=~"^$Pod$"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~"^$Node$"}) * 100
I understand the $Pod is my valuable or template which I created.
But I am not sure what does the "^" and the second "$" in "^$Node$" mean.
Thank you for help me.
I know nothing about Grafana, but that definitely looks like a Regular Expression. If I'm right, $Pod and $Node are mere placeholders that will be substituted with their actual values at runtime, and the ^ and $ mean that you want to match exactly that value. In other words, in order to match, a string has to begin and end with that value.
As an example, if $Pod gets substituted by, say, foo_pod, a string containing exactly foo_pod will match, but a string like foo_pod2 will not.
Here you can learn more about Regular Expressions, specifically about the ^ and $ anchors.

list trigger no system ending with "_BI"

I want to list the trigger no system ending with "_BI" in firebird database,
but no result with this
select * from rdb$triggers
where
rdb$trigger_source is not null
and (coalesce(rdb$system_flag,0) = 0)
and (rdb$trigger_source not starting with 'CHECK' )
and (rdb$trigger_name like '%BI')
but with this syntaxs it gives me a "_bi" and "_BI0U" and "_BI0U" ending result
and (rdb$trigger_name like '%BI%')
but with this syntaxs it gives me null result
and (rdb$trigger_name like '%#_BI')
thank you beforehand
The problem is that the Firebird system tables use CHAR(31) for object names, this means that they are padded with spaces up to the declared length. As a result, use of like '%BI') will not yield results, unless BI are the 30th and 31st character.
There are several solutions
For example you can trim the name before checking
trim(rdb$trigger_name) like '%BI'
or you can require that the name is followed by at least one space
rdb$trigger_name || ' ' like '%BI %'
On a related note, if you want to check if your trigger name ends in _BI, then you should also include the underscore in your condition. And as an underscore in like is a single character matcher, you need to escape it:
trim(rdb$trigger_name) like '%\_BI' escape '\'
Alternatively you could also try to use a regular expressions, as you won't need to trim or otherwise mangle the lefthand side of the expression:
rdb$trigger_name similar to '%\_BI[[:SPACE:]]*' escape '\'