Extracting words from a string in postgres SQL - postgresql

I am brand new to 'postgres' and trying to extract a value from a string. I'm trying to use a combination of regexp_substr and replace to implement the desired outcome.
UPDATED Example: I have a string "When Harry met Sally" (following the pattern, When X met Y). I'd like to extract the word Harry, which is X.
I am trying the syntax:
regexp_substr(REPLACE('When Harry met Sally', 'When ', ''),' met [^.]*'); but am receiving the error message: ERROR: function regexp_substr(text, unknown) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Can anyone help? I imagine that this is child's play for some pro out there.

Before using some function, check into the documentation about its syntax... or if it exists.
regexp_substr does not exist in Postgresql. What are you trying to do ?
I'd like to extract the word Harry.
By which criteria? The second word? The X in "When X met Y"?
Without more info, it's impossible to answer.
postgres=# select regexp_replace('When Harry met Sally',
'When (.*) met.*',
'\1' );
regexp_replace
----------------
Harry
(1 row)
This is just an example. It could be wrong if, for example you want to support other kind of blank space, or be case sensitive, or admit some content before 'When', or...

Related

DB2: Need to extract string to the left of delimitere

I have a column that looks like this:
SBN:123456=1
SBN:1234=0
SBN:12345678=5
I need to extract everything left of the equal sign ('=') for every row. I attempted using SUBSTRING this way:
SELECT COLUMN1, SUBSTR(COLUMN2,1,LOCATE('=', COLUMN2)-1) AS STUFF FROM TABLE1;
Instead of extracting the text from the string, it gave me the error "The statement was not executed because a numeric argument of a scalar function is out of range." and I can't seem to figure out why. What am I doing wrong?
I'm using DB2 11.1.4.4 on AIX, just FYI.
I found the issue. There were some NULLs in the column that the query didn't like apparently. Got rid of those and it worked fine.

Can I write a PCRE conditional that only needs the no-match part?

I am trying to create a regular expression to determine if a string contains a number for an SQL statement. If the value is numeric, then I want to add 1 to it. If the number is not numeric, I want to return a 1. More or less. Here is the SQL:
SELECT
field,
CASE
WHEN regexp_like(field, '^ *\d*\.?\d* *$') THEN dec(field) + 1
ELSE 1
END nextnumber
FROM mytable
This actually works, and returns something like this:
INVALID 1
00000 1
00001E 1
00379 380
00013 14
99904 99905
But to push the envelope of understanding, what if I wanted to cover negative numbers, or those with a positive sign. The sign would have to immediately precede or follow the number, but not both, and I would not want to allow white space between the sign and the number.
I came up with a conditional expression with a capture group to capture the sign on the front of the number to determine if a sign was allowed on the end, but it seems a little awkward to handle given I don't really need a yes-pattern.
Here is the modified regex: ^ ([+-]?)*\d*\.?\d*(?(1) *|[+-]? *)$
This works at regex101.com, but in order for it to work I need to have something before the pipe, so I have to duplicate the next pattern in both the yes-pattern and the no-pattern.
All that background for this question: How can I avoid that duplication?
EDIT: DB2 for i uses International Components for Unicode to provide regular expression processing. It turns out that this library does not support conditionals like PRCE, so I changed the tags on this question. The answer given by Wiktor Stribiżew provides a working alternative to the conditional by using a negative lookahead.
You do not have to duplicate the end pattern, just move it outside the conditional:
^ *([+-])?\d*\.?\d*(?(1)|[+-]?) *$
See the regex demo. So, the yes-part is empty, and the no-part has an optional pattern.
You may also solve it with a mere negative lookahead:
^ *([+-](?!.*[-+]))?\d*\.?\d*[+-]? *$
See another regex demo. Here, ([+-](?!.*[-+]))? matches (optionally) a + or - that are not followed with any 0+ char followed with another + or -.

In DB2 SQL RegEx, how can a conditional replacement be done without CASE WHEN END..?

I have a DB2 v7r3 SQL SELECT statement with three instances of REGEXP_SUBSTR(), all with the same regex pattern string, each of which extract one of three groups.
I'd like to change the first SUBSTR to REGEXP_REPLACE() to do a conditional replacement if there's no match, to insert a default value similarly to the ELSE section of a CASE...END. But I can't make it work. I could easily use a CASE, but it seems more compact & efficient to use RegEx.
For example, I have descriptions of food containers sizes, in various states of completeness:
12X125
6X350
1X1500
1500ML
1000
The last two don't have the 'nnX' part at the beginning, in which case '1X' is assumed and needs to be inserted.
This is my current working pattern string:
^(?:(\d{1,3})(?:X))?((?:\d{1,4})(?:\.\d{1,3})?)(L|ML|PK|Z|)$
The groups returned are: quantity, size, and unit.
But only the first group needs the conditional replacement:
(?:(\d{1,3})(?:X))?
This RexEgg webpage describes the (?=...) operator, and it seems to be what I need, but I'm not sure. It's in the list of operators for my version of DB2, but I can't make it work. Frankly, it's a bit deeper than my regex knowledge, and I can't even make it work in my favorite online regex tester, Regex101.
So...does anyone have any idea or suggestions..? Thanks.
Try this (replace "digits not followed by X_or_digit"):
with t(s) as (values
'12X125'
, '6X350'
, '1X1500'
, '1500'
, '1125'
)
select regexp_replace(s, '^([\d]+(?![X\d]))', '1X\1')
from t;

Replace with undefined character in Postgres

I need to do an UPDATE script using the Replace() function of Postgres but I don't know the exact string that I have to replace and I'd like to know if there is some way that I can do this similary the LIKEoperator, using Wildcards.
My problem is that I got a table that contains some scripts and at the end of each one there is a tag <signature> like this:
'SELECT SCRIPT WHATEVER.... < signature>782798e2a92c72b270t920b< signature>'
What I need to do is:
UPDATE table SET script = REPLACE(script,'<signature>%<signature>','<signature>1234ABCDEF567890<signature>')
Whatever the signature is, I need to replace with a new one defined by me. I know using the '%' doesn't work, it was just to ilustrate the effect i want to perform. Is there any way to do this in Postgres 9.5?
with expr
as
(select 'Hello <signature>thisismysig</signature>'::text as
full_text, '<signature>'::text as open,
'</signature>'::text as close
)
select
substring(full_text from
position(open in full_text)+char_length(open)
for
position(close in full_text)- char_length(open)-position(open in full_text)
)
note: with part added for ease of understanding (hopefully).
Use POSIX regex to do the same thing as other answer (but shorter)
select
substring('a bunch of other stuff <signature>mysig</signature>'
from '<signature>(.*?)</signature>')

PostgreSQL regexp.replace all unwanted chars

I have registration codes in my PostgreSQL table which are written messy, like MU-321-AB, MU/321/AB, MU 321-AB and so forth...
I would need to clear all of this to get MU321AB.
For this I uses following expression:
SELECT DISTINCT regexp_replace(ccode, '([^A-Za-z0-9])', ''), ...
This expression work as expected in 'NET' but not in PostgreSQL where it 'clears' only first occurrence of unwanted character.
How would I modify regular expression which will replace all unwanted chars in string to get clear code with only letters and numbers?
Use the global flag, but without any capture groups:
SELECT DISTINCT regexp_replace(ccode, '[^A-Za-z0-9]', '', 'g'), ...
Note that the global flag is part of the standard regular expression parser, so .NET is not following the standard in this case. Also, since you do not want anything extracted from the string - you just want to replace some characters - you should not use capture groups ().