Postgresql: How to replace multiple substrings in a string and with regexp - postgresql

Postgresql 9.4: 1 column in 1 table is a string of text representing a route of flight for an aircraft.
The complete field consists of "Fixes" and "Routes" up to 80
characters in total length.
Routes and Fixes can be either 3 or 5 characters in length.
Routes and Fixes can have the same name.
There may be zero, one, or two Routes
Routes are followed by a single non-zero digit or a hash.
Routes and Fixes can be preceded or followed by a "+" or "*".
The field may contain CR/LF or double-triple spaces which should remain.
Each schema contains 6-20000s fields in this table
There are nearly 1800 Route names, but generally only 40-80 per schema
Examples:
"KIND ROCKY1 STL BUM OATHE CLASH5 KDEN"
"+MEARZ7 OKK+
KIND OKK FWA MIZAR3 KDTW"
"KIND OOM OOM5 WEGEE PXV J131 LIT BYP5 KDFW"
"KIND MEARZ# OKK ECK YEE YXI N171B VALIEE***EGSS"
The task is to clean up the lazy use of the hash instead of a digit and to update the Route versions (the trailing numbers). I.e. replace-in-place the Route with the correct digit rather than the # or what might be a wrong number. So every instance of "MEARZ7" or "MEARZ#" becomes "MEARZ9" and "OOM5" becomes "OOM6" but "OOM " stays "OOM ".
Currently I have been testing this:
UPDATE target SET detail =
CASE WHEN POSITION('CLASH' in detail) > 0
AND SUBSTRING(detail,POSITION('CLASH' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'CLASH.', 'CLASH5')
WHEN POSITION('MEARZ' in detail) > 0
AND SUBSTRING(detail,POSITION('MEARZ' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'MEARZ.', 'MEARZ9')
WHEN POSITION('OOM' in detail) > 0
AND SUBSTRING(detail,POSITION('OOM' in detail)+3,1) != ' '
THEN REGEXP_REPLACE (detail, 'OOM.', 'OOM6')
WHEN POSITION('ROCKY' in fsrtedtail) > 0
AND SUBSTRING(detail,POSITION('ROCKY' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'ROCKY.', 'ROCKY1')
ELSE detail END;
My logic was to:
Find the Route name.
Check if it's followed by a space.
If not, replace it with the correct Route+digit
I hadn't yet attempted to avoid "+" or "* ". I was thinking I could first replace the "#" with a number, then update the Route+digit as to not worry about the # and this would eliminate the need to look for the "+" or "* ". Then I could just look for a trailing space.
The second Route (in order of the WHEN statements) does not get updated so I guess am barking up the wrong tree.
They other big obstacle is there can be 80 or more Routes in a schema so if I have to nest a statement, it's gonna be huge.
I have tried array_to_string(array_replace(string_to_array( but it leaves behind double quotes, commas, and curly brackets so doesn't seem feasible.
At this point I'm thinking a function is the way to go, but I don't know where to start.

Related

How to change case of characters identified via pattern matching in PostgreSQL

I have several PostgreSQL tables with "comment" columns (data type = text) for which I am trying to standardize the use of upper and lowercase. Specifically, I'd like to change the case of comment strings from all-caps to capitalization of only the first character in each sentence (there are typically 1-3 sentences per comment). I standardized the number of spaces between sentences (to 1) with
update table
set comment = regexp_replace(comment, '( ){2,}',' ','g');
and set all characters in each string except the first to lower case with
update table
set comment = upper(left(comment, 1)) || lower(right(comment, -1))
Now, how do I change the case of the first character after each period to uppercase? I can select the relevant characters with
select regexp_matches('Testing. this. using. some. text.', '([.]\s\S)', 'g');
but haven't been able to figure out how to capitalize these. Also, I'm sure there is a better way to conduct these steps in a more integrative way, but this is my noob-ish attempt.
The following worked in my situation, in which comments are made up of one or more sentences and sentences are separated by a period and single space:
with source as (
select regexp_split_to_table('hello. world', '\.\s') sentence
)
select array_to_string(
array(
select upper(left(sentence, 1)) || right(sentence, -1) sentence from source
), '. '
) modified_comment;

Can I write a PCRE conditional that only needs the no-match part?

I am trying to create a regular expression to determine if a string contains a number for an SQL statement. If the value is numeric, then I want to add 1 to it. If the number is not numeric, I want to return a 1. More or less. Here is the SQL:
SELECT
field,
CASE
WHEN regexp_like(field, '^ *\d*\.?\d* *$') THEN dec(field) + 1
ELSE 1
END nextnumber
FROM mytable
This actually works, and returns something like this:
INVALID 1
00000 1
00001E 1
00379 380
00013 14
99904 99905
But to push the envelope of understanding, what if I wanted to cover negative numbers, or those with a positive sign. The sign would have to immediately precede or follow the number, but not both, and I would not want to allow white space between the sign and the number.
I came up with a conditional expression with a capture group to capture the sign on the front of the number to determine if a sign was allowed on the end, but it seems a little awkward to handle given I don't really need a yes-pattern.
Here is the modified regex: ^ ([+-]?)*\d*\.?\d*(?(1) *|[+-]? *)$
This works at regex101.com, but in order for it to work I need to have something before the pipe, so I have to duplicate the next pattern in both the yes-pattern and the no-pattern.
All that background for this question: How can I avoid that duplication?
EDIT: DB2 for i uses International Components for Unicode to provide regular expression processing. It turns out that this library does not support conditionals like PRCE, so I changed the tags on this question. The answer given by Wiktor Stribiżew provides a working alternative to the conditional by using a negative lookahead.
You do not have to duplicate the end pattern, just move it outside the conditional:
^ *([+-])?\d*\.?\d*(?(1)|[+-]?) *$
See the regex demo. So, the yes-part is empty, and the no-part has an optional pattern.
You may also solve it with a mere negative lookahead:
^ *([+-](?!.*[-+]))?\d*\.?\d*[+-]? *$
See another regex demo. Here, ([+-](?!.*[-+]))? matches (optionally) a + or - that are not followed with any 0+ char followed with another + or -.

db2 remove all non-alphanumeric, including non-printable, and special characters

This may sound like a duplicate, but existing solutions does not work.
I need to remove all non-alphanumerics from a varchar field. I'm using the following but it doesn't work in all cases (it works with diamond questionmark characters):
select TRANSLATE(FIELDNAME, '?',
TRANSLATE(FIELDNAME , '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'))
from TABLENAME
What it's doing is the inner translate parse all non-alphanumeric characters, then the outer translate replace them all with a '?'. This seems to work for replacement character�. However, it throws The second, third or fourth argument of the TRANSLATE scalar function is incorrect. which is expected according to IBM:
The TRANSLATE scalar function does not allow replacement of a character by another character which is encoded using a different number of bytes. The second and third arguments of the TRANSLATE scalar function must end with correctly formed characters.
Is there anyway to get around this?
Edit: #Paul Vernon's solution seems to be working:
· 6005308 ??6005308
–6009908 ?6009908
–6011177 ?6011177
��6011183�� ??6011183??
Try regexp_replace(c,'[^\w\d]','') or regexp_replace(c,'[^a-zA-Z\d]','')
E.g.
select regexp_replace(c,'[^a-zA-Z\d]','') from table(values('AB_- C$£abc�$123£')) t(c)
which returns
1
---------
ABCabc123
BTW Note that the allowed regular expression patterns are listed on this page Regular expression control characters
Outside of a set, the following must be preceded with a backslash to be treated as a literal
* ? + [ ( ) { } ^ $ | \ . /
Inside a set, the follow must be preceded with a backslash to be treated as a literal
Characters that must be quoted to be treated as literals are [ ] \
Characters that might need to be quoted, depending on the context are - &

How to keep the upper case and lower case letters in a column alias in the results in Redshift

In Redshift we are trying to give more meaningful aliases to the columns we are returning from the queries as we are importing the results into TABLEAU, the issue is that RedShift turns all the letter to lower case ones, i.e. from "Event Date" it then returns "event date", any idea on how to work this one out to keep the alias given?
I know I'm a bit late to the party but for anyone else looking, you can enable case sensitivity, so if you want to return a column with camel casing for example
SET enable_case_sensitive_identifier TO true;
Then in your query wrap what you want to return the column as in double quotes
SELECT column AS "thisName"
Or as per OP's example
SELECT a.event_date AS "Event Date"
https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html
Edit: To have this behaviour as default for the cluster you will need to create/update a parameter group in Configurations => Workload Management. You can't change the settings for the default parameter group. Note, you will need to reboot the cluster after applying the parameter group for the changes to take effect.
No, you cannot do this in Redshift. all columns are lowercase only.
You can enforce upper case only by using
set describe_field_name_in_uppercase to on;
Also see the examples here https://docs.aws.amazon.com/redshift/latest/dg/r_names.html you can see that the upper case characters are returned as lower case. and it says "identifiers are case-insensitive and are folded to lowercase in the database"
You can of course rename the column to include uppercase within Tableau.
I was going through AWS docs for redshift and looks like INTCAP function can solve your use case
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html
Brief description (copied)
The INITCAP function makes the first letter of each word in a string uppercase, and any subsequent letters are made (or left) lowercase. Therefore, it is important to understand which characters (other than space characters) function as word separators. A word separator character is any non-alphanumeric character, including punctuation marks, symbols, and control characters. All of the following characters are word separators:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
And in your case you have declared field name as event_date which will convert to Event_Date.
And next you can use REPLACE function to replace underscore '_'
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_REPLACE.html
You need to put
set describe_field_name_in_uppercase to on;
in your Tableau's Initial SQL.

list trigger no system ending with "_BI"

I want to list the trigger no system ending with "_BI" in firebird database,
but no result with this
select * from rdb$triggers
where
rdb$trigger_source is not null
and (coalesce(rdb$system_flag,0) = 0)
and (rdb$trigger_source not starting with 'CHECK' )
and (rdb$trigger_name like '%BI')
but with this syntaxs it gives me a "_bi" and "_BI0U" and "_BI0U" ending result
and (rdb$trigger_name like '%BI%')
but with this syntaxs it gives me null result
and (rdb$trigger_name like '%#_BI')
thank you beforehand
The problem is that the Firebird system tables use CHAR(31) for object names, this means that they are padded with spaces up to the declared length. As a result, use of like '%BI') will not yield results, unless BI are the 30th and 31st character.
There are several solutions
For example you can trim the name before checking
trim(rdb$trigger_name) like '%BI'
or you can require that the name is followed by at least one space
rdb$trigger_name || ' ' like '%BI %'
On a related note, if you want to check if your trigger name ends in _BI, then you should also include the underscore in your condition. And as an underscore in like is a single character matcher, you need to escape it:
trim(rdb$trigger_name) like '%\_BI' escape '\'
Alternatively you could also try to use a regular expressions, as you won't need to trim or otherwise mangle the lefthand side of the expression:
rdb$trigger_name similar to '%\_BI[[:SPACE:]]*' escape '\'