Postgres query to get particular text from given text - postgresql

I have text like this in different rows in a column
xxxxxxxxxxx ab_88_2018 xxxxxx
ab_88_2018 xxxxxx
AB_88_2018 xxxxxx
ab_2018_88 XXXXXX
So I want only 88 out of the text into another column.
What can be the query?
Its not 88, but two numbers in that position

Is the 88 always a 2 digit number? If so, this is working for me for Postgres and Redshift and I believe gets you what you want:
SELECT
CASE
WHEN LOWER(column)
~ '.*[a-z]{2}\_[0-9]{2}\_[0-9]{4}.*' THEN SPLIT_PART(column,'_',2)
WHEN LOWER(column)
~ '[a-z]{2}\_[0-9]{4}\_[0-9]{2}.*' THEN LEFT(SPLIT_PART(column,'_',3),2)
END As get_two_digit_number
The ~ (tilde) is similar to LIKE but allows you to do pattern matching through regex. See regexr.com and paste your examples and the code between the '' to see what it's matching
SPLIT_PART is taking the string that matches the pattern, and then breaking it on a character of my choosing, here it's the '_'. The last number is which break to return
Using 'xxxxxxxxxxx ab_88_2018 xxxxxx' as an example, SPLIT_PART('xxxxxxxxxxx ab_88_2018 xxxxxx','',2) will return '88'as 88 is the second part after ''. If you entered 1 it would return everything before the '_'

Related

Postgresql: How to replace multiple substrings in a string and with regexp

Postgresql 9.4: 1 column in 1 table is a string of text representing a route of flight for an aircraft.
The complete field consists of "Fixes" and "Routes" up to 80
characters in total length.
Routes and Fixes can be either 3 or 5 characters in length.
Routes and Fixes can have the same name.
There may be zero, one, or two Routes
Routes are followed by a single non-zero digit or a hash.
Routes and Fixes can be preceded or followed by a "+" or "*".
The field may contain CR/LF or double-triple spaces which should remain.
Each schema contains 6-20000s fields in this table
There are nearly 1800 Route names, but generally only 40-80 per schema
Examples:
"KIND ROCKY1 STL BUM OATHE CLASH5 KDEN"
"+MEARZ7 OKK+
KIND OKK FWA MIZAR3 KDTW"
"KIND OOM OOM5 WEGEE PXV J131 LIT BYP5 KDFW"
"KIND MEARZ# OKK ECK YEE YXI N171B VALIEE***EGSS"
The task is to clean up the lazy use of the hash instead of a digit and to update the Route versions (the trailing numbers). I.e. replace-in-place the Route with the correct digit rather than the # or what might be a wrong number. So every instance of "MEARZ7" or "MEARZ#" becomes "MEARZ9" and "OOM5" becomes "OOM6" but "OOM " stays "OOM ".
Currently I have been testing this:
UPDATE target SET detail =
CASE WHEN POSITION('CLASH' in detail) > 0
AND SUBSTRING(detail,POSITION('CLASH' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'CLASH.', 'CLASH5')
WHEN POSITION('MEARZ' in detail) > 0
AND SUBSTRING(detail,POSITION('MEARZ' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'MEARZ.', 'MEARZ9')
WHEN POSITION('OOM' in detail) > 0
AND SUBSTRING(detail,POSITION('OOM' in detail)+3,1) != ' '
THEN REGEXP_REPLACE (detail, 'OOM.', 'OOM6')
WHEN POSITION('ROCKY' in fsrtedtail) > 0
AND SUBSTRING(detail,POSITION('ROCKY' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'ROCKY.', 'ROCKY1')
ELSE detail END;
My logic was to:
Find the Route name.
Check if it's followed by a space.
If not, replace it with the correct Route+digit
I hadn't yet attempted to avoid "+" or "* ". I was thinking I could first replace the "#" with a number, then update the Route+digit as to not worry about the # and this would eliminate the need to look for the "+" or "* ". Then I could just look for a trailing space.
The second Route (in order of the WHEN statements) does not get updated so I guess am barking up the wrong tree.
They other big obstacle is there can be 80 or more Routes in a schema so if I have to nest a statement, it's gonna be huge.
I have tried array_to_string(array_replace(string_to_array( but it leaves behind double quotes, commas, and curly brackets so doesn't seem feasible.
At this point I'm thinking a function is the way to go, but I don't know where to start.

Can I write a PCRE conditional that only needs the no-match part?

I am trying to create a regular expression to determine if a string contains a number for an SQL statement. If the value is numeric, then I want to add 1 to it. If the number is not numeric, I want to return a 1. More or less. Here is the SQL:
SELECT
field,
CASE
WHEN regexp_like(field, '^ *\d*\.?\d* *$') THEN dec(field) + 1
ELSE 1
END nextnumber
FROM mytable
This actually works, and returns something like this:
INVALID 1
00000 1
00001E 1
00379 380
00013 14
99904 99905
But to push the envelope of understanding, what if I wanted to cover negative numbers, or those with a positive sign. The sign would have to immediately precede or follow the number, but not both, and I would not want to allow white space between the sign and the number.
I came up with a conditional expression with a capture group to capture the sign on the front of the number to determine if a sign was allowed on the end, but it seems a little awkward to handle given I don't really need a yes-pattern.
Here is the modified regex: ^ ([+-]?)*\d*\.?\d*(?(1) *|[+-]? *)$
This works at regex101.com, but in order for it to work I need to have something before the pipe, so I have to duplicate the next pattern in both the yes-pattern and the no-pattern.
All that background for this question: How can I avoid that duplication?
EDIT: DB2 for i uses International Components for Unicode to provide regular expression processing. It turns out that this library does not support conditionals like PRCE, so I changed the tags on this question. The answer given by Wiktor Stribiżew provides a working alternative to the conditional by using a negative lookahead.
You do not have to duplicate the end pattern, just move it outside the conditional:
^ *([+-])?\d*\.?\d*(?(1)|[+-]?) *$
See the regex demo. So, the yes-part is empty, and the no-part has an optional pattern.
You may also solve it with a mere negative lookahead:
^ *([+-](?!.*[-+]))?\d*\.?\d*[+-]? *$
See another regex demo. Here, ([+-](?!.*[-+]))? matches (optionally) a + or - that are not followed with any 0+ char followed with another + or -.

How to keep the upper case and lower case letters in a column alias in the results in Redshift

In Redshift we are trying to give more meaningful aliases to the columns we are returning from the queries as we are importing the results into TABLEAU, the issue is that RedShift turns all the letter to lower case ones, i.e. from "Event Date" it then returns "event date", any idea on how to work this one out to keep the alias given?
I know I'm a bit late to the party but for anyone else looking, you can enable case sensitivity, so if you want to return a column with camel casing for example
SET enable_case_sensitive_identifier TO true;
Then in your query wrap what you want to return the column as in double quotes
SELECT column AS "thisName"
Or as per OP's example
SELECT a.event_date AS "Event Date"
https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html
Edit: To have this behaviour as default for the cluster you will need to create/update a parameter group in Configurations => Workload Management. You can't change the settings for the default parameter group. Note, you will need to reboot the cluster after applying the parameter group for the changes to take effect.
No, you cannot do this in Redshift. all columns are lowercase only.
You can enforce upper case only by using
set describe_field_name_in_uppercase to on;
Also see the examples here https://docs.aws.amazon.com/redshift/latest/dg/r_names.html you can see that the upper case characters are returned as lower case. and it says "identifiers are case-insensitive and are folded to lowercase in the database"
You can of course rename the column to include uppercase within Tableau.
I was going through AWS docs for redshift and looks like INTCAP function can solve your use case
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html
Brief description (copied)
The INITCAP function makes the first letter of each word in a string uppercase, and any subsequent letters are made (or left) lowercase. Therefore, it is important to understand which characters (other than space characters) function as word separators. A word separator character is any non-alphanumeric character, including punctuation marks, symbols, and control characters. All of the following characters are word separators:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
And in your case you have declared field name as event_date which will convert to Event_Date.
And next you can use REPLACE function to replace underscore '_'
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_REPLACE.html
You need to put
set describe_field_name_in_uppercase to on;
in your Tableau's Initial SQL.

Detect Column containing special characters other than space - Postgresql

I need to find the values from a text column which have characters other than alphabets, numbers, and SPACE (It is a name column so having space is allowed).
I am trying this which is not working
select * from table where name ~ '[^a-z0-9 ]';
I have left a space between 9 and ]
The correct regular expression would be:
[^[:alnum:] ]
That will match any string that contains a character that is neither alphabetical nor numerical nor space.
Try ^[-a-z0-9 ]
I think you can use \\w instead of a-z0-9
so that looks like : [-\\w.]

Modelica combiTimeTable

I have a few questions regarding combitimeTables: I tired to import a txt file (3 columns: first time + 2 measured data)into a combitimeTable. - Does the txt file have to have the following header #1; double K(x,y) - Is it right, that the table name in combitimeTable have to have the same name than the variable after double (in my case K)? - I get errors if i try to connect 2 outputs of the table (column 1 and column2). Do I have to specify how many columns that I want to import?
And: Why do i have to use in the path "/" instead of "\" ?
Modelica Code:
Modelica.Blocks.Sources.CombiTimeTable combiTimeTable(
tableOnFile=true,
tableName="K",
fileName="D:/test.txt")
Thank you very much!
The standard text file format for CombiTables is:
#1
double K(4,3)
0 1 10
1 3 20
2 5 30
3 7 40
In this case note the "tableName" parameter I would set as a modifier to the CombiTable (or CombiTimeTable) is "K". And yes, the numbers in parenthesis indicate the dimensions of the data to the tool, so in this case 4 rows and 3 columns.
Regarding the path separator "/" or "\", the backslash character "\" which is the path separator in Windows where as the forward slash "/" is the path separator on Unix like systems (e.g. Linux). The issue is that in most libraries the backslash is used as an escape character. So for example "\n" indicates new line and "\t" indicates tab so if my file name string was "D:\nextfolder\table.txt", this would actually look something like:
D:
extfolder able.txt
Depending on your Modelica simulation tool however it might correct this. So if you used a file selection dialog box to choose your file, the tool should automatically switch the file separator character to the forward slash "/" and your text would look like:
combiTimeTable(
tableOnFile=true,
tableName="K",
fileName="D:/nextfolder/table.txt",
columns=2:3)
If you are getting errors in your connect statement, I would guess you might have forgotten the "columns" parameter. The default value for this parameter comes from the "table" parameter (which is empty by default because there are zero rows by two columns), not from the data in the file. So when you are reading data from a file you need to explicitly set this