how to remove certain alphanumeric characters from a specific column - postgresql

I have a column that displays certain data in the form of Y14H1050101P01T01.
I want to remove the first 4 alphanumeric characters and the last 6 alphanumeric characters so it reads 1050101 instead.
How do I go about doing this?

We can use a regex replacement here:
SELECT col, REGEXP_REPLACE(col, '^[A-Z0-9]{4}|[A-Z0-9]{6}$', '', 'g') AS col_out
FROM yourTable;
Demo
If you want the change to stick, then use an update:
UPDATE yourTable
SET col = REGEXP_REPLACE(col, '^[A-Z0-9]{4}|[A-Z0-9]{6}$', '', 'g')
WHERE col ~* '^[A-Z0-9]{4}.*[A-Z0-9]{6}$';

Related

to find new lines character in postgres

Having couple of entries in database table that have multiple line "names" data.
I try to find single newline character from it.
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems
WHERE
strpos ( NAME, E'\n' ) > 0;
But it fails for the data that have more than 1 new line character (\n).
ANy way to find "n" number of "\n" in names data.
regexp_matches will emit a row for each match. doc
SELECT
id,
strpos ( NAME, E'\n' ) AS Position_of_substring
FROM
problems p
WHERE
(select count(*) from regexp_matches(p.name,E'\n','g') ) = ?;
This one gives you a list of all indexes with \n in your string. I am not sure if you were expecting this result:
demo:db<>fiddle
SELECT
name,
array_remove( -- 5
(array_agg(sum))::int[], -- 4
length(name) + 1
)
FROM (
-- 3
SELECT
name,
SUM(length(lines) + 1) OVER (PARTITION BY name ORDER BY row_number)
FROM (
-- 2
SELECT
*,
row_number() OVER ()
FROM (
-- 1
SELECT
name,
regexp_split_to_table(name, '\n') as lines
FROM problems
)s
)s
) s
GROUP BY name
Splitting the string at the \n chars. Every split part is now one row in a temporary table.
Adding a row_count to assure the right order of the split parts
This counts the length of all single split parts. The (length + 1) gives the position of the \n. The SUM window function sums up all values within a group (your original text). That's why the order is relevant. For example: The first two parts of "abc\nde\nfgh" have the lengths of 3 and 2. So the breaks are at 4 (abc = 3, + 1) and 3 (de = 2, + 1). But the 3 of the second part is no real index, but if you sum up these values you get the right indexes: 4 and 7.
Aggregating these results
If (as in my example) the last char is always a \n and you are only interested in the \n chars the string you could remove the last entry of the aggregated array.
Changed problem in comments below:
Would like to replace \n with spaces. So I am thinking how above query
will look in the Update statement. – Pranav Unde
Replacing the \n by spaces is a quiet different problem then getting indexes for all occurances of a special character. And it's much simpler:
UPDATE problems
SET name = trim(regexp_replace(name, E'\n', ' ', 'g'));
regexp_replace(..., 'g') finds all occurances of \n and does the replacing
trim() removes the whitespaces before and after the string if necessary (maybe because there was a trailing \n as in my example - which was replaced by a space as well in the step before)
demo:db<>fiddle

How to replace substring in Postgres

I want to replace substrings in PostgreSQL.
For example string "ABC_dog" , 'dogABCcat', 'dogABC' to 'XYZ_dog', 'dogXYZcat', 'dogXYZ'
I tried:
UPDATE my_table SET name = regexp_replace( name , '.*ABC.*', '.*XYZ.*', 'g')
but it set new names to '.XYZ.'
The simplest solution would be to use the replace() function:
UPDATE my_table SET name = replace(name , 'ABC', 'XYZ');
Keep in mind, though, that this will replace all rows in your table. Unless most rows have the pattern you want to replace, you are better off testing for the offending sub-string first:
UPDATE my_table SET name = replace(name , 'ABC', 'XYZ')
WHERE position('ABC' in name) > 0;
The pattern '.*' matches everything, so '.ABC.' means match everything before the ABC, the ABC and everything after as well, so effectively the whole string.
Change it to be just ABC as that is the bit you want to replace. Also, remove the .* from the replacement.
UPDATE my_table SET name = regexp_replace( name , 'ABC', 'XYZ', 'g')

How to convert spaced number with comma fraction delimeter in PostgreSQL

Currently looking into way to convert number like:
699 937,57
Into
699937.57
I'm looking into something like
SELECT to_number(column, 'FM99G999G999') from mytable;
But the last example will drop decimal numbers
It's probably easier to simply replace the comma with a dot and then convert that result to a number:
select to_number(replace(the_column, ',', '.'), 'FM999999.99')
from mytable
This however requires you to know the maximum number of digits before the decimal point. Another option would be to remove all whitespace from the string (after replacing the comma) and then cast that to a number:
select regexp_replace(replace(the_column, ',', '.'), '\s+', '', 'g')::numeric
from mytable;
Regular expressions are somewhat expensive, to the second solution is probably slower (but more robust) then the first one.
The following:
with mytable (the_column) as (
values ('699 937,57'), ('123,45'), ('456,789'), ('123 456 789,1234')
)
select regexp_replace(replace(the_column, ',', '.'), '\s+', '', 'g')::numeric
from mytable;
returns:
regexp_replace
--------------
699937.57
123.45
456.789
123456789.1234

PostgreSQL convert a string with commas into an integer

I want to convert a column of type "character varying" that has integers with commas to a regular integer column.
I want to support numbers from '1' to '10,000,000'.
I've tried to use: to_number(fieldname, '999G999G999'), but it only works if the format matches the exact length of the string.
Is there a way to do this that supports from '1' to '10,000,000'?
select replace(fieldname,',','')::numeric ;
To do it the way you originally attempted, which is not advised:
select to_number( fieldname,
regexp_replace( replace(fieldname,',','G') , '[0-9]' ,'9','g')
);
The inner replace changes commas to G. The outer replace changes numbers to 9. This does not factor in decimal or negative numbers.
You can just strip out the commas with the REPLACE() function:
CREATE TABLE Foo
(
Test NUMERIC
);
insert into Foo VALUES (REPLACE('1,234,567', ',', '')::numeric);
select * from Foo; -- Will show 1234567
You can replace the commas by an empty string as suggested, or you could use to_number with the FM prefix, so the query would look like this:
SELECT to_number(my_column, 'FM99G999G999')
There are things to take note:
When using function REPLACE("fieldName", ',', '') on a table, if there are VIEW using the TABLE, that function will not work properly. You must drop the view to use it.

Test for numeric value?

The vendor data we load in our staging table is rather dirty. One column in particular captures number data but 40% of the time has garbage characters or random strings.
I have to create a report that filters out value ranges in that column. So, I tried playing with a combination of replace/translate like so
select replace(translate(upper(str),' ','all possible char'),' ','')
from table
but it fails whenever it encounters a char I did not code. Therefore, the report can never be automated.
Javascript has the isNaN() function to determine whether a value is an illegal number (True if it is and false if not).
How can I do the same thing with DB2?? Do you have any idea?
Thanks in advance.
A fairly reliable (but somewhat hackish) way is to compare the string to its upper- and lower-case self (numbers don't have different cases). As long as your data that is bringing in characters only includes Latin characters, you should be fine:
SELECT input, CASE
WHEN UPPER(input) = LOWER(input) THEN TO_NUMBER(input)
ELSE 0
END AS output
FROM source
Another option would be to use the TRANSLATE function:
SELECT input,
CASE
WHEN TRANSLATE(CAST(input as CHAR(10)), '~~~~~~~~~~~~~', '0123456789-. ') = '~~~~~~~~~~' THEN CAST(input AS DECIMAL(12, 2))
ELSE 0
END AS num
FROM x
WITH x (stringval) AS
(
VALUES ('x2'),(''),('2.2.'),('5-'),('-5-'),('--5'),('.5'),('2 2'),('0.5-'),(' 1 '),('2 '),('3.'),('-4.0')
)
SELECT stringval,
CASE WHEN (
-- Whitespace must not appear in the middle of a number
-- (but trailing and/or leading whitespace is permitted)
RTRIM(LTRIM( stringval )) NOT LIKE '% %'
-- A number cannot start with a decimal point
AND LTRIM( stringval ) NOT LIKE '.%'
-- A negative decimal number must contain at least one digit between
-- the negative sign and the decimal point
AND LTRIM( stringval ) NOT LIKE '-.%'
-- The negative sign may only appear at the beginning of the number
AND LOCATE( '-', LTRIM(stringval)) IN ( 0, 1 )
-- A number must contain at least one digit
AND TRANSLATE( stringval, '0000000000', '123456789') LIKE '%0%'
-- Allow up to one negative sign, followed by up to one decimal point
AND REPLACE(
TRANSLATE( RTRIM(LTRIM(stringval)), '000000000', '123456789'),
'0', '') IN ('','-','.','-.')
)
THEN 'VALID'
ELSE 'INVALID'
END AS stringisvalidnumber
FROM x
;
Check this out:
SELECT Mobile,
TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789') AS FirstPass,
TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~') AS Erroneous,
REPLACE(TRANSLATE(Mobile, '', TRANSLATE(TRANSLATE(Mobile, '~~~~~~~~~~', '0123456789'), '', '~')), ' ', '') AS Corrected
FROM Person WHERE Mobile <> '' FETCH FIRST 100 ROWS ONLY
The table is "Person" and the field that you want to check is "Mobile".
If you work a little bit more on this, you can build an UPDATE to fix the entire table