Postgresql: Remove hyphens and whitespaces

Postgresql: Remove hyphens and whitespaces - postgresql

I am currently working on DB data which contains whitespaces and hyphens. I searched over the net and found this Remove/replace special characters in column values? . I tried to follow the answer but I am still getting hyphens. I tried playing around with it, I can only remove the whitespace
conn_p = p.connect("dbname='p_test' user='postgres' password='postgres' host='localhost'")
conn_t = p.connect("dbname='t_mig1' user='postgres' password='postgres' host='localhost'")
cur_p = conn_p.cursor()
cur_t = conn_t.cursor()
cur_t.execute("SELECT CAST(REGEXP_REPLACE(studentnumber, ' ', '') as integer), firstname, middlename, lastname FROM sprofile")
rows = cur_t.fetchall()
for row in rows:
print "Inserting ", row[0], row[1], row[2], row[3]
cur_p.execute(""" INSERT INTO "a_recipient" (id, first_name, middle_name, last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0], row[1], row[2], row[3]))
cur_p.commit()
cur_pl.close()
cur_t.close()
What I would like to achieve is if I got a studentnumber of 001-2012-1456, it will be displayed as 000120121456.

To wipe out all characters in a set efficiently use translate. It takes a set of characters to translate into another set of characters. If the other set is empty it deletes them.
test=> select translate('001-2012-145 6', '- ', '');
translate
-------------
00120121456
While translate is simpler and faster for this particular job, it's important to know how to use regexes for others. To do it with regexp_replace there's two changes you need to make.
First, you have to match the set of - and as [- ].
Then, you have to specify to replace all occurrences, otherwise it will stop after the first one. That's done with the g flag.
test=> select regexp_replace('001-2012-145 6', '[- ]', '', 'g');
regexp_replace
----------------
00120121456
Here's a tutorial on POSIX regular expressions and character sets.

Its very simple to use inbuilt translate function.
Example:
select translate('001-2012-145 6', '- ', '');
Output of above command :
00120121456

Related

Postgres: Retrieve first n words from column

I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.

You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
'^\s*((?:\w+\s+){9}\w+)'
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

How to find/replace weird whitespace in string

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?

First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY ##SPID)
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')
Result:
This|is|a|string|containing|1|number|and|some|words.

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.

demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).

One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle

RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Truncating leading zero from the string in postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance

assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g

What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.

TRIM not works with lines and tabs of a xpath in PostgreSQL?

With this query
SELECT trim(title) FROM (
SELECT
unnest( xpath('//p[#class="secTitle1"]', xmlText )::varchar[] ) AS title
FROM t1
) as t2
and XML input text with lines and spaces,
<root>
...
<p class="x">
text text
text text
</p><p> ...</p>
...
</root>
The trim() have no effect (!). It is a PostgreSQL bug? How to apply fn:normalize-space() with the XPath? I need something like "WHERE title is not null"? (Oracle is simpler...) How to do this simple query with PostreSQL?
Workaround
I need a well-configured build-in function, not a workaround... But I need to work and to show results, so I am using regular expression...
SELECT id, TRIM(regexp_replace(tit, E'[\\n\\r\\t ]+', ' ', 'g')) AS tit
FROM (
SELECT
id, -- xpath returns array of 1, 2, or more strings
unnest( xpath('//p[#class="secTitle1"]', texto )::VARCHAR[] ) AS tit
FROM t
) AS tmp
So, a "only simple space trim" is not friendly, not util (!).
EDIT after #mu comment
I try
SELECT id, TRIM(tit, E'\\n\\r\\t') AS tit
and
SELECT id, TRIM(tit, '\n\r\t') AS tit
both NOT WORKs.
QUESTION REMAINS:
there are no TRIM-option or postgresql configuration to say to TRIM work as it is required?
can I use normalize-space() at xpath? How?
I am using PostgreSQL 9.1, need to upgrade?

It works in 9.2, and it works on 8.4 too.
postgres=# select trim(unnest(string_to_array(e'\t\tHello\n\t\tHello\n\t\tHello', e'\n')), e'\t');
btrim
-------
Hello
Hello
Hello
(3 rows)
your regexp replace any char \n or \r or \t, but trim working with string "\n\r\t". It has different meaning than you expect.