How to use case insensitive pattern matching with PostgreSQL and Umlauts? - postgresql

I'm trying to get PostgreSQL 8.4.3 to do case insensitive pattern matching with its ~* operator when the strings contain non-ASCII characters like German umlauts. The database, terminal, and everything else is configured to use UTF-8.
Here's the problem in a nutshell:
SELECT 'Ö' ~* 'ö'; -- false
There are other variants which do work:
SELECT 'Ö' ILIKE 'ö'; -- true
SELECT 'Ö' ~* '[Öö]'; -- true
SELECT LOWER('Ö') ~* 'ö'; -- true
None of these alternatives make me especially happy. ILIKE doesn't use regular expressions. [Öö] involves rewriting the search term. LOWER() is probably the best workaround, but I'd really like to get the ~* operator working like it's supposed to.
Thanks in advance.

This is a bug in PostgreSQL versions prior to 9.0.
It's in the 9.0 changelog: http://www.postgresql.org/docs/9.0/static/release-9-0.html#AEN99075
Here is my test in 9.0 beta2 using Ubuntu:
SELECT 'Ö' ~* 'ö';
?column?
----------
t
(1 row)

I get true with this query:
SELECT 'Ö' ~* 'ö'; -- true
But I did use version 9.0beta2 at OS X 10.5.8 with these settings:
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'nl_NL.UTF-8'
LC_CTYPE = 'nl_NL.UTF-8'
CONNECTION LIMIT = -1;
Edit: Same result on version 8.3.7. Looks like you have a problem with the encoding.

Related

How to get a "€" (u+20AC) character in a postgres UTF8 client encoding?

I only found some strange results online where somebody tried select E'\x020AC', select E'\x020\x0AC' or select E'\x0AC\x020' but none worked.
So I had to search and read more carefully in the pg docs and found the solution:
select U&'\20AC' -- => "€"
select E'\u20AC' -- => "€"

Postgres Queries Work on 9.0 but not on 9.2

I'm trying this query on my postgres 9.2
SELECT ar.nome_defensor, count(*)
FROM sirdp.atividade_realizadas ar
INNER JOIN sirdp.naturezas n on n.id = ar.natureza_id
INNER JOIN sirdp.atividades at on at.id = n.atividade_id
WHERE ar.data_atividade between '01/08/2011' and '31/08/2014'
and ar.local_atuacao_defensor in ('1ª Vara de Acara\303\272')
group by ar.nome_defensor
order by ar.nome_defensor
It don't work, but on 9.0 it works.
I think it has something with the parameter: 1ª Vara de Acara\303\272 because the problem is with accented words.
Both database have this config:
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'pt_BR.UTF-8'
LC_CTYPE = 'pt_BR.UTF-8'
CONNECTION LIMIT = -1;
I think it has something with the parameter: 1ª Vara de Acara\303\272
because the problem is with accented words.
Yes. Postgresql 9.0 and before had the configuration parameter standard_conforming_strings set to OFF by default, which means that this string literal:
'1ª Vara de Acara\303\272'
was interpreted in the context of UTF-8 encoding as: 1ª Vara de Acaraú
Since PostgreSQL 9.1, this standard_conforming_strings has been turned to ON by default, so now the backslash is interpreted as just a backslash. This is explained in the documentation:
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes literally, as specified in the SQL standard. Beginning in
PostgreSQL 9.1, the default is on (prior releases defaulted to off).
Applications can check this parameter to determine how string literals
will be processed. The presence of this parameter can also be taken as
an indication that the escape string syntax (E'...') is supported.
Escape string syntax (Section 4.1.2.2) should be used if an
application desires backslashes to be treated as escape characters.
You may get away with this by either:
Using directly 1ª Vara de Acaraú in the query. After all this, ª character is not in the US-ASCII range already, so your method to submit the query already supports accents.
Using E'1ª Vara de Acara\303\272' in the query.
Not changing the query, but setting standard_conforming_strings to OFF to switch back to the behavior of PostgreSQL 9.0 and previous versions.

PostgreSQL 9.0 replace function not working for one character

im working with PostgreSQl 9.0
and i have a table from which i need to replace a character with ''(blank space)
for that im using
update species set engname = replace(engname, '', '');
(this is the query image)
(image is posted)
in the case species is the table and engname is the field(character varying)..
the contens of one of the row is
" -tellifer fÂÂrthii"
even after firing the query the character is not replaced.
i have tried with
update species set sciname = regexp_replace(sciname, '', '')
but the character doesnot get replace
my database is
CREATE DATABASE myDB
WITH OWNER = Myadmin
ENCODING = 'SQL_ASCII'
TABLESPACE = pg_default
LC_COLLATE = 'C'
LC_CTYPE = 'C'
CONNECTION LIMIT = -1;
We are planning to move to UTF-8 encoding but during conversion with iconv the conversion fails because of this
so i wanted to replace the character with..
can anyone tell me how to remove that character?
this symbol can be used for more characters - so you cannot to use replace. Probably your client application uses a different encoding than database. Symbol is used to signalisation broken encoding.
Solution is using correct encoding
postgres=# select * from ff;
a
───────────────
žluťoučký kůň
(1 row)
postgres=# set client_encoding to 'latin2'; --setting wrong encoding
SET
postgres=# select * from ff; -- and you can see strange symbols
a
───────────────
�lu�ou�k� k�
(1 row)
postgres=# set client_encoding to 'utf8'; -- setting good encoding
SET
postgres=# select * from ff;
a
───────────────
žluťoučký kůň
(1 row)
Other solution is replacing national or special chars by related ascii characters
9.x has unaccent contrib module for utf or for some 8bites encoding there is function to_ascii()

postgres full text search like operator

what is the query to use to search for text that matches the like operator.
I am asking about full text search query, which is of the form
SELECT * FROM eventlogging WHERE description_tsv ## plainto_tsquery('mess');
I want results with "message" as well but right not it does not return anything
If I read the description in the manual correctly, you'll need to use to_tsquery() instead of plainto_tsquery together with a wildcard to allow prefix matching:
SELECT *
FROM eventlogging
WHERE description_tsv ## to_tsquery('mess:*');
You can use LIKE for exact case matches or ILIKE for case insensitive. Use the % as your wildcard.
This will match anything with 'SomeThing' in it (case sensitive).
SELECT * FROM table WHERE name LIKE '%SomeThing%'
This will match anything ending with "ing" (case insensitive).
SELECT * FROM table WHERE name ILIKE '%ing'

How can I do a accent insensitive search in Postgres 8.3.x with a DB in utf-8?

Tried select to_ascii('capo','LATIN1'), to_ascii('çapo','LATIN1') and the results are different....
Look here.
CREATE FUNCTION to_ascii(bytea, name)
RETURNS text STRICT AS 'to_ascii_encname' LANGUAGE internal;
and then just use it like this:
SELECT to_ascii(convert_to('Übermeier', 'latin1'), 'latin1');