Encoded values to plain text in PostgreSql - postgresql

I have encoded values in the table, and wanted to select them as a plain text without altering the database property,
I have tried many ways, but not getting exact result.
Value - Er tręäńgt keinen Manteł
Expected - Er treangt keinen Mantel
SELECT 'Er tręäńgt keinen Manteł', 'Er tręäńgt keinen Manteł'::bytea;
SELECT 'Er tręäńgt keinen Manteł', convert_to('Er tręäńgt keinen Manteł', 'utf-8');
Any suggestion will be helpful

You may use the unaccent function of the unaccent extension.
First create the extension (if not there already). It is included in the EDB installation package.
CREATE EXTENSION unaccent;
and then remove the diacritics
SELECT
'Er tręäńgt keinen Manteł' as accented,
unaccent('Er tręäńgt keinen Manteł') as unaccented;
accented
unaccented
Er tręäńgt keinen Manteł
Er treangt keinen Mantel
I assume that the original character set is UTF8.

Related

Translate greek characters in PostgreSQL full-text search

I would like to translate greek characters to their common latin equivalents for the purpose of full-text search.
Consider the following:
SELECT
to_tsvector('english', 'α-decay') ## to_tsquery('α & decay') AS greek_greek,
to_tsvector('english', 'α-decay') ## to_tsquery('a & decay') AS greek_latin_short,
to_tsvector('english', 'α-decay') ## to_tsquery('alpha & decay') AS greek_latin_long;
greek_greek | greek_latin_short | greek_latin_long
-------------+-------------------+------------------
t | t | f
(1 row)
The long version does not match, but users expecting these symbols might type in alpha or beta instead of α and β. Is there a pre-defined dictionary which would automatically turn α into both 'a' and 'alpha'? If not, how can I make one? Or is there a better way altogether?
You'd have to use a synonym dictionary with a synonym file like:
α alpha
β beta
ɣ gamma
...

How can I ignore accented characters when string matching in SPARQL

I have no idea to how to compare different labels without taking accents into account.
The next query doesn't return the place because "Ibáñez" has accents in Spanish DBpedia, but it has different accents in my data source.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT DISTINCT ?iri
WHERE {
?iri rdfs:label ?label .
?label bif:contains "'Blasco Ibañez'" .
?iri ?location ?city .
FILTER (?location = <http://dbpedia.org/ontology/location> || <http://dbpedia.org/ontology/wikiPageWikiLink>) .
?city bif:contains "valencia"
} LIMIT 100
Is there a way to not to take account of the accents?
The issue is the current configuration of the Spanish DBpedia endpoint. (You may find the query I used to check their configuration interesting.)
Their virtuoso.ini must be adjusted to include --
[I18N]
XAnyNormalization=3
-- as described in the documentation of the INI file, and as further discussed in the article about "normalization of UNICODE3 accented chars in free-text index and queries", as cited in comments by #StanislavKralin.
(Note -- as of this writing, there's a typo in the doc; the section about "WideFileNames = 1/2/3/0" should say it's about "XAnyNormalization = 1/2/3/0")

Replacing nonbreaking spaces (%A0) in Postgres

I've got some values in a varchar column that are separated by nonbreaking spaces (urlencoded %A0 instead of %20). I'm trying to replace them with spaces, but can't seem to get the syntax right:
select regexp_replace('hello world', E'\xa0', ' ')
What is the correct way to encode the character in a Postgres regexp_replace function? Or, is there a better way to do the replacement?
Replacing '\xa0' didn't work for me, possibly because my strings were in UTF-8 rather than Latin1 or other where the character is encoded directly as A0. (U+A0 is encoded with bytes C2 A0 in UTF-8)
I found it more practical to replace it as a code point (U+A0) rather than as the encoded bytes (C2 A0 or A0):
select replace('456321 ', E'\u00a0', '') -- value is E'456321\u00a0'
This may help you
select replace('Hello world', '\xa0', '')
Ref Postgresql (Current) Section 9.4. String Functions and Operators

Postgres Queries Work on 9.0 but not on 9.2

I'm trying this query on my postgres 9.2
SELECT ar.nome_defensor, count(*)
FROM sirdp.atividade_realizadas ar
INNER JOIN sirdp.naturezas n on n.id = ar.natureza_id
INNER JOIN sirdp.atividades at on at.id = n.atividade_id
WHERE ar.data_atividade between '01/08/2011' and '31/08/2014'
and ar.local_atuacao_defensor in ('1ª Vara de Acara\303\272')
group by ar.nome_defensor
order by ar.nome_defensor
It don't work, but on 9.0 it works.
I think it has something with the parameter: 1ª Vara de Acara\303\272 because the problem is with accented words.
Both database have this config:
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'pt_BR.UTF-8'
LC_CTYPE = 'pt_BR.UTF-8'
CONNECTION LIMIT = -1;
I think it has something with the parameter: 1ª Vara de Acara\303\272
because the problem is with accented words.
Yes. Postgresql 9.0 and before had the configuration parameter standard_conforming_strings set to OFF by default, which means that this string literal:
'1ª Vara de Acara\303\272'
was interpreted in the context of UTF-8 encoding as: 1ª Vara de Acaraú
Since PostgreSQL 9.1, this standard_conforming_strings has been turned to ON by default, so now the backslash is interpreted as just a backslash. This is explained in the documentation:
standard_conforming_strings (boolean)
This controls whether ordinary string literals ('...') treat backslashes literally, as specified in the SQL standard. Beginning in
PostgreSQL 9.1, the default is on (prior releases defaulted to off).
Applications can check this parameter to determine how string literals
will be processed. The presence of this parameter can also be taken as
an indication that the escape string syntax (E'...') is supported.
Escape string syntax (Section 4.1.2.2) should be used if an
application desires backslashes to be treated as escape characters.
You may get away with this by either:
Using directly 1ª Vara de Acaraú in the query. After all this, ª character is not in the US-ASCII range already, so your method to submit the query already supports accents.
Using E'1ª Vara de Acara\303\272' in the query.
Not changing the query, but setting standard_conforming_strings to OFF to switch back to the behavior of PostgreSQL 9.0 and previous versions.

How to update a record with literal percent literal (%) in PostgreSQL without saving it as "\%"

I need to update a record, which contains literal percent signs, using PostgreSQL in Railo. The query looks like
<cfquery>
update foo set bar = 'string with % in it %'
</cfQuery>
It throws error as ColdFusion normally interprets it as a wildcard character. I can escape it using the following query.
<cfquery>
update foo set bar = 'string with escaped \% in it \%'
</cfQuery>
However, the record now contains "\%" in the database and will be displayed on the page as "\%".
I found a documentation with an example of escaping percent sign in a SELECT. But it does not work for me: syntax error at or near "ESCAPE".
SELECT emp_discount
FROM Benefits
WHERE emp_discount LIKE '10\%'
ESCAPE '\';
Is there a better to achieve the same goal? The underlining database is PostgreSQL. Thanks!
Queryparameters escape special characters. Yet another reason to use them.