How can I transcode a UTF-8 string to Latin1 with PostgreSQL 13+ ?
I've read this SO thread but the functions convert(), convert_from() and convert_to() no longer exist starting from Postgres 13.
EDIT: the solution is given by Laurenz Albe, who pointed out that the functions still exist. I was only afterwards that I noticed:
Google made me land on the manual for 8.2, for which convert() has a different signature than in version 8.3+
I tried the 8.2 SQL code that resulted in ERROR: syntax error at or near "USING"
I couldn't find the function in the version 13 docs, because:
the function manual has been moved to Binary functions
So the correct SQL should have been:
SELECT convert('text_in_utf8', 'UTF8', 'LATIN1');
convert_from and convert_to still exist, but they cannot convert from text to text because text is always a string in the database encoding. Strings in other encoding can only be stored as bytea.
I cannot guide you any further, because you didn't tell us what problem you are trying to solve.
Related
When I try to write the following date time format to Postgres using Pgadmin
2018-04-18 05:40:28
I get the following error. ERROR: invalid input syntax for type timestamp: "2018-04-18 05:40:28"
CONTEXT: COPY timestamp, line 1, column date: "2018-04-18 05:40:28"
I am trying to write the data using the timestamp format within Postgres.
Any pointers on where I am going wrong would be much appreciated.
Thank you.
My educated guess: you have a leading BOM (byte order mark) in the file that should be removed.
How do I remove  from the beginning of a file?
Or some exotic whitespace or non-printing character that should be removed or replaced.
Trim trailing spaces with PostgreSQL
And the offending character (well, the BOM is not a "character", strictly speaking, it's just mistaken for one) was not copied to the question. That would explain the otherwise contradicting error message.
To test, copy the "2018-04-18 05:40:28" part from the error message and paste it in a pgAdmin SQL editor window (which you seem to be using) and test:
SELECT '"2018-04-18 05:40:28"' = '"2018-04-18 05:40:28"';
---------^ BOM here?
I added a leading BOM to demonstrate in the first string. Type the second string by hand to be sure it's plain ASCII. If you get false, we are on to something here.
But we cannot be too sure, your question is confusing and essential information is missing. Don't use the basic type names timestamp and date as identifiers, for sanity.
When I run:
COPY con (date,kgs)
FROM 'H:Sir\\data\\reporting\\hi.rpt'
WITH DELIMITER ','
CSV HEADER
date AS 'Datum/Uhrzeit'
kgs AS 'Summe'
I get the error:
WARNING: nonstandard use of \\ in a string literal
LINE 2: FROM 'H:Sudhir\\Conair data\\TBreporting\\hi.txt'
^
HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
I've been having this problem for quite a while. Help?
It's not an error, it's just a warning. It has nothing to do with the file content, it's related to a PostgreSQL setting and the COPY command syntax you're using.
You're using PostgreSQL after 8.1 with standard_conforming_strings turned off - either before 9.1 (which defaulted to off) or a newer version with it turned off manually.
This causes backslashes in strings, like bob\ted, get interpreted as escapes, so that string would be bob<tab>ted with a literal tab, as \t is the escape for a tab.
Interpreting strings like this is contrary to the SQL standard, which doesn't have C-style backslash escapes. Years ago the PostgreSQL team decided to switch to the SQL standard way of doing things. For backward compatibility reasons it was done in two stages:
Add the standard_conforming_strings option to use the SQL-standard interpretation of strings, but have it default to off. Issue warnings when using the non-standard PostgreSQL string interpretation. Add a new E'string' style to allow applications to explicitly request escape processing in strings.
A few releases later, turn standard_conforming_strings on by default, once people had updated and fixed the warnings their applications produced. Supposedly.
The escape for \ is \\. So "doubling" the backslashes like you (or the tool you're using) done is correct. PostgreSQL is showing a warning because it doesn't know if when you wrote H:Sir\\data\\reporting\\hi.rpt you meant literally H:Sir\\data\\reporting\\hi.rpt (like the SQL spec says) or H:Sir\data\reporting\hi.rpt (like PostgreSQL used to do, against the standard).
Thus there's nothing wrong with your query. If you want to get rid of the warning, either turn standard_conforming_strings on , or add an explicit E'' to your string.
2012-04-23 16:35:07 PDTWARNING: nonstandard use of \\ in a string literal at character 117
2012-04-23 16:35:07 PDTHINT: Use the escape string syntax for backslashes, e.g., E'\\'.
In recent PostgreSQL, strings are supposed to conform to the standard, which means no escaping -- by default. There are settings to control this:
standard_conforming_strings -- defaults to on since 9.1.
escape_string_warning -- also defaults to on.
Which means a string literal like 'a\nb' is parsed as 4 simple characters. If you want it to parsed with escapes, there is a syntax E'a\nb' which will parse it as the 3 expected characters. See http://www.postgresql.org/docs/current/interactive/runtime-config-compatible.html for a proper explanation.
I suspect (if you are running a PostgreSQL 9.1 or later), that you have standard_conforming_strings = off -- possibly to let legacy queries written using escapes run correctly. The warning is still enabled however, because it's warning you that you're using deprecated syntax.
The proper solution is to fix all your queries to use the E prefix, if you want to get rid of the warning. Assuming of course that the escapes are intentional -- if not, then setting standard_conforming_strings = on is correct.
You're inserting/updating using a value that contains an unescaped back-slash, eg abc\def, when you should escape it like this abc\\def.
Examine/debug your input data to find the problem text
I am getting the following exception:
Caused by: org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding "UTF8" has no equivalent in "WIN1252"
Is there a way to eradicate such characters, either via SQL or programmatically?
(SQL solution should be preferred).
I was thinking of connecting to the DB using WIN1252, but it will give the same problem.
I had a similar issue, and I solved by setting the encoding to UTF8 with \encoding UTF8 in the client before attempting an INSERT INTO foo (SELECT * from bar WHERE x=y);. My client was using WIN1252 encoding but the database was in UTF8, hence the error.
More info is available on the PostgreSQL wiki under Character Set Support (devel docs).
What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.
Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.
Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.
I had a very similar issue. I had a linked server from SQL Server to a PostgreSQL database. Some data I had in the table I was selecting from using an openquery statement had some character that didn't have an equivalent in Win1252. The problem was that the System DSN entry (to be found under the ODBC Data Source Administrator) I had used for the connection was configured to use PostgreSQL ANSI(x64) rather than PostgreSQL Unicode(x64). Creating a new data source with the Unicode support and creating a new modified linked server and refernecing the new linked server in your openquery resolved the issue for me. Happy days.
That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.
I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.
In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.
Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.
I was able to get around it by using Postgres' substring function and selecting that instead:
select substring(comments from 1 for 200) from billing
The comment that the special character started each field was a great help in finally resolving it.
This problem appeared for us around 19/11/2016 with our old Access 97 app accessing a postgresql 9.1 DB.
This was solved by changing the driver to UNICODE instead of ANSI (see plang comment).
Here's what worked for me :
1 enable ad-hoc queries in sp_configure.
2 add ODBC DSN for your linked PostgreSQL server.
3 make sure you have both ANSI and Unicode (x64) drivers (try with both).
4 run query like this below - change UID, server ip, db name and password.
5 just keep the query in last line in postgreSQL format.
EXEC sp_configure 'show advanced options', 1
RECONFIGURE
GO
EXEC sp_configure 'ad hoc distributed queries', 1
RECONFIGURE
GO
SELECT * FROM OPENROWSET('MSDASQL',
'Driver=PostgreSQL Unicode(x64);
uid=loginid;
Server=1.2.3.41;
port=5432;
database=dbname;
pwd=password',
'select * FROM table_name limit 10;')
I have face this issue when my Windows 10 using Mandarin China as default language. This problem has occurred because I did try to import a database with UTF-8. Checking via psql and do "\l", it shows collate and cytpe is Mandarin China.
The solution, reset OS language back to US and re-install PostgreSQL. As the collate back to UTF-8, you can reset back your OS language again.
I write the full context and solution here https://www.yodiw.com/fix-utf8-encoding-win1252-cputf8-postgresql-windows-10/
In our organization, we handle GIS content in different file formats. I need to put these files into a PostGIS database, and that is done using ogr2ogr. The problem is, that the database is UTF8 encoded, and the files might have a different encoding.
I found descriptions of how I can specify the encoding by adding an options parameter to org2ogr, but appearantly it doesn't have an effect.
ogr2ogr -f PostgreSQL PG:"host=localhost user=username dbname=dbname \
password=password options='-c client_encoding=latin1'" sourcefile;
The error I recieve is:
ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "målsætning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe56c73
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "påvirkning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe57669
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
ERROR 1: INSERT command for new feature failed.
ERROR: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
Currently, my source file is a Shape file and I'm pretty sure, that it is Latin1 encoded.
What am I doing wrong here and can you help me?
Kind regards, Casper
Magnus is right and I will discuss the solution here.
I have seen the option to inform PostgreSQL about character encoding, options=’-c client_encoding=xxx’, used many places, but it does not seem to have any effect. If someone knows how this part is working, feel free to elaborate.
Magnus suggested to set the environment variable PGCLIENTENCODING to LATIN1. This can, according to a mailing list I queried, be done by modifying the call to ogr2ogr:
ogr2ogr -–config PGCLIENTENCODING LATIN1 –f PostgreSQL
PG:”host=hostname user=username dbname=databasename password=password” inputfile
This didn’t do anything for me. What worked for me was to, before the call to ogr2ogr, to:
SET PGCLIENTENCODING=LATIN1
It would be great to hear more details from experienced users and I hope it can help others :)
That does sound like it would set the client encoding to LATIN1. Exactly what error do you get?
Just in case ogr2ogr doesn't pass it along properly, you can also try setting the environment variable PGCLIENTENCODING to latin1.
I suggest you double check that they are actually LATIN1. Simply running file on it will give you a good idea, assuming it's actually consistent within the file. You can also try sending it through iconv to convert it to either LATIN1 or UTF8.
You need to write your command line like this :
PGCLIENTENCODING=LATIN1 ogr2ogr -f PostgreSQL PG:"dbname=...
Currently, OGR from GDAL does not perform any recoding of character data during translation between vector formats. The team has prepared RFC 23.1: Unicode support in OGR document which discusses support of recoding for OGR drivers. The RFC 23 was adopted and the core functionality was already released in GDAL 1.6.0. However, most of OGR drivers have not been updated, including Shapefile driver.
For the time being, I would describe OGR as encoding agnostic and ignorant. It means, OGR does take what it gets and sends it out without any processing. OGR uses char type to manipulate textual data. This is fine to handle multi-byte encoded strings (like UTF-8) - it's just a plain stream of bytes stored as array of char elements.
It is advised that developers of OGR drivers should return UTF-8 encoded strings of attribute values, however this rule has not been widely adopted across OGR drivers, thus making this functionality not end-user ready yet.
On windows a command is
SET PGCLIENTENCODING=LATIN1
On linux
export PGCLIENTENCODING=LATIN1
or
PGCLIENTENCODING=LATIN1
Moreover this discussion help me:
https://gis.stackexchange.com/questions/218443/ogr2ogr-encoding-on-windows-using-os4geo-shell-with-census-data
On windows
SET PGCLIENTENCODING=LATIN1 ogr2ogr...
do not give me any error, but ogr2ogr do not works...I need to change the system variable (e.g. System--> Advanced system settings--> Environment variables -->New system variable) reboot the system and then run
ogr2ogr...
I solved this problem using this command:
pg_restore --host localhost --port 5432 --username postgres --dbname {DBNAME} --schema public --verbose "{FILE_PATH to import}"
I don't know if this is the right solution, but it worked for me.
For some reason, I dont know why, I could not import tables with ÅÄÖ in them to the public schema.
When I created a new schema I could import the tables to the new schema.