PostgreSql Russian dict gor fulltext search - postgresql

I tried to add Russian dictionary for fulltext search in postgresql db. I' ve downloaded dict files, converted them to UTF-8 and tried to create new dict
$ iconv -f koi8-r -t utf-8 < ru_RU.aff > /opt/local/share/postgresql93/tsearch_data/russian.affix
$ iconv -f koi8-r -t utf-8 < ru_RU.dic > /opt/local/share/postgresql93/tsearch_data/russian.dict
CREATE TEXT SEARCH DICTIONARY russian_ispell (
TEMPLATE = ispell,
DictFile = russian,
AffFile = russian,
StopWords = russian
);
But got an ERROR:
ERROR: invalid byte sequence for encoding "UTF8": 0xd1
CONTEXT: line 341 of configuration file "/opt/local/share/postgresql93/tsearch_data/russian.affix": "SFX Y хаться шутся хаться"
Then tried with other Russian dicts, but the same error occurred. How can I handle with this error? Thanks.

You can try to execute the following command:
export LC_ALL=C
I think you have a locale issue. This command should be executed in the same command line session where you execute the command for creating dictionary.

Related

pg_dump and pg_restore can't deal with file name contain Arabic fonts

I use Postgres 10, and pg_dump\pg_restore that come with it.
now pg_dump and pg_restore give me an error when trying to dump\restore a file or a path that hold Arabic fonts(didn't test any font except English & Arabic)
Here's the exception for trying to restore a file name that has no English (like Arabic) characters.
.\pg_dump.exe --file "C:\א\TOC.DUMP" --host "localhost" --port "1111" --username "MyUserName" --verbose --format=c --blobs --compress "1" --schema "MySchema" "MyDBName"
System.Exception: pg_restore: [custom archiver] could not open input
file "C:\?\TOC.DUMP":
Invalid argument
And the same Exception when the file path has Arabic Fonts.
Everything works fine when using English Fonts for FileName\FilePath.
So I searched here and on google and PostgreSQL Documentation, and couldn't find a related subject except for the fact that pg_dump\pg_restore have no problem dealing with scripts that hold Arabic & Hebrew Encoding,
but nothing was mentioned about the encoding of the file name itself.
how did I solve it? I didn't.
I couldn't stop shipping my projects because of this issue.
so as a temporary workaround, I prevent the user from using Arabic fonts
with this code down here, it's not a good way of doing things you know.
// C#
BackUpPath = fileDialog.FileName;
var westernLatin = Encoding.GetEncoding(1252);
var arabic = Encoding.GetEncoding(1256);
var bytes = arabic.GetBytes(BackUpPath);
var result = westernLatin.GetString(bytes);
if (result != BackUpPath)
{
// Inform user to use English fonts for the file name and file path.
}
Appreciate any help, thanks.

How to fix "the octet sequence #(130) cannot be decoded." in pgloader

I'm trying to migrate a database from sqlite to postgresql using pgloader.
My sqlite db is data.db, so i try this
pgloader ./var/data.db postgres://***#ec2-54-83-50-174.compute-1.amazonaws.com:5432/mydb?sslmode=require
Output:
pgloader version 3.6.1
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/var/folders/65/x6spw10s4jgd3qkhdq96bk8c0000gn/T/"
KABOOM!
2019-04-11T19:22:47.022000+01:00 NOTICE Starting pgloader, log system is ready.
FATAL error: :UTF-8 stream decoding error on #<SB-SYS:FD-STREAM for "file /Users/mackbookpro/Desktop/dev/www/Beyti/var/data.db" {1005892A93}>: the octet sequence #(130) cannot be decoded.
Date/time: 2019-04-11-18:22An unhandled error condition has been signalled: :UTF-8 stream decoding error on #<SB-SYS:FD-STREAM for "file /Users/mackbookpro/Desktop/dev/www/Beyti/var/data.db" {1005892A93}>: the octet sequence #(130) cannot be decoded.
An idea about this problem? thank you in advance
This is a character encoding issue.
The culprit "octet sequence #(130)" corresponded to "é" in my case, which was encoded as \x82.
iconv failed.
I replaced in the byte stream those corrupted \x82 with \x65 (ascii char "e"), and I got out of it.
<bad_file xxd -c1 -p | sed s/82/65/ | xxd -r -p > good_new_file
(cheers to Natacha on irc freenode #gcu :) )
Edit : French issues? same problem with #133 "à", same solution \x85 -> \x61
Edit 2 : A little generalization I just found :
The "octet sequence" pgloader refers to, is the decimal ranking of the ascii table. When you get higher than 127 in the "octet sequence", you step in the extended ascii table and generate errors.
I just got an issue with #144? It is \x90. replace :)

Convert Character Set from WIN1252 to UTF8 - Firebird 3

I'm facing problems trying to convert a Firebird 3 database with character set WIN1252 to UTF8.
I've performed the following procedures:
Extract the DDL from the database and the definitions, so I created the new database with UTF8 Character Set, Collate UNICODE_CI_AI. The database structure was created correctly.
After when I try using fbcopy to copy data from WIN1252 database to new UTF8 database the process is aborted returning the error:
Message: isc_dsql_execute2 failed
SQL Message: -104
can not format message 13: 896 - message file C: \ WINDOWS \ SYSTEM32 \ firebird.msg not found
Engine Code: 335544849
Engine Message:
Malformed string
Enabling triggers ... done.
Before using the FbCopy tool, I tried to perform the following commands through backup and restore of the WIN1252 database:
-FIX_FSS_D UTF8 -FIX_FSS_M UTF8
or
-FIX_FSS_D WIN1252 -FIX_FSS_M WIN1252
However, I still get the same error.

PostgreSQL - COPY FROM - "unacceptable encoding" error

I get an error when trying to use copy utility to extract data from csv file with UCS-2 LE BOM encoding (as reported by notepad++).
COPY pub.calls (............ )
FROM 'c:\IMPORT\calls.csv'
WITH
DELIMITER ','
HEADER
CSV
ENCODING 'UCS2';
The error is something like this
SQL Error [22023]Error The argument of encoding parameter should be
acceptable encoding name.
UCS-2 gives the same error.
For the list of supported charsets:
https://www.postgresql.org/docs/current/static/multibyte.html
or in psql type \encoding and dbl tab for autocomplete:
postgres=# \encoding
BIG5 EUC_JP GB18030 ISO_8859_6 JOHAB LATIN1 LATIN3 LATIN6 LATIN9 SJIS UTF8 WIN1252 WIN1255 WIN1258
EUC_CN EUC_KR GBK ISO_8859_7 KOI8R LATIN10 LATIN4 LATIN7 MULE_INTERNAL SQL_ASCII WIN1250 WIN1253 WIN1256 WIN866
EUC_JIS_2004 EUC_TW ISO_8859_5 ISO_8859_8 KOI8U LATIN2 LATIN5 LATIN8 SHIFT_JIS_2004 UHC WIN1251 WIN1254 WIN1257 WIN874

Importing csv file into postgres db using pgadmin with special characters

I'm importing data from csv file into postgres db using pgadmin 4
Everything is ok but I get an issue when I try to import a file which contain some data like this
“‘t Zand, Vlotbrug”
“Dussen, `t Middeltje”
as you can see, the data contains
`
and
'
I also tried to import the file with utf-8 encoding but could not.
Anyone knows how to solve this issue?
Updated
Structure:
stop_id,stop_code,stop_name,stop_lat,stop_lon,location_type,parent_station,stop_timezone,wheelchair_boarding,platform_code,zone_id
Data:
stoparea:123953,,"De Zande, 'Koelucht'",52.5184475,5.956368,1,,,0,,
Error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY stops, line 69400: "stoparea:123953,,"De Zande, 'Koelucht'",52.5184475,5.956368,1,,,0,,
stoparea:120536,,"Poortvliet, Zu..."
Updated 2
Command:
"/Applications/pgAdmin 4.app/Contents/SharedSupport/psql" --command "
"\copy transit.stops (stop_id, stop_code, stop_name, stop_lat,
stop_lon, location_type, parent_station, stop_timezone,
wheelchair_boarding, platform_code, zone_id) FROM
'/Users/tvtan/Desktop/gtfs-nl/stops.txt' DELIMITER ',' CSV HEADER
QUOTE '\"' ESCAPE '''';""
UI:
From the command line it looks like you have defined the ESCAPE character as a single quote. The single quote appears in your data but is not escaped.
The default ESCAPE character is the same as the QUOTE character.
More information here