When I load data from PostgreSQL into Stata some of the data has unexpected characters appended. How can I avoid this?
Here is the Stata code I am using:
odbc query mydatabase, schema $odbc
odbc load, exec("SELECT * FROM my_table") $odbc allstring
Here is an example of the output I see:
198734/0 one/0/r April/0/0/0
893476/0 two/0/r May/0/0/0
324192/0 three/0/r June/0/0/0
In Postgres the data is:
198734 one April
893476 two May
324192 three June
I see this in mostly in larger tables and with fields of all datatypes in PostgreSQL. If I export the data to a csv there are no trailing characters.
The odbci.ini file I am using looks like this:
[ODBC Data Sources]
mydatabase = PostgreSQL
[mydatabase]
Debug = 1
CommLog = 1
ReadOnly = no
Driver = /usr/lib64/psqlodbcw.so
Servername = myserver
Servertype = postgres
FetchBufferSize = 99
Port = 5432
Database = mydatabase
[Default]
Driver = /usr/lib64/psqlodbcw.so
I am using odbc version unixODBC 2.3.1 and PostgreSQL version 9.4.9 with server encoding UTF8 and Stata version 14.1.
What is causing the unexpected characters in the data imported into Stata? I know that I can clean the data once it’s in Stata but I would like to avoid this.
I was able to fix this by adding the line
set odbcdriver ansi
to the Stata code.
Related
It is a cosmic age problem, I am getting data from MySQL (Latin1) to Postgres (UTF8) and getting the invalid byte error.
My setup for all solutions:
additiona jdbc parameter for Postgres: "characterEncoding=utf8"
tDBRow_1: "SET NAMES 'utf8'"
And yes, I've checked Stack on the matter. So far nothing worked.
Options tried:
Only - "SET NAMES 'utf8'"
convert(cast(convert(data using latin1) as binary) using utf8) as data - in iot SQL query
CONVERT(CAST(data as BINARY) USING utf8) as data - in iot SQL query
CAST(CONVERT(data USING utf8) AS binary) - in iot SQL query
trim(both CHAR(0x00) from data) - in iot SQL query
row1.data.replace("\x00", " ") - in tMap
data.replace('\0', ' ') - in tJava
data.replaceAll("\0", "") - in tJava
What is left:
-change additional params in target to: noDatetimeStringSync=true&characterEncoding=utf8
-change additional params in target to: useOldUTF8Behavior=true
-change tDBRow_1 to SET CLIENT_ENCODING TO utf8
But I run out of the ideas at the moment, so as the Internet.
I have created a binary field in Odoo-10 that is supposed to store CSV file on server. But when I am checking it's table at postgres instead of getting binary data in that column, I am getting something like this
<memory at 0x7f1539393648>
Where is my binary file getting stored exactly?
my odoo-version is 10.
I am also trying to migrate table from openerp-6 to Odoo-10, the column that stores the CSV binary has okay data at postgres table for version-6, But when I migrate that table, CSV binary column contains this "memory at 0x7f1539393648" again at table in version-10
Where I am making the mess. Help appreciated.
Binary data storing has shifted out of database into normal storage on the filesystem around Odoo 7 or 8 as default.
You can find the files under (from odoo/odoo/tools/appdirs.py):
Typical user data directories are:
Mac OS X: ~/Library/Application Support/<AppName>
Unix: ~/.local/share/<AppName> # or in $XDG_DATA_HOME, if defined
Win XP (not roaming): C:\Documents and Settings\<username>\Application Data\<AppAuthor>\<AppName>
Win XP (roaming): C:\Documents and Settings\<username>\Local Settings\Application Data\<AppAuthor>\<AppName>
Win 7 (not roaming): C:\Users\<username>\AppData\Local\<AppAuthor>\<AppName>
Win 7 (roaming): C:\Users\<username>\AppData\Roaming\<AppAuthor>\<AppName>
If you have set a value data_dir in your Odoo server config, the files can be found there.
Desciption of the issue is given at link. It seems that it is a PostgreSQL bug only. To resolve this issue, there seems to be only a single workaround, which is to create a list of locale (map) with key as <Language>_<Country>.<CodePage>and value as <Language>, <Country>.
For example:
English_United States.1252 = English, United States
...
Since value of parameter --locale is accepted in the format of <Language>, <Country>whereas output of command SHOW LC_COLLATE is in the format of <Language>_<Country>.<CodePage>. So, during an ugrade I will get the value of lc_collate cmd and get the corresponding value from the list and provide it during PostgreSQL 9.5 installation.
How do I convert <Language>_<Country>.<CodePage> to appropriate format for successful installation ?
I am using UTF8 as encoding for my Postgres 8.4.11 database:
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = mydata
LC_COLLATE = 'de_DE.UTF-8'
LC_CTYPE = 'de_DE.UTF-8'
CONNECTION LIMIT = -1;
ALTER DATABASE test SET default_tablespace='mydata';
ALTER DATABASE test SET temp_tablespaces=mydata;
And the output of \l
test | postgres | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
When I try to insert a German character:
create table x(a text);
insert into x values('ä,ß,ö');
ERROR: invalid byte sequence for encoding "UTF8": 0xe42cdf
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
I am using puTTY to connect. Any idea?
The key element is the client_encoding - the encoding the server expects from your client. It has to match what is actually sent. What do you get for show client_encoding? Is it UNICODE?
Read more in the chapter Automatic Character Set Conversion Between Server and Client of the manual.
If you are using psql as client, you can set client_encoding with \encoding. Check the encoding your local system users (on Linux type locale in the shell) and set a matching client_encoding in psql. You can avoid such complications if you use the same locale on your system as you use as encoding for your PostgreSQL server.
If you use puTTY (on Windows), make sure to set its "Translation" accordingly. Have a look at Settings: Window - Translation. Must match client_encoding. You can right-click in a running session and chose Change Settings. You can also save these settings with your saved sessions.
I am trying to update a column:
update t_references
set reference = 'Stöcker W, et al. Autoimmunity to Pancreatic Juice in Crohn’s Disease.
Results of an Autoantibody Screening in Patients With Chronic Inflammatory Bowel Disease. <i>Scand J Gastroenterol Suppl</i>. 1987;139:41-52.'
,index = 9
where reference_id = 161;
I got error:
The query could not be converted to the required encoding.
Please advise.
I had to login to the machine and then ran this from the command.
On experiencing the same error, I found I could recreate it when opening a .SQL file saved from PGAdmin in an editor(even Notepad) and then copying and pasting the contents into PGAdmin. When I opened the file directly with PGAdmin, there was no issue.
HTH