I'm trying to import a large dataset into Google Fusion Tables using the csv-import function. The data contains Danish æ-ø-å characters. The original encoding of the data seems to be ANSI (or "windows-1252"). Data uploadet in that encoding is not displayed correctly. I've tried to reencode the various strings in most other relevant encodings (Encoding.(Unicode|ASCII|UTF8) etc.) but nothing seems to please Fusion Tables.
I'm using FileHelpers to generate the csv and I have tried explicitly setting the encoding there too, but to no avail.
UTF 8 should work. I had uploaded this table as a test: http://www.google.com/fusiontables/DataSource?dsrcid=276537
Related
I have an agent written in Lotuscript (IBM Domino 9.0.1 - Windows 10) that reads records into a DB2 database and writes them to Notes documents. The table in DB2 (Centos OS) contains international names in Varchar fields such as "Łódź".
The DB2 database was created as UTF-8 CodePage: 1208 and Domino by its nature supports UNICODE. Unfortunately, the value loaded in the notes document is not "Łódź" as it should be but it is "? Ód?".
How can I import special characters from DB2
in Domino NSF DBs in correct ways?
Thank you
To import the table I used the following code taken from OpenNtfs XSnippets:
https://openntf.org/XSnippets.nsf/snippet.xsp?id=db2-run-from-lotusscript-into-notes-form
Find where the codepage conversion is happening. Alter the lotusscript to dump the hex of the received data for the column-concerned to a file or in a dialog-box. If the hex codes differ from what is in the column, then it may be your Db2-client that is using the wrong codepage. Are you aware of the DB2CODEPAGE environment variable for Windows? That might help if it is the Db2-client that is doing the codepage conversion.
i.e setting environment variable DB2CODEPAGE=1208 may help, although careful testing is required to ensure it does not cause other symptoms that are mentioned online.
Looking for good way to load FIXED-Width data into postgres tables. I do this is sas and python not postgres. I guess there is not a native method. The files are a few GB. The one way I have seen does not work on my file for some reason (possibly memory issues). There you load as one large column and then parse into tables. I can use psycopy2 but because of memory issues would rather not. Any ideas or tools that work. Does pgloader work well or are there native methods?
http://www.postgresonline.com/journal/index.php?/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html
Thanks
There's no convenient built-in method to ingest fixed-width tabular data in PostgreSQL. I suggest using a tool like Pentaho Kettle or Talend Studio to do the data-loading, as they're good at consuming many different file formats. I don't remember if pg_bulkload supports fixed-width, but suspect not.
Alternately, you can generally write a simple script with something like Python and the psycopg2 module, loading the fixed-width data row by row and sending that to PostgreSQL. psycopg2's support for the COPY command via copy_from makes this vastly more efficient. I didn't find a convenient fixed-width file reader for Python in a quick search but I'm sure they're out there. You can use whatever language you like anyway - Perl's DBI and DBD::Pg do just as well, and there are millions of fixed-width file reader modules for Perl.
The Python Pandas library has a function pandas.read_fwf which works great.
Data can be read in using python, then written to Postgres database.
I have recently started using PostgreSQL for creating/updating existing SQL databases. Being rather new in this I came across an issue of selecting correct encoding type while creating new database. UTF-8 (default) did not work for me as data to be included is of various languages (English, Chinese, Japanese, Russia etc) as well as includes symbolic characters.
Question: What is the right database encoding type to satisfy my needs.
Any help is highly appreciated.
There are four different encoding settings at play here:
The server side encoding for the database
The client_encoding that the PostgreSQL client announces to the PostgreSQL server. The PostgreSQL server assumes that text coming from the client is in client_encoding and converts it to the server encoding.
The operating system default encoding. This is the default client_encoding set by psql if you don't provide a different one. Other client drivers might have different defaults; eg PgJDBC always uses utf-8.
The encoding of any files or text being sent via the client driver. This is usually the OS default encoding, but it might be a different one - for example, your OS might be set to use utf-8 by default, but you might be trying to COPY some CSV content that was saved as latin-1.
You almost always want the server encoding set to utf-8. It's the rest that you need to change depending on what's appropriate for your situation. You would have to give more detail (exact error messages, file contents, etc) to be able to get help with the details.
I had an application that used a Sybase ASA 8 database. However, the application is not working anymore and the vendor went out of business.
Therefore, I've been trying to extract the data from the database, which has Arabic characters. When I connect to the database and display the contents, Arabic characters do not display correctly; instead, it looks something like ÇáÏãÇã.
Which is incorrect.
I tried to export the data to a text file. Same result. Tried to save the text file with UTF-8 encoding, but to no avail.
I have no idea what collation the tables are set to. Is there a way to export the data correctly, or convert it to the correct encoding?
the problem was solved by exporting the data from the database using "Windows-1252" encoding, and then importing it to other applications with "Windows-1256" encoding.
When you connect to the database, use the CHARSET=UTF-8 connection parameter. That will tell the server to convert the data to UTF-8 before sending it to the client application. Then you can save the data from the client to a file.
This, of course, is assuming that the data was saved with the correct character set to begin with. If it wasn't, you may be out of luck.
We are writing a testing framework from scratch using Perl. Each test case writes a log file and we are planning to archive the resulting log files created by each test case for reporting purposes.
Now we are using PostgreSQL database for storing the results. Now how do I archive the text log file in PostgreSQL database? I googled and found out that bytea datatype can be used to store files in binary format. If I do so how do i retrieve it back as text?.
Any ideas will be appreciated.
If your log files are text files, then you should use the TEXT datatype to store them. If the log files are binary (or, perhaps, compressed text files), then you'd want to use BYTEA. In either case, you can INSERT and SELECT them just like any other column type when using DBI. If they're really large then you might want to play with the LongReadLen DBI parameter and read the DBI manual section on BLOBs.