Import CSV file (contains some non-UTF8 characters) in MongoDb - mongodb

How can I import a CSV file that contains some non-UTF8 characters to MongoDB?
I tried a recommended importing code.
mongoimport --db dbname --collection colname --type csv --headerline --file D:/fastfood.xls
Error Message
exception: Invalid UTF8 character detected
I would remove those invalid characters manually, but the size of the data is considerably big.
Tried Google with no success.
PS: mongo -v = 2.4.6
Thanks.
Edit:
BTW, I'm on Win7

In Linux you could use the iconv command as suggested in: How to remove non UTF-8 characters from text file
iconv -f utf8 -t utf8 -c file.txt
I'm not familiar with MongoDB, so I have no insight on how to preserve the invalid characters during import.

For emacs users:
Open CSV file in emacs and change encoding using ‘C-x C-m f’ and choosing utf-8 as the coding system. For more information see ChangingEncodings

You're trying to import an xls file as a csv file. Save the file as csv first, then try again.

Related

CLI option to give encoding format to 'mongoimport '

Does mongoimport cli command support only UTF-8 format files?
Is there a way to provide encoding format so that, it can accept non-utf-8 files, without we manually converting each file to UTF-8?
This is one way of doing it on Linux/Unix. You could use iconv to convert non-utf8 to utf8 and then use mongoimport on the converted file:
iconv -f ISO-8859-1 -t utf-8 myfile.csv > myfileutf8.csv
man iconv should give you more details about options
Also, Import CSV file (contains some non-UTF8 characters) in MongoDb
discusses some options for windows.

Weird key-name with mongoimport

I have a Tab Separated Value file that I need to import in mongodb
I do
mongoimport -d mydb -c blsItem --type tsv --file .\BLS_3.01.txt --fieldFile .\fieldnames-bls.txt
fieldname-bls.txt contains all the keys nicely separated in an UTF-8 file:
blsKey
germanDescription
englishDescription
The result of the import is, that every blsKey starts with glibberish
{ "_id" : ObjectId("4eee82136e6ffebe9085debd"), "´╗┐blsKey" : "B100000", "germanDescription" : "Vollkornbrote", "englishDescription" : ""
But even VIM shows the "fieldname-bls.txt" nice and clean.
What is going on?
It looks like UTF-8 BOM. Convert your file into UTF-8 without BOM, that's it.

Convert pipe delimited csv to tab delimited using batch script

I am trying to write a batch script that will query a Postgres database and output the results to a csv. Currently, it queries the database and saves the output as a pipe delimited csv.
I want the output to be tab delimited rather than pipe delimited, since I will eventually be importing the csv into Access. Does anyone know how this can be achieved?
Current code:
cd C:\Program Files\PostgreSQL\9.1\bin
psql -c "SELECT * from jivedw_day;" -U postgres -A -o sample.csv cscanalytics
postgres = username
cscanalytics = database
You should be using COPY to dump CSV:
psql -c "copy jivedw_day to stdout csv delimiter E'\t'" -o sample.csv -U postgres -d csvanalytics
The delimiter E'\t' part will get you your output with tabs instead of commas as the delimiter. There are other other options as well, please see the documentation for further details.
Using -A like you are just dumps the usual interactive output to sample.csv without the normal padding to making the columns line up, that's why you're seeing the pipes:
-A
--no-align
Switches to unaligned output mode. (The default output mode is otherwise aligned.)

MysqlDump from Powershell and Windows encoding

I'm doing an export from command line on ms-dos with mysqldump:
& mysqldump -u root -p --default-character-set=utf8 -W -B dbname
> C:\mysql_backup.sql
My database/tables are encoded with UTF-8 and I specify the same encoding when I did the dump. But when I open the file with Notepad++ or Scite I see an encoding of UTF-16 (UCS-2). If I don't convert the file with iconv to UTF-8 before running the import I got an error.
It seems that MS-DOS / CMD.exe is redirecting by default with UTF-16. Can I change this ?
A side note: I use Powershell to call mysqldump.
UPDATE: it seems that it occurs only when calling mysqldump from Powershell. I change the command line with the one I use in my PS script
By default PowerShell represents text as Unicode and when you save it to a file it saves as Unicode by default. You can change the file save format by using the Out-File cmdlet instead of the > operator e.g.:
... | Out-File C:\mysql_backup.sql -Encoding UTF8
You may also need to give PowerShell a hint on how to interpret the UTF8 text coming from the dump utiltiy. This blog post shows how to handle this scenario in the event the utility isn't outputting a proper UTF8 BOM.

How to export table as CSV with headings on Postgresql?

I'm trying to export a PostgreSQL table with headings to a CSV file via command line, however I get it to export to CSV file, but without headings.
My code looks as follows:
COPY products_273 to '/tmp/products_199.csv' delimiters',';
COPY products_273 TO '/tmp/products_199.csv' WITH (FORMAT CSV, HEADER);
as described in the manual.
From psql command line:
\COPY my_table TO 'filename' CSV HEADER
no semi-colon at the end.
instead of just table name, you can also write a query for getting only selected column data.
COPY (select id,name from tablename) TO 'filepath/aa.csv' DELIMITER ',' CSV HEADER;
with admin privilege
\COPY (select id,name from tablename) TO 'filepath/aa.csv' DELIMITER ',' CSV HEADER;
When I don't have permission to write a file out from Postgres I find that I can run the query from the command line.
psql -U user -d db_name -c "Copy (Select * From foo_table LIMIT 10) To STDOUT With CSV HEADER DELIMITER ',';" > foo_data.csv
This works
psql dbname -F , --no-align -c "SELECT * FROM TABLE"
The simplest way (using psql) seems to be by using --csv flag:
psql --csv -c "SELECT * FROM products_273" > '/tmp/products_199.csv'
For version 9.5 I use, it would be like this:
COPY products_273 TO '/tmp/products_199.csv' WITH (FORMAT CSV, HEADER);
This solution worked for me using \copy.
psql -h <host> -U <user> -d <dbname> -c "\copy <table_name> FROM '<path to csvfile/file.csv>' with (format csv,header true, delimiter ',');"
Heres how I got it working power shell using pgsl connnect to a Heroku PG database:
I had to first change the client encoding to utf8 like this: \encoding UTF8
Then dumped the data to a CSV file this:
\copy (SELECT * FROM my_table) TO C://wamp64/www/spider/chebi2/dump.csv CSV DELIMITER '~'
I used ~ as the delimiter because I don't like CSV files, I usually use TSV files, but it won't let me add '\t' as the delimiter, so I used ~ because its a rarely used characeter.
The COPY command isn't what is restricted. What is restricted is directing the output from the TO to anywhere except to STDOUT. However, there is no restriction on specifying the output file via the \o command.
\o '/tmp/products_199.csv';
COPY products_273 TO STDOUT WITH (FORMAT CSV, HEADER);
copy (anysql query datawanttoexport) to 'fileablsoutepathwihname' delimiter ',' csv header;
Using this u can export data also.
I am posting this answer because none of the other answers given here actually worked for me. I could not use COPY from within Postgres, because I did not have the correct permissions. So I chose "Export grid rows" and saved the output as UTF-8.
The psql version given by #Brian also did not work for me, for a different reason. The reason it did not work is that apparently the Windows command prompt (I was using Windows) was meddling around with the encoding on its own. I kept getting this error:
ERROR: character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8"
The solution I ended up using was to write a short JDBC script (Java) which read the CSV file and issued insert statements directly into my Postgres table. This worked, but the command prompt also would have worked had it not been altering the encoding.
Try this:
"COPY products_273 FROM '\tmp\products_199.csv' DELIMITER ',' CSV HEADER"
In pgAdmin, highlight your query statement just like when you use F5 to execute and press F9 - this will open the file browser so you can pick where you save your CSV.
If you are using Azure Data Studio, the instruction are here: Azure Data Studio: Save As CSV.
I know this isn't a universal solution, but most of the time you just want to grab the file by hand.