MongoDB - import csv - mongodb

I am importing a csv from command prompt using mongoimport statement.
Some of the description fields in my csv contains blank lines in their value, which is breaking when I am importing the csv into mongoDB. (Please note - When I am viewing the same csv in excel, it is coming out to be perfect)
There is a message on command prompt when I run mongoimport command, as follows:
"CSV file ends while inside quoted field".
It did not solve my problem either.
Here is an example that defines my CSV
Input CSV format
Column1,Column2,Column3,Column4
Values:
Val1,Val2,Val3
Val1,"abcdsc \n \n \n some text",Val3
Please advise how can I proceed further

try mongoimport manual
--ignoreBlanks
In csv and tsv exports, ignore empty fields. If not specified, mongoimport creates fields without values in imported documents.

Related

How to import a CSV file with a field having leading zero values into MongoDB?

MongoDB seems to have trouble importing data,
I tried to import csv file with values having leading zeros enclosed in double quotes but once imported the leading zeros gets removed.
Sample CSV File
duns_number,company_name,type,country,original_company_name,search_key,cik,last_scraped_date
"194946757","Carbonite, Inc.",Public,United States,,,"0001340127",
"116670284",SolarWinds Corp,Public,United States,SolarWinds Corporation,,"0001739942",
Sample Import Script
mongoimport --type csv -d ds_dnb -c ds_company_masterlist --headerline --drop ""
Is there any other way of forcing it to just import the data as all text?

mongoimport: set type for all fields when importing CSV

I have multiple problems with importing a CSV with mongoimport that has a headerline.
Following is the case:
I have a big CSV file with the names of the fields in the first line.
I know you can set this line to use as field names with: --headerline.
I want all field types to be strings, but mongoimport sets the types automatically to what it looks like.
IDs such as 0001 will be turn into 1, which can have bad side effects.
Unfortunately, there is (as far as i know) no way of setting them as string with a single command, but by naming each field and setting it type with
--columnsHaveTypes --fields "name.string(), ... "
When I did that, the next problem appeared.
The headerline (with all field names) got imported as values in a separate document.
So basically, my questions are:
Is there a way of setting all field types as string using the --headerline command ?
Alternative, is there a way to ignore the first line ?
I had this problem when uploading 41 million record CSV file into mongodb.
./mongoimport -d testdb -c testcollection --type csv --columnsHaveTypes -f
"RECEIVEDDATE.date(2006-01-02 15:04:05)" --file location/test.csv
As above we have a command to upload file with data types called '-f' or '--fields' but when we use this command to the file that contain header line, mondodb upload first row as well i.e header lines row then its leads error 'cannot convert to datatype' or upload column names also as data set.
Unfortunately we cannot use '--headerline' command instead of '--fields'.
Here the solutions that I found for this problem.
1)Remove header column and upload using '--fields' command as above command. if you re use linux environment you can use below command to remove first row of the huge file i.e header line.it took 2-3 mints for me.(depending on the machine performance)
sed -i -e "1d" location/test.csv
2)upload the file using '--headerline' command then mongodb uploads the file with its default identified data types.Then open mongodb shell command use testdb then run javascript command that get each record and change it into specific data types.But if you have huge file this will takes time.
found this solution from stackoverflow
db.testcollection.find().forEach( function (x) {
x.RECEIVEDDATE = new Date(x.RECEIVEDDATE ); db.testcollection .save(x);});
If you wanna remove the unnecessary rows that not fit to data type use below command.
mongodb document
'--parseGrace skipRow'
I found a solution, that I am comfortable with
Basically, I wanted to use mongoimport within my Clojure Code to import a CSV file in the DB and do a lot of stuff with it automatically. Due to the above mentioned problems I had to find a workaround, to delete this wrong document.
I did following to "solve" this problem:
To set the types as I wanted, I wrote a function to read the first line, put it in a vector and then used String concatenation to set these as fields.
Turning this: id,name,age,hometown,street
into this: id.string(),name.string(),age.string() etc
Then I used the values from the vector to identify the document with
{ name : "name"
age : "age"
etc : "etc" }
and then deleted it with a simple remving.find() command.
Hope this helps any dealing with the same kind of problem.
https://docs.mongodb.com/manual/reference/program/mongoimport/#example-csv-import-types reads:
MongoDB 3.4 added support for specifying field types. Specify field names and types in the form .() using --fields, --fieldFile, or --headerline.
so your first line within the csv file should have names with types. e.g.:
name.string(), ...
and the mongoimport parameters
--columnsHaveTypes --headerline --file <filename.csv>
As to the question of how to remove the first line, you can employ pipes. mongoimport reads from STDIN if no --file option passed. E.g.:
tail -n+2 <filename.csv> | mongoimport --columnsHaveTypes --fields "name.string(), ... "

PostgreSQL Include special characters in copy/import

This is a very quick question on using the postgresql copy feature to import a csv file.
If I have a row with data such as
random, 1689, rnd\\168
how do I include the special characters \ so that it appears in the db as
random
1689
rnd\\168
What about simply using the copy command?
copy my_table from 'full_csv_file_path' with CSV delimiter ',';
And the CSV file is containing:
random,1689,rnd\\168

Semi-Colon Delimiter in Mongoimport

I have been trying to import several csv files into MongoDB using the Mongoimport tool. The thing is that despite what the name says in several countries the csv files are saved with semi-colons instead of commas making me unable to use the mongoimport tool properly.
There are some workarounds for this by changing the delimiter option in the region settings, however for several reasons I don't have access to the machine that generates this csv files so I can't do that.
I was wondering is there any way to import this csv files using the mongo tools instead of me having to write something to replace all the semi-colons on a file with commas? Since I find pretty strange mongo overlooking that in some countries semi-colons are used.
mongodb supports tsv then we should replace ";" by "\t" :
tr ";" "\t" < file.csv | mongoimport --type tsv ...
It looks like this is not supported,I can't find the option to specify a delimiter among the allowed arguments for 'mongoimport' on document page http://docs.mongodb.org/manual/reference/program/mongoimport/#bin.mongoimport .
You can file a feature request on jira if it's something you'd like to
see supported.

Postgresql: CSV export with escaped linebreaks

I exported some data from a postgresql database using (all) the instruction(s) posted here: Save PL/pgSQL output from PostgreSQL to a CSV file
But some exported fields contains newlines (linebreaks), so I got a CSV file like:
header1;header2;header3
foobar;some value;other value
just another value;f*** value;value with
newline
nextvalue;nextvalue2;nextvalue3
How can I escape (or ignore) theese newline character(s)?
Line breaks are supported in CSV if the fields that contain them are enclosed in double quotes.
So if you had this in the middle of the file:
just another value;f*** value;"value with
newline"
it will be taken as 1 line of data spread on 2 lines with 3 fields and just work.
On the other hand, without the double quotes, it's an invalid CSV file (when it advertises 3 fields).
Although there's no formal specification for the CSV format, you may look at RFC 4180 for the rules that generally apply.