mongoimport with CSV: --headerline does not convert first row to fields - mongodb

When I try to import a .csv file into a (non-existent) MongoDB collection, the first line is not correctly converted to fields.
Instead, I get one new field with all the field names in it. In that field, all data is stored.
Example CSV:
product;type
Apple;Fruit
Pizza;Italian
Coffee;Drink
The command I use:
mongoimport -d db -c collection --type csv --headerline --file ./import.csv
The result I get for 1 row:
{
"_id": ObjectID("56a89c5f3ea2a256f0da7acf"),
"product;type": "Coffee;Drink"
}
Does anyone know whats wrong here?

CSV stands for coma-separated values: https://docs.mongodb.org/manual/reference/glossary/#term-csv
Not semicolon-separated ones. Preprocess your import.csv with something like
sed -import.bak "s/;/,/g" import.csv

Related

Import from .tsv with provided index

I would like to import data to a MongoDb document from a .tsv file using the record _id as defined in my file.
How would I go about using the _id as specified in my .tsv, specifying the MongoDb should use the provided _id rather than generating its own?
Example data set:
student firstName lastName
ab867499 example student
I want MongoDb to use the student column as _id rather than generate its own object_id as the key.
Here is what you can do:
mongoimport --db <your_db_name> --collection <your_collection_name> --type tsv --file <path_to_file> --fields _id,firstName,lastName
In this case you will want to make sure that the first line of your file does not contain the header row or simply drop the imported document for the header row after the import.
Also, make sure you have a line break at the end of your last line of data in your file since mongoimport will skip this last record otherwise.

Comma in a field of a CSV file is causing mongoimport to read one field as multiple values

I am using this command to insert all the values from file_name.csv to a MongoDb data base.
cat file_name.csv | awk -F',' 'BEGIN{OFS=","} {print $1,$2,$3,$8,$9,$10}' | mongoimport --type csv --db test --collection test2 --headerline
The problem is one of the field's value,in file_name.csv, has variable number of commas in it too. So, when I try to fetch the required data using '$1,$2,$3,$8,$9,$10', the field with commas in it is getting split on comma and is being read multiple times.
Is there a way to read the complete value as one and not multiple values ?
And i would prefer the answer that uses mongoimport, thanks !

How do I use mongoexport to export all records in a collection to a CSV file

I am trying to export data to a CSV file but for some reason I am not getting any data in the CSV file.
I have a DB called "test", and a collection called "people". The contents of the people collection is (json export works!):
{"_id":{"$oid":"55937ce0c64ddad5023a9570"},"name":"Joe Bloggs","position":"CE"}
{"_id":{"$oid":"55937d57c64ddad5023a9571"},"name":"Jane Bloggs","position":"CE"}
{"_id":{"$oid":"55937d62c64ddad5023a9572"},"name":"Peter Smith","position":"CE"}
{"_id":{"$oid":"55937d78c64ddad5023a9573"},"name":"Sarah Smith","position":"STL"}
I am trying to export this data into a CSV file with the following command:
mongoexport --type=csv -d test -c people --fieldFile c:\dev\peopleFields.txt --out c:\dev\people.csv
When I run this command, the response is:
2015-07-01T14:56:36.787+0800 connected to: localhost
2015-07-01T14:56:36.787+0800 exported 4 records
The contents of peopleFields.txt is:
ID
Name
Position
And the resulting output to the people.csv file is:
ID,Name,Position
"","",""
"","",""
"","",""
"","",""
Could someone please explain to me what I am doing wrong?
What you are missing here is that the --fieldFile option is not a "mapping" but just a "list" of all the fields you want to export from the collection.
So to actually "match" fields present in your collection the content should be:
_id
name
position
Since the names you have do not match any fields, you get four lines ( one per document ) of blank field output, for the number of fields you specify.
The mongoexport utility itself will not "map" to alternate names. If you want different names to how they are stored in your collection then you will have to alter the output yourself.
The same goes for the output as any ObjectId value will be output as that literal string.
You can use following command to export data in csv file:
mongoexport --db dbName --collection collectionName --type=csv --fields name,position --out fileName.csv
As per documentation,
1) The fieldFile allows you to specify fields to include in the export.
2) The file must have only one field per line, and the line(s) must end with the LF character (0x0A).
You are using different name (ID, Name, Position) in text file as that of in collection (_id, name, position)so you are getting empty fields exported.

How to import CSV with a natural primary key into mongodb?

I have a large CSV file (100M), which I wish to import into mongodb.
So, I have set out to explore my options with a small sample CSV. The mongoimport command works fine
mongoimport.exe -d mydb -c mycoll --type csv --file .\aaa.csv --headerline --stopOnError
but it creates the _id keys of type ObjectId. Now each record in the CSV contains a natural primary key, which I want to become the _id in mongo.
How do I do it for the import?
EDIT
The top two lines are:
id,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,q11,q22,q33,q44,q55,q66,q77,q88
72184515,4522534,"xo xo","2011-08-01 00:00:00","here",4848,4185,100,"xa xa","oops","yep",39.0797,-94.4067,"aha","qw","er","ty","opo",39.1029,-94.3826,2.06146,2,"q",1,"w","e","r","t","y","a","s","d","r","12787",""
The id column should become the _id.
In the header line of your .csv file, simply change "id" to "_id".
When you use mongoimport, you may find that it is a little limiting because it only creates data types of strings or numbers. The official recommendation for importing data from CSV files is to write your own script that will create documents containing the correct format and data types to fit your application.
However, if your .csv file contains only strings and numbers, then changing the header file should suffice.

Mongo: export all fields data from collection without specifying fields?

I have over 100 fields and I am looking for a way so that I can just export the entire collection as CSV format
The command-line is asking to provide all fields via
-f [ --fields ] arg comma seperated list of field names e.g. -f
name,age
is there a way to get the entire collection like using dump but not in bson format?
I need CSV data
Thank you
In bash you can create this "export-all-collections-to-csv.sh" and pass the database name as the only argument (feel free to reduce this to a single collection):
OIFS=$IFS;
IFS=",";
dbname=$1 #put "database name" here if you don't want to pass it as an argument
collections=`mongo $dbname --eval "rs.slaveOk();db.getCollectionNames();" --quiet`;
collectionArray=($collections);
for ((i=0; i<${#collectionArray[#]}; ++i));
do
keys=`mongo $dbname --eval "rs.slaveOk();var keys = []; for(var key in db.${collectionArray[$i]}.findOne()) { keys.push(key); }; keys;" --quiet`;
mongoexport --db $dbname --collection ${collectionArray[$i]} --fields "$keys" --csv --out $dbname.${collectionArray[$i]}.csv;
done
IFS=$OIFS;
You could create a file with the field names (may be easier for you):
--fieldFile arg file with fields names - 1 per line
In your case they might all be the same but the reason you have to specify the field names is because they could be different for every document however the field names in the csv must be fixed.