I would like to import data to a MongoDb document from a .tsv file using the record _id as defined in my file.
How would I go about using the _id as specified in my .tsv, specifying the MongoDb should use the provided _id rather than generating its own?
Example data set:
student firstName lastName
ab867499 example student
I want MongoDb to use the student column as _id rather than generate its own object_id as the key.
Here is what you can do:
mongoimport --db <your_db_name> --collection <your_collection_name> --type tsv --file <path_to_file> --fields _id,firstName,lastName
In this case you will want to make sure that the first line of your file does not contain the header row or simply drop the imported document for the header row after the import.
Also, make sure you have a line break at the end of your last line of data in your file since mongoimport will skip this last record otherwise.
I want to clone a collection to a new collection, remove all the documents, and then import new documents from a csv file. When I do the copy using copyTo everything works fine. The datatypes are copied over from the source collection to the new collection. However, after I remove all the documents from the new collection and import from the csv, the datatypes are lost. The datatypes from my source csv are already setup to match what is in the source collection I copied from.
Is there a way to preserve the datatypes after removing all documents from a collection?
How can I copy the datatypes from my csv when importing? For example my date columns show as string.
A new collection doesn't have a fixed schema so documents added don't have to be similar unless you've created the collection using the validator option. You can also add validation to an existing collection. See Document Validation in the MongoDB manual.
As example:
I have TSV file with data: {id:"", name:"", age:""} 100 records.
I import it to database into new collection:
mongoimport -d myDB -c people --type tsv C:\Users\User1\Downloads\PgWxXsCHH5rtmpOt4BXqZA.tsv --headerline
I decided that each record shoul have some custom field like e.g. rank so I add field to each record:
db.people.update({},{$set:{rank:0}},false,true)
I get new TSV file with updated data, for example same ids, only new ages.
Question is: how can I update same collection with new data, with pre-saving a custom field with its value. Also if TSV has new records, which are not present in collection they should be added and also same custom fields as old records but with empty or "0" value?
The command db.update is not a valid MongoDB command. The command to update a collection while setting a value for a new field is:
db.comments.update({}, { $set: { rank: 0 } }, false, true)
You do not need to define the custom fields for the new records except you require them.
I have a csv file that I've imported into a Meteor project and I've updated the csv file (added a couple of columns with data) and I'd like to re-import the csv file. If I just import it again, will it over-write the first one? Or would I have two collections with the same name? What's the best way to do this?
If you re-import the file again, it will do insert not update to the collection
So if your collection have a unique key index on a field (like _id because by default _id is indexed and unique) and that field is a column in the csv file. When you import again, mongodb will throw an error saying you have violated a unique unique constraint and stop, your old data is untouched.
If not, your collection don't have any other unique key index and _id is not a column in the csv file. Then if you re-import, your collection will have duplicate records with the old data and the new data that you just imported.
Either way, the result is not what you wanted.
You can't have 2 collections with the same name in the same database.
Easiest way to do: if your data is not important, you could just drop the collection and import again
Else you will have to update the document in mongodb (using mongo console or write a script)
I have a large CSV file (100M), which I wish to import into mongodb.
So, I have set out to explore my options with a small sample CSV. The mongoimport command works fine
mongoimport.exe -d mydb -c mycoll --type csv --file .\aaa.csv --headerline --stopOnError
but it creates the _id keys of type ObjectId. Now each record in the CSV contains a natural primary key, which I want to become the _id in mongo.
How do I do it for the import?
EDIT
The top two lines are:
id,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,q11,q22,q33,q44,q55,q66,q77,q88
72184515,4522534,"xo xo","2011-08-01 00:00:00","here",4848,4185,100,"xa xa","oops","yep",39.0797,-94.4067,"aha","qw","er","ty","opo",39.1029,-94.3826,2.06146,2,"q",1,"w","e","r","t","y","a","s","d","r","12787",""
The id column should become the _id.
In the header line of your .csv file, simply change "id" to "_id".
When you use mongoimport, you may find that it is a little limiting because it only creates data types of strings or numbers. The official recommendation for importing data from CSV files is to write your own script that will create documents containing the correct format and data types to fit your application.
However, if your .csv file contains only strings and numbers, then changing the header file should suffice.