Import from .tsv with provided index - mongodb

I would like to import data to a MongoDb document from a .tsv file using the record _id as defined in my file.
How would I go about using the _id as specified in my .tsv, specifying the MongoDb should use the provided _id rather than generating its own?
Example data set:
student firstName lastName
ab867499 example student
I want MongoDb to use the student column as _id rather than generate its own object_id as the key.

Here is what you can do:
mongoimport --db <your_db_name> --collection <your_collection_name> --type tsv --file <path_to_file> --fields _id,firstName,lastName
In this case you will want to make sure that the first line of your file does not contain the header row or simply drop the imported document for the header row after the import.
Also, make sure you have a line break at the end of your last line of data in your file since mongoimport will skip this last record otherwise.

Related

mongoimport .csv into existing collection and database

I have a database that contains a collection that has documents in it already. Now I'm trying to insert another .csv into the same database and collection. for example one document in the db.collection has:
Name: Bob
Age: 25
and an entry from the csv im tying to upload is like this:
Name: Bob
Age:27
How can I import the new csv without replacing any documents, just adding to the database so that both entries will be in the database.collection?
Assuming you're using mongoimport, take a look at the --mode option.
With --mode merge, mongoimport enables you to merge fields from a new
record with an existing document in the database. Documents that do
not match an existing document in the database are inserted as usual.

How to mongoexport with one field

i have a few fields in my collection at the mongoDB.
i have tried exported out everything.
which looking like this
{"_id":{"$oid":"5a5ef05dbe83813f55141a51"},"comments_data":{"id":"211","comments":{"paging":{"cursors":{"after":"WzZANVFV4TlRVME5qUXpPUT09","before":"WTI5dEF4TlRVNE1USTVNemczTXpZAMk56YzZANVFV4TlRBMU9ERTFNQT09"}},"data":[{"created_time":"2018-01-04T09:29:09+0000","message":"Super","from":{"name":"M Mun","id":"1112"},"id":"1111"},{"created_time":"2018-01-07T22:25:08+0000","message":"Happy bday..Godbless you...","from":{"name":"L1","id":"111"},"id":"1111"},{"created_time":"2018-01-10T00:22:00+0000","message":"Nelson ","from":{"name":"Boon C","id":"1111"},"id":"10111"},{"created_time":"2018-01-10T01:07:19+0000","message":"Thank to SingTel I like to","from":{"name":"Sarkar WI","id":"411653482605703"},"id":"10155812413346677_10155825869201677"}]}},"post_id":"28011986676_10155812413346677","post_message":"\"Usher in the New Year with deals and rewards that will surely perk you up, exclusively for Singtel customers. Find out more at singtel.com/rewards\"",
but now i want to export just a single field which is the 'message' from the 'comments_data' from the collection.
i tried using this mongoexport --db sDB --collection sTest --fields data.comments_data --out test88.json
but when i check my exported file, it just contains something like this
{"_id":{"$oid":"5a5ef05dbe83813f55141a51"}}
which is something not i have expected.
i just want something like "message":"Happy bday..Godbless you..."
but when i query out at the mongoshell with db.sTest.find({}, {comments_data:1, _id:0})i can roughly get what i want.
If this ...
db.sTest.find({}, {'comments_data.message':1, _id:0})
... selects the data you are interested in then the equivalent mongoexport command is:
mongoexport --db sDB --collection sTest --fields 'comments_data.message' --type csv --out test88.csv
Note: this uses --type csv because, according to the docs, use of the JSON output format causes MongoDB to export all fields in the selected sub document ...
For csv output formats, mongoexport includes only the specified field(s), and the specified field(s) can be a field within a sub-document.
For JSON output formats, mongoexport includes only the specified field(s) and the _id field, and if the specified field(s) is a field within a sub-document, the mongoexport includes the sub-document with all its fields, not just the specified field within the document.
If you must have JSON format and limit your output to a single field then I think you'll need to write the reduced documents to a separate collection and export that collection, as per this answer.

How to update collection with custom fields in mongoDB

As example:
I have TSV file with data: {id:"", name:"", age:""} 100 records.
I import it to database into new collection:
mongoimport -d myDB -c people --type tsv C:\Users\User1\Downloads\PgWxXsCHH5rtmpOt4BXqZA.tsv --headerline
I decided that each record shoul have some custom field like e.g. rank so I add field to each record:
db.people.update({},{$set:{rank:0}},false,true)
I get new TSV file with updated data, for example same ids, only new ages.
Question is: how can I update same collection with new data, with pre-saving a custom field with its value. Also if TSV has new records, which are not present in collection they should be added and also same custom fields as old records but with empty or "0" value?
The command db.update is not a valid MongoDB command. The command to update a collection while setting a value for a new field is:
db.comments.update({}, { $set: { rank: 0 } }, false, true)
You do not need to define the custom fields for the new records except you require them.

How do I use mongoexport to export all records in a collection to a CSV file

I am trying to export data to a CSV file but for some reason I am not getting any data in the CSV file.
I have a DB called "test", and a collection called "people". The contents of the people collection is (json export works!):
{"_id":{"$oid":"55937ce0c64ddad5023a9570"},"name":"Joe Bloggs","position":"CE"}
{"_id":{"$oid":"55937d57c64ddad5023a9571"},"name":"Jane Bloggs","position":"CE"}
{"_id":{"$oid":"55937d62c64ddad5023a9572"},"name":"Peter Smith","position":"CE"}
{"_id":{"$oid":"55937d78c64ddad5023a9573"},"name":"Sarah Smith","position":"STL"}
I am trying to export this data into a CSV file with the following command:
mongoexport --type=csv -d test -c people --fieldFile c:\dev\peopleFields.txt --out c:\dev\people.csv
When I run this command, the response is:
2015-07-01T14:56:36.787+0800 connected to: localhost
2015-07-01T14:56:36.787+0800 exported 4 records
The contents of peopleFields.txt is:
ID
Name
Position
And the resulting output to the people.csv file is:
ID,Name,Position
"","",""
"","",""
"","",""
"","",""
Could someone please explain to me what I am doing wrong?
What you are missing here is that the --fieldFile option is not a "mapping" but just a "list" of all the fields you want to export from the collection.
So to actually "match" fields present in your collection the content should be:
_id
name
position
Since the names you have do not match any fields, you get four lines ( one per document ) of blank field output, for the number of fields you specify.
The mongoexport utility itself will not "map" to alternate names. If you want different names to how they are stored in your collection then you will have to alter the output yourself.
The same goes for the output as any ObjectId value will be output as that literal string.
You can use following command to export data in csv file:
mongoexport --db dbName --collection collectionName --type=csv --fields name,position --out fileName.csv
As per documentation,
1) The fieldFile allows you to specify fields to include in the export.
2) The file must have only one field per line, and the line(s) must end with the LF character (0x0A).
You are using different name (ID, Name, Position) in text file as that of in collection (_id, name, position)so you are getting empty fields exported.

How to import CSV with a natural primary key into mongodb?

I have a large CSV file (100M), which I wish to import into mongodb.
So, I have set out to explore my options with a small sample CSV. The mongoimport command works fine
mongoimport.exe -d mydb -c mycoll --type csv --file .\aaa.csv --headerline --stopOnError
but it creates the _id keys of type ObjectId. Now each record in the CSV contains a natural primary key, which I want to become the _id in mongo.
How do I do it for the import?
EDIT
The top two lines are:
id,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,q11,q22,q33,q44,q55,q66,q77,q88
72184515,4522534,"xo xo","2011-08-01 00:00:00","here",4848,4185,100,"xa xa","oops","yep",39.0797,-94.4067,"aha","qw","er","ty","opo",39.1029,-94.3826,2.06146,2,"q",1,"w","e","r","t","y","a","s","d","r","12787",""
The id column should become the _id.
In the header line of your .csv file, simply change "id" to "_id".
When you use mongoimport, you may find that it is a little limiting because it only creates data types of strings or numbers. The official recommendation for importing data from CSV files is to write your own script that will create documents containing the correct format and data types to fit your application.
However, if your .csv file contains only strings and numbers, then changing the header file should suffice.