mongoimport dosent work well on big files - mongodb

I have a json file of around 4M json lines, I tried to use:
mongoimport --db mydb --collection mycoll --file myfile.json
and what happened was weird. I got this error:
2018-12-29T17:00:50.424+0200 connected to: localhost
2018-12-29T17:00:50.483+0200 Failed: error processing document #1428: invalid character 'S' after object key:value pair
2018-12-29T17:00:50.483+0200 imported 1426 documents
so first I went to count this collection in mongo and saw that there are 1000 documents and not 1426 as the above mentioned.
second, I located a json with the 'S' in it, which is just a string that looks like "name" : "Double 'S' Transport" and left only this json in the file, import and it worked well.
does anyone understands why is it happening? my suspicion is that mongoimport dosent work on files that big...
any help would be great :)

Related

How can i import a .json file in mongodb on ubuntu?

I am new to mongodb and i am trying to import .json files.I created a database on sql developer and exported three of my tables in 3 separate .json files that look like this,
{"results":[{"columns":[{"name":"CLUBID","type":"NUMBER"},{"name":"MANAGERID","type":"NUMBER"},{"name":"NAME","type":"VARCHAR2"},{"name":"CITY","type":"VARCHAR2"},{"name":"CREATION_DATE","type":"DATE"}],"items":
[
{"clubid":2001,"managerid":5376,"name":"FC KOPRITIS","city":"LAKONIA","creation_date":"03\/07\/99"}
,{"clubid":2002,"managerid":5377,"name":"FC NOE","city":"KITHERA","creation_date":"10\/11\/14"}
,{"clubid":2003,"managerid":5378,"name":"FC KRK","city":"MELOS","creation_date":"31\/01\/39"}
,{"clubid":2004,"managerid":5379,"name":"FC FOCUSRITE","city":"THERA","creation_date":"02\/02\/02"}
,{"clubid":2005,"managerid":5380,"name":"FC GHOST","city":"SERIFOS","creation_date":"05\/08\/64"}
,{"clubid":2006,"managerid":5431,"name":"FC ALITIS","city":"LIMNOS","creation_date":"22\/10\/45"}
,{"clubid":2007,"managerid":5432,"name":"FC VLOSPA","city":"MIKONOS","creation_date":"30\/08\/85"}
,{"clubid":2008,"managerid":5433,"name":"FC MADCLIP","city":"CAPITAL","creation_date":"01\/04\/01"}
,{"clubid":2009,"managerid":5436,"name":"FC SNIK","city":"ATHENS","creation_date":"18\/07\/98"}
,{"clubid":2010,"managerid":5435,"name":"FC YTM","city":"XANTHI","creation_date":"20\/04\/18"}
]}
I tried using mongoimport --jsonArray --file club.json but it didn't work.
I get errors like "unexpected EOF" or "no collection specified"
The following steps resulted in a successful import:
Get the desired records to import (cleanse the data):
{"clubid":2001,"managerid":5376,"name":"FC KOPRITIS","city":"LAKONIA","creation_date":"03\/07\/99"}
{"clubid":2002,"managerid":5377,"name":"FC NOE","city":"KITHERA","creation_date":"10\/11\/14"}
{"clubid":2003,"managerid":5378,"name":"FC KRK","city":"MELOS","creation_date":"31\/01\/39"}
{"clubid":2004,"managerid":5379,"name":"FC FOCUSRITE","city":"THERA","creation_date":"02\/02\/02"}
{"clubid":2005,"managerid":5380,"name":"FC GHOST","city":"SERIFOS","creation_date":"05\/08\/64"}
{"clubid":2006,"managerid":5431,"name":"FC ALITIS","city":"LIMNOS","creation_date":"22\/10\/45"}
{"clubid":2007,"managerid":5432,"name":"FC VLOSPA","city":"MIKONOS","creation_date":"30\/08\/85"}
{"clubid":2008,"managerid":5433,"name":"FC MADCLIP","city":"CAPITAL","creation_date":"01\/04\/01"}
{"clubid":2009,"managerid":5436,"name":"FC SNIK","city":"ATHENS","creation_date":"18\/07\/98"}
{"clubid":2010,"managerid":5435,"name":"FC YTM","city":"XANTHI","creation_date":"20\/04\/18"}
Verify your file location, name and path. My file is in a sub directory of my current working directory called testData and the name is json1.JSON.
Execute the import (the db and collection will be created if they don't exist):
mongoimport --db tst2 --collection so2 --file testData/json1.JSON
Results:
2020-04-06T21:39:10.177-0400 connected to: mongodb://localhost/
2020-04-06T21:39:10.179-0400 10 document(s) imported successfully. 0 document(s) failed to import.

Unexpected identifier while doing Mongoimport of json file

mongoimport --db dbName --collection collectionName < /Users/pratikjoshi/Desktop/FileName.json --jsonArray
I run this command in shell scripts!
Here is my json file content
[
{
"trackingRecordId":5742294,
"longitude":77.126205,
"latitude":28.54711,
"batteryPerc":100,
"speed":0.13,
"createdOnDt":"2016-01-14 00:00:01"
},
{
"trackingRecordId":5742293,
"longitude":72.86727,
"latitude":19.112692,
"batteryPerc":51.82,
"speed":10,
"createdOnDt":"2016-01-13 23:59:59"
}
]
Well it clearly shows that your JSON is corrupted,
just goto JSON Validator, validate your JSON file and try again.
Hope it helps
EDIT
Well your json is completely valid and your command for export is also valid and working pretty much OK with me.

mongoimport tsv error - "invalid bson in object with unknown _id"

I'm trying to import data to a mongodb collection using the mongoimport function. the file containing the data I'm importing is saved as a .tsv file. (It has to be a 'tsv' file as it has unicode characters which are lost when I save to .csv).
I use the following command to import the data:
mongoimport --db millie_db --collection ref_datas --type tsv --headerline --file vowels.tsv
and get the following error:
error inserting documents: Client Error: bad object in message: invalid bson in object with unknown _id
imported 0 documents
Can anyone advise how I can find out what the problem is?
Many thanks,
Problem is with the encoding of the input file. I converted to UTF-8 using notepad++ and saved and all is good.

How to export JSON from MongoDB using Robo 3T

I am using Robo 3T (formerly RoboMongo) which I connect to a MongoDB. What I need to do is this: There is a collection in that MongoDB. I want to export the data from that collection so that I can save it into a file.
I used the interface to open the data from the collection as text and did a Ctrl + A and pasted into a text file. However, I found that not all data is copied and also that there were many comments in the text data which naturally breaks the JSON.
I am wondering if Robo 3T has a "Export As JSON" facility so that I can do a clean export.
Any pointers are appreciated!
A quick and dirty way: Just write your query as db.getCollection('collection').find({}).toArray() and right click Copy JSON. Paste the data in the editor of your choice.
You can use tojson to convert each record to JSON in a MongoDB shell script.
Run this script in RoboMongo:
var cursor = db.getCollection('foo').find({}, {});
while(cursor.hasNext()) {
print(tojson(cursor.next()))
}
This prints all results as a JSON-like array.
The result is not really JSON! Some types, such as dates and object IDs, are printed as JavaScript function calls, e.g., ISODate("2016-03-03T12:15:49.996Z").
Might not be very efficient for large result sets, but you can limit the query. Alternatively, you can use mongoexport.
Robomongo's shell functionality will solve the problem. In my case I needed couple of columns as CSV format.
var cursor = db.getCollection('Member_details').find({Category: 'CUST'},{CustomerId :1,Name :1,_id:0})
while (cursor.hasNext()) {
var record = cursor.next();
print(record.CustomerID + "," + record.Name)
}
Output : -------
334, Harison
433, Rechard
453, Michel
533, Pal
you say "export to file" as in a spreadsheet? like to a .csv?
IMO this is the EASIEST way to do this in Robo 3T (formerly robomongo):
In the top right of the Robo 3T GUI there is a "View Results in text
mode" button, click it and copy everything
paste everything into this website: https://json-csv.com/
click the download button and now you have it in a spreadsheet.
hope this helps someone, as I wish Robo 3T had export capabilities
There are a few MongoDB GUIs out there, some of them have built-in support for data exporting. You'll find a comprehensive list of MongoDB GUIs at http://mongodb-tools.com
You've asked about exporting the results of your query, and not about exporting entire collections. Give 3T MongoChef MongoDB GUI a try, this tool has support for your specific use case.
Don't run this command on shell, enter this script at a command prompt with your database name, collection name, and file name, all replacing the placeholders..
mongoexport --db (Database name) --collection (Collection Name) --out (File name).json
It works for me.
I don't think robomongo have such a feature.
So you better use mongodb function as mongoexport for a specific Collection.
http://docs.mongodb.org/manual/reference/program/mongoexport/#export-in-json-format
But if you are looking for a backup solution is better to use
mongodump / mongorestore
If you want to use mongoimport, you'll want to export this way:
db.getCollection('tables')
.find({_id: 'q3hrnnoKu2mnCL7kE'})
.forEach(function(x){printjsononeline(x)});
Expanding on Anish's answer, I wanted something I can apply to any query to automatically output all fields vs. having to define them within the print statement. It can probably be simplified but this was something quick & dirty that works great:
var cursor = db.getCollection('foo').find({}, {bar: 1, baz: 1, created_at: 1, updated_at: 1}).sort({created_at: -1, updated_at: -1});
while (cursor.hasNext()) {
var record = cursor.next();
var output = "";
for (var i in record) {
output += record[i] + ",";
};
output = output.substring(0, output.length - 1);
print(output);
}
Using a robomongo shell script:
//on the same db
var cursor = db.collectionname.find();
while (cursor.hasNext()) {
var record = cursor.next();
db.new_collectionname.save(record);
}
Using mongodb's export and import command
You can add the --jsonArray parameter / flag to your mongoexport command, this exports the result as single json array.
Then just specify the --jsonArray flag again when importing.
Or remove the starting and ending array brackets [] in the file, then your modified & exported file will import with the mongoimport command without the --jsonArray flag.
More on Export here: https://docs.mongodb.org/manual/reference/program/mongoexport/#cmdoption--jsonArray
Import here:
https://docs.mongodb.org/manual/reference/program/mongoimport/#cmdoption--jsonArray
Solution:
mongoexport --db test --collection traffic --out traffic.json<br><br>
Where:
database -> mock-server
collection name -> api_defs
output file name -> childChoreRequest.json
An extension to Florian Winter answer for people looking to generate ready to execute query.
drop and insertMany query using cursor:
{
// collection name
var collection_name = 'foo';
// query
var cursor = db.getCollection(collection_name).find({});
// drop collection and insert script
print('db.' + collection_name + '.drop();');
print('db.' + collection_name + '.insertMany([');
// print documents
while(cursor.hasNext()) {
print(tojson(cursor.next()));
if (cursor.hasNext()) // add trailing "," if not last item
print(',');
}
// end script
print(']);');
}
Its output will be like:
db.foo.drop();
db.foo.insertMany([
{
"_id" : ObjectId("abc"),
"name" : "foo"
}
,
{
"_id" : ObjectId("xyz"),
"name" : "bar"
}
]);
I had this same issue, and running script in robomongo (Robo 3T 1.1.1) also doesn't allow to copy values and there was no export option either.
The best way I could achieve this is to use mongoexport, if mongodb is installed on your local, you can use mongoexport to connect to database on any server and extract data
To connect to Data on remote server, and csv output file, run the following mongoexport in your command line
mongoexport --host HOSTNAME --port PORT --username USERNAME --password "PASSWORD" --collection COLLECTION_NAME --db DATABASE_NAME --out OUTPUTFILE.csv --type=csv --fieldFile fields.txt
fieldFile: helps to extract the desired columns, ex:
contents of fields.txt can be just:
userId
to only extract values of the column 'userId'
Data on remote server, json output file:
mongoexport --host HOST_NAME --port PORT --username USERNAME --password "PASSWORD" --collection COLECTION_NAME --db DATABASE_NAME --out OUTPUT.json
this extracts all fields into the json file
data on localhost (mongodb should be running on localhost)
mongoexport --db DATABASE_NAME --collection COLLECTION --out OUTPUT.json
Reference: https://docs.mongodb.com/manual/reference/program/mongoexport/#use
Simple solution:
tostrictjson(db.getCollection(collection_name).find({}))
Note:
Other solutions are fine but might cause errors during import when your collection has types like Date, ObjectId etc...
Happy Hacking :)
I export using Mongodb Compass, you can export to csv or json.
On the menu of Mongo Compass select Collection-> export collection, and you can select the fields to export, and the file to export the result, previously you can specify the query.
Regards
make your search
push button view results in JSON mode
copy te result to word
print the result from word

How to copy a collection from one database to another in MongoDB

Is there a simple way to do this?
The best way is to do a mongodump then mongorestore. You can select the collection via:
mongodump -d some_database -c some_collection
[Optionally, zip the dump (zip some_database.zip some_database/* -r) and scp it elsewhere]
Then restore it:
mongorestore -d some_other_db -c some_or_other_collection dump/some_collection.bson
Existing data in some_or_other_collection will be preserved. That way you can "append" a collection from one database to another.
Prior to version 2.4.3, you will also need to add back your indexes after you copy over your data. Starting with 2.4.3, this process is automatic, and you can disable it with --noIndexRestore.
At the moment there is no command in MongoDB that would do this. Please note the JIRA ticket with related feature request.
You could do something like:
db.<collection_name>.find().forEach(function(d){ db.getSiblingDB('<new_database>')['<collection_name>'].insert(d); });
Please note that with this, the two databases would need to share the same mongod for this to work.
Besides this, you can do a mongodump of a collection from one database and then mongorestore the collection to the other database.
Actually, there is a command to move a collection from one database to another. It's just not called "move" or "copy".
To copy a collection, you can clone it on the same database, then move the cloned collection.
To clone:
> use db1
switched to db db1
> db.source_collection.find().forEach(
function(x){
db.collection_copy.insert(x)
}
);
To move:
> use admin
switched to db admin
> db.runCommand(
{
renameCollection: 'db1.source_collection',
to : 'db2.target_collection'
}
);
The other answers are better for copying the collection, but this is especially useful if you're looking to move it.
I would abuse the connect function in mongo cli mongo doc. so that means you can start one or more connection.
if you want to copy customer collection from test to test2 in same server. first you start mongo shell
use test
var db2 = connect('localhost:27017/test2')
do a normal find and copy the first 20 record to test2.
db.customer.find().limit(20).forEach(function(p) { db2.customer.insert(p); });
or filter by some criteria
db.customer.find({"active": 1}).forEach(function(p) { db2.customer.insert(p); });
just change the localhost to IP or hostname to connect to remote server. I use this to copy test data to a test database for testing.
If between two remote mongod instances, use
{ cloneCollection: "<collection>", from: "<hostname>", query: { <query> }, copyIndexes: <true|false> }
See http://docs.mongodb.org/manual/reference/command/cloneCollection/
I'd usually do:
use sourcedatabase;
var docs=db.sourcetable.find();
use targetdatabase;
docs.forEach(function(doc) { db.targettable.insert(doc); });
for huge size collections, you can use Bulk.insert()
var bulk = db.getSiblingDB(dbName)[targetCollectionName].initializeUnorderedBulkOp();
db.getCollection(sourceCollectionName).find().forEach(function (d) {
bulk.insert(d);
});
bulk.execute();
This will save a lot of time.
In my case, I'm copying collection with 1219 documents: iter vs Bulk (67 secs vs 3 secs)
Unbelievable how many up-votes are given for agonizingly slow one-by-one copy of data.
As given in other answers the fastest solution should be mongodump / mongorestore. There is no need to save the dump to your local disk, you can pipe the dump directly into mongorestore:
mongodump --db=some_database --collection=some_collection --archive=- | mongorestore --nsFrom="some_database.some_collection" --nsTo="some_or_other_database.some_or_other_collection" --archive=-
In case you run a sharded cluster, the new collection is not sharded by default. All data is written initially to your primary shard. This may cause problems with disk space and put additional load to your cluster for balancing. Better pre-split your collection like this before you import the data:
sh.shardCollection("same_or_other_database.same_or_other_collection", { <shard_key>: 1 });
db.getSiblingDB("config").getCollection("chunks").aggregate([
{ $match: { ns: "some_database.some_collection"} },
{ $sort: { min: 1 } },
{ $skip: 1 }
], { allowDiskUse: true }).forEach(function (chunk) {
sh.splitAt("same_or_other_database.same_or_other_collection", chunk.min)
})
There are different ways to do the collection copy. Note the copy can happen in the same database, different database, sharded database or mongod instances. Some of the tools can be efficient for large sized collection copying.
Aggregation with $merge:
Writes the results of the aggregation pipeline to a specified collection. Note that the copy can happen across databases, even the sharded collections. Creates a new one or replaces an existing collection. New in version 4.2.
Example: db.test.aggregate([ { $merge: { db: "newdb", coll: "newcoll" }} ])
Aggregation with $out:
Writes the results of the aggregation pipeline to a specified collection. Note that the copy can happen within the same database only. Creates a new one or replaces an existing collection.
Example: db.test.aggregate([ { $out: "newcoll" } ])
mongoexport and mongoimport:
These are command-line tools.
mongoexport produces a JSON or CSV export of collection data. The output from the export is used as the source for the destination collection using the mongoimport.
mongodump and mongorestore:
These are command-line tools.
mongodump utility is for creating a binary export of the contents of a database or a collection. The mongorestore program loads data from a binary database dump created by mongodump into the destination.
db.cloneCollection():
Copies a collection from a remote mongod instance to the current mongod instance.
Deprecated since version 4.2.
db.collection.copyTo():
Copies all documents from collection into new a Collection (within the same database).
Deprecated since version 3.0. Starting in version 4.2, MongoDB this command is not valid.
NOTE: Unless said the above commands run from mongo shell.
Reference: The MongoDB Manual.
You can also use a favorite programming language (e.g., Java) or environment (e.g., NodeJS) using appropriate driver software to write a program to perform the copy - this might involve using find and insert operations or another method. This find-insert can be performed from the mongo shell too.
You can also do the collection copy using GUI programs like MongoDB Compass.
You can use aggregation framework to resolve your issue
db.oldCollection.aggregate([{$out : "newCollection"}])
It should be noted, that indexes from oldCollection will not copied in newCollection.
I know this question has been answered however I personally would not do #JasonMcCays answer due to the fact that cursors stream and this could cause an infinite cursor loop if the collection is still being used. Instead I would use a snapshot():
http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database
#bens answer is also a good one and works well for hot backups of collections not only that but mongorestore does not need to share the same mongod.
This might be just a special case, but for a collection of 100k documents with two random string fields (length is 15-20 chars), using a dumb mapreduce is almost twice as fast as find-insert/copyTo:
db.coll.mapReduce(function() { emit(this._id, this); }, function(k,vs) { return vs[0]; }, { out : "coll2" })
Using pymongo, you need to have both databases on same mongod, I did the following:
db = original database
db2 = database to be copied to
cursor = db["<collection to copy from>"].find()
for data in cursor:
db2["<new collection>"].insert(data)
If RAM is not an issue using insertMany is way faster than forEach loop.
var db1 = connect('<ip_1>:<port_1>/<db_name_1>')
var db2 = connect('<ip_2>:<port_2>/<db_name_2>')
var _list = db1.getCollection('collection_to_copy_from').find({})
db2.collection_to_copy_to.insertMany(_list.toArray())
This won't solve your problem but the mongodb shell has a copyTo method that copies a collection into another one in the same database:
db.mycoll.copyTo('my_other_collection');
It also translates from BSON to JSON, so mongodump/mongorestore are the best way to go, as others have said.
Many right answers here. I would go for mongodump and mongorestore in a piped fashion for a large collection:
mongodump --db fromDB --gzip --archive | mongorestore --drop --gzip --archive --nsFrom "fromDB.collectionName" --nsTo "toDB.collectionName"
although if I want to do quick copy, its slow but it works:
use fromDB
db.collectionName.find().forEach(function(x){
db.getSiblingDB('toDB')['collectionName'].insert(x);
});"
In case some heroku users stumble here and like me want to copy some data from staging database to the production database or vice versa here's how you do it very conveniently (N.B. I hope there's no typos in there, can't check it atm., I'll try confirm the validity of the code asap):
to_app="The name of the app you want to migrate data to"
from_app="The name of the app you want to migrate data from"
collection="the collection you want to copy"
mongohq_url=`heroku config:get --app "$to_app" MONGOHQ_URL`
parts=(`echo $mongohq_url | sed "s_mongodb://heroku:__" | sed "s_[#/]_ _g"`)
to_token=${parts[0]}; to_url=${parts[1]}; to_db=${parts[2]}
mongohq_url=`heroku config:get --app "$from_app" MONGOHQ_URL`
parts=(`echo $mongohq_url | sed "s_mongodb://heroku:__" | sed "s_[#/]_ _g"`)
from_token=${parts[0]}; from_url=${parts[1]}; from_db=${parts[2]}
mongodump -h "$from_url" -u heroku -d "$from_db" -p"$from_token" -c "$collection" -o col_dump
mongorestore -h "$prod_url" -u heroku -d "$to_app" -p"$to_token" --dir col_dump/"$col_dump"/$collection".bson -c "$collection"
You can always use Robomongo. As of v0.8.3 there is a tool that can do this by right-clicking on the collection and selecting "Copy Collection to Database"
For details, see http://blog.robomongo.org/whats-new-in-robomongo-0-8-3/
This feature was removed in 0.8.5 due to its buggy nature so you will have to use 0.8.3 or 0.8.4 if you want to try it out.
use "Studio3T for MongoDB"
that have Export and Import tools by click on database , collections or specific collection
download link : https://studio3t.com/download/
The simplest way to import data from the existing MongoDB atlas cluster DB is using mongodump & mongorestore commands.
To create the dump from existing DB you can use:
mongodump --uri="<connection-uri>"
There are other options for connection which can be lookup here: https://www.mongodb.com/docs/database-tools/mongodump/
After the dump is successfully created in a dump/ directory, you can use import that data inside your other db like so:
mongorestore --uri="<connection-uri-of-other-db>" <dump-file-location>
Similarly for mongorestore, there are other connection options that can be looked up along with commands to restore specific collections:
https://www.mongodb.com/docs/database-tools/mongorestore/
The dump file location will be inside the dump directory. There may be a subdirectory with the same name as DB name which you dumped. For example if you dumped test DB, then dump file location would be /dump/test
In my case, I had to use a subset of attributes from the old collection in my new collection. So I ended up choosing those attributes while calling insert on the new collection.
db.<sourceColl>.find().forEach(function(doc) {
db.<newColl>.insert({
"new_field1":doc.field1,
"new_field2":doc.field2,
....
})
});`
To copy a collection (myCollection1) from one database to another in MongoDB,
**Server1:**
myHost1.com
myDbUser1
myDbPasword1
myDb1
myCollection1
outputfile:
myfile.json
**Server2:**
myHost2.com
myDbUser2
myDbPasword2
myDb2
myCollection2
you can do this:
mongoexport --host myHost1.com --db myDb1 -u myDbUser1 -p myDbPasword1 --collection myCollection1 --out myfile.json
then:
mongoimport --host myHost2.com --db myDb2 -u myDbUser2 -p myDbPasword2 --collection myCollection2 --file myfile.json
Another case , using CSV file:
Server1:
myHost1.com
myDbUser1
myDbPasword1
myDb1
myCollection1
fields.txt
fieldName1
fieldName2
outputfile:
myfile.csv
Server2:
myHost2.com
myDbUser2
myDbPasword2
myDb2
myCollection2
you can do this:
mongoexport --host myHost1.com --db myDb1 -u myDbUser1 -p myDbPasword1 --collection myCollection1 --out myfile.csv --type=csv
add clolumn types in csv file (name1.decimal(),name1.string()..) and then:
mongoimport --host myHost2.com --db myDb2 -u myDbUser2 -p myDbPasword2 --collection myCollection2 --file myfile.csv --type csv --headerline --columnsHaveTypes
This can be done using Mongo's db.copyDatabase method:
db.copyDatabase(fromdb, todb, fromhost, username, password)
Reference: http://docs.mongodb.org/manual/reference/method/db.copyDatabase/