How to import a large json file into Mongodb? - mongodb

I have a big json file (~ 300GB) which is composed of many of dicts in it and I am trying to import this file into MongoDB. The method that I tried was mongoimport by using this:
mongoimport --db <DB_NAME> --collection <COLLECTION_NAME> --file <FILE_DIRECTORY> --jsonArray --batchSize 1
but it shows the error something like this after some insertions Failed: error processing document #89602: unexpected EOF. I have no idea why it happens.
Any other methods to make it work?

Related

Mongodb exports fine but complains when importing with the same file the export creates

I see there are quite a few people with exporting an importing a collection/database issues.
When I try to export a collection via:
mongodump --db database1 --collection collection1
This is perfectly fine. No errors.
But when I try to import the collection that I just exported:
mongoimport --db database1 --collection collection1 --file collection1.bson -vvvvv
I get:
Failed: error processing document #1: invalid character '\u008f' looking for beginning of value
I've tried to import/export a different collections and I get:
Failed: error processing document #1: invalid character '¡' looking for beginning of value
Is there a simple way to fix this other than going through a binary encoded json file to look for ¡ and \u008f? Why would mongo allow it to be exported yet complains when trying to import it?

Bulk insert directory of JSONs into single mongodb collection in parallel

The following works for us, importing the single file our_file.json into our mongodb collection: mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop --file /tmp/our_file.json
The following does not work, as we cannot point to a directory our_directory: mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop --file /tmp/our_directory
We predictably get the error Failed: error processing document #1: read /tmp/our_directory: is a directory
Is it possible to import all of the JSONs in our_directory into our collection using a single bash command? See speed test in my answer below - is it possible to parallelize, or use multi-threading, so that the mongoimport of the 103 files outperforms the mongoimport of the 1 file?
It looks like cat /tmp/our_directory/*.json | mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop is working. And the import seems to be happening at a decent speed...
Edit: Speed Test (locally on Mac with these specs)
It took 11 minutes to mongoimport a total of 103 files with combined size of ~1GB to our mongoDB collection. We tested the mongoimport speed with a single 1GB file as well (rather than 103), and it took roughly 11 minutes as well.

I want to import the json file only if they don't exist

I am using mongo 3.4
I want to import json file from json array to mongod using bash script, and I want to import the json file only if they don't exist. I tried with --upsert but it does not work.
Is there any easy way to do it? Thanks
mongoimport --db dbName --collection collectionName --file fileName.json --jsonArray --upsert
mongoimport -d dbName -c collectionName jsonFile.json -vvvvv
Even though the output of mongoimport says that n of objects were imported, the exsiting document with same data has not been overwritten.
if use --upsert it will update the existing document.
Found similar discussion here

mongoimport error: Unexpected identifier

When I try to import my json data file into my local instance of mongodb, I get an error. The code that I am using is shown below.
> mongoimport --db cities --collection zips --type json --file C:/MongoDB/data/zips.json
This is the error that I get.
2014-11-29T20:27:33.803-0800 SyntaxError: Unexpected identifier
what seems to be to problem here?
I just found out that mongoimport is used from terminal/command line(cmd), and NOT within the mongo shell.

how to import large json file into mongodb using mongoimport?

I am trying to import the large data set json file into mongodb using mongoimport.
mongoimport --db test --collection sam1 --file 1234.json --jsonArray
error:
2014-07-02T15:57:16.406+0530 error: object to insert too large
2014-07-02T15:57:16.406+0530 tried to import 1 objects
Please try add this option: --batchSize 1
Like:
mongoimport --db test --collection sam1 --file 1234.json --batchSize 1
Data will be parsed and stored into the database batchwise