Bulk insert directory of JSONs into single mongodb collection in parallel - mongodb

The following works for us, importing the single file our_file.json into our mongodb collection: mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop --file /tmp/our_file.json
The following does not work, as we cannot point to a directory our_directory: mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop --file /tmp/our_directory
We predictably get the error Failed: error processing document #1: read /tmp/our_directory: is a directory
Is it possible to import all of the JSONs in our_directory into our collection using a single bash command? See speed test in my answer below - is it possible to parallelize, or use multi-threading, so that the mongoimport of the 103 files outperforms the mongoimport of the 1 file?

It looks like cat /tmp/our_directory/*.json | mongoimport --uri "mongodb+srv://<username>:<password>#our-cluster.dwxnd.gcp.mongodb.net/dbname" --collection our_coll_name --drop is working. And the import seems to be happening at a decent speed...
Edit: Speed Test (locally on Mac with these specs)
It took 11 minutes to mongoimport a total of 103 files with combined size of ~1GB to our mongoDB collection. We tested the mongoimport speed with a single 1GB file as well (rather than 103), and it took roughly 11 minutes as well.

Related

Mongodb exports fine but complains when importing with the same file the export creates

I see there are quite a few people with exporting an importing a collection/database issues.
When I try to export a collection via:
mongodump --db database1 --collection collection1
This is perfectly fine. No errors.
But when I try to import the collection that I just exported:
mongoimport --db database1 --collection collection1 --file collection1.bson -vvvvv
I get:
Failed: error processing document #1: invalid character '\u008f' looking for beginning of value
I've tried to import/export a different collections and I get:
Failed: error processing document #1: invalid character '¡' looking for beginning of value
Is there a simple way to fix this other than going through a binary encoded json file to look for ¡ and \u008f? Why would mongo allow it to be exported yet complains when trying to import it?

mongoimport cannot find file in (public) GCS bucket

We have a newline delimited JSON file saved in a public bucket in GCS:
Shows as public to internet. Hopefully one of the following 3 links finds the JSON on your end:
https://storage.googleapis.com/cbb-staging/division_info
https://storage.cloud.google.com/cbb-staging/division_info
gs://cbb-staging/division_info
We are trying to import this JSON into our MongoDB cluster using mongoimport. Our MongoDB URI string is correct, however we are struggling to point to the file in GCS.
mongoimport --uri "mongodb+srv://UserName:Password#our-cluster.abcde.gcp.mongodb.net/dbname" --collection staging__text_export --drop --file https://storage.googleapis.com/cbb-staging/division_info
mongoimport --uri "mongodb+srv://UserName:Password#our-cluster.abcde.gcp.mongodb.net/dbname" --collection staging__text_export --drop --file https://storage.cloud.google.com/cbb-staging/division_info
mongoimport --uri "mongodb+srv://UserName:Password#our-cluster.abcde.gcp.mongodb.net/dbname" --collection staging__text_export --drop --file gs://cbb-staging/division_info
All 3 of these return the similar error Failed: open https://storage.cloud.google.com/cbb-staging/division_info.json: no such file or directory. We tried adding .json to the end of the file names and it did not help.
Is this possible to do?
Here's a screenshot from MongoDB Atlas Support confirming what Rajdeep has said in the comments.

How to import a large json file into Mongodb?

I have a big json file (~ 300GB) which is composed of many of dicts in it and I am trying to import this file into MongoDB. The method that I tried was mongoimport by using this:
mongoimport --db <DB_NAME> --collection <COLLECTION_NAME> --file <FILE_DIRECTORY> --jsonArray --batchSize 1
but it shows the error something like this after some insertions Failed: error processing document #89602: unexpected EOF. I have no idea why it happens.
Any other methods to make it work?

Error running mongoImport from the mongo shell

I have a problem importing a json-file with MongoDB.
The desired file is in the folder (when I execute the command ls() the file is listed) but the method gives me this error:
mongoimport --db geo --collection points --file zips.json
Wed Mar 19 09:42:49.032 SyntaxError: Unexpected identifier
Can anyone tell me what I do wrong.
Greetings, Andre
ok,that is pretty stupid. you have to insert the mongoimport --db --coll... in the normal cmd and not in the mongo-shell. then it works without problems
You have to say the file type you are importing
mongoimport --db geo --collection points --file zips.json --type json

Getting an assertion error in mongoexport command in Mongodb.

I am getting an error after executing this command:
mongoexport --db records --collection source_list --csv --out C:\bcopy.csv
record is my DB n source_list is my collection
It displays this message:
assertion: 9998 you need to specify fields
I also tried to specify fields but it is giving me the same error.
What changes should i make in the command to get a backup of my collection or is there any other way to do so ?
Here's sample command that specifies fields to export:
mongoexport -h 127.0.0.1 --port 27018 --db mydb --collection system.profile --csv --out profile.csv --fields ns,millis,numYield,nscanned
In my case --headerline helped. I had around 60 columns, enumerating them with -f would be quite cumbersome.
--headerline
If using “--type csv” or “--type tsv,” use the first line as field names. Otherwise, > mongoimport will import the first line as a distinct document.
Seems like you should be using -f paramater to choose the fields that will be exported to csv file. There is a bug reported for this case to change the explanation as the error message is not informative enough.
https://jira.mongodb.org/browse/SERVER-4224