MongoDB import to different collections set by a field - mongodb

I have a file called data.json and extracted with mongoexport, with the following structure:
{"id":"63","name":"rcontent","table":"modules"}
{"id":"81","name":"choicegroup","table":"modules"}
{"id":"681","course":"1242","name":"Requeriments del curs","timemodified":"1388667164","table":"page"}
{"id":"682","course":"1242","name":"Guia d'estudi","timemodified":"1374183513","table":"page"}
What I need is to import this file into my local mongodb with a command like mongoimport or with pymongo, but storing every line in the collection named after the table value.
For example, the collection modules would contain the documents
{"id":"63","name":"rcontent"} and {"id":"81","name":"choicegroup"}
I've tried with mongoimport but I haven't seen any option which allows that. Does anyone know if there is a command or a method to do that?
Thank you

The basic steps for this using python are:
parse the data.json file to create python objects
extract the table key value pair from each document object
insert the remaining doc into a pymongo collection
Thankfully, pymongo makes this pretty straightforward, as below:
import json
from pymongo import MongoClient
client = MongoClient() # this will use default port and host
db = client['test-db'] # select the db to use
with open("data.json", "r") as json_f:
for str_doc in json_f.readlines():
doc = json.loads(str_doc)
table = doc.pop("table") # remove the 'table' key
db[table].insert(doc)

Related

Trouble querying specific text field in mongoDB using pymongo

I have stored around 120 text files in a mongoDB database through connecting my local instance to mongodb cloud. I used pymongo to automate the insertion of the contents of each text file into mongodb cloud. The collection of 120 documents looks like this:
'''
{ _id:ObjectID(....),
nameTextdoc.txt:"text_document",
content: ['Each sentence stored in an array.','...']
'''
I am trying to retrieve the nameTextdoc.txt field and content field by using:
'''
collections.find_one({'nameTextdoc.txt': 'text_doc'})
'''
in a python script using pymongo. For some reason I receive None when I run this query. However, when I run:
'''
collections.find_one({})
'''
I get the entire document.
I would like to get assistance on writing a query that would retrieve the entirety of the text file by querying the name of the text file. I have periods in my key names, which may be the specific reason why I cannot retrieve them. Any help would be much appreciated.

mongoimport .csv into existing collection and database

I have a database that contains a collection that has documents in it already. Now I'm trying to insert another .csv into the same database and collection. for example one document in the db.collection has:
Name: Bob
Age: 25
and an entry from the csv im tying to upload is like this:
Name: Bob
Age:27
How can I import the new csv without replacing any documents, just adding to the database so that both entries will be in the database.collection?
Assuming you're using mongoimport, take a look at the --mode option.
With --mode merge, mongoimport enables you to merge fields from a new
record with an existing document in the database. Documents that do
not match an existing document in the database are inserted as usual.

Omit database collection when importing from MongoDB to ElasticSearch

I'm trying to omit a collection from being included when importing from a MongoDB database to ElasticSearch.
https://github.com/appbaseio/abc is being used with a transform file, which has the following code:
t.Source('source', source, '/.*/')
.Transform(omit(user_database_bugtest.users))
.Save('sink', sink, '/.*/');
Database: user_database_bugtest
Collection to be omitted: users
I assume this is what's not formatted correctly, unless I have to make other changes: user_database_bugtest.users
I used this transform file code:
t.Source("source", source, '/^testcollection$/').Save("sink", sink, "/.*/")

Determine whether collection in MongoDB exists in Python

I want to know whether collections of specific names exists in the MongoDB. How can I achieve this programmatically in Python. On searching about the same, I got to know how to do that from MongoDB shell but nothing useful for doing the same in Python.
You can use the method to retrieve and check if your collection exists or not from the comment given by the #Alex like this:
Method 1:
import pymongo
connection = pymongo.MongoClient('localhost', 27017) # Connect to mongodb
db = connection['test_db']
list_of_collections = db.list_collection_names() # Return a list of collections in 'test_db'
print("posts" in list_of_collections) # Check if collection "posts" exists in db (test_db)
Or, you can validate a collection with validate_collection() (documentation) This returns an error (pymongo.errors.OperationFailure) if the collection doesn't exist. With this method, you can also catch that exception and do whatever you want.
Method 2:
import pymongo
connection = pymongo.MongoClient('localhost', 27017) # Connect to mongodb
db = connection['test_db']
try:
db.validate_collection("random_collection_name") # Try to validate a collection
except pymongo.errors.OperationFailure: # If the collection doesn't exist
print("This collection doesn't exist")

How to import CSV with a natural primary key into mongodb?

I have a large CSV file (100M), which I wish to import into mongodb.
So, I have set out to explore my options with a small sample CSV. The mongoimport command works fine
mongoimport.exe -d mydb -c mycoll --type csv --file .\aaa.csv --headerline --stopOnError
but it creates the _id keys of type ObjectId. Now each record in the CSV contains a natural primary key, which I want to become the _id in mongo.
How do I do it for the import?
EDIT
The top two lines are:
id,aaa,bbb,ccc,ddd,eee,fff,ggg,hhh,iii,jjj,kkk,lll,mmm,nnn,ooo,ppp,qqq,rrr,sss,ttt,uuu,vvv,www,xxx,yyy,zzz,q11,q22,q33,q44,q55,q66,q77,q88
72184515,4522534,"xo xo","2011-08-01 00:00:00","here",4848,4185,100,"xa xa","oops","yep",39.0797,-94.4067,"aha","qw","er","ty","opo",39.1029,-94.3826,2.06146,2,"q",1,"w","e","r","t","y","a","s","d","r","12787",""
The id column should become the _id.
In the header line of your .csv file, simply change "id" to "_id".
When you use mongoimport, you may find that it is a little limiting because it only creates data types of strings or numbers. The official recommendation for importing data from CSV files is to write your own script that will create documents containing the correct format and data types to fit your application.
However, if your .csv file contains only strings and numbers, then changing the header file should suffice.