MongoDB Pymongo copy a whole collection to another

MongoDB Pymongo copy a whole collection to another - mongodb

I have 10,000 data in a collection and want to copy the whole thing to another collection.
I'm new to mongodb and were stuck here for a while and am looking for a help.
I've tried
for a in db.source_file.find():
try:
db.destination.insert(a) // tried insert_one here too
except:
print('did not copy')
nothing copies and kept printing out "did not copy"
I also have tried this
SOURCE = db.source_file
DESTINATION = db.destination
pipeline = [ {"$match": {}},
{"$out": "DESTINATION" },
]
SOURCE.aggregate(pipeline)
This didn't copy anything either
The source collection definitely contains data as when I tried source_col.find_one(), it prints out the data.
Any suggestions?

Have you tried using db.command() function?
Something like this should work.
db.command('eval', 'db.collection.copyTo("newcollection")', nolock=True)

Related

MongoDB findOneAndReplace log if added as new document or replaced

I'm using mongo's findOneAndReplace() with upsert = true and returnNewDocument = true
as basically a way to not insert duplicate. But I want to get the _id of the new inserted document (or the old existing document) to be passed to a background processing task.
BUT I also want to log if the document was Added-As-New or if a Replacement took place.
I can't see any way to use findOneAndReplace() with these parameters and answer that question.
The only think I can think of is to find, and insert in two different requests which seems a bit counter-productive.
ps. I'm actually using pymongo's find_one_and_replace() but it seems identical to the JS mongo function.
EDIT: edited for clarification.

Is it not possible to use replace_one function ? In java I am able to use repalceOne which returns UpdateResult. That has method for finding if documented updated or not. I see repalce_one in pymongo and it should behave same. Here is doc PyMongo Doc Look for replace_one

The way I'm going to implement it for now (in python):
import pymongo
def find_one_and_replace_log(collection, find_query,
document_data,
log={}):
''' behaves like find_one_or_replace(upsert=True,
return_document=pymongo.ReturnDocument.AFTER)
'''
is_new = False
document = collection.find_one(find_query)
if not document:
# document didn't exist
# log as NEW
is_new = True
new_or_replaced_document = collection.find_one_and_replace(
find_query,
document_data,
upsert=True,
return_document=pymongo.ReturnDocument.AFTER
)
log['new_document'] = is_new
return new_or_replaced_document

Retrieve last document in a MongoDB using Pymongo and Flask

I'm working on a Raspberry Pi project that collects weather measurements and stores them in a Mongo database like this:
{
"_id": {
"$oid": "577975c874fece5775117209"
},
"timestamp": {
"$date": "2016-07-03T20:30:00.995Z"
},
"temp_f": 68.9,
"temp_c": 20.5,
"humidity": 50,
"pressure": 29.5
}
The data is going into the Mongo db just fine. Next, I'm trying to build a Flask-based dashboard that enables me to look at the recorded data. On one of the pages of the dashboard, I want to show the current recorded values, so what I need to do is pull out the last measurement and pass it to a flask template for rendering to the browser.
I found a post here that said I could use:
data = db.measurements.find().limit(1).sort({"$natural": -1})
but natural doesn't seem to be a valid option for the call to find.
This works:
measurement = mongo.db.measurements.find_one()
It pulls back one random measurement that I can then pass to the flask template, but how do I use sort to get the most recent one?
I tried:
measurement = mongo.db.measurements.find_one().sort([("timestamp", -1)])
but that generates an attribute error: AttributeError: 'dict' object has no attribute 'sort'
I've also tried:
cursor = mongo.db.measurements.find().limit(1).sort({"timestamp": -1})
but that doesn't work either.
I'm missing something here, can someone give me a quick, complete fix for this?

It turns out Pymongo has a different format for sort. You can't pass in a JSON object, you have to use a dict. Once I fixed that, it all works:
cursor = mongo.db.measurements.find().sort([('timestamp', -1)]).limit(1)

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.

I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)

It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);

Mongo find by regex: return only matching string

My application has the following stack:
Sinatra on Ruby -> MongoMapper -> MongoDB
The application puts several entries in the database. In order to crosslink to other pages, I've added some sort of syntax. e.g.:
Coffee is a black, caffeinated liquid made from beans. {Tea} is made from leaves. Both drinks are sometimes enjoyed with {milk}
In this example {Tea} will link to another DB entry about tea.
I'm trying to query my mongoDB about all 'linked terms'. Usually in ruby I would do something like this: /{([a-zA-Z0-9])+}/ where the () will return a matched string. In mongo however I get the whole record.
How can I get mongo to return me only the matched parts of the record I'm looking for. So for the example above it would return:
["Tea", "milk"]
I'm trying to avoid pulling the entire record into Ruby and processing them there

I don't know if I understand.
db.yourColl.aggregate([
{
$match:{"yourKey":{$regex:'[a-zA-Z0-9]', "$options" : "i"}}
},
{
$group:{
_id:null,
tot:{$push:"$yourKey"}
}
}])
If you don't want to have duplicate in totuse $addToSet

The way I solved this problem is using the string aggregation commands to extract the StartingIndexCP, ending indexCP and substrCP commands to extract the string I wanted. Since you could have multiple of these {} you need to have a projection to identify these CP indices in one shot and have another projection to extract the words you need. Hope this helps.

Am i correctly using indexes for this mongoDB?

So i need some advice as to what i'm doing incorrectly.
My database is setup up exactly like a file system consisting of folders and files.
It begins with a folder, but can have a relatively infinite number of subfolders and or files.
{
"name":"folder1",
"uniqueID":"zzz0",
"subcontents": [ {"name":"subfolder1", "uniqueID":"zzz1"},
{"name":"subfile1", "uniqueID":"zzz2"},
{"name":"subfile2", "uniqueID":"zzz3"},
{"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"},
]
}
Each folder/file document have a uniqueID so that I can reference to it (seen above zzz#). My question is, can I make a mongoDB query to pull out only a single document?
Like say for example db.fileSystemCollection.find({"uniqueID":"zzz4"}) and it would give me the following result? Do i have to use indexes to do this? I've been trying but the query returns empty every time.
intended result ---> {"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"}
[EDIT]
Based on the responses below, I will consider an XML database instead on mongoDB. The json structure cant be rearranged to work with MongoDB (too much data).

Short answer is no, as it's stated by Chris.
Your embedded representation of a tree is really good for intuitive understanding (and implementation as well). But if you want to allow effective searches on your tree using indices in MongoDB, you might consider another ways for tree storage. A bunch of ways is listed at http://docs.mongodb.org/manual/tutorial/model-tree-structures/
Please keep in mind that every representation has its own pros and cons depending on your access patterns.
Since for filesystem-like structure it's likely to have the ability to find all the sub contents of a given folder, you may use child references pattern for this:
{
"name":"folder1",
"uniqueID":"zzz0",
"subcontents": [ "zzz1",
"zzz2",
"zzz3",
"zzz4"
]
}
{
"name":"subfolder1",
"uniqueID":"zzz1"
}
...

No; searching for {uniqueID: "zzz4"} will only get you documents whose top-level uniqueID matches.
What you probably want is to maintain an array on the document which lists all the unique IDs in that tree. So your document would be:
{
"name":"folder1",
"uniqueID":"zzz0",
"idList": ["zzz0", "zzz1", "zzz2", "zzz3", "zzz4"],
"subcontents": [ {"name":"subfolder1", "uniqueID":"zzz1"},
{"name":"subfile1", "uniqueID":"zzz2"},
{"name":"subfile2", "uniqueID":"zzz3"},
{"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"},
]
}
Then you can index that:
db.fileSystemCollection.ensureIndex({"idList": 1})
Then you can find on it:
db.fileSystemCollection.find({"idList": "zzz4})
That'll return you those documents.
As an aside, if you're trying to store files in Mongo, have you looked at GridFS?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB Pymongo copy a whole collection to another - mongodb

Have you tried using db.command() function? Something like this should work. db.command('eval', 'db.collection.copyTo("newcollection")', nolock=True)

Related

MongoDB findOneAndReplace log if added as new document or replaced

Retrieve last document in a MongoDB using Pymongo and Flask

call custom python function on every document in a collection Mongo DB

Mongo find by regex: return only matching string

Am i correctly using indexes for this mongoDB?

Categories

Resources