I'm working on a Raspberry Pi project that collects weather measurements and stores them in a Mongo database like this:
{
"_id": {
"$oid": "577975c874fece5775117209"
},
"timestamp": {
"$date": "2016-07-03T20:30:00.995Z"
},
"temp_f": 68.9,
"temp_c": 20.5,
"humidity": 50,
"pressure": 29.5
}
The data is going into the Mongo db just fine. Next, I'm trying to build a Flask-based dashboard that enables me to look at the recorded data. On one of the pages of the dashboard, I want to show the current recorded values, so what I need to do is pull out the last measurement and pass it to a flask template for rendering to the browser.
I found a post here that said I could use:
data = db.measurements.find().limit(1).sort({"$natural": -1})
but natural doesn't seem to be a valid option for the call to find.
This works:
measurement = mongo.db.measurements.find_one()
It pulls back one random measurement that I can then pass to the flask template, but how do I use sort to get the most recent one?
I tried:
measurement = mongo.db.measurements.find_one().sort([("timestamp", -1)])
but that generates an attribute error: AttributeError: 'dict' object has no attribute 'sort'
I've also tried:
cursor = mongo.db.measurements.find().limit(1).sort({"timestamp": -1})
but that doesn't work either.
I'm missing something here, can someone give me a quick, complete fix for this?
It turns out Pymongo has a different format for sort. You can't pass in a JSON object, you have to use a dict. Once I fixed that, it all works:
cursor = mongo.db.measurements.find().sort([('timestamp', -1)]).limit(1)
Related
I am trying to perform aggregate queries using SumoLogic APIs as mentioned here.
Something like:
_view = <some_view> | where sourceCategory matches \"something\" | sum(field) by sourceCategory
This works just fine in the Sumo GUI. I get a field in result called "_sum" which gives me the desired result.
However the same doesn't work when I do it using the SUMO APIs. If I create a job with this body:
{
"query": "_view = <some_view> | where sourceCategory matches "something" | sum(field) by sourceCategory",
"from": "start_timestamp",
"to": "end_timestamp",
"timeZone": "some_timezone"
}
I call the "v1/search/jobs" POST method with the above body and I do GET "v1/search/jobs/{job_id}" till the state is "DONE GATHERING RESULTS". Then I do "v1/search/jobs/{job_id}/messages". I was expecting to see aggregated values in the result, but instead I see something similar to:
{
"fields":[
{
"name":"_messageid",
"fieldType":"long",
"keyField":false
}, ...
],
"messages":[
{
"map":{
"_receipttime":"1359407350899",
"_size":"549",
"_sourcecategory":"service",
"_sourceid":"1640",
"the_field_i_mentioned":"not-aggregated-value"
"_messagecount":"2044"
}
}, ...
]
]
Thanks for going through my question. Any advices / work-arounds are appreciated. I don't really want to iterate manually through all items and calculate the sum. I'd prefer to do it on SumoLogic side itself. Thanks Again!
Explanation
Similar as in the User Interface, in the API for log searches you get both raw results (also referred to as messages) and the aggregate results (also referred to as records).
(Obviously, the latter are only returned if there's any aggregation in the query. In your case there is.)
Actual suggestion
Then I do "v1/search/jobs/{job_id}/messages"
Try /records instead.
See the docs for "Paging through the records found by a Search Job"
Disclaimer: I am currently employed by Sumo Logic.
I have 10,000 data in a collection and want to copy the whole thing to another collection.
I'm new to mongodb and were stuck here for a while and am looking for a help.
I've tried
for a in db.source_file.find():
try:
db.destination.insert(a) // tried insert_one here too
except:
print('did not copy')
nothing copies and kept printing out "did not copy"
I also have tried this
SOURCE = db.source_file
DESTINATION = db.destination
pipeline = [ {"$match": {}},
{"$out": "DESTINATION" },
]
SOURCE.aggregate(pipeline)
This didn't copy anything either
The source collection definitely contains data as when I tried source_col.find_one(), it prints out the data.
Any suggestions?
Have you tried using db.command() function?
Something like this should work.
db.command('eval', 'db.collection.copyTo("newcollection")', nolock=True)
I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);
Mongo has a nice feature that tells you when a document was created.
ObjectId("53027f0adb97425bbd0cce39").getTimestamp() = ISODate("2014-02-17T21:28:42Z")
How would I get about finding all documents that were created before lets say February 10th 2014? Searched around but doesn't seem like this question comes up. Any help is appreciated! Thanks!
You mean something like this?
db.YOUR_COLLECTION.find({YOUR_DATE_FIELD: { "$lt": ISODate("2014-02-10") }})
Guess that you have to make the same as JoJo recommended:
Convert a date to an ObjectId
Filter ID using $lt and returned ObjectId
Using pymongo you can do something like this:
gen_time = datetime.datetime(2014, 2, 10)
dummy_id = ObjectId.from_datetime(gen_time)
result = collection.find({"_id": {"$lt": dummy_id}})
Reference: http://api.mongodb.org/python/1.7/api/pymongo/objectid.html
I have documents in mongodb that i'm accessing with mongoid that looks like this:
{
"first_name": "Clarke",
"last_name": "Kent",
"vault_info": {
"container": "names",
"created_at": "2013-12-09T23:18:07.963Z",
"updated_at": "2013-12-09T23:18:07.963Z",
"vault_id": "4dc08baa97"
}
}
I want to be able to query for it using the for_js method like this:
Model.for_js('this.vault_info.vault_id=="5088de6f12"')
If there is a single document in the database this works. If there is more than one it gives this error:
"TypeError: Cannot read property 'vault_id' of undefined near '==\"5088de6f12\"' "
Any help would be appreciated.
Robert
It looks like the new documents you're adding don't have a vault_info property on them. That means that this.vault_info evaluates to undefined, and you can't access any property (including vault_id) of undefined.
If you don't have to use the for_js method, just use the where method like this:
Model.where('vault_info.vault_id' => '5088de6f12')
Mongo intelligently filters out all documents that lack the vault_info property, so you don't get any errors about accessing properties of undefined. JavaScript doesn't help you out like that, so if you do absolutely have to use the for_js method, you'll have to check each step of the way. That'd look something like:
Model.for_js('this.vault_info && this.vault_info.vault_id=="5088de6f12"')
I'd encourage you to use the where method, though. The less JavaScript you send to MongoDB the better.