I've been trying to figure out how to create a CouchDB view that will let me query all the documents that have a start date greater than A and an end date less than B.
Is this possible in CouchDB or another noSQL document store? Should I scrap it and go back to SQL?
I'm simply trying to do the SQL equivalent of:
SELECT * WHERE [start timestamp] >= doc.start AND [end timestamp] < doc.end;
Just create a map like this:
function (doc) {emit(doc.timestamp, 1)}
then query the view with:
?descending=true&limit=10&include_docs=true // Get the latest 10 documents
The view will be sorted oldest to latest so descending=true reverses that order.
If you want a specific range.
?startkey="1970-01-01T00:00:00Z"&endkey="1971-01-01T00:00:00Z"
would get you everything in 1970.
These should help:
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
http://wiki.apache.org/couchdb/HttpViewApi
http://wiki.apache.org/couchdb/View_collation
Use an array key in your map function
function (doc) {
var key = [doc.start, doc.end]
emit(key, doc)
}
Then to get documents with a start date greater then 1970-01-01T00:00:00Z and an end date before 1971-01-01T00:00:00Z use the query
?startkey=["1970-01-01T00:00:00Z", ""]&endkey=["\ufff0", "1971-01-01T00:00:00Z"]
I was looking for the same thing and stumbled upon this question. With CouchDB 2.0 or higher you have the possibility of using Mango Queries, which includes greater-than and less-than.
A mango query could look like:
"selector": {
"effectiveDate": {
"$gte": "2000-04-29T00:00:00.000Z",
"$lt": "2020-05-01T00:00:00.000Z"
}
}
Use startkey and endkey. This way you can decide your date range at runtime without slowing down your query.
Related
This question already has an answer here:
MongoDB sort with a custom expression or function
(1 answer)
Closed 5 years ago.
Let's say I have a collection with documents that look like this:
{
_id: <someMongoId>,
status: 'start'
otherImportantData: 'Important!'
}
...and status can be 'start', 'middle', or 'end'.
I want to sort these documents (specifically in the aggregation framework) by the status field - but I don't want it in alphabetical order; I want to sort in the order start -> middle -> end.
I've been looking for some way to project the status field to a statusValue field that is numeric (and where I get to dictate the numbers each string maps to), although I'd be happy to look at any alternatives.
Let's say I could do that. I'd take status and map it as such:
start: 1
middle: 2
end: 3
<anything else>: 0
then I could do something like this in the aggregation pipeline:
{
$sort: { statusValue : 1 }
}
...but I'm not sure how to get those statuses mapped in the aggregation pipeline.
If there is no way to do it, at least I'll know to stop looking. What would be the next best way? Using the Map-Reduce features in MongoDB?
You can try below aggregation in 3.4.
Use $indexOfArray to locate the position of search string in list of values and $addFields to keep the output index in the extra field in the document followed by $sort to sort the documents
[
{"$addFields":{ "statusValue":{"$indexOfArray":[[start, middle, end], "$status"]}}},
{"$sort":{"statusValue":1}}
]
I have documents in mongodb collection each with a timestamp (field_name = expires).
I need to get all the documents that are between latest timestamp present in collection and latest timestamp-90 minutes.
For example,
Current clock time is 4pm.
Time-stamp of latest document in Mongodb is 2pm.
I need to get documents between 2pm and 12.30pm
All the answers I found provide queries for the documents that lie within current clock time and 90 minutes before that. (In this example, it would be 2.30PM to 4 PM)
I could do it in 2 queries where, in first query, I get latest timestamp from Mongodb and then issue second query that matches documents between that timestamp and 90 minutes older timestamp.
pipeline =[]
sort = {
"$sort": {
"expires": -1
}
}
limit = {
"$limit" : 1
}
pipeline.append(sort)
pipeline.append(limit)
And calculate
end_time = (result['result'][0])['expires']
start_time = end_time - datetime.timedelta(minutes=90)
And second query would be
pipeline = []
match = {
"$match": {
"expires" : {
"$gt" : start_time,
"$lte" : end_time,
"$type": 18,
}
}
}
pipeline.append(match)
Is there a way to do this in single query using aggregation pipeline?
Please provide the link to the answer if posted already.
Thanks
Edit: I am using Mongodb 2.4
It is quiet simple, and can be done using the aggregation pipeline. You need not make expensive $sort or $unwind operations. One way of doing it would be,
$group all the records together, accumulate the records in an array named result. In the subsequent steps, we shall traverse through this array to keep only the records that we are interested in. For now, it will hold all the records in the collection. This is done, in order to get the $max expires field for the entire collection.
$redact through the array, $$DESCEND into those records which have an their expire field $gte (the maximum expires minus($subtract) 90 minutes (5400000ms)).
$project the result array containing the matching records.
Pipeline, which you can easily plug in to your python code:
db.collection.aggregate([
{$group:{"_id":null,
"result":{$push:"$$ROOT"},
"maxTimeStamp":{$max:"$expires"}}},
{$redact:{$cond:[{$gte:[{$ifNull:["$expires","$maxTimeStamp"]},
{$subtract:["$maxTimeStamp",5400000]}]},
"$$DESCEND",
"$$PRUNE"]}},
{$project:{"result":1,"_id":0}}
])
For earlier versions, where the $redact stage is not available, you would need to use an alternative approach using, $unwind and $project,
$unwind the result array.
$project a field, which holds a boolean value, to indicate if the expires field matches our criteria.
$match all the documents which are marked selectable.
modified approach,
db.collection.aggregate([
{$group:{"_id":null,
"result":{$push:{"expires":"$expires"}},
"maxTimeStamp":{$max:"$expires"}}},
{$unwind:"$result"},
{$project:{"selectable":{$cond:[{$gte:["$result.expires",
{$subtract:["$maxTimeStamp",5400000]}]},
true,
false]},"result":1}},
{$match:{"selectable":true}},
{$project:{"result":1,"_id":0}}
])
I am using the below query on my MongoDB collection which is taking more than an hour to complete.
db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)
I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.
The document count in this collection is just above 20 million.
An index on the field "language" is already created.
MongoDB Version i am using is 2.6.7
Is there a more appropriate index for this query which can get the result faster?
When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.
This index should greatly increase the performance of your sort:
db.collection.ensureIndex({ language: 1, _id: -1 });
When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:
db.collection.ensureIndex({ language: 1, _id: 1 });
Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.
In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.
You desired "sort key" here is _id, so that makes things simple:
First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:
db.collection.createIndex({ "language": 1, "_id": -1 })
Then you want to do some simple processing, from the start:
var lastId = null;
var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
That's the first batch. Now when you move on to the next one:
var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.
That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.
Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.
I want to do something like this:-
db.pub_pairs.find({status:'active'})
.sort( { if (pcat="abc") then 0 else 1 end })
.limit(10)
In other words, I want it to prefer records where the field pcat is "abc".
I also need it to be fast, but I can add indexes as necessary.
Any suggestions on how to do this?
Thanks
A decently efficient way of doing that would be to query the DB twice. Once for documents having pcat="abc" and once (if needed) for documents having pcat!="abc"
Something along the lines of (pseudo-code):
var data = db.pub_pairs.find({status:'active', pcat="abc"})
.limit(10)
if (data.length < 10)
data.concat(db.pub_pairs.find({status:'active', pcat!="abc"})
.limit(10-data.length))
For this to perform well, I would suggest a compound index on {status: 1, pcat: 1}
Please note, like any other "find" with MongoDB, you have to be prepared to retrieve the same document twice in the result set, notably here if the pcat field was concurrently modified by an other client during execution.
I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."