I have a collection subscribers.
I want to get a segment of subscribers by applying sometimes complex filters in the query db.subscribers.find({ age: { $gt: 20 }, ...etc }), but I don't want to save the result, since that would be inefficient.
Instead, I would like to save only the filters applied in the query as a set of rules in the segments collection.
Is that a good approach and what would be an efficient way to do that?
Should I just save the query object itself as a document or define a more restrictive schema before saving?
Related
I want to have pagination and sampling.
I have a client that requests entities in a random order.
I want to implement a new feature - pagination.
The only way I can think of is to use match to ensure the entities to be served do not have ID's of the entities already served, but this sounds like an expensive operation.
Is there a way to randomise the order of entities in a collection, and save it in a way? So that I can use skip on it?
How about using a combination of $sample and $out?
db.collection.aggregate({
$sample: { size: 300 } // get 300 random documents
}, {
$out: "temp" // write results to a new "temp" collection
})
Indexing is possible while fetching some particular records, sorting, ordering etc but suppose a collection contains lot many documents and it is taking time to fetch them all and display. So how to make this query faster using indexing? Is it possible using indexing? If yes then is it the best way or is there any other way too?
EDIT 1
Since indexing can't be used in my case. What is the most efficient way to write a query to fetch millions of records?
Edit 2
This is my mongoose query function for fetching data from some collection. If this collection has millions of data then obviously it will affect performance so how will you use indexing in this case for good performance?
Info.statics.findAllInfo = function( callback) {
this.aggregate([{$project:{
"name":"$some_name",
"age":"$some_age",
"city":"$some_city",
"state":"$some_state",
"country":"$some_country",
"zipcode":"$some_zipcode",
"time":{ $dateToString: { format: "%Y-%m-%d %H:%M:%S ", date: "$some_time" } }
}},
{
$sort:{
_id:-1
}
}],callback);
};
I haven't tried lean() method yet due to some temporary issue. But still i would like to know whether it will help or not?
I have some simple transaction-style data in a flat format like the following:
{
'giver': 'Alexandra',
'receiver': 'Julie',
'amount': 20,
'year_given': 2015
}
There can be multiple entries for any 'giver' or 'receiver'.
I am mostly querying this data based on the giver field, and then split up by year. So, I would like to speed up these queries.
I'm fairly new to Mongo so I'm not sure which of the following methods would be the best course of action:
1. Restructure the data into the format:
{
'giver': 'Alexandra',
'transactions': {
'2015': [
{
'receiver': 'Julie',
'amount': 20
},
...
],
'2014': ...,
...
}
}
This makes me the most sense to me. We place all transactions into subdocuments of a person rather than having transactions all over the collection. It provides the data in the form I query it by the most, so it should be fast to query by 'giver' and then 'transactions.year'
I'm unsure if restructuring data like this is possible inside of mongo or if I should export it and modify it outside of mongo via some programming language.
2. Simply index by 'giver'
This doesn't quite match the way I'm querying this data (by 'giver' then 'year'), but it could be fast enough to do what I'm looking for. It's simple within mongo to do, and doesn't require restructuring of the data.
How should I go about adjusting my database to make my queries faster? And which way is the 'Mongo way'?
I have data across three collections and need to produce a data set which aggregates data from these collections, and filters by a date range.
The collections are:
db.games
{
_id : ObjectId,
startTime : MongoDateTime
}
db.entries
{
player_id : ObjectId, // refers to db.players['_id']
game_id : ObjectId // refers to db.games['_id']
}
db.players
{
_id : ObjectId,
screen_name,
email
}
I want to return a collection which is number of entries by player for games within a specified range. Where the output should look like:
output
{
player_id,
screen_name,
email,
sum_entries
}
I think I need to start by creating a collection of games within the date range, combined with all the entries and then aggregate over count of entries, and finally output collection with the player data, it's seems a lot of steps and I'm not sure how to go about this.
The reason why you have these problems is because you try to use MongoDB like a relational database, not like a document-oriented database. Normalizing your data over many collections is often counter-productive, because MongoDB can not perform any JOIN-operations. MongoDB works much better when you have nested documents which embed other objects in arrays instead of referencing them. A better way to organize that data in MongoDB would be to either have each game have an array of players which took part in it or to have an array in each player with the games they took part in. It's also not necessarily a mistake to have some redundant additional data in these arrays, like the names and not just the ID's.
But now you have the problem, so let's see how we can deal with it.
As I said, MongoDB doesn't do JOINs. There is no way to access data from more than one collection at a time.
One thing you can do is solving the problem programmatically. Create a program which fetches all players, then all entries for each player, and then the games referenced by the entries where startTimematches.
Another thing you could try is MapReduce. MapReduce can be used to append results to another collection. You could try to use one MapReduce job for each of the relevant collections into one and then query the resulting collection.
I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.