I need to join two collections, on the first collection I have say profile info and on the second collection I have computed amounts (with dates). I want to join these collection based on a common id, but I want to create a result set of joined profile data with the latest amount.
I can create a join with the amount data as a list incorporated into it. But what is the smartest way to sort this nested list and then dump the other elements that are not the latest in the list.
So for example:
profile
{"pid": 123, "name": "xyz"}
amount
{"pid": "amount": 12.00, "date": date}
I can see two ways to do this, one with an unwind and one with a sort on the array. Is there any negative to speed in the unwind method?
Related
One of my collections in Cloud Firestore has an where each item in the array contains three separate values (see the groupMembership field:
I know I can't write a query to find documents that match one of the array values. Is there a way to write a query to find documents that match a specific value for all three items?
For instance, I want to find all users where the groupMembership array contains one object that is equal to groupId: X, groupName: Y, membershipStatus: active
You can pass the entire object in an array-contains clause.
So something like:
usersRef
.whereField("groupMembership", arrayContains: [
"groupId": "kmDT8OUOTCxSMIBf9yZC",
"groupName": "Jon and Charles Group",
"membershipStatus": "pending",
])
The array item must completely and exactly match the dictionary you pass in. If you want to filter only on some of the properties in each array element, you will have to create additional fields with just those properties. For example, it is quite common to have say a field groupIds with just the (unique) group Ids, so that you can also filter on those (of course within the limits of Firestore's queries).
I am trying to take an extract from a huge MongoDB collection.
In particular, the collection contains 2.65TB data (unzipped), i.e., 600GB data (zipped). Each document has a deep hierarchy and a couple of arrays and I want to extract some parts out of them. In this collection we have multiple documents for each customer id. Since I want to export the most active document for each customer, I need to group and take the records with the maximum timestamp field and perform some further processing on them. I need some help in forming the query for the export. I have tried to sort the documents per customer id, but this could not be achieved in an acceptable time when combined with a 'match' construct (this is needed since it is a huge collection and we try to create the export in parts). Currently the query looks like this:
db.getCollection('CEM').aggregate([
{'$match' : {'LiveFeed.customer.profile.id':'TCAYT2RY2PF93R93JVSUGU7D3'}},
{'$project':{'LiveFeed.customer.profile.id':1,'LiveFeed.customer.profile.products.air.flights':1, 'LiveFeed.context.timestamp':1}},
{'$sort':{'LiveFeed.customer.profile.id':1,"LiveFeed.context.timestamp":1}},
{'$group':{'_id':'$LiveFeed.customer.profile.id',
'products':{'$last':'$LiveFeed.customer.profile.products.air.flights'}}},
{'$unwind': '$products'},
{'$unwind': '$products.sources'},
{'$project':{'_id':0,
'ceid': '$_id',
'coupon_no':{'$ifNull':['$products.couponId.couponNumber', ""]},
'ticket_no':{'$ifNull':['$products.couponId.ticketId.number','']},
'pnr_id':'$products.sources.id',
'departure_date':'$products.segment.departure.at',
'departure_airport':'$products.segment.departure.code',
'arrival_airport':'$products.segment.arrival.code',
'created_date':'$products.createdAt'}}])
Any ideas/suggestions on to how to improve this query will be very helpful indeed - Thanks in advance!
It is difficult to answer this without knowing the indexes on your collection. However, you can save some time by eliminating stage 3. The $sort is undone by the $group in stage 4. See $group does not preserve order
Lets say I have simple document structure like:
{
"item": {
"name": "Skittles",
"category": "Candies & Snacks"
}
}
On my search page, whenever user searches for product name, I want to have a filter options by category.
Since categories can be many (like 50 types) I cannot display all of the checkboxes on the sidebar beside the search results. I want to only show those which have products associated with it in the results. So if none of the products in search result have a category, then do not show that category option.
Now, the item search by name itself is paginated. I only show 30 items in a page. And we have tens of thousands of items in our database.
I can search and retrieve all items from all pages, then parse the categories. But if i retrieve tens of thousands of items in 1 page, it would be really slow.
Is there a way to optimize this query?
You can use different approaches based on your workflow and see what works the best in your situation. Some good candidate for the solution are
Use distinct prior to running the query on large dataset
Use Aggregation Pipeline as #Lucia suggested
[{$group: { _id: "$item.category" }}]
Use another datastore(either redis or mongo itselff) to store intelligence on categories
Finally based on the approach you choose and the inflow of requests for filters, you may want to consider indexing some fields
P.S. You're right about how aggregation works, unless you have a match filter as first stage, it will fetch all the documents and then applies the next stage.
In my meteor project, I have a leaderboard of sorts, where it shows players of every level on a chart, spread across every level in the game. For simplicitys sake, lets say there are levels 1-100. Currently, to avoid overloading meteor, I just tell the server to send me every record newer than two weeks old, but that's not sufficient for an accurate leaderboard.
What I'm trying to do is show 50 records representing each level. So, if there are 100 records at level 1, 85 at level 2, 65 at level 3, and 45 at level 4, I want to show the latest 50 records from each level, making it so I would have [50, 50, 50, 45] records, respectively.
The data looks something like this:
{
snapshotDate: new Date(),
level: 1,
hp: 5000,
str: 100
}
I think this requires some mongodb aggregation, but I couldn't quite figure out how to do this in one query. It would be trivial to do it in two, though - select all records, group by level, sort each level by date, then take the last 50 records from each level. However, I would prefer to do it in one operation, if I could. Is it currently possible to do something like this?
Currently there is no way to pick up n top records of a group, in the aggregation pipeline. There is an unresolved open ticket regarding this: https://jira.mongodb.org/browse/SERVER-9377.
There are two solutions to this:
Keep your document structure as it is now and aggregate, but,
grab the n top records and slice off the remaining records for each group, in the client side.
Code:
var top_records = [];
db.collection.aggregate([
// The sort operation needs to come before the $group,
// because once the records are grouped by level,
// there exists only one document per group.
{$sort:{"snapshotDate":-1}},
// Maintain all the records in an array in sorted order.
{$group:{"_id":"$level","recs":{$push:"$$ROOT"}}},
],{allowDiskUse: true}).forEach(function(level){
level.recs.splice(50); //Keep only the top 50 records.
top_records.push(level);
})
Remember that this loads all the documents for each level and removes the unwanted records in the client side.
Alter your document structure to accomplish what you really need. If
you only need the top n records always, keep them always in sorted
order in the root document.This is accomplished using a sorted capped array.
Your document would look like this:
{
level:1,
records:[{snapshotDate:2,hp:5000,str:100},
{snapshotDate:1,hp:5001,str:101}]
}
where, records is an capped array of size n and always has sub documents sorted in descending order of their snapshotDate.
To make the records array work that way, we always perform an update operation when we need to insert documents to it for any level.
db.collection.update({"level":1},
{$push:{
recs:{
$each:[{snapshotDate:1,hp:5000,str:100},
{snapshotDate:2,hp:5001,str:101}],
$sort:{"snapshotDate":-1},
$slice:50 //Always trim the array size to 50.
}
}},{upsert:true})
What this does is, is always keeps the size of the records array to 50 and always sorts the records whenever new sub documents are inserted at a level.
A simple find, db.collection.find({"level":{$in:[1,2,..]}}), would give you the top 50 records in order, for each selected level.
I've got a simple parent child object stored as a document in MongoDB. Something simplistic like Order/OrderItems. (Order has an array of OrderItem)
What I'd like to do is query for a count of Order Items where they meet a set of criteria.
Example: In Order "999" find out how many order items had a quantity of 3.
db.collection.find( {OrderId:999, "OrderItems.QuantityOrdered":3} ).count();
The way this query works is it returns "1" because if it matches at least one OrderItem inside the array it will return the count of Orders matched.
How can I query for how many "OrderItems" matched?:
There's no direct way of return this sort of count with an embedded document. With this sort of ad-hoc count, your best bet is to return the document and do the count from the application.
If you want to perform this kind of count for a large number of orders, you could use map-reduce, which will output the results to a new collection.