How to multi-sort MongoDB entry with dynamic keys, on two suboptions? - mongodb

I'm trying to sort this in MongoDB with mongojs on a find():
{
"songs": {
"bNppHOYIgRE": {
"id": "bNppHOYIgRE",
"title": "Kygo - ID (Ultra Music Festival Anthem)",
"votes": 1,
"added": 1428514707,
"guids": [
"MzM3NTUx"
]
},
"izJzdDPH9yw": {
"id": "izJzdDPH9yw",
"title": "Benjamin Francis Leftwich - Atlas Hands (Samuraii Edit)",
"votes": 1,
"added": 1428514740,
"guids": [
"MzM3NTUx"
]
},
"Yifz3X_i-F8": {
"id": "Yifz3X_i-F8",
"title": "M83 - Wait (Kygo Remix)",
"votes": 0,
"added": 1428494338,
"guids": []
},
"nDopn_p2wk4": {
"id": "nDopn_p2wk4",
"title": "Syn Cole - Miami 82 (Kygo Remix)",
"votes": 0,
"added": 1428494993,
"guids": []
}
}
}
and I want to sort the keys in the songs on votes ascending and added descending.
I have tried
db.collection(coll).find().sort({votes:1}, function(err, docs) {});
but that doesn't work.

If this is an operation that you're going to be doing often, I would strongly consider changing your schema. If you make songs an array instead of a map, then you can perform this query using aggregation.
db.coll.aggregate([{ "$unwind": "$songs" }, { "$sort": { "songs.votes": 1, "songs.added": -1 }}]);
And if you put each of these songs in a separate songs collection, then you could perform the query with a simple find() and sort().
db.songs.find().sort({ "votes": 1, "added": -1 });
With your current schema, however, all of this logic would need to be in your application and it would get messy. A possible solution would be to get all of the documents and while iterating through the cursor, for each document, iterate through the keys, adding them to an array. Once you have all of the subdocuments in the array, sorting the array according to votes and added.
It is possible, but unnecessarily complex. And, of course, you wouldn't be able to take advantage of indexes, which would have an impact on your performance.
You already include the key inside the subdocument, so I would really recommend you reconsider your schema.

Related

Search and update in array of objects MongoDB

I have a collection in MongoDB containing search history of a user where each document is stored like:
"_id": "user1"
searchHistory: {
"product1": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
},
{
"timestamp": 1623481234,
"query": {
"query": "lindor",
"qty": 4
}
},
],
"product2": [
{
"timestamp": 1623473622,
"query": {
"query": "table",
"qty": 1
}
},
{
"timestamp": 1623438232,
"query": {
"query": "ike",
"qty": 1
}
},
]
}
Here _id of document acts like a foreign key to the user document in another collection.
I have backend running on nodejs and this function is used to store a new search history in the record.
exports.updateUserSearchCount = function (userId, productId, searchDetails) {
let addToSetData = {}
let key = `searchHistory.${productId}`
addToSetData[key] = { "timestamp": new Date().getTime(), "query": searchDetails }
return client.db("mydb").collection("userSearchHistory").updateOne({ "_id": userId }, { "$addToSet": addToSetData }, { upsert: true }, async (err, res) => {
})
}
Now, I want to get search history of a user based on query only using the db.find().
I want something like this:
db.find({"_id": "user1", "searchHistory.somewildcard.query": "some query"})
I need a wildcard which will replace ".somewildcard." to search in all products searched.
I saw a suggestion that we should store document like:
"_id": "user1"
searchHistory: [
{
"key": "product1",
"value": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
]
}
]
However if I store document like this, then adding search history to existing document becomes a tideous and confusing task.
What should I do?
It's always a bad idea to save values are keys, for this exact reason you're facing. It heavily limits querying that field, obviously the trade off is that it makes updates much easier.
I personally recommend you do not save these searches in nested form at all, this will cause you scaling issues quite quickly, assuming these fields are indexed you will start seeing performance issues when the arrays get's too large ( few hundred searches ).
So my personal recommendation is for you to save it in a new collection like so:
{
"user_id": "1",
"key": "product1",
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
Now querying a specific user or a specific product or even a query substring is all very easily supported by creating some basic indexes. an "update" in this case would just be to insert a new document which is also much faster.
If you still prefer to keep the nested structure, then I recommend you do switch to the recommended structure you posted, as you mentioned updates will become slightly more tedious, but you can still do it quite easily using arrayFilters for updating a specific element or just using $push for adding a new search

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

Update Array Children Sorted Order

I have a collection containing objects with the following structure
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp1",
"age": 31
},{
"name": "emp2",
"age": 35
}]
}
I would like to sort and save the array of employees for the object with id "some_id", by employees.age, descending. The best outcome would be to do this atomically using mongodb's query language. Is this possible?
If not, how can I rearrange the subdocuments without affecting the parent's other data or the data of the subdocuments? In case I have to download the data from the database and save back the sorted array of children, what would happen if something else performs an update to one of the children or children are added or removed in the meantime?
In the end, the data should be persisted to the database like this:
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp2",
"age": 35
},{
"name": "emp1",
"age": 31
}]
}
The best way to do this is to actually apply the $sort modifier as you add items to the array. As you say in your comment "My actual objects have a "rank" and 'created_at'", which means that you really should have asked that in your question instead of writing a "contrived" case ( don't know why people do that ).
So for "sorting" by multiple properties, the following reference would adjust like this:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "rank": -1, "created_at": -1 } } } },
{ "multi": true }
)
But to update all the data you presently have "as is shown in the question", then you would sort on "age" with:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "age": -1 } } } },
{ "multi": true }
)
Which oddly uses $push to actually "modify" an array? Yes it's true, since the $each modifier says we are not actually adding anything new yet the $sort modifier is actually going to apply to the array in place and "re-order" it.
Of course this would then explain how "new" updates to the array should be written in order to apply that $sort and ensure that the "largest age" is always "first" in the array:
db.collection.update(
{ "dep_id": "some_id" },
{ "$push": {
"employees": {
"$each": [{ "name": "emp": 3, "age": 32 }],
"$sort": { "age": -1 }
}
}}
)
So what happens here is as you add the new entry to the array on update, the $sort modifier is applied and re-positions the new element between the two existing ones since that is where it would sort to.
This is a common pattern with MongoDB and is typically used in combination with the $slice modifier in order to keep arrays at a "maximum" length as new items are added, yet retain "ordered" results. And quite often "ranking" is the exact usage.
So overall, you can actually "update" your existing data and re-order it with "one simple atomic statement". No looping or collection renaming required. Furthermore, you now have a simple atomic method to "update" the data and maintain that order as you add new array items, or remove them.
In order to get what you want you can use the following query:
db.collection.aggregate({
$unwind: "$employees" // flatten employees array
}, {
$sort: {
"employees.name": -1 // sort all documents by employee name (descending)
}
}, {
$group: { // restore the previous structure
_id: "$_id",
"dep_id": {
$first: "$dep_id"
},
"departament": {
$first: "$departament"
},
"employees": {
$push: "$employees"
},
}
}, {
$out: "output" // write everything out to a separate collection
})
After this step you would want to drop your source table and rename the "output" collection to match your source table name.
This solution will, however, not deal with the concurrency issue. So you should remove write access from the collection first so nobody modifies it during the process and then restore it once you're done with the migration.
You could alternatively query all data first, then sort the employees array on the client side and then use either single update queries or - faster but more complicated - a bulk write operation with all the individual update calls in order to update the existing documents. Here, you could use the entire document that you've initially read as a filter for the update operation. So if an individual update does not modify any document you'd know straight away, that some other change must have modified the document you read before. Those cases you'd need to retry later (or straight away until the update does actually modify a document).

How to get (or aggregate) distinct keys of array in MongoDB

I'm trying to get MongoDB to aggregate for me over an array with different key-value pairs, without knowing keys (Just a simple sum would be ok.)
Example docs:
{data: [{a: 3}, {b: 7}]}
{data: [{a: 5}, {c: 12}, {f: 25}]}
{data: [{f: 1}]}
{data: []}
So basically each doc (or it's array really) can have 0 or many entries, and I don't know the keys for those objects, but I want to sum and average the values over those keys.
Right now I'm just loading a bunch of docs and doing it myself in Node, but I'd like to offload that work to MongoDB.
I know I can unwind those first, but how to proceed from there? How to sum/avg/min/max the values if I don't know the keys?
If you do not know the keys or cannot make a reasonable educated guess then you are basically stuck from going any further with the aggregation framework. You could supply "all of the keys" for consideration, but I supect your acutal data looks more like this:
{ "data": [{ "film": 10 }, { "televsion": 5 },{ "boardGames": 1 }] }
So there would be little point here findin out all the "key names" and then throwing that at an aggregation statement.
For the record though, "this is why you do not structure your data storage like this". Information like "film" here should not be used as a "key" name, because it is useful "data" that could be searched upon and most importantly "indexed" in a database system.
So your data should really look like this:
{
"data": [
{ "type": "film", "value": 10 },
{ "type": "televsion", "valule": 5 },
{ "type": "boardGames", "value": 1 }
]
}
Then the aggregation statement is simple, as are many other things:
db.collection.aggregate([
{ "$unwind": "$data" },
{ "$group": {
"_id": null,
"sum": { "$sum": "$data.value" },
"avg": { "$avg": "$data.value" }
}}
])
But since the key names are constantly changing in documents and do not have a uniform structure, then you need JavaScript processing on the server to traverse the keys, and that meand mapReduce:
db.collection.mapReduce(
function() {
this.data.forEach(function(data) {
Object.keys(data).forEach(function(key) {
emit(null,data[key]); // emit the value regardless of key name
});
});
},
function(key,values) {
return Array.sum(values); // Just summing for example
},
{ "out": { "inline": 1 } }
)
And of course the JavaScript execution here will work much more slowly than the native coded operators available to the aggregation framework.
So this should be an abject lesson as to why you don not use "data" as "key names" when storing data in a database. The aggregation framework works with standard structres and is fast, falling back to JavaScript processing is more flexible, but the cost is mostly in speed and other features.

Mongo using indexes with sort

I'm trying to optimize a mongodb query. I have an index on from_account_id, to_account_id, and created_at. But the following query does a full collection scan.
{
"ts": {
"$date": "2012-03-18T20:29:27.038Z"
},
"op": "query",
"ns": "heroku_app2281692.transactions",
"query": {
"$query": {
"$or": [
{
"from_account_id": {
"$oid": "4f55968921fcaf0001000005"
}
},
{
"to_account_id": {
"$oid": "4f55968921fcaf0001000005"
}
}
]
},
"$orderby": {
"created_at": -1
}
},
"ntoreturn": 25,
"nscanned": 2643718,
"responseLength": 20,
"millis": 10499,
"client": "10.64.141.77",
"user": "heroku_app2281692"
}
If I don't do the or, and only query from_account_id or to_account_id with an order on it, it's fast.
What's the best way to get the desired effect? Should I be keeping account_ids (both from and to) in one field like an array? Or perhaps there is a better way. Thanks!
Unfortunately, as you have discovered, an $or clause can make life difficult for the optimizer.
So, to work around this you have a couple options. Among them:
Divide your query into two and manually merge the results.
Change your data model to allow efficient querying. For example, you might add a "referenced_accounts" field that is an array of all the accounts referenced in the transaction.