How to do querying, sorting and filtering on values of an object in mongodb? - mongodb

I am trying to find a way to do querying, sorting and filtering on values of an object(which is again an object) in a mongo document. The document structure is,
{
_id: '',
uid: '12345',
objects:{
dkey1: {
prop1: val1,
prop2: val2,
...
},
dkey2: {
prop1: val1,
prop2: val2,
...
},
dkey3: {
prop1: val1,
prop2: val2,
...
},
dkey4: {
prop1: val1,
prop2: val2,
...
}
...
}
}
objects property can contain 1000s of objects with dynamic keys. Thery are hash based unique keys. When I get these objects, I don't want to return all. I want to query, sort, limit as it can be done if they are from different documents. For example, if I say prop1 = val1 sort by prop2 limit 10, the query should return first 10 sub-objects in the objects, where their prop1 is val1 sorted by prop2.
I think it cannot be done with normal find. So, I am trying with aggregation framework. In first stage I will do match on uid. Next? I am confused there. Instead of objects with dynamic keys, if it is an array of objects, I can do $unwind and in further stages, I could've done filter on inner properties(prop1, prop2...), sorting and applying limit etc. But the problem is, it is not an array of objects. If there is a way to convert values of objects object into array of objects, it would be easier. I was looking for the way, but I could not find a solution.
I know the structure is not good and changing the schema would help me. But I am in a situation, I cannot change it now. Is there a way to convert objects's values into array of objects? Or is there different way to achieve the same result with some other aggregation pipeline stages?

Why can't this be done with normal find? According to the documents aggregation operations is to "group values from multiple documents together", and if I understood correctly that's not what you want to do here.
Try this:
db.objects.find({prop1: 'val1'}).sort({prop2: 1}).limit(10)
This was tested in mongo shell
objects would be your collection
the number 1 on sort means ascending order, -1 would be descending
and the number 10 on limit is the limit value of course
--- Edit ----
If you want to access to properties of documents inside another document, use the dot notation.
Example: db.objects.find({'nestedobj.prop1': 'val1'})
--- Edit 2 ---
Now I see I misunderstood your question. Sory. The problem here is that I don't think there is an operator that will let you access any embedded document (I really don't know, I can look into it but not right now).
But maybe I can help you telling you that if you are going to use 'aggregate', '$match' would be to filter the results, so uid wouldn't be on that pipeline stage. MongoDB 2.4 provides support for Javascript functions executed at database, but the purpose of the aggregation framework, as I told you before, is to map reduce the documents, so this isn't the best case scenario. I would be concerned about the performance and the ability of the database engine to accomplish what you want. But I think it should be tested before dismissing the idea.
Sorry again for misunderstanding your question and I hope you can solve your problem. Let me know if you do and how!

Related

How to create a collection from a query in MongoDB?

I have an existing collection where I can do some queries on. For further data processing, it would be handy to create some subset collections via query.
I understood that I can use the aggregate function with $match and $expr to e.g. $group some values and at the end use $out to get a new collection with the results.
The thing I am hanging on is not to $group anything, but just put the objects that $match finds into a new collection only. So not the complete objects with all their values. Just the one I am matching. Like when you db[collection].find({$match: {...}}, {"key1": 1, "key2": 0})
Where I get the new matching objects just containing key1: value1 but not key2: value2, which is also in the original collection.
How do I achieve that using aggregate without grouping anything? I read through the documentation and couldn't find any other stage operation that looks good.
As i mentioned in the comments $project is the right operator to use to achieve this.

Custom MongoDB Object _id vs Compound index

So I need to create a lookup collection in MongoDB to verify uniqueness. The requirement is to check if the same 2 values are being repeated or not. In SQL, I would something like this
SELECT count(id) WHERE key1 = 'value1' AND key2 = 'value2'
If the above query returns a count then it means the combination is not unique. I have 2 solutions in mind but I am not sure which one is more scalable. There are 30M+ docs against which I need to create this mapping.
Solution1:
I create a collection of docs with compound index on key1 and key2
{
_id: <MongoID>,
key1: <value1>,
key2: <value2>
}
Solution2:
I write application logic to create custom _id by concatenating value1 and value2
{
_id: <value1>_<value2>
}
Personally, I feel the second one is more optimised as it only has a single index and the size of doc is also smaller. But I am not sure if it is a good practice to create my own _id indexes as they may not be completely random. What do you think?
Thanks in advance.
Update:
My database already has a lot of indexes which take up memory so I want to keep index size to as low as possible specially for collections which are only used to verify uniqueness.
I would suggest Solution 1 i.e to use compound index and use two different properties key1 and key2
db.yourCollection.ensureIndex( { "key1": 1, "key2": 1 }, { unique: true } )
You can search easily by individual field if required. i.e if you require to search only by key1 or key2 then it would be easy with compound index. If you make _id with combination of keys, then it will be hard to search by individual field.
Size of document in Mongo is very least bothered while designing document.
If in near future if you would required to change keys values of same document with respect to other values, it will be easy. Keep in mind if you are using reference of this document in other collection's document.
In terms of your scalability, _id index would be sequential, easily shardable, and you can let MongoDB manage it.
If you are searching with those keys then it will use that index otherwise it will use the other required indexes for your search.
If you are still thinking of size of document than searching then you can go with Solution 1, make _id like
{_id:{key1:<value1>,key2:<value2>}}
By this you can search specific _id.key1 too.
Update:
Yes if document size is your concern than maintaining. And if you are sure about keys will not modify in future of same document and if it still modifying and do not have reference in other collections, then you can use Solution 1. Just use keys as objects than underscore _. You can add more keys later too if wanted in future.
I think the solution 2 is more suitable for your requirement. It is absolutely ok to generate the _id value of MongoDB. Most of the applications does populate the _id value with UUID. In your case, it make sense to concatenate value 1 and 2 for _id value assuming this collection is primarily used for verifying the uniqueness (i.e kind of temporary table) or lookup purpose.
Solution 1 is expensive as it requires additional index. Again, it depends on whether you are going to use this collection for verifying the uniqueness purpose alone or for some other use case as well.
Please note that you need to create the unique compound index, so that it doesn't allow to insert data for duplicate values.

How do I make a mongo query for something that is not in a subdocument array of heterodox size?

I have a mongodb collection full of 65k+ documents, each one with a properties named site_histories. The value of it is an array that might be empty, or might not be. If it is not empty, it will have one or more objects similar to this:
"site_histories" : "[{\"site_id\":\"129373\",\"accepted\":\"1\",\"rejected\":\"0\",\"pending\":\"0\",\"user_id\":\"12743\"}]"
I need to make a query that will look for every instance in the collection of a document that does not have a given user_id.
I'm pretty new to Mongo, so I was trying to make a query that would find every instance that does have the given user_id, which I was then planning on adding a "$ne" to, but even that didn't work. This is the query I was using that didn't work:
db.test.find({site_histories: { $elemMatch: {user_id: '12743\' }}})
So can anyone tell me why this query didn't work? And can anyone help me format a query that will do what I need the final query to do?
If your site_histories really is an array, it should be as simple as doing:
db.test.find({"site_histories.user_id": "12743"})
That looks in all the elements of the array.
However, I'm a bit scared of all those backslashes. If site_histories is a string, that won't work. It would mean that the schema is poorly designed, you'd maybe try with $regex

mongodb: create a top-level index for a nested document instead of having to index each individual sublevel?

This question is about how I can use indexes in MongoDB to look something up in nested documents, without having to index each individual sublevel.
I have a collection "test" in MongoDB which basically goes something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : {
"1" : { [1,2,3] },
"2" : { [4,5,6] }
}}
Scenario has multiple keys, each document can have any subset of the scenarios (i.e. from none to a subset to all). Also: Scenario can't be an array because i need it as a dictionary in Python. I created an index on the "scenario" field.
My issue is that i want to select on the collection, filtering for documents that have a certain value. So this works fine functionally:
db.test.find({"scenario.1": {$exists: true}})
However, it won't use any index i've put on scenario. Only if i put an index on the "scenario.1" an index is used. But I can have thousands (or more) scenarios (and the collection itself has 100.000s of records), so i would prefer not to!
So i tried alternatives:
db.test.find({"scenario": "1"})
This will use the index on scenario, but won't return results. Making scenario an array still gives the same index issue.
Is my question clear? Can anyone give a pointer on how I could achieve the best performance here?
P.s. I have seen this: How to Create a nested index in MongoDB? but that solution is not possible in my case (due to the amount of scenarios)
Putting an index on a subobject like scenario is useless in this case as it would only be used when you're filtering on complete scenario objects rather than individual fields (think of it as a binary blob comparison).
You either need to add an index on each of your possible fields ("scenario.1", "sceanario.2", etc.) or rework your schema to get rid of the dynamic keys by doing something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [
{ id: "1", value: [1,2,3] },
{ id: "2", value: [4,5,6] }
}}
Then you can add a single index to scenario.id to support the queries you need to perform.
I know you said you need scenario to be a dict and not an array, but I don't see how you have much choice.
Johnny HK's answer is a nice explained answer and should be used in general cases. I will just suggest a workaround for you to solve your issue if you have to have many scenarios and don't need complex querying. Instead of keeping values under scenario field, just hold the id of the scenario under that field, and hold the values as another field in the document and use the scenario id as the key of this field.
Example:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [ "1", "2"],
"scenario_1": [1,2,3],
"scenario_2": [4,5,6]
}}
With this schema you can use index on scenario to find specific scenarios. But if you need to query for specific scenario values, you again need to have an index on each scenario value field i.e scenario_1, scenario_2, etc.. If you need to have indexes for each field, then don't change your original schema and use sparse indexes for each nested field and that might help reduce the size of your indexes.

MongoDB multi indexing

quick question, do you know if mongo can create an index on the following data:
{
prices: {
price1: 0.90,
price2: 0.12,
price3: 0.13
}
}
so I would like to do index like this ensureIndex({prices:1}) but I am not sure will it include all prices and its values inside.
The reason why I need to lay data this way not the standard Array approach is because I would like to be able to sort based on which price version has been chosen i.e. sort({prices.price:1}).
Any ideas?
cheers
Ok I think I understand now. You want sort all documents in a collection by a given price version.
This won't work too well with subdocuments since you can't do a positional sort as such:
find({price.version:1}).sort({price.$.version: 1})
As such this would be better as a flat structure within the root document. Infact in your example placing it in a subdocument has no real use other than to group the data which actually makes indexing harder.
So I would recommend you just project the fields to the root document. You can do something fancy with the aggregation framework here with #Philipp's answer but it sounds like this query might be run extremely often and maybe on larger working sets.
As such the new document strucutre would be:
{
price1: 0.90,
price2: 0.12,
price3: 0.13
}
This is how I do this kind of stuff in both SQL and MongoDB and it works well with a compound index on all the fields.
As well since MongoDB is, of course, schemaless you don't have to worry about maintaining the database when you want to add new prices to your app.
When you do ensureIndex({prices:1}) on that data, you would create an index over the exact combination of price1, price2 and price3. So the index would only be searchable when you search for an exact match of all three prices together (no additional prices in either the search or the potential result).
But you could search for individual prices when you structure your data like that:
{
prices: [
{version: 1, value: 0.99},
{version: 2, value: 0.12},
{version: 3, value: 0.13}
]
}
You could then add a compound index which includes prices.version, prices.value and an unique identifier from the parent document. That way you could quickly search for products by {prices.version:1} or {prices.version:2}.
See also the documentation of indices in mongodb.