I have a MongoDB collection like this:
{
"_id" : {
"processUuid" : "653d0937-2afc-4915-ad42-d2b69f344402",
"partnerId" : "p377"
},
"isComplete" : true,
"tasks" : {
"dbb361a7-4b73-4691-bde5-2b160346464f" : {
"sku" : "4060079000048",
"status" : "FAILED",
"errorList" : [....]
},
"790dbc6f-563d-4eb7-931c-3cc604563dc1" : {
"sku" : "4060079000130",
"status" : "SUCCESSFUL",
"errorList" : [....]
},
... more tasks
}
... more processes
}
I want to query a certain sku and couldnt find a way. I know i could write an aggregation pipeline to project the inner part of a task, but then i would lose the task identifier which i need in order to add some stuff to a task.
I know the document structure is weird, i would have done it different, but its given.
I tried what i found about querying nested documents and so on, but unfortunately didnt get it. Any hints are appreciated.
Related
document sample data followed like this,
{
"_id" : ObjectId("62317ae9d007af22f984c0b5"),
"productCategoryName" : "Product category 1",
"productCategoryDescription" : "Description about product category 1",
"productCategoryIcon" : "abcd.svg",
"status" : true,
"productCategoryUnits" : [
{
"unitId" : ObjectId("61fa5c1273a4aae8d89e13c9"),
"unitName" : "kilogram",
"unitSymbol" : "kg",
"_id" : ObjectId("622715a33c8239255df084e4")
}
],
"productCategorySizes" : [
{
"unitId" : ObjectId("61fa5c1273a4aae8d89e13c9"),
"unitName" : "kilogram",
"unitSize" : 10,
"unitSymbol" : "kg",
"_id" : ObjectId("622715a33c8239255df084e3")
}
],
"attributes" : [
{
"attributeId" : ObjectId("62136ed38a35a8b4e195ccf4"),
"attributeName" : "Country of Origin",
"attributeOptions" : [],
"isRequired" : true,
"_id" : ObjectId("622715ba3c8239255df084f8")
}
]
}
This collection has been indexed in "_id". without sub-documents execution time is reduced but all document fields are required.
db.getCollection('product_categories').find({})
This collection contains 30000 records and this query takes more than 30 seconds to execute. so how to solve this issue. Anybody ask me a better solution. Thanks.
Indexing and compound indexing will make it use cache instead of scanning document every time you query it. 30.000 documents is nothing to MongoDB, it can handle millions in a second. If these fields are populated in the process that's another heavy operation for the query.
See if your schema is efficiently structured or you're throttling your connection to the server. Other thing to consider is to project only the fields that you require, using aggregation pipeline.
Although the question is not very clear you can follow this article for some best practices.
First of all I'd to apologize for my english.
I have serious issue with performance on my query. Unfortunetly I'm pretty new to mongoDB. So i have collection test which looks similar to this
{
"_id" : ObjectId("1"),
[...]
"statusHistories" : [
{
"created" : ISODate("2016-03-15T14:59:11.597Z"),
"status" : "STAT1",
},
{
"created" : ISODate("2016-03-15T14:59:20.465Z"),
"status" : "STAT2",
},
{
"created" : ISODate("2016-03-15T14:51:11.000Z"),
"status" : "STAT3",
}
],
}
statusHistories is an array.
Daily there's more than 3000 records inserted into that collection.
What I want to achieve is to find all tests that have given status and they are beeten two dates. So I have prepared query like this:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
},
'status' : 'STAT1'
}
}
})
It gives expected result. Unfortunetly it takes around 120 seconds to complete. Which is way too long. Surprisingly if I split this query into two seperate it takes way less:
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
created : {
"$gte" : ISODate("2016-07-11 00:00:00.052Z"),
"$lte" : ISODate("2016-07-11 23:59:00.052Z")
}
}
}
})
db.getCollection('test').find({
'statusHistories' : {
$elemMatch : {
'status' : 'STAT1'
}
}
})
Both of them need less then a second in order to complete.
So what am I doing wrong with my original query? I need to take those records in one query but when I combine two elemMatch statements into one it takes ages. I tried to ensureIndex on statusHistories but it didn't work out. Any suggestion would be really helpfull.
I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.
I have the following items in my collection:
> db.test.find().pretty()
{ "_id" : ObjectId("532c471a90bc7707609a3d4f"), "name" : "Alice" }
{
"_id" : ObjectId("532c472490bc7707609a3d50"),
"name" : "Bob",
"partner_type1" : {
"status" : "rejected"
}
}
{
"_id" : ObjectId("532c473e90bc7707609a3d51"),
"name" : "Carol",
"partner_type2" : {
"status" : "accepted"
}
}
{
"_id" : ObjectId("532c475790bc7707609a3d52"),
"name" : "Dave",
"partner_type1" : {
"status" : "pending"
}
}
There are two partner types: partner_type1 and partner_type2. A user cannot be accepted partner in the both of types. But he can be a rejected partner in partner_type1 but accepted in the another, for example.
How can I build Mongo query that fetches the users that can become partners?
When your user can only be accepted in one partner-type, you should turn it around: Have a field accepted_as:"partner_type1" or accepted_as:"partner_type2". For people who aren't accepted yet, either have no such field or set it to null.
In both cases, your query to get any non-accepted will then be:
{
data.accepted_as: null
}
(null matches both non-existing fields as well as fields explicitly set to null)
For me the logical schema would be this:
"partner : {
"type": 1,
"status" : "rejected"
}
At least that keeps the paths consistent between documents.
So if you want to stay away from using mapReduce type methods to find out "which field" it is on, and otherwise use plain queries and the aggregation pipeline, then don't vary field paths on documents. If you alter the "data" then that is the most consistent form.
I've "users" collection with a "watchlists" field, which have many inner fields too, one of that is "arrangeable_values" (the second field within "watchlists").
I need to find for each user in "users" collection, each "arrangeable_values" within "watchlists".
How can I do that with mongodb shell ?
Here is an example of data model :
> db.users.findOne({'nickname': 'superj'})
{
"_id" : ObjectId("4f6c42f6018a590001000001"),
"nickname" : "superj",
"provider" : "github",
"user_hash" : null,
"watchlists" : [
{
"_id" : ObjectId("4f6c42f7018a590001000002"),
"arrangeable_values" : {
"description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.",
"tag" : "",
"html_url" : "https://github.com/indexzero/nodejs-intro"
},
"avatar_url" : "https://secure.gravatar.com/avatar/d43e8ea63b61e7669ded5b9d3c2e980f?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png",
"created_at" : ISODate("2011-02-01T10:20:29Z"),
"description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.",
"fork_" : false,
"forks" : 13,
"html_url" : "https://github.com/indexzero/nodejs-intro",
"pushed_at" : ISODate("2011-09-12T17:54:58Z"),
"searchable_values" : [
"description:my",
"description:introduction",
"description:presentation",
"html_url:indexzero",
"html_url:nodejs",
"html_url:intro"
],
"tags_array" : [ ],
"watchers" : 75
},
{
"_id" : ObjectId("4f6c42f7018a590001000003"),
"arrangeable_values" : {
"description" : "A Backbone alternative idea",
"tag" : "",
"html_url" : "https://github.com/maccman/spine.todos"
},
"avatar_url" : "https://secure.gravatar.com/avatar/baf018e2cc4616e4776d323215c7136c?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png",
"created_at" : ISODate("2011-03-18T11:03:42Z"),
"description" : "A Backbone alternative idea",
"fork_" : false,
"forks" : 31,
"html_url" : "https://github.com/maccman/spine.todos",
"pushed_at" : ISODate("2011-11-20T22:59:45Z"),
"searchable_values" : [
"description:a",
"description:backbone",
"description:alternative",
"description:idea",
"html_url:https",
"html_url:github",
"html_url:com",
"html_url:maccman",
"html_url:spine",
"html_url:todos"
],
"tags_array" : [ ],
"watchers" : 139
}
]
}
For the document above, the following find() query would extract both the "nickname" of the document, and its associated "arrangeable_values" (where the document is in the users collection):
db.users.find({}, { "nickname" : 1, "watchlists.arrangeable_values" : 1 })
The result you get for your single document example would be:
{ "_id" : ObjectId("4f6c42f6018a590001000001"), "nickname" : "superj",
"watchlists" : [
{ "arrangeable_values" : { "description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.", "tag" : "", "html_url" : "https://github.com/indexzero/nodejs-intro" } },
{ "arrangeable_values" : { "description" : "A Backbone alternative idea", "tag" : "", "html_url" : "https://github.com/maccman/spine.todos" } }
] }
MongoDB queries return entire documents. You are looking for a field inside an array inside of the document and this will break the find().
The problem here is that any basic find() query, will return all matching documents. The find() does have the option to only return specific fields. But that will not work with your array of sub-objects. You could returns watchlists, but not watchlist entries that match.
As it stands you have two options:
Write some client-side code that loops through the documents and does the filtering. Remember that the shell is effectively a javascript driver, so you can write code in there.
Use the new aggregation framework. This will have a learning curve, but it can effectively extract the sub-items you're looking for.