In my MongoDB database I've a collection named "recommendation.users" in which i store all the User data.
A document of my collection has the following form:
{
"_id" : ObjectId("542e67e07f724fc2af28ba75"),
"id" : "",
"email" : "luigi#gmail.com",
"tags" : [
{
"tag" : "Paper Goods:Liners - Baking Cups",
"weight" : 2,
"lastInsert" : 1412327492874
},
{
"tag" : "Vegetable:Carrots - Jumbo",
"weight" : 4,
"lastInsert" : 1412597883569
},
{
"tag" : "Paper Goods:Lialberto- Baking Cups",
"weight" : 1,
"lastInsert" : 1412327548205
},
{
"tag" : "Fish:Swordfish Loin Portions",
"weight" : 3,
"lastInsert" : 1412597939124
},
{
"tag" : "Vegetable:Carrots - alberto#gmail.com",
"weight" : 2,
"lastInsert" : 1412597939124
}
]
}
As driver I'm using ReactiveMongo in Scala.
Now I'm writing a method that get a tags: List[String] and search in my collections all the Users where the "tags" elements contains all the value passed as parameter.
In other words i want to find all the documents where the "tags" tag contains documents where "tag" = tags(i) for every element of tags.
How can I make something like this in MongoDB and reactiveMongo??
My method has the following signature:
def findUsers(tags: List[String]): Future[Option[List[User]]] = {
//search all the matching Users
//if exists at least one User return a Some(List(users))
//otherwise returns a None
}
For example if i had the following two documents in my collection:
[
{
"_id" : ObjectId("542e67e07f724fc2af28ba75"),
"id" : "",
"email" : "luigi#gmail.com",
"tags" : [
{
"tag" : "Paper Goods:Liners - Baking Cups",
"weight" : 2,
"lastInsert" : 1412327492874
},
{
"tag" : "Vegetable:Carrots - Jumbo",
"weight" : 4,
"lastInsert" : 1412597883569
},
{
"tag" : "Paper Goods:Lialberto- Baking Cups",
"weight" : 1,
"lastInsert" : 1412327548205
},
{
"tag" : "Fish:Swordfish Loin Portions",
"weight" : 3,
"lastInsert" : 1412597939124
},
{
"tag" : "Vegetable:Carrots - alberto#gmail.com",
"weight" : 2,
"lastInsert" : 1412597939124
}
]
},
{
"_id" : ObjectId("542e67e07f724fc2af28ba75"),
"id" : "",
"email" : "alberto#gmail.com",
"tags" : [
{
"tag" : "Paper Goods:Lialberto- Baking Cups",
"weight" : 1,
"lastInsert" : 1412327548205
},
{
"tag" : "Fish:Swordfish Loin Portions",
"weight" : 3,
"lastInsert" : 1412597939124
},
{
"tag" : "Vegetable:Carrots - alberto#gmail.com",
"weight" : 2,
"lastInsert" : 1412597939124
}
]
}]
If my method is called with findUser(List("Fish:Swordfish Loin Portions", "Vegetable:Carrots - alberto#gmail.com")) my method had to returns the first User.
I know that is possible to do that with a cycle that check if a user had all the given tags, but it's too complicated and verbose. Exists an alternative??
How can I make that??
Use query like
db.test.find({$and:[{"tags.tag":"a"},{"tags.tag":"b"}]})
And you may simplify your life by using this reactivemongo-query https://github.com/sh1ng/ReactiveMongo-Queries
val cursor = collection.find(on[Interests].and(_.eq(_.tags.tag, "Fish:Swordfish Loin Portions"), _.eq(_.tags.tag, "Vegetable:Carrots - alberto#gmail.com"))).cursor[Interests]
Of course you may rewrite it in pure reactivemongo.
Related
I need to execute the following query:
db.S12_RU.find({"venue.raw":a,"title":/b|c|d|e/}).sort({"year":-1}).skip(X).limit(Y);
where X and Y are numbers.
The number of documents in my collection is:
208915369
Currently, this sort of query takes about 6 minutes to execute.
I have the following indexes:
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"key" : {
"venue.raw" : 1
},
"name" : "venue.raw_1"
},
{
"v" : 2,
"key" : {
"venue.raw" : 1,
"title" : 1,
"year" : -1
},
"name" : "venue.raw_1_title_1_year_-1"
}
]
A standard document looks like this:
{ "_id" : ObjectId("5fc25fc091e3146fb10484af"), "id" : "1967181478", "title" : "Quality of Life of Swedish Women with Fibromyalgia Syndrome, Rheumatoid Arthritis or Systemic Lupus Erythematosus", "authors" : [ { "name" : "Carol S. Burckhardt", "id" : "2052326732" }, { "name" : "Birgitha Archenholtz", "id" : "2800742121" }, { "name" : "Kaisa Mannerkorpi", "id" : "240289002" }, { "name" : "Anders Bjelle", "id" : "2419758571" } ], "venue" : { "raw" : "Journal of Musculoskeletal Pain", "id" : "49327845" }, "year" : 1993, "n_citation" : 31, "page_start" : "199", "page_end" : "207", "doc_type" : "Journal", "publisher" : "Taylor & Francis", "volume" : "1", "issue" : "", "doi" : "10.1300/J094v01n03_20" }
Is there any way to make this query execute in a few seconds?
all
I'm trying to do a join in MongoDB but also, I need to check for conditions and to do a sum on inner values of what comes back from the join.
I will explain.
Currently I have this simple join query which looks like this:
db.Sets.aggregate([
{
$lookup:
{
from: "ExecutionTasks",
localField: "identifier",
foreignField: "setIdentifier",
as: "execTask"
}
}
])
It returns the following results:
/* 1 */
{
"_id" : 1,
"name" : "Demo Set",
"identifier" : "demo-set",
"description" : "Demo Set",
"creator" : {
"id" : 1,
"name" : "admin"
},
"createdDate" : ISODate("2017-03-24T20:09:55.120Z"),
"updatedDate" : ISODate("2017-03-24T20:09:55.120Z"),
"execTask" : [
{
"_id" : 1,
"isActive" : 1,
"type" : "count",
"threshold" : {
"default" : "0",
"deviations" : []
},
"name" : "amishay",
"setIdentifier" : "demo-set",
"description" : "a",
"query" : {
"source" : 1,
"text" : "select * from t"
},
"creator" : {
"id" : 1,
"name" : "admin"
},
"createdDate" : ISODate("2017-03-27T20:03:22.275Z"),
"updatedDate" : ISODate("2017-03-27T20:03:22.275Z")
},
{
"_id" : 2,
"isActive" : 0,
"type" : "count",
"threshold" : {
"default" : "0",
"deviations" : []
},
"name" : "amishay2",
"setIdentifier" : "demo-set",
"description" : "test",
"query" : {
"source" : 1,
"text" : "select * from t"
},
"creator" : {
"id" : 1,
"name" : "admin"
},
"createdDate" : ISODate("2017-03-27T20:03:57.248Z"),
"updatedDate" : ISODate("2017-03-27T20:03:57.248Z")
}
]
}
What I would like to do is to return only the length of the array (execTask) and also only those with the attribute isActive which equals to 1.
So basically I want to get something like:
{
"_id" : 1,
"name" : "Demo Set",
"identifier" : "demo-set",
"description" : "Demo Set",
"creator" : {
"id" : 1,
"name" : "admin"
},
"createdDate" : ISODate("2017-03-24T20:09:55.120Z"),
"updatedDate" : ISODate("2017-03-24T20:09:55.120Z"),
"execTask" : 1
}
I checked online numerous questions but I only saw examples which query the collection attribute and not the joined collection attribute.
Thanks!
You can add $addFields stage after $lookup. The below stage will $filter and calculate the $size for query criteria.
$filter operator is to used to filter the execTask array contents in-place on the mentioned criteria.
Expressions $ and $$ to reference the fields / aggregation operators / aggregation stages and inner variables respectively.
$size operator to calculate the length of filtered array.
$addFields overwrites the existing field execTask to replace its value with the calculated size.
{
$addFields: {
"execTask": {
$size: {
$filter: {
input: "$execTask",
as: "result",
cond: {
$eq: ["$$result.isActive", 1]
}
}
}
}
}
}
I have the following document, I need to search for multiple items from the embedded collection"items".
Here's an example of a single SKU
db.sku.findOne()
{
"_id" : NumberLong(1192),
"description" : "Uploaded via CSV",
"items" : [
{
"_id" : NumberLong(2),
"category" : DBRef("category", NumberLong(1)),
"description" : "840 tag visual",
"name" : "840 Visual Mini Round",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(7),
"category" : DBRef("category", NumberLong(2)),
"description" : "Maxi",
"name" : "Maxi",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(11),
"category" : DBRef("category", NumberLong(3)),
"description" : "Button",
"name" : "Button",
"version" : NumberLong(0)
},
{
"_id" : NumberLong(16),
"category" : DBRef("category", NumberLong(4)),
"customizationFields" : [
{
"_class" : "CustomizationField",
"_id" : NumberLong(1),
"displayText" : "Custom Print 1",
"fieldName" : "customPrint1",
"listOrder" : 1,
"maxInputLength" : 12,
"required" : false,
"version" : NumberLong(0)
},
{
"_class" : "CustomizationField",
"_id" : NumberLong(2),
"displayText" : "Custom Print 2",
"fieldName" : "customPrint2",
"listOrder" : 2,
"maxInputLength" : 17,
"required" : false,
"version" : NumberLong(0)
}
],
"description" : "2 custom lines of farm print",
"name" : "Custom 2",
"version" : NumberLong(2)
},
{
"_id" : NumberLong(20),
"category" : DBRef("category", NumberLong(5)),
"description" : "Color Red",
"name" : "Red",
"version" : NumberLong(0)
}
],
"skuCode" : "NF-USDA-XC2/SM-BC-R",
"version" : 0,
"webCowOptions" : "840miniwithcust2"
}
There are repeat items.id throughout the embedded collection. Each Sku is made up of multiple items, all combinations are unique, but one item will be part of many Skus.
I'm struggling with the query structure to get what I'm looking for.
Here are a few things I have tried:
db.sku.find({'items._id':2},{'items._id':7})
That one only returns items with the id of 7
db.sku.find({items:{$all:[{_id:5}]}})
That one doesn't return anything, but it came up when looking for solutions. I found about it in the MongoDB manual
Here's an example of a expected result:
sku:{ "_id" : NumberLong(1013),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(2) } ] },
sku:
{ "_id" : NumberLong(1014),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(2) },
{ "_id" : NumberLong(16) },
{ "_id" :NumberLong(24) } ] },
sku:
{ "_id" : NumberLong(1015),
"items" : [ { "_id" : NumberLong(5) },
{ "_id" : NumberLong(7) },
{ "_id" : NumberLong(12) },
{ "_id" : NumberLong(2) },
{ "_id" :NumberLong(5) } ] }
Each Sku that comes back has both a item of id:7, and id:2, with any other items they have.
To further clarify, my purpose is to determine how many remaining combinations exist after entering the first couple of items.
Basically a customer will start specifying items, and we'll weed it down to the remaining valid combinations. So Sku.items[0].id=5 can only be combined with items[1].id=7 or items[1].id=10 …. Then items[1].id=7 can only be combined with items[2].id=20 … and so forth
The goal was to simplify my rules for purchase, and drive it all from the Sku codes. I don't know if I dug a deeper hole instead.
Thank you,
On the part of extracting the sku with item IDs 2 and 7, when I recall correctly, you have to use $elemMatch:
db.sku.find({'items' :{ '$all' :[{ '$elemMatch':{ '_id' : 2 }},{'$elemMatch': { '_id' : 7 }}]}} )
which selects all sku where there is each an item with _id 2 and 7.
You can use aggregation pipelines
db.sku.aggregate([
{"$unwind": "$sku.items"},
{"$group": {"_id": "$_id", "items": {"$addToSet":{"_id": "$items._id"}}}},
{"$match": {"items._id": {$all:[2,7]}}}
])
I've got a collection with documents using a schema something like this (some members redacted):
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
2,
3,
5
],
"activity" : [
4,
4,
3
],
},
"media" : [
ObjectId("537ea185df872bb71e4df270"),
ObjectId("537ea185df872bb71e4df275"),
ObjectId("537ea185df872bb71e4df272")
]
}
In this schema, the first, second, and third positivity ratings correspond to the first, second, and third entries in the media array, respectively. The same is true for the activity ratings. I need to calculate statistics for the positivity and activity ratings with respect to their associated media objects across all documents in the collection. Right now, I'm doing this with MapReduce. I'd like to, however, accomplish this with the Aggregation Pipeline.
Ideally, I'd like to $unwind the media, answers.ratings.positivity, and answers.ratings.activity arrays simultaneously so that I end up with, for example, the following three documents based on the previous example:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 2,
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df270")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 3
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df275")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 5
"activity" : 3
}
},
"media" : ObjectId("537ea185df872bb71e4df272")
}
]
Is there some way to accomplish this?
The current aggregation framework does not allow you to do this. Being able to unwind multiple arrays that are know to be the same size and creating a document for the ith value of each would be a good feature.
If you want to use the aggregation framework you will need to change your schema a little. For example take the following document schema:
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
{k:1, v:2},
{k:2, v:3},
{k:3, v:5}
],
"activity" : [
{k:1, v:4},
{k:2, v:4},
{k:3, v:3}
],
}},
"media" : [
{k:1, v:ObjectId("537ea185df872bb71e4df270")},
{k:2, v:ObjectId("537ea185df872bb71e4df275")},
{k:3, v:ObjectId("537ea185df872bb71e4df272")}
]
}
By doing this you are essentially adding the index to the object inside the array. After this it's just a matter of unwinding all the arrays and matching on the key.
db.test.aggregate([{$unwind:"$media"},
{$unwind:"$answers.ratings.positivity"},
{$unwind:"$answers.ratings.activity"},
{$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,
include:{$and:[
{$eq:["$media.k", "$answers.ratings.positivity.k"]},
{$eq:["$media.k", "$answers.ratings.activity.k"]}
]}}
},
{$match:{include:true}}])
And the output is:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 1,
"v" : 2
},
"activity" : {
"k" : 1,
"v" : 4
}
}
},
"media" : {
"k" : 1,
"v" : ObjectId("537ea185df872bb71e4df270")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 2,
"v" : 3
},
"activity" : {
"k" : 2,
"v" : 4
}
}
},
"media" : {
"k" : 2,
"v" : ObjectId("537ea185df872bb71e4df275")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 3,
"v" : 5
},
"activity" : {
"k" : 3,
"v" : 3
}
}
},
"media" : {
"k" : 3,
"v" : ObjectId("537ea185df872bb71e4df272")
},
"include" : true
}
]
Doing this creates a lot of extra document overhead and may be slower than your current MapReduce implementation. You would need to run tests to check this. The computations required for this will grow in a cubic way based on the size of those three arrays. This should also be kept in mind.
Say we have the following collection of documents:
{ "_id" : ObjectId("50a69fa904c8310609600be3"), "id" : 100, "city" : "San Francisco", "friends" : [ { "id" : 1, "name" : "John" }, { "id" : 2, "name" : "Betty" }, { "id" : 3, "name" : "Harry" } ] }
{ "_id" : ObjectId("50a69fc104c8310609600be4"), "id" : 200, "city" : "Palo Alto", "friends" : [ { "id" : 1, "name" : "Carol" }, { "id" : 2, "name" : "Frank" }, { "id" : 3, "name" : "Norman" } ] }
{ "_id" : ObjectId("50a69fc304c8310609600be5"), "id" : 300, "city" : "Los Angeles", "friends" : [ { "id" : 1, "name" : "Fred" }, { "id" : 2, "name" : "Neal" }, { "id" : 3, "name" : "David" } ] }
.
.
.
Now let's say that Frank (Palo Alto, id=2) is no longer my friend, and I want to delete him from the collection. I thought the following might work, but it doesn't:
db.test.update({"city":"Palo Alto"},{"$pull":{"friends.name":"Frank"}})
I'd like to be able to do something like that. Delete an object within an array within a collection of documents. How do you do this?
You were close. The query should be like this:
db.test.update({"city":"Palo Alto"},{"$pull":{"friends":{"name":"Frank"}}});
$pull takes an object whose field specifies the field array "friends". The value {"name":"Frank"} represents the query (to run inside the array) to find the element to pull out.