I have a database with about 50k records about candidates like the example bellow:
[
{
"_id":{
"$oid":"5744eff20ca7832b5c7452321"
},
"name":"Candidate 1",
"characteristics":[
{
"name":"personal skills",
"info":[
"Great speaker",
"Very friendly",
"Born to be a leader"
]
},
{
"name":"education background",
"info":[
"Studied Mechanical Engineering",
"Best of his class 2001"
]
}
]
},
... thousands more objects with same structure
]
And given some personal skills I would like to search the best matches for that input:
Example of input:
["speaker", "leader"]
Expected output:
list of candidates (whole object) descenting from the best match.
I basically need to search only the field "personal skills".
What could be a good approach for this problem using MongoDB? Or is there another database that fits better this problem?
The below query using regex brings us the matching records of speaker and leader.
db.collection_name.find(
{ $and :
[
{"characteristics.info": /.*speaker.*/},
{"characteristics.info": /.*leader.*/}
]
}
)
To have a better performance we can have a Text Index as shown below, but please note that there is only one Text Index allowed per collection
db.collection_name.createIndex({"characteristics":"text"});
After our Text Index has been created we can see that it is used in our search
Using explain to view the use of Text Index
db.collection_name.find({ $and: [{"characteristics.info": /.*speaker.*/}, {"characteristics.info": /.*leader.*/}]}).explain()
Mongo shell output with query plan explained
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.a",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"characteristics.info" : {
"$regex" : ".*speaker.*"
}
},
{
"characteristics.info" : {
"$regex" : ".*leader.*"
}
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"characteristics.info" : {
"$regex" : ".*speaker.*"
}
},
{
"characteristics.info" : {
"$regex" : ".*leader.*"
}
}
]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "PC369236",
"port" : 27017,
"version" : "3.6.1",
"gitVersion" : "025d4f4fe61efd1fb6f0005be20cb45a004093d1"
},
"ok" : 1
}
Related
I have a collection with documents that can either look like this (option A):
{
"my_list": [
{ "id": "A", "other_data": 123 },
{ "id": "B", "other_data": 456 },
{ "id": "C", "other_data": 789 },
]
}
or like this (option B):
{
"my_list": {
"A": 123,
"B": 456,
"C": 789,
}
}
Question is: which one is more efficient for doing queries such as: fetch me all documents that have id 'B' in 'my_list'?
Also, for the better option, how do you tell Mongo to create the relevant index?
Definitely the first one.
{
"my_list": [
{ "id": "A", "other_data": 123 },
{ "id": "B", "other_data": 456 },
{ "id": "C", "other_data": 789 },
]
}
MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.
https://docs.mongodb.com/manual/indexes/#multikey-index
The second option it's object type. You need to create Single Field or Compound Index to use indexes.
Transform arrays into key:value store
MongoDB allows you to transform Multikey Index array as key:value store for during aggregation i.e.:
db.collection.aggregate([
{
$match: { "my_list.id" : "A" }
},
{
$project: {
my_list: {
$arrayToObject: {
$map: {
input: "$my_list",
in: {
k: "$$this.id",
v: "$$this.other_data"
}
}
}
}
}
}
])
If we take a look explain command, MongoDB uses indexes for efficient execution of queries.
{
"stages" : [
{
"$cursor" : {
"query" : {
"my_list.id" : "A"
},
"fields" : {
"my_list" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"my_list.id" : {
"$eq" : "A"
}
},
"queryHash" : "599B2BF4",
"planCacheKey" : "48B2FCB0",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"my_list.id" : 1.0
},
"indexName" : "my_list.id_1",
"isMultiKey" : true,
"multiKeyPaths" : {
"my_list.id" : [
"my_list"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"my_list.id" : [
"[\"A\", \"A\"]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$project" : {
"_id" : true,
"my_list" : {
"$arrayToObject" : [
{
"$map" : {
"input" : "$my_list",
"as" : "this",
"in" : {
"k" : "$$this.id",
"v" : "$$this.other_data"
}
}
}
]
}
}
}
],
"ok" : 1.0
}
Here is my question.
This is my sample records
{
"_id" : ObjectId("5d9b69fae4757402b4b4ca0d"),
"status_changed_utc" : [
{
"status" : NumberInt(1),
"time" : ISODate("2019-05-20T23:03:10.000+0000")
},
{
"status" : NumberInt(2),
"time" : ISODate("2019-05-23T23:04:03.000+0000")
},
{
"status" : NumberInt(4),
"time" : ISODate("2019-05-23T23:05:06.000+0000")
},
{
"status" : NumberInt(5),
"time" : ISODate("2019-05-23T23:05:07.000+0000")
},
{
"status" : NumberInt(6),
"time" : ISODate("2019-05-23T23:05:09.000+0000")
}
],
"requested_completion_utc" : ISODate("2019-05-22T23:05:09.000+0000")
},
{
"_id" : ObjectId("5d9b69fae4757402b4b4ca1e"),
"status_changed_utc" : [
{
"status" : NumberInt(1),
"time" : ISODate("2019-06-20T23:03:10.000+0000")
},
{
"status" : NumberInt(2),
"time" : ISODate("2019-07-23T23:04:03.000+0000")
},
{
"status" : NumberInt(4),
"time" : ISODate("2019-07-23T23:05:06.000+0000")
},
{
"status" : NumberInt(5),
"time" : ISODate("2019-05-23T23:05:07.000+0000")
},
{
"status" : NumberInt(6),
"time" : ISODate("2019-07-23T23:05:09.000+0000")
}
],
"requested_completion_utc" : ISODate("2019-08-22T23:05:09.000+0000")
},
I expect to find out the record which the date of the "requested_completion_utc" field larger than the date from the "status_changed_utc" field when "status_changed_utc.status" is NumberInt(2).
In this example, I expected to get the second record.
Except for $unwind function, is there any other ways to handle this issue?
Thanks
If the NumberInt(2) is always in the second position of the array. It should be pretty easy.
db.whatever.find({ $expr: { $gt: [ "$requested_completion_utc" , "$status_changed_utc.1" ] } })
the requirement is to find the record that meet the following standard, how to write the query without using unwind?
requested_completion_utc > status_changed_utc.time and status_changed_utc.status=2, where the status_changed_utc.time is from the record that has status_changed_utc.status=2
Eventually, we found the answer.
db.getCollection("test").aggregate(
// Pipeline
[
// Stage 1
{
$match: {
{
$expr: {
$gt:[
{
$size: {
$filter:{
"input":"$status_changed_utc",
"as":"doc",
"cond":{
$and: [
{
$eq:["$$doc.status",2]
},
{
$gt:["$$doc.time", "$requested_completion_utc"]
}
]
}
}
}
},
0
]
}
}
}
},
]
);
I am working on a software that uses MongoDB as a database. I have a collection like this (this is just one document)
{
"_id" : ObjectId("5aef51e0af42ea1b70d0c4dc"),
"EndpointId" : "89799bcc-e86f-4c8a-b340-8b5ed53caf83",
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Url" : "test",
"Tags" : [
{
"Uid" : "E2:02:00:18:DA:40",
"Type" : 1,
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Sensors" : [
{
"Type" : 1,
"Value" : NumberDecimal("-98")
},
{
"Type" : 2,
"Value" : NumberDecimal("-65")
}
]
},
{
"Uid" : "12:3B:6A:1A:B7:F9",
"Type" : 1,
"DateTime" : ISODate("2018-05-06T19:05:04.574Z"),
"Sensors" : [
{
"Type" : 1,
"Value" : NumberDecimal("-95")
},
{
"Type" : 2,
"Value" : NumberDecimal("-59")
},
{
"Type" : 3,
"Value" : NumberDecimal("12.939770381907275")
}
]
}
]
}
and I want to run this query on it.
db.myCollection.aggregate([
{ $unwind: "$Tags" },
{
$match: {
$and: [
{
"Tags.DateTime": {
$gte: ISODate("2018-05-06T19:05:02Z"),
$lte: ISODate("2018-05-06T19:05:09Z"),
},
},
{ "Tags.Uid": { $in: ["C1:3D:CA:D4:45:11"] } },
],
},
},
{ $unwind: "$Tags.Sensors" },
{ $match: { "$Tags.Sensors.Type": { $in: [1, 2] } } },
{
$project: {
_id: 0,
EndpointId: "$EndpointId",
TagId: "$Tags.Uid",
Url: "$Url",
TagType: "$Tags.Type",
Date: "$Tags.DateTime",
SensorType: "$Tags.Sensors.Type",
Value: "$Tags.Sensors.Value",
},
},
])
the problem is, the second match (that checks $Tags.Sensors.Type) doesn't work and doesn't affect the result of the query.
How can I solve that?
If this is not the right way, what is the right way to run these conditions?
The $match stage accepts field names without a leading $ sign. You've done that correctly in your first $match stage but in the second one you write $Tags.Sensors.Type. Simply removing the leading $ sign should make your query work.
Mind you, the whole thing can be a bit simplified (and some beautification doesn't hurt, either):
You don't need to use $and in your example since it's assumed by default if you specify more than one criterion in a filter.
The $in that you use for the Tags.Sensors.Type filter can be a simple : kind of equality operator unless you have more than one element in the list of acceptable values.
In the $project stage, instead of (kind of) duplicating identical field names you can use the <field>: 1 syntax unless the order of the fields matters.
So the final query would be something like this.
db.myCollection.aggregate([
{
"$unwind" : "$Tags"
},
{
"$match" : {
"Tags.DateTime" : { "$gte" : ISODate("2018-05-06T19:05:02Z"), "$lte" : ISODate("2018-05-06T19:05:09Z") },
"Tags.Uid" : { "$in" : ["C1:3D:CA:D4:45:11"] }
}
}, {
"$unwind" : "$Tags.Sensors"
}, {
"$match" : {
"Tags.Sensors.Type" : { "$in" : [1,2] }
}
},
{
"$project" : {
"_id" : 0,
"EndpointId" : 1,
"TagId" : "$Tags.Uid",
"Url" : 1,
"TagType" : "$Tags.Type",
"Date" : "$Tags.DateTime",
"SensorType" : "$Tags.Sensors.Type",
"Value" : "$Tags.Sensors.Value"
}
}])
helpful people of StackOverflow!
I'm in the process of learning how to work with MongoDB, and am currently stuck with one particular problem.
I'm building a guitar tabs app, working only with an "artist" base document. All other data are subdocuments. Depending on the accessed functionality (e.g: search, list tabs by artist, view single tab), I aggregate and project my documents accordingly.
However, I can't get one projection to work as I want.
Given the following data:
{
"artist" : "Jeff Buckley",
"songs" : [
{
"name" : "Grace",
"tabs" : [
{
"version" : 1,
"tab" : "...",
"tuning" : "DADGBe"
},
{
"version" : 2,
"tab" : "...",
"tuning" : "DADGBe"
}
]
},
{
"name" : "Last Goodbye",
"tabs" : [
{
"version" : 1,
"tab" : "...",
"tuning" : "DGDGBD"
},
{
"version" : 2,
"tab" : "...",
"tuning" : "EADGBe"
}
]
}
]
}
I want to aggregate it the following way for a list view:
{
"artist" : "Jeff Buckley",
"tabs" : [
{
"song" : "Grace",
"version" : 1
},
{
"song" : "Grace",
"version" : 2
},
{
"song" : "Last Goodbye",
"version" : 1
},
{
"song" : "Last Goodbye",
"version" : 2
},
]
}
I tried it with the following projection:
db.tabs.aggregate(
[
{
$project : {
artist : 1,
tabs.song : "$songs.name",
tabs.version : "$songs.tabs.version"
}
}
]
)
But instead I got:
{
"artist" : "Jeff Buckley",
"tabs" : {
"version" : [[2,1],[2,1]],
"song" : ["Grace","Last Goodbye"]
}
}
Can anyone point me in the right direction?
Thanks!
your aggregation query not correct $project only affect your json document keys
your aggretion query like this
db.tabs.aggregate(
{$unwind : "$songs"},
{$unwind : "$songs.tabs"},
{$group : {
_id:"$artist",
tabs:{$push : {song : "$songs.name",version:"$songs.tabs.version"}}}},
{$project : {
tabs:"$tabs",
artist:"$_id",
_id:0}}
).pretty()
I've got a collection with documents using a schema something like this (some members redacted):
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
2,
3,
5
],
"activity" : [
4,
4,
3
],
},
"media" : [
ObjectId("537ea185df872bb71e4df270"),
ObjectId("537ea185df872bb71e4df275"),
ObjectId("537ea185df872bb71e4df272")
]
}
In this schema, the first, second, and third positivity ratings correspond to the first, second, and third entries in the media array, respectively. The same is true for the activity ratings. I need to calculate statistics for the positivity and activity ratings with respect to their associated media objects across all documents in the collection. Right now, I'm doing this with MapReduce. I'd like to, however, accomplish this with the Aggregation Pipeline.
Ideally, I'd like to $unwind the media, answers.ratings.positivity, and answers.ratings.activity arrays simultaneously so that I end up with, for example, the following three documents based on the previous example:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 2,
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df270")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 3
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df275")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 5
"activity" : 3
}
},
"media" : ObjectId("537ea185df872bb71e4df272")
}
]
Is there some way to accomplish this?
The current aggregation framework does not allow you to do this. Being able to unwind multiple arrays that are know to be the same size and creating a document for the ith value of each would be a good feature.
If you want to use the aggregation framework you will need to change your schema a little. For example take the following document schema:
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
{k:1, v:2},
{k:2, v:3},
{k:3, v:5}
],
"activity" : [
{k:1, v:4},
{k:2, v:4},
{k:3, v:3}
],
}},
"media" : [
{k:1, v:ObjectId("537ea185df872bb71e4df270")},
{k:2, v:ObjectId("537ea185df872bb71e4df275")},
{k:3, v:ObjectId("537ea185df872bb71e4df272")}
]
}
By doing this you are essentially adding the index to the object inside the array. After this it's just a matter of unwinding all the arrays and matching on the key.
db.test.aggregate([{$unwind:"$media"},
{$unwind:"$answers.ratings.positivity"},
{$unwind:"$answers.ratings.activity"},
{$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,
include:{$and:[
{$eq:["$media.k", "$answers.ratings.positivity.k"]},
{$eq:["$media.k", "$answers.ratings.activity.k"]}
]}}
},
{$match:{include:true}}])
And the output is:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 1,
"v" : 2
},
"activity" : {
"k" : 1,
"v" : 4
}
}
},
"media" : {
"k" : 1,
"v" : ObjectId("537ea185df872bb71e4df270")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 2,
"v" : 3
},
"activity" : {
"k" : 2,
"v" : 4
}
}
},
"media" : {
"k" : 2,
"v" : ObjectId("537ea185df872bb71e4df275")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 3,
"v" : 5
},
"activity" : {
"k" : 3,
"v" : 3
}
}
},
"media" : {
"k" : 3,
"v" : ObjectId("537ea185df872bb71e4df272")
},
"include" : true
}
]
Doing this creates a lot of extra document overhead and may be slower than your current MapReduce implementation. You would need to run tests to check this. The computations required for this will grow in a cubic way based on the size of those three arrays. This should also be kept in mind.