Mongolite Aggregate Query with Added Fields - mongodb

Problem
I have a collection hotelreviews_collection containing 1 million rows (documents) of reviews with various metadata. I would like to group by the Hotel_Name field, count the number of times this hotel has showed up, but also get the fields "lat", "lng" and "Average_Score" with my query. The three extra rows are the same for each Hotel_Name.
I am doing the queries in R using the mongolite library connected to a local MongoDB.
My Attempt
I have gotten to retrieving the Hotel_Names and counting their appearances using the code below, but cannot for the life of me get the other fields to work.
Current Code
overviewData <- M_CONNECTION$aggregate('[{"$group":{"_id":"$Hotel_Name", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]',
options = '{"allowDiskUse":true}')
I am completely lost on this, any and all help would be greatly appreciated.

I have solved my issue using the following code.
db.getCollection("hotelreviews_collection").aggregate(
[
{
"$group" : {
"_id" : {
"Hotel_Name" : "$Hotel_Name",
"lat" : "$lat",
"lng" : "$lng",
"Average_Score" : "$Average_Score"
},
"COUNT(Hotel_Name)" : {
"$sum" : NumberInt(1)
}
}
},
{
"$project" : {
"Hotel_Name" : "$_id.Hotel_Name",
"lat" : "$_id.lat",
"lng" : "$_id.lng",
"Average_Score" : "$_id.Average_Score",
"COUNT(Hotel_Name)" : "$COUNT(Hotel_Name)",
"_id" : NumberInt(0)
}
}
]
)

Related

(MongoDB) Aggregate (.out) moove values to wrong fields

I'm actually creating an autocomplete website bar using express js & mongodb 4.2.7, and using lat & lon to avoid using geocoding api.
Here is the format of my db:
{
"_id" : ObjectId("5eea03e9a7891b6d701df571"),
"id_ban_position" : "ban-position-4c882de48d894fc49ed9be76f7631d6d",
"id_ban_adresse" : "ban-housenumber-78a2651ce382476ca9c8c8fcbcbaa539",
"cle_interop" : "01001_A028_5307",
"id_ban_group" : "ban-group-37c7c3f2b61440b48bca7d8fad56055e",
"id_fantoir" : "01001A028",
"numero" : 5307,
"suffixe" : "",
"nom_voie" : "Lotissement les Lilas",
"code_postal" : 1400,
"nom_commune" : "L'Abergement-Clémenciat",
"code_insee" : 1001,
"nom_complementaire" : "",
"x" : 848436.205131189,
"y" : 6562595.33916435,
"lon" : 4.923279,
"lat" : 46.146903,
"typ_loc" : "parcel",
"source" : "dgfip",
"date_der_maj" : "2019-02-12"
}
as you can see, i got lot of fields that are not necessary for the purpose of my website, and most of all, i don't need to get one row for each number in a street, one unique name of street is enough. So i decided to supress duplicate street name ('nom_voie') and town name ('nom_commune') with aggregate as follow:
db.Addresses.aggregate(
[ {
$group: {
_id: {voie: "$nom_voie", commune: "$nom_commune"},
doc: {$first: "$$ROOT"}
}
},
{$replaceRoot: {newRoot: "$doc"}},
{$out: 'UniqueIds'}
],
{allowDiskUse: true }
);
The problem is that this utilisation mooved a lot of values from one field to an other and made my db absolutely unusable.
{
"_id" : ObjectId("5eea03e9a7891b6d701df571"),
"nom_voie" : "-",
"code_postal" : 1400,
"nom_commune" : "Lotissement les Lilas",
"nom_complementaire" : "",
"lon" : 6562595.33916435,
"lat" : 848436.205131189,
"field19" : "dgfip",
...
}
As you can see, the value of "nom_voie" is now in "nom_commune", "x" value is in "lat", and "y" value is in "lon" and finally the "nom_voie" value is now "-", and i got new fields replacing others ("field19" for "source")...
Am i using aggregate with a wrong option?
I got 47 Millions entry, is this creating some issues?
Thanks to all of you for your time, even if you just reading this!
EDIT :
After few tries and some search i found out that it is the aggregate function that create some problems, but i still don't understand why, if somebody got some hints, i just edited the doc so it's more understandable and readable!
(I thought before that it was a $unset problem in a function)

How to find and return the page that a result is on in MongoDB aggregation pipeline?

I have a mongo DB aggregation pipeline that performs the following steps:
Sorts a list of user stats objects by timestamp
Groups the results by user ID
Sorts by a specified stat's name
Pages the results via skip and limit stages
In plain English, this pipeline returns a page from a list of user stats sorted by a specified stat. Each user can have multiple stats object, so I group to return only the most recent stats object for each user.
In Mongo Shell, this looks like:
db.getCollection("stats").aggregate(
[
{ "$sort" : { "Timestamp" : -1.0 } },
{
"$group" : {
"_id" : "$UserId",
"UserId" : { "$last" : "$UserId" },
"StatsOverall" : { "$last" : "$StatsOverall" },
"Timestamp" : { "$last" : "$Timestamp" }
}
},
{ "$sort" : { "StatsOverall.Rank" : -1.0 } },
{ "$skip" : specifiedPageNumber },
{ "$limit" : specifiedNumResultsPerPage }
]
);
This works fine.
I now want to modify this query to be able to search the user by name, and get back the entire page that user is contained on. (This is for a leaderboard). So, if the user is on page 5 of the leaderboard, I want to return the entirety of page 5.
However, I'm having trouble seeing a solution that doesn't require me to either load all of the users in to memory and page them there (awful idea), or go back and forth to the database iterating through pages (almost as awful).
Is there some way I can modify my aggregation pipeline to do all this at the database level?
EDIT: As requested, added some sample data and the expected result.
Sample data looks something like this... I've omitted some fields that aren't relevant. The initial data is a collection of user's stats, where each user can have more than one object. My existing pipeline returns the 1 most recent stats object for each user sorted by a specified stat name.
{
"_id" : "5c611e71ab0ffc430410e0ba",
"UserId" : "5c611e71ab0ffc430410e0ba",
"StatsOverall" : {
"Rank" : NumberInt(1000),
"GamesLost" : NumberInt(30),
"GamesWon" : NumberInt(50)
}
"Timestamp" : "2019-02-10T21:35:06.599Z"
}
// ----------------------------------------------
{
"_id" : "5c6238658966ae5860795879",
"UserId" : "5c6238658966ae5860795879",
"StatsOverall" : {
"Rank" : NumberInt(413),
"GamesLost" : NumberInt(2),
"GamesWon" : NumberInt(141),
},
"Timestamp" : "2019-02-10T21:35:06.599Z"
}
// many objects like this
The expected result looks like this:
{
"_id" : "5c611e71ab0ffc430410e0ba",
"UserId" : "5c611e71ab0ffc430410e0ba",
"StatsOverall" : {
"Rank" : NumberInt(1000),
"GamesLost" : NumberInt(30),
"GamesWon" : NumberInt(50)
}
"Timestamp" : "2019-02-10T21:35:06.599Z"
}
It returns the exact same type of object, sorted the same way as the existing pipeline, however I want to return only the page the the user is on. In the example result, assume the page size is just 1 result per page. So, the result would contain the 1 page that the user with the given UserId is on. In my sample result, that ID would be 5c611e71ab0ffc430410e0ba.

Get specific object in array of array in MongoDB

I need get a specific object in array of array in MongoDB.
I need get only the task object = [_id = ObjectId("543429a2cb38b1d83c3ff2c2")].
My document (projects):
{
"_id" : ObjectId("543428c2cb38b1d83c3ff2bd"),
"name" : "new project",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"members" : [
ObjectId("5424ac37eb0ea85d4c921f8b")
],
"US" : [
{
"_id" : ObjectId("5434297fcb38b1d83c3ff2c0"),
"name" : "Test Story",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b"),
"tasks" : [
{
"_id" : ObjectId("54342987cb38b1d83c3ff2c1"),
"name" : "teste3",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
},
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
]
}
]
}
Result expected:
{
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
But i not getting it.
I still testing with no success:
db.projects.find({
"US.tasks._id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}, { "US.tasks.$" : 1 })
I tryed with $elemMatch too, but return nothing.
db.projects.find({
"US" : {
"tasks" : {
$elemMatch : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2")
}
}
}
})
Can i get ONLY my result expected using find()? If not, what and how use?
Thanks!
You will need an aggregation for that:
db.projects.aggregate([{$unwind:"$US"},
{$unwind:"$US.tasks"},
{$match:{"US.tasks._id":ObjectId("543429a2cb38b1d83c3ff2c2")}},
{$project:{_id:0,"task":"$US.tasks"}}])
should return
{ task : {
"_id" : ObjectId("543429a2cb38b1d83c3ff2c2"),
"name" : "jklasdfa_XXX",
"author" : ObjectId("5424ac37eb0ea85d4c921f8b")
}
Explanation:
$unwind creates a new (virtual) document for each array element
$match is the query part of your find
$project is similar as to project part in find i.e. it specifies the fields you want to get in the results
You might want to add a second $match before the $unwind if you know the document you are searching (look at performance metrics).
Edit: added a second $unwind since US is an array.
Don't know what you are doing (so realy can't tell and just sugesting) but you might want to examine if your schema (and mongodb) is ideal for your task because the document looks just like denormalized relational data probably a relational database would be better for you.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more