I understand there are similar questions to this one on Stack Overflow, but the issue is that I cannot actually change the structure of the Mongo Schema. I have a document resembling the following:
{
"_id" : ObjectId("asdfghjkl"),
"recordId" : "0000_11111",
"__v" : 0,
"userid" : "0000",
"date" : ISODate("2017-08-07T07:34:19.505Z"),
"username" : "batman",
"countries" : {
"Philippines" : 1,
"Lebanon" : 1,
"Andorra" : 1,
"Vanuatu" : 1,
"China" : 2,
"Greenland" : 2,
"Denmark" : 1,
"Hong Kong" : 1
}
}
The list of countries is much larger than the example above, imagine up to 100 countries with numbers summing up to 500,000. I need an aggregate that can take the values of the list of countries and sum them up. I tried grouping and $sum but with no success.
Any quick solutions for this?
Thanks!
AK
Related
BED_MAST this is my one collection bed_mast contains WARD_ID and want to perform join to my other collection with is WARD_MAST given below.
{
"_id" : ObjectId("5e53c95a26b0e5ad0fb46376"),
"Bed_id" : "bd-10",
"WARD_ID" : "4",
"OCCUPIED" : "0",
"BED_TYPE" : "single AC"
}
{
"_id" : ObjectId("5e53c95a26b0e5ad0fb46377"),
"Bed_id" : "bd-11",
"WARD_ID" : "1",
"OCCUPIED" : "0",
"BED_TYPE" : "single Non AC"
}
WARD_MAST this is my WARD_MAST having ward_id. but while I am putting lookup I am not getting any data.
{
"_id" : ObjectId("5e53c95b26b0e5ad0fb46544"),
"patient_id" : null,
"ward_id" : 1,
"total_beds" : 55,
"ward_name" : "Ward 1"
}
{
"_id" : ObjectId("5e53c95d26b0e5ad0fb46545"),
"patient_id" : null,
"ward_id" : 2,
"total_beds" : 63,
"ward_name" : "Ward 2"
}
MY query is
db.BED_MAST.aggregate([{$lookup:{'from':"WARD_MAST",'localField':"WARD_ID",'foreignField':"ward_id",'as':"lookup_value"}}]).pretty()
output: I have confirmed the data by running this query to MySQL there it is working fine
{
"_id" : ObjectId("5e53c95b26b0e5ad0fb46388"),
"Bed_id" : "bd-28",
"WARD_ID" : "6",
"OCCUPIED" : "0",
"BED_TYPE" : "NICU",
"lookup_value" : [ ]
}
SAMPLE VALUES DATA IS GIVEN ALL DATA IS NOT POSSIBLE TO GIVE. I know it was asked 1000 times but not able to resolve this question. tried to solve with lookup. but it showing blank space. Is anything I am missing.
The problem is BED_MAST collection's WARD_ID has string values and WARD_MAST collection's ward_id has Number values.
I finished to create my Mongo database. It is made on two collections:
1. team
2. coach
I give you an example of the documents contained in these collections:
Here is a team document:
{
"_id" : "Mil.74",
"official_name" : "Associazione Calcio Milan S.p.A",
"common_name" : "Milan",
"country" : "Italy",
"started_by" : {
"day" : 16,
"month" : 12,
"year" : 1899
},
"stadium" : {
"name" : "Giuseppe Meazza",
"capacity" : 81277
},
"palmarès" : {
"Serie A" : 18,
"Serie B" : 2,
"Coppa Italia" : 5,
"Supercoppa Italiana" : 6,
"UEFA Champions League" : 7,
"UEFA Super Cup" : 5,
"Cup Winners cup" : 2,
"UEFA Intercontinental cup" : 4
},
"uniform" : "black and red"
}
This is a coach document:
{
"_id" : ObjectId("556cec3b9262ab4f14165fcd"),
"name" : "Carlo",
"surname" : "Ancelotti",
"age" : 55,
"date_Of_birth" : {
"day" : 10,
"month" : 6,
"year" : 1959
},
"place_Of_birth" : "Reggiolo",
"nationality" : "Italian",
"preferred_formation" : "4-2-3-1",
"coached_Team" : [
{
"team_id" : "RMa.103",
"in_charge" : {
"from" : "26/june/2013",
"to" : "25/may/2015"
},
"matches" : 119
},
{
"team_id" : "PSG.00",
"in_charge" : {
"from" : "30/dec/2011",
"to" : "24/june/2013"
},
"matches" : 77
},
{
"team_id" : "Che.11",
"in_charge" : {
"from" : "01/july/2009",
"to" : "22/may/2011"
},
"matches" : 109
},
{
"team_id" : "Mil.74",
"in_charge" : {
"from" : "07/nov/2001",
"to" : "31/may/2009"
},
"matches" : 420
}
]
As you can see, I used a normalized model: every coach has an array of coached teams.
I want to convert this Mongo database into a graph database, in particular Neo4j; my goal is to show that in this highly connected domains neo4j has better performance than Mongo(For example the query:"Find the palmarès of all teams coached by Carlo Ancelotti, in mongo requires two queries, instead in neo4j it's enough to follow relationships).
I found this guide on the forum that uses Gremlin to convert a mongo collection of documents into neo4j graph automatically.The problem is that the guide talks about just one collection.
So, is it possible to generate automatically the neo4j graph starting from my mongo database(with two collections) or must I create the graph "by hand"?
Gremlin is a Domain Specific Language for working with graphs, but it is based on Groovy so you effectively have all the flexibility you want to really do whatever you want. In other words, what you can do with one MongoDB collection you can easily do with two (or however many collections you have). That was the point of the blog post referenced in one of the other answers:
http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
Gremlin is a great language for transforming data into graph form, whatever its source format is. I would think that you would first load all of your teams as vertices then iterate through your coaches, creating coach vertices and edges to their related teams as you go.
I would also add that nothing is "automatic" about Gremlin. It's not as though you tell Gremlin that you have data in MongoDB and it turns it into a graph. You have to write Gremlin to tell it how you want your MongoDB data turned into a graph.
I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more
I'm preparing a descriptive "schema" (quelle horreur) for a MongoDB I've been working with.
I used the excellent variety.js to create a list of all keys and show coverage of each key. However, in cases where the values corresponding to the keys have a small set of values, I'd like to be able to list the entire set as "available values." In R, I'd be thinking of these as the "factors" for the categorical variable, ie, gender : ["M", "F"].
I know I could just use R + RMongo, query each variable, and basically do the same procedure I would to create a histogram, but I'd like to know the proper Mongo.query()/javascript/Map,Reduce way to approach this. I understand the db.collection.aggregate() functions are designed for exactly this.
Before asking this, I referenced:
http://docs.mongodb.org/manual/reference/aggregation/
http://docs.mongodb.org/manual/reference/method/db.collection.distinct/
How to query for distinct results in mongodb with python?
Get a list of all unique tags in mongodb
http://cookbook.mongodb.org/patterns/count_tags/
But can't quite get the pipeline order right. So, for example, if I have documents like these:
{_id : 1, "key1" : "value1", "key2": "value3"}
{_id : 2, "key1" : "value2", "key2": "value3"}
I'd like to return something like:
{"key1" : ["value1", "value2"]}
{"key2" : ["value3"]}
Or better, with counts:
{"key1" : ["value1" : 1, "value2" : 1]}
{"key2" : ["value3" : 2]}
I recognize one problem with doing this will be any values that have a wide range of different values---so, text fields, or continuous variables. Ideally, if there were more than x different possible values, it would be nice to truncate, say to no more than 20 unique values. If I find it's actually more, I'd query that variable directly.
Is this something like:
db.collection.aggregate(
{$limit: 20,
$group: {
_id: "$??varname",
count: {$sum: 1}
}})
First, how can I reference ??varname? for the name of each key?
I saw this link which had 95% of it:
Binning and tabulate (unique/count) in Mongo
with...
input data:
{ "_id" : 1, "age" : 22.34, "gender" : "f" }
{ "_id" : 2, "age" : 23.9, "gender" : "f" }
{ "_id" : 3, "age" : 27.4, "gender" : "f" }
{ "_id" : 4, "age" : 26.9, "gender" : "m" }
{ "_id" : 5, "age" : 26, "gender" : "m" }
This script:
db.collection.aggregate(
{$project: {gender:1}},
{$group: {
_id: "$gender",
count: {$sum: 1}
}})
Produces:
{"result" :
[
{"_id" : "m", "count" : 2},
{"_id" : "f", "count" : 3}
],
"ok" : 1
}
But what I don't understand is how could I do this generically for an unknown number/name of keys with a potentially large number of return values? This sample knows the key name is gender, and that the response set will be small (2 values).
If you already ran a script that outputs the names of all keys in the collection, you can generate your aggregation framework pipeline dynamically. What that means is either extending the variety.js type script or just writing your own.
Here is what it might look like in JS if passed an array called "keys" which has several non-"_id" named fields (I'm assuming top level fields and that you don't care about arrays, embedded documents, etc).
keys = ["key1", "key2"];
group = { "$group" : { "_id" : null } } ;
keys.forEach( function(f) {
group["$group"][f+"List"] = { "$addToSet" : "$" + f }; } );
db.collection.aggregate(group);
{
"result" : [
{
"_id" : null,
"key1List" : [
"value2",
"value1"
],
"key2List" : [
"value3"
]
}
],
"ok" : 1
}
When i don't use pagination, everything works fine (i have only 3 records in this collection, so all of them are listed here):
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1})
{ "_id" : ObjectId("4f33ff549112b9b84f000070"), "badge" : 3, "name" : "Dedetizadora Alvorada" }
{ "_id" : ObjectId("4f33ff019112b9b84f00005b"), "badge" : 2, "name" : "Sampex Desentupidora e Dedetizadora" }
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
But when i try to paginate from the first to the second page, one record doesn't show up and one is repeated:
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1}).skip(0).limit(2)
{ "_id" : ObjectId("4f33ff549112b9b84f000070"), "badge" : 3, "name" : "Dedetizadora Alvorada" }
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
db.suppliers.find({location: {$near: [-23.5968323, -46.6782386]}},{name:1,badge:1}).sort({badge:-1}).skip(2).limit(2)
{ "_id" : ObjectId("4f33feae9112b9b84f000046"), "badge" : 1, "name" : "Higitec Desentupimento e Dedetização" }
Am i doing something wrong or is this some kind of bug?
edit:
Here is a workaround for this. Basically you shouldn't mix $near queries with sorting; use $within instead.
There is an open issue regarding the same problem. Please have a look & vote Geospatial result paging fails when sorting with additional keys