Get a count of documents in an aggregation under specific requirement

Get a count of documents in an aggregation under specific requirement - mongodb

In my collection, each document represents a user-generated quiz, and includes an array field for tags, i.e. History, Science, Math, etc. I am trying to get a count of documents associated with each tag.
The below aggregation results in a unique tag list that look like this: {tags:["History", "Science", "Math"]}
db.quizzes.aggregate([
{$unwind: "$tags"},
{$group: {_id:null, tgs: {$addToSet: "$tags"}}},
{$project: {_id:0, tags: "$tgs"}},
])
However, can the above aggregation also get a count of the number of documents that contains each tag? For example if there were 3 History quizzes, 2 Science quizzes, and 5 Math quizzes, the result would look like this: {tags:[{tag: "History", count: 3}, {tag: "Science", count: 2}, {tag: "Math", count:5}]}
Thanks in advance for any tips.
Edited to include collection documents:
{
"_id" : ObjectId("57d8ccd573099cb013b462b5"),
"title" : "Presidential Trivia",
"quiz" : "[{\"question\":\"How many presidents were members of the Whig party?\",\"choices\":[\"Two\",\"Three\",\"Four\"],\"correct\":\"2\"},{\"question\":\"Who was the first president to be impeached?\",\"choices\":[\"Warren Harding\",\"Andrew Johnson\",\"Andrew Jackson\"],\"correct\":\"1\"},{\"question\":\"How many presidents died during their presidency?\",\"choices\":[\"Four\",\"Six\",\"Eight\"],\"correct\":\"2\"},{\"question\":\"How many presidents had no party affiliation?\",\"choices\":[\"One\",\"Two\",\"Three\"],\"correct\":\"0\"},{\"question\":\"Who was the only president to serve two non-consecutive terms, making him both the 22nd and 24th president?\",\"choices\":[\"John Quincy Adams\",\"Grover Cleveland\",\"Theodore Roosevelt\"],\"correct\":\"1\"}]",
"correctArray" : "[\"2\",\"1\",\"2\",\"0\",\"1\"]",
"author" : "jake2",
"createTime" : ISODate("2016-09-14T04:06:45.118Z"),
"likes" : 0,
"avgScore" : 0,
"plays" : 3,
"private" : "0",
"tags" : [
"US Presidents",
"American History",
"History"
]
}
{
"_id" : ObjectId("57d8d08973099cb013b462b6"),
"title" : "Finance Quiz",
"quiz" : "[{\"question\":\"Which of these involves the analysis of of a business's financial statements, often used in stock valuation?\",\"choices\":[\"Fundamental Analysis\",\"Technical Analysis\",\"P/E ratio\"],\"correct\":\"0\"},{\"question\":\"What was the name of the bond purchasing program started by the U.S. Federal Reserve in response to the 2008 financial crisis?\",\"choices\":[\"Stimulus Package\",\"Quantitative Easing\",\"Mercantilism\"],\"correct\":\"1\"},{\"question\":\"Which term describes a debt security issued by a government, company, or other entity?\",\"choices\":[\"Bond\",\"Stock\",\"Mutual fund\"],\"correct\":\"0\"},{\"question\":\"Which of these companies has the largest market capitalization (as of October 2015)?\",\"choices\":[\"Ford Motors\",\"Apple\",\"Bank of America\"],\"correct\":\"1\"},{\"question\":\"Which of these is a measure of the size of an economy?\",\"choices\":[\"Purchasing Power Index\",\"Unemployment Rate\",\"Gross Domestic Product\"],\"correct\":\"2\"}]",
"correctArray" : "[\"0\",\"1\",\"0\",\"1\",\"2\"]",
"author" : "jake2",
"createTime" : ISODate("2016-09-14T04:22:33.756Z"),
"tags" : [
"Finance"
],
"likes" : 0,
"avgScore" : 0,
"plays" : 10,
"private" : "0"
}
{
"_id" : ObjectId("57d8d24073099cb013b462b8"),
"title" : "Astronomy Pop Quiz",
"quiz" : "[{\"question\":\"Which of the following are currently (as of November 2015) used by scientists as observational evidence of the existence of dark matter?\",\"choices\":[\"Gravitational Lensing\",\"Specimens of dark matter collected by NASA\",\"Anomalies in planetary orbits\"],\"correct\":\"0\"},{\"question\":\"Which of these emits the most energy?\",\"choices\":[\"Stars\",\"Quasars\",\"Black Holes\"],\"correct\":\"1\"},{\"question\":\"What is it called when light or electromagnetic radiation from an object is increased in wavelength?\",\"choices\":[\"The Jupiter Effect\",\"Redshift\",\"The Observer's Differential\"],\"correct\":\"1\"},{\"question\":\"Who was the first human in space?\",\"choices\":[\"Yuri Gagarin\",\"Alan Shepard\",\"John Glenn\"],\"correct\":\"0\"},{\"question\":\"Which of these is the most dense?\",\"choices\":[\"The Sun\",\"A neutron star\",\"Earth\"],\"correct\":\"1\"}]",
"correctArray" : "[\"0\",\"1\",\"1\",\"0\",\"1\"]",
"author" : "Bertram",
"createTime" : ISODate("2016-09-14T04:29:52.636Z"),
"tags" : [
"Astronomy"
],
"likes" : 1,
"avgScore" : 0,
"plays" : 5,
"private" : "0"
}
{
"_id" : ObjectId("57d8d3c173099cb013b462ba"),
"title" : "Film Trivia",
"quiz" : "[{\"question\":\"Who directed The Godfather trilogy?\",\"choices\":[\"John Huston\",\"Francis Ford Coppola\",\"Martin Scorsese\"],\"correct\":\"1\"},{\"question\":\"What year was the first Ocscar awarded?\",\"choices\":[\"1923\",\"1927\",\"1932\"],\"correct\":\"1\"},{\"question\":\"As of 2010, this and Schindler's List (1993) are the only films to win Best Picture, Director and Screenplay at the Golden Globes, BAFTAs and the Oscars.\",\"choices\":[\"Rain Man\",\"Slumdog Millionaire\",\"Titanic\"],\"correct\":\"1\"},{\"question\":\"In Casablanca, why can't Rick return to America?\",\"choices\":[\"He is indebted to the mob.\",\"He was deported.\",\"No reason is given.\"],\"correct\":\"2\"},{\"question\":\"What was the highest-grossing Western of all time?\",\"choices\":[\"Django Unchained\",\"True Grit\",\"Dances with Wolves\"],\"correct\":\"2\"}]",
"correctArray" : "[\"1\",\"1\",\"1\",\"2\",\"2\"]",
"author" : "Pappy2",
"createTime" : ISODate("2016-09-14T04:36:17.950Z"),
"tags" : [
"Movies"
],
"likes" : 1,
"avgScore" : 0,
"plays" : 8,
"private" : "0"
}
{
"_id" : ObjectId("57ea7f67a58303f01a585e55"),
"title" : "US History Concepts",
"quiz" : "[{\"question\":\"\",\"choices\":[\"\",\"\",\"\"]}]",
"correctArray" : "[]",
"author" : "martha",
"createTime" : ISODate("2016-09-27T14:17:11.627Z"),
"tags" : [
"US History",
"History"
],
"likes" : 0,
"avgScore" : 0,
"plays" : 1,
"private" : "0"
}

You can try the following aggregation pipeline.
db.quizzes.aggregate([
{"$unwind":"$tags"},
{"$group":{"_id":"$tags", count:{$sum:1}}},
{"$project":{"_id":0, "tags":{"tag":"$_id","count":"$count"}}},
{"$group":{"_id":null, "tags":{"$push":"$tags"}}},
{"$project":{"_id":0, tags:1}}
])

Related

MongoDB[4.2] $text search not returning expected results

We have author collection which contains author information for all the authors. We created text index using following
db.getCollection('contributors').createIndex(
{
display_name:"text",
first_name: "text",
last_name: "text"
},
{
weights: {
display_name: 10,
first_name: 5,
last_name:5
},
name: "Contributor_FTS_Index"
}
)
Here is sample data we have
{
"_id" : ObjectId("5eac8232eb5aca201f104bfb"),
"firebrand_id" : 54529588,
"agents" : null,
"created" : ISODate("2020-05-01T20:10:26.762Z"),
"display_name" : "Grace Octavia",
"email" : null,
"estates" : null,
"first_name" : "Grace",
"item_type" : "Contributor",
"last_name" : "Octavia",
"phone" : null,
"role" : 1,
"short_bio" : "GRACE OCTAVIA is the author of unforgettable novels that deal with the trials and tribulations of love, friendship, and what it means to be true to yourself. Her second novel, His First Wife, graced the Essence® bestseller list and also won the Best African-American Fiction Award from RT Book Reviews. A native of Westbury, NY, she now resides in Atlanta, GA, where there is never any shortage of material on heartache and scandal. Grace earned a doctorate in English, Creative Writing at Georgia State University in Atlanta and currently teaches at Spelman College. Visit her online at GraceOctavia.net or follow her on Twitter #GraceOctavia2.",
"slug" : "grace-octavia",
"updated" : ISODate("2020-08-05T10:10:27.691Z"),
"deleted" : false
}
{
"_id" : ObjectId("5ada44aa2ad4b3e3d0ae3daf"),
"item_type" : "Contributor",
"role" : 1,
"short_bio" : "",
"firebrand_id" : 41529135,
"display_name" : "Grace Octavia",
"first_name" : "Grace",
"last_name" : "Octavia",
"slug" : "grace-octavia",
"updated" : ISODate("2020-09-22T16:19:57.319Z"),
"agents" : null,
"estates" : null,
"deleted" : false,
"email" : null,
"phone" : null
}
{
"_id" : ObjectId("58e6ee27afbe421347a11834"),
"item_type" : "Contributor",
"role" : 1,
"short_bio" : "Octavia E. Butler (1947–2006) was a bestselling and award-winning author, considered one of the best science fiction writers of her generation. She received both the Hugo and Nebula awards, and in 1995 became the first author of science fiction to receive a MacArthur Fellowship. She was also awarded the prestigious PEN Lifetime Achievement Award in 2000. Her first novel, <i>Patternmaster</i> (1976), was praised both for its imaginative vision and for Butler’s powerful prose, and spawned four prequels, beginning with <i>Mind of My Mind</i> (1977) and finishing with <i>Clay’s Ark</i> (1984).<br /><br /> Although the Patternist series established Butler among the science fiction elite, it was <i>Kindred</i> (1979), a story of a black woman who travels back in time to the antebellum South, that brought her mainstream success. In 1985, Butler won Nebula and Hugo awards for the novella “Bloodchild,” and in 1987 she published <i>Dawn</i>, the first novel of the Xenogenesis trilogy, about a race of aliens who visit earth to save humanity from itself. <i>Fledgling</i> (2005) was Butler’s final novel. She died at her home in 2006.",
"firebrand_id" : 11532005,
"display_name" : "Octavia E. Butler",
"first_name" : "Octavia",
"last_name" : "Butler",
"slug" : "octavia-e-butler",
"updated" : ISODate("2020-09-23T04:06:18.857Z"),
"image" : "https://s3.amazonaws.com/orim-book-contributors/11532005-book-contributor.jpg",
"agents" : [
{
"name" : "Heifetz, Merrilee",
"primaryemail" : "mheifetz#writershouse.com",
"primaryphone" : "212-685-2605"
}
],
"estates" : [
{
"name" : "Estate of Octavia E. Butler",
"primaryemail" : "",
"primaryphone" : ""
}
],
"deleted" : false,
"email" : null,
"phone" : null
}
When we try to execute something like following;
db.getCollection('contributors').find({ $text: { $search: "oct" }})
it don't return any document. But if search for
db.getCollection('contributors').find({ $text: { $search: "octavia" }})
it returns all the document.
Our requirement is to give search result based on search term user entering. So it can be oc, oct, octav

Populer way to use this type of search Instead of $text so try like This,
db.contributors.find({
"$or": [
{
display_name: {
$regex: "oct",
$options: "i"
}
}
// add more fields objects same as above
]
});

You picked the wrong tool. Text search in mongo uses whole words. Read more about mongo tokenizer at https://docs.mongodb.com/manual/core/index-text/#tokenization-delimiters
The part-word index requires ngram tokenizer. It is available in full-featured text engines. E.g. based on Apache Lucene: ElasticSearch, Solr, Mongo Atlas, etc.
If your database is relatively small and weights are not essential, you can use regexp:
db.contributors.find({
"$or": [
{
displayname: {
$regex: "oct",
$options: "i"
}
},
{
first_name: {
$regex: "oct",
$options: "i"
}
},
{
last_mname: {
$regex: "oct",
$options: "i"
}
}
]
})

rename year field in $project in MongoDB

I am trying to rename my ID field in the project phase but I have an error message. The $match and $sort phases work fine. Here are the details:
db.complaints.aggregate([
{$match:{$text:{$search:"\"loan\""}}},
{$group:{"_id":{Year:{$substr: ["$received", 0, 4]}}, "loan":{$sum:1}}},
{$sort:{_id:-1}},
{$project:{_id:0, "Year":"_id.Year", "loan":1}}
])
Here is my schema:
> db.complaints.findOne()
{
"_id" : ObjectId("55e5990d991312e2c9b266e3"),
"complaintID" : 1388734,
"product" : "mortgage",
"subProduct" : "conventional adjustable mortgage (arm)",
"issue" : "loan servicing, payments, escrow account",
"subIssue" : "",
"state" : "va",
"ZIP" : 22204,
"submitted" : "web",
"received" : "2015-05-22",
"sent" : "2015-05-22",
"company" : "green tree servicing, llc",
"response" : "closed with explanation",
"timely" : "yes",
"disputed" : ""
}

Convert a MongoDB with two collections in a neo4j graph

I finished to create my Mongo database. It is made on two collections:
1. team
2. coach
I give you an example of the documents contained in these collections:
Here is a team document:
{
"_id" : "Mil.74",
"official_name" : "Associazione Calcio Milan S.p.A",
"common_name" : "Milan",
"country" : "Italy",
"started_by" : {
"day" : 16,
"month" : 12,
"year" : 1899
},
"stadium" : {
"name" : "Giuseppe Meazza",
"capacity" : 81277
},
"palmarès" : {
"Serie A" : 18,
"Serie B" : 2,
"Coppa Italia" : 5,
"Supercoppa Italiana" : 6,
"UEFA Champions League" : 7,
"UEFA Super Cup" : 5,
"Cup Winners cup" : 2,
"UEFA Intercontinental cup" : 4
},
"uniform" : "black and red"
}
This is a coach document:
{
"_id" : ObjectId("556cec3b9262ab4f14165fcd"),
"name" : "Carlo",
"surname" : "Ancelotti",
"age" : 55,
"date_Of_birth" : {
"day" : 10,
"month" : 6,
"year" : 1959
},
"place_Of_birth" : "Reggiolo",
"nationality" : "Italian",
"preferred_formation" : "4-2-3-1",
"coached_Team" : [
{
"team_id" : "RMa.103",
"in_charge" : {
"from" : "26/june/2013",
"to" : "25/may/2015"
},
"matches" : 119
},
{
"team_id" : "PSG.00",
"in_charge" : {
"from" : "30/dec/2011",
"to" : "24/june/2013"
},
"matches" : 77
},
{
"team_id" : "Che.11",
"in_charge" : {
"from" : "01/july/2009",
"to" : "22/may/2011"
},
"matches" : 109
},
{
"team_id" : "Mil.74",
"in_charge" : {
"from" : "07/nov/2001",
"to" : "31/may/2009"
},
"matches" : 420
}
]
As you can see, I used a normalized model: every coach has an array of coached teams.
I want to convert this Mongo database into a graph database, in particular Neo4j; my goal is to show that in this highly connected domains neo4j has better performance than Mongo(For example the query:"Find the palmarès of all teams coached by Carlo Ancelotti, in mongo requires two queries, instead in neo4j it's enough to follow relationships).
I found this guide on the forum that uses Gremlin to convert a mongo collection of documents into neo4j graph automatically.The problem is that the guide talks about just one collection.
So, is it possible to generate automatically the neo4j graph starting from my mongo database(with two collections) or must I create the graph "by hand"?

Gremlin is a Domain Specific Language for working with graphs, but it is based on Groovy so you effectively have all the flexibility you want to really do whatever you want. In other words, what you can do with one MongoDB collection you can easily do with two (or however many collections you have). That was the point of the blog post referenced in one of the other answers:
http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
Gremlin is a great language for transforming data into graph form, whatever its source format is. I would think that you would first load all of your teams as vertices then iterate through your coaches, creating coach vertices and edges to their related teams as you go.
I would also add that nothing is "automatic" about Gremlin. It's not as though you tell Gremlin that you have data in MongoDB and it turns it into a graph. You have to write Gremlin to tell it how you want your MongoDB data turned into a graph.

Update a document (quickly) - MongoDB

Consider this MongoDB document:
{
"_id" : "RMa.103",
"official_name" : "Real Madrid Club de Fùtbol",
"country" : "Spain",
"started_by" : {
"day" : 6,
"month" : 3,
"year" : 1902
},
"stadium" : {
"name" : "Santiago Bernabeu",
"capacity" : 85454
},
"palmarès" : {
"La Liga" : 32,
"Copa del Rey" : 19,
"Supercopa de Espana" : 9,
"UEFA Champions League" : 10,
"UEFA Europa League" : 2,
"UEFA Super Cup" : 2,
"FIFA Club World cup" : 4
},
"uniform" : "white"
}
I forgot to insert an important information of the team: the common name.
So, I updated the document:
[1] db.team.update({_id:"RMa.103"}, {$set:{common_name:"Real Madrid"}})
In this way, the new information is added at the end of the document, instead I want it after the official_name:
{
"_id" : "RMa.103",
"official_name" : "Real Madrid Club de Fùtbol",
"common_name" : "Real Madrid"
.......
.......
.......
}
Now, I know that updating the document with the following method, I have the common name of the team in the right location:
db.team.update(
{_id:"RMa.103"}, {$set:{ "_id" : "RMa.103",
"official_name": "Real Madrid Club de Fùtbol",
common_name:"Real Madrid", "country" : "Spain",
"started_by" : { "day" : 6, "month" : 3, "year" : 1902 },
"stadium" : { "name" : "Santiago Bernabeu", "capacity" : 85454 },
"palmarès" : { "La Liga" : 32, "Copa del Rey" : 19, "Supercopa de Espana" : 9,
"UEFA Champions League" : 10, "UEFA Europa League" : 2, "UEFA Super Cup" : 2,
"FIFA Club World cup" : 4 }, "uniform" : "white", "common_name" : "Real Madrid" }})
I have to update a lot of documents and this kind of operation is very difficult and boring from the shell of the prompt. Are there faster methods to do this update? For example is it possible to change the update method of the [1]?

From the JSON documentation :
An object is an unordered set of name/value pairs.
So there is absolutely no way to enforce any kind of key order in JSON therefore it's the same for MongoDB documents.
It's useless : any MongoDB driver will give you the corresponding value with a given key. The key order is pointless.

No, not really. You should not have to have it in a specific order. When querying from a application, it won't care what order your values are in. For example:
db.collection("team").find({offical_name:"Real Madrid Club de Fùtbol"},function(err,results)
{
/*Gets the first result (should only be one)
and prints the common name*/
console.log(results[0].common_name);
});
If you are wondering why they would not have a way to order it for human eyes, you must remember that databases are not designed to be interacted with directly. They are supposed to be in conjunction with an application that uses them and presents the data in a human friendly manner.
I hope I answered your question sufficiently, if not please comment below and I will explain more.

Mongodb save/upsert using C# drivers, continuous array adds and field updates to same doc

I need some ideas/tips for this. Here is a sample document I am storing:
{
"_id" : new BinData(0, "C3hBhRCZ5ZFizqbO1hxwrA=="),
"gId" : 237,
"name" : "WEATHER STATION",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw" : [{
"tag" : "Weather Sensors",
"snrs" : [{
"_id" : NumberLong(7),
"sdn" : "Wind Speed"
}, {
"_id" : NumberLong(24),
"sdn" : "Wind Gust"
}, {
"_id" : NumberLong(28),
"sdn" : "Wind Direction"
}, {
"_id" : NumberLong(31),
"sdn" : "Rainfall Amount"
}, {
"_id" : NumberLong(33),
"sdn" : "Rainfall Peak Amount"
}, {
"_id" : NumberLong(38),
"sdn" : "Barometric Pressure"
}],
"_id" : 1
}]
}
What I am currently doing is using the C# driver and performing a .Save() to my collection to get upsert, however, what I want is kinda a hybrid approach I guess. Here are the distinct operations I need to be able to perform:
Upsert entire document if it does not exist
Update the dt field with a new timestamp if the document does exist
For the hw field, I need several things here. If hw._id exists, update its tag field as well as handling the snrs field by either updating existing entries so the sdn value is updated or adding entirely new entires when _id does not exist
Nothing should ever be removed from the hw array and nothing should ever be removed from the snrs array.
A standard upsert does not appear to get me what I am after, so I am looking for the best way to do what I need with as few roundtrips to the server as possible. I am thinking some of the $ Operators may be what I am needing here, but just need some thoughts on how best to approach this.
The gist of what I am doing here is keeping an accumulating, historical document of snrs entries with the immediate current value as well as retaining any historical entries in the array even though they are no longer "alive", being reported, etc. This allows future reporting on things that no longer exist in current time, but were at some point in the past. _id values are application-generated, globally unique across all documents, and never change after initial creation. For example, last week "Wind Speed" was being reported, but this week it is not. It's _id value, however, will not change if "Wind Speed" starts reporting again. Follow?
Clarifications or more detail can be provided if needed.
Thanks.

By changing the structure of your document from embedded arrays to subdocuments key'ed by the _ids you can do this.
e.g.
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-02-24T18:59:02Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}
I created the above document by the following upsert
db.foo.update(
{_id:1},
{
$set: {
"gId" : 237,
"name" : "WEATHER STATION 1",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw.1.tag" : "Weather Sensors",
"hw.1.snrs.1" : "Wind Speed",
"hw.1.snrs.2" : "Wind Gust"
}
},
true
)
Now when I run
db.foo.update(
{_id:1},
{
$set: {
"dt" : new Date(),
"hw.2.snrs.1" : "Rainfall Amount"
}
},
true
)
I get
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-03-07T05:14:31.881Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
},
"2" : {
"snrs" : {
"1" : "Rainfall Amount"
}
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Get a count of documents in an aggregation under specific requirement - mongodb

You can try the following aggregation pipeline. db.quizzes.aggregate([ {"$unwind":"$tags"}, {"$group":{"_id":"$tags", count:{$sum:1}}}, {"$project":{"_id":0, "tags":{"tag":"$_id","count":"$count"}}}, {"$group":{"_id":null, "tags":{"$push":"$tags"}}}, {"$project":{"_id":0, tags:1}} ])

Related

MongoDB[4.2] $text search not returning expected results

rename year field in $project in MongoDB

Convert a MongoDB with two collections in a neo4j graph

Update a document (quickly) - MongoDB

Mongodb save/upsert using C# drivers, continuous array adds and field updates to same doc

Categories

Resources