mongodb queries find total number of cities in the database - mongodb

Hi everyone I have a huge data that contains some information like this below:
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }
I want to be able to find the number of the cities in this database using Mongodb command. But also the database may have more than one recored that has the same city. As the example above.
I tried:
>db.zipcodes.distinct("city").count();
2015-04-25T15:57:45.446-0400 E QUERY warning: log line attempted (159k) over max size (10k), printing beginning and end ... TypeError: Object AGAWAM,BELCHERTOWN ***data*** has no method 'count'
but I didn't work with me.Also I did something like this:
>db.zipcodes.find({city:.*}).count();
2015-04-25T16:00:01.043-0400 E QUERY SyntaxError: Unexpected token .
But it didn't work also and even if does work it will count the redundant data (city). Any idea?

Instead of doing
db.zipcodes.distinct("city").count();
do this:
db.zipcodes.distinct("city").length;
and there is aggregate function, which may help you.
I have also found 1 example on aggregate (related to your query).
If you want to add condition, then you could refer $gte / $gte (aggregation) and/or $lte / $lte (aggregation)
See, if that helps.

You can also use the aggregation framework for this. The aggregation pipeline has two $group operator stages; the first groups the documents by city and the last calculates the total distinct documents from the previous stream:
db.collection.aggregate([
{
"$group": {
"_id": "$city"
}
},
{
"$group": {
"_id": 0,
"count": { "$sum": 1 }
}
}
]);
Output:
/* 1 */
{
"result" : [
{
"_id" : 0,
"count" : 3
}
],
"ok" : 1
}

Related

Mongodb accessing documents

I've the following db:
{ "_id" : 1, "results" : [ { "product" : "abc", "score" : 10 }, { "product" : "xyz", "score" : 5 } ] }
{ "_id" : 2, "results" : [ { "product" : "abc", "score" : 8 }, { "product" : "xyz", "score" : 7 } ] }
{ "_id" : 3, "results" : [ { "product" : "abc", "score" : 7 }, { "product" : "xyz", "score" : 8 } ] }
I want to show the first score of each _id, i tried the following:
db.students.find({},{"results.$":1})
But it doesn't seem to work, any advice?
You can take advantage of aggregation pipeline to solve this.
Use $project in conjunction with $arrayElemAt to point to appropriate node index in the array.
So, to extract the documents of the first score, have written below query.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results",0]}} } ]);
In case if you just wish to have scores excluding product, use $results.score as shown below.
db.students.aggregate([ {$project: { scoredoc:{$arrayElemAt:["$results.score",0]}} } ]);
Here scoredoc object will have all documents of first score element.
Hope this helps!
According to above mentioned description please try executing following query in MongoDB shell
db.students.find(
{results:
{$elemMatch:{score:{$exists:true}}}}, {'results.$.score':1}
)
According to MongoDB documentation
The positional $ operator limits the contents of an from the
query results to contain only the first element matching the query
document.
Hence in above mentioned query positional $ operator is used in projection section to retrieve first score of each document.

allowDiskUse not working in pymongo

I have data stored in MongoDB in the following format.
{
"_id" : ObjectId("570b487fb5360dd1e5ef840c"),
"internal_id" : 1,
"created_at" : ISODate("2015-07-14T10:08:38.994Z"),
"updated_at" : ISODate("2016-01-10T00:35:19.748Z"),
"ad_account_id" : 1,
"updated_time" : "2013-08-05T04:48:49-0700",
"created_time" : "2013-08-05T04:46:35-0700",
"name" : "Sale1",
"daily": [
{"clicks": 5000, "date": "2015-04-16"},
{"clicks": 5100, "date": "2015-04-17"},
{"clicks": 5030, "date": "2015-04-20"}
]
"custom_tags" : {
"Event" : {
"name" : "Clicks"
},
"Objective" : {
"name" : "Sale"
},
"Image" : {
"name" : "43c3fe7b262cde5f476ed303e472c65a"
},
"Goal" : {
"name" : "10"
},
"Type" : {
"name" : "None"
},
"Call To Action" : {
"name" : "None",
},
"Landing Pages" : {
"name" : "www.google.com",
}
}
I am trying to group individual documents by internal_id to find the aggregate sum of clicks from say 2015-04-15 to 2015-04-21 using the aggregate method.
In pymongo, when I try to do an aggregate using just $project on internal_id, I get the results, but when I try to $project custom_tags fields, I get the following error:
OperationFailure: Exceeded memory limit for $group, but didn't allow external sort.
Pass allowDiskUse:true to opt in.
Following the answer here, I even changed my aggregate function to list(collection._get_collection().aggregate(mongo_query["pipeline"], allowDiskUse=True)). But this still keeps throwing the earlier error.
Take a look at this link:
Can't get allowDiskUse:True to work with pymongo
This Works for me:
someSampleList= db.collectionName.aggregate(pipeline, allowDiskUse=True)
Where
pipeline = [
{'$sort': {'sortField': 1}},
{'$group': {'_id': '$distinctField'}},
{'$limit': 20000}]
Try with that:
list(collection._get_collection().aggregate(mongo_query["pipeline"], {allowDiskUse : true}))

MongoDB aggregate Query with select fields

I want to make chat system, and also need get last message of user which aggigation. I also provide query with this but it only return userId of user. so please help me, thanks
Database:
/* 1 */
{
"_id" : ObjectId("56937df0418a6afab248616d"),
"to" : ObjectId("56728051d4b426be03de18f2"),
"from" : ObjectId("568e402eaecfa53282f60d17"),
"msg" : "Hello!",
"cd" : ISODate("2016-01-11T10:03:28.139Z"),
"type" : "other",
"ir" : 0
}
/* 2 */
{
"_id" : ObjectId("56937e01418a6afab248616e"),
"to" : ObjectId("568e402eaecfa53282f60d17"),
"from" : ObjectId("56728051d4b426be03de18f2"),
"msg" : "Hi!",
"cd" : ISODate("2016-01-11T10:03:45.588Z"),
"type" : "other",
"ir" : 0
}
/* 3 */
{
"_id" : ObjectId("56937e45418a6afab248616f"),
"to" : ObjectId("56728051d4b426be03de18f2"),
"from" : ObjectId("568e402eaecfa53282f60d17"),
"msg" : "Shu che ela!",
"cd" : ISODate("2016-01-11T10:04:53.280Z"),
"type" : "other",
"ir" : 0
}
Query:
db.getCollection('chat_message').aggregate( [
{
$match: {
ir: 0,
$or : [
{"to" : ObjectId("56728051d4b426be03de18f2")}
]
}
},
{ $group: { _id: "$from" } },
])
I Run this query but not get result which i want
Require Out came:
/* 1 */
{
"result" : [
{
"_id" : ObjectId("568e402eaecfa53282f60d17"),
"msg" : "Shu che ela!"
}
],
"ok" : 1.0000000000000000
}
You are right track but you missing some things about $mongo object Id
In MongoDB, documents stored in a collection require a unique _id field that acts as a primary key.
so when you run your aggregation query $group creates _id for from key but in your document structure looks like from having two documents with same ObjectId since the result return first matching criteria. This return only "msg" : "Hello!", or "msg" : "Shu che ela!" which documents inserted first return first.
so should changed your aggreation like this {"$group":{"_id":"$_id","msg":{"$first":"$msg"}}} you will get both documents.

Find maximum date from multiple embedded documents

One of many documents in my collection is like below:
{ "_id" :123,
"a" :[
{ "_id" : 1,
"dt" : ISODate("2013-06-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-02-10T19:38:42Z")
}
],
"b" :[
{ "_id" : 1,
"dt" : ISODate("2013-02-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-23-10T19:38:42Z")
}
],
"c" :[
{ "_id" : 1,
"dt" : ISODate("2013-03-10T19:38:42Z")
},
{ "_id" : 2,
"dt" : ISODate("2013-13-10T19:38:42Z")
}
]
}
I want to find the maximum date for the whole document (a,b,c).
The solution i have right now is, I loop through all root _id then do a $match in aggregation framework for each a, b, c for every root document. this sounds very inefficient, any better ideas?
Your question is very very similar to this question. Check it out.
As with that solution, your issue can be handled with MongoDB's aggregation framework, using the $project and $cond operators to repeatedly flatten your documents while preserving a max value at each step.

MongoDB: Query by size of array with a filtered value

In MongoDB, I have a collection ("users") with the following basic schema:
{
"_id" : ObjectId("50e5de00b623143995c5b739")
"name" : "Jon",
"emails_sent" : [
{
"type" : "invite",
"sent_time" : ISODate("2013-04-21T21:11:50.999Z")
},
{
"type" : "invite",
"sent_time" : ISODate("2013-04-15T21:10:35.999Z")
},
{
"type" : "follow",
"sent_time" : ISODate("2013-04-21T21:11:50.999Z")
}
]
}
I'd like to query for users based on the $size of emails_sent of a certain "type" only, e.g. only count "invite" emails. Is there any way to achieve this sort of "filtered count" in a standard mongo query?
Many thanks.
db.users.aggregate([
{$unwind:'$emails_sent'},
{$match: {'$emails_sent.type':'invite'}},
{$group : {_id : '$name', sumOfEmailsSent:{$sum:1} }}
]);
BTW you are missing a square bracket which closes the $emails_sent array.