I am using pipelines in pymongo to query a json file.
I have one list, "sixcities" containing the 6 'cities' with the 'highest count' of book shops i.e. the least book shops. (contains 6 pymongo instances)
{'_id': 'city1', 'count': 84}
{'_id': 'city2', 'count': 65}
{'_id': 'city3', 'count': 61}
{'_id': 'city4', 'count': 59}
{'_id': 'city5', 'count': 84}
{'_id': 'city6', 'count': 64}
I have a second list, "travelcities" with the counts of Travel Book shops in each of the 'cities' ( 20+) in the json file. (contains 20+pymongo instances)
{'_id': 'city1', 'count': 42}...etc
Please note:This list holds cities that do not feature in the first list.
I would like to use these lists to calculate the ratios of travel book shops in the 6 highest count cities.
The common key will be 'city' as this appears in documents of both lists
i.e. in list 2 : city1: 42 divided by in list 1: city1: 84 = 0.5 ratio
I am unsure of how to do this in pymongo as the information is in mongo documents within a list.
I thought some kind of nested loop would work:
dict={}
for i in sixcities: #loop through the first list
dict[i["_id"]]=i["count"]
for i in travelcities: #loop through second list
dict[i["_id"]]=i["count"]/(dict[i["_id"]]) #ratio
But I am getting the following result:
KeyError: 'city15'
This city does not appear in the first list as one of the 6 with the most bookshops, but it does appear in the second as containing a travel bookshop.
Any and all help is appreciated.
One of the problems in your code is that you are using same variable 'i' in both outer and inner loop
Consider this code which, for each city in first list search for it in the second list, then computes the ratio.
dict={}
for i in sixcities: #loop through the first list
dict[i["_id"]]=i["count"]
for j in travelcities: #loop through second list
if j["_id"] == i["_id"]:
dict[i["_id"]]=j["count"]/(dict[i["_id"]]) #ratio
Do note that if the city does not exist in the second list the answer remains the count of the city in the first list. Handle this corner case in the way you want.
Related
I have another question regarding Mongo DB Queries.I already asked this question using only one collection but it seems I am completely incapable
to help myself:(.
I have 2 collections looks like this for example:
collection patients:
{"_id": 1,
"$firstname": 'Max',
"$lastname": 'Mustermann'},
{"_id": 2,
"$firstname": 'Marina',
"$lastname": 'Musterfrau'}
collection text:
{"_id": 123,
"$content": 'Max Mustermann leidet unter Kopfschmerzen'},
{"_id":456,
"$content": 'Patientin leidet unter Kopfschmerzen'}
And now i want to query the $content where the values of $firstname and $lastname from collection $patients occur in the $content which would only give me the collection of Max Mustermann.
Background is that i need to gather the text data from $content and check if there is sensible data in there like the name, the birthday or anything like this. If it does contain sensible data is there a way to anonymize those parts of the text? If not i would just leave those ones out where the query is true e.g. the $content contains sensible data.
Result should be like:
{"_id": 123,
"$content": 'Max Mustermann leidet unter Kopfschmerzen'}
Thank you very much for your help!
I'm trying to reach something like this:
I have collections of activities that belong to some user.
I want to get the activity names distincted ordered by 'added_time', so I used 'group by' on the activity name and get the max value of 'added_time'.
Also, I want to sort them by 'added_time', and then to get the whole document.
The only thing that I reached so far, is to get only the name that I grouped by, and the 'added_time' property.
This is the query:
db.getCollection('user_activities').aggregate
(
{$match: {'type': 'food', 'user_id': '123'}},
{$group:{'_id':'$name', 'added_time':{$max:'$added_time'}}},
{$sort:{'added_time':-1}},
{$project: {_id: 0,name: "$_id",count: 1,sum: 1, 'added_time': 1}}
)
Can someone help me with reaching the whole document?
Thank's!
i have a query and when i validate it i see that the count command returns a different results from the aggregate result.
i have an array of sub-documents like so:
{
...
wished: [{'game':'dayz','appid':'1234'}, {'game':'half-life','appid':'1234'}]
...
}
i am trying to query a count of all games in the collection and return the name along with the count of how many times i found that game name.
if i go
db.user_info.count({'wished.game':'dayz'})
it returns 106 as the value and
db.user_info.aggregate([{'$unwind':'$wished'},{'$group':{'_id':'$wished.game','total':{'$sum':1}}},{'$sort':{'total':-1}}])
returns 110
i don't understand why my counts are different. the only thing i can think of is that it has to do with the data being in an array of sub-documents as opposed to being in an array or just in a document.
The $unwind statement will cause one user with multiple wished games to appear as several users. Imagine this data:
{
_id: 1,
wished: [{game:'a'}, {game:'b'}]
}
{
_id: 2,
wished: [{game:'a'}, {game:'c'}, {game:'a'}]
}
The count can NEVER be more than 2.
But with this same data, an $unwind will give you 5 different documents. Summing them up will then give you a:3, b:1, c:1.
I have a problem in mongodb.
I want to create aggregation witch result will be like this:
A 10
B 2
C 4
D 9
E 3
...
I have a column words in my table and I want to group my records according to first character of column words.
I find resolve for sql but not for mongo.
I will be very grateful for your help
You don't show what the docs in your collection look like, but you can use the aggregate collection method to do this:
// Group by the first letter of the 'words' field of each doc in the 'test'
// collection while generating a count of the docs in each group.
db.test.aggregate({$group: {_id: {$substr: ['$words', 0, 1]}, count: {$sum: 1}}})
how would I create a query to get both the current player's rank and the surrounding player ranks. For example, if I had a leaderboard collection with name and points
{name: 'John', pts: 123}
If John was in 23rd place, I would want to show the names of users in the 22nd and 24th place as well.
I could query for a count of leader board items with pts greater than 123 to get John's rank, but how can I efficiently get the one player that is ranked just above and below the current player? Can I get items based on index position alone?
I suppose I can make 2 queries, first to get the number the rank position of a user, then a skip limit query, but that seems inefficient and doesn't seem to have an efficient use of the index
db.leaderboards.find({pts:{$gt:123}}).count();
-> 23
db.leaderboards.find().skip(21).limit(3)
The last query seems to scan across 24 records using the its index, is there a way I can reasonably do this with a range query or something more efficient? I can see this becoming an issue if the user is very low ranked, like 50,000th place.
You'll need to do three queries:
var john = db.players.findOne({name: 'John'})
var next_player = db.players.find(
{_id: {$ne: john._id}, pts: {$gte: john.pts}}).sort({pts:1,name:1}).limit(-1)[0]
var previous_player = db.players.find(
{_id: {$ne: john._id}, pts: {$lte: john.pts}}).sort({pts:-1,name:-1}).limit(-1)[0]
Create indexes on name and pts.
Answer of A. Jesse Jiryu Davis is ok but I think there is another better option where geo/2d index could be used.
You could start with creating 2d index on pts field. And query the N number of documents near to a given point or a score. For example if you want to fetch 10 documents with points near to a score let say 123 then you can do this:
db.players.find( { pts: { $near: [ 123, 123 ] } } ).limit(10)
Points probably need to be normalised to fit the 2d index coordinates but this should work.