pymongo query for distinct values and case insensitive - mongodb

How to get distinct values from mongodb collection with case insensitive. with given examples I can able to find distinct values.
collection:location schema
{ "_id" : ObjectId("542bc237e75e4a30c2e13b7e"),"place" : ["Hyderabad"]}
{ "_id" : ObjectId("542bc238e75e4a30c2e13b7f"),"place" : ["hyderabad"]}
Example:
from pymongo import MongoClient
MongoClient client = MongoClient('mongodb://localhost:27017/')
db = client.india
collection = db.location
doc = collection.distinct("place")
print doc [[u'Hyderabad'],[u'hyderabad']]
But I need to get only one value as hyderabad, as value being same in two documents.

Why is place an array? I guess you have > 1 values in it sometimes in real documents? Start by unwinding, then use $toLower string operator in an otherwise standard compute-distinct-values aggregation pipeline:
db.test.aggregate([
{ "$unwind" : "$place" },
{ "$group" : { "_id" : { "$toLower" : { "$place" } } } }
])

Related

How to delete the document based on ObjectId (_id)

How do I write a MongoDB shell query that will return the documents for all objects created after a specific date?
Collections like:
{
"_id" : ObjectId("59918c9014450171039b7e1f"),
"cont_id" : "59918c9014450171039b7e1d",
"systemdate" : ISODate("2017-07-25T00:09:00.567Z"),
}
db.itemtable.count({"systemdate" : { $gte: ISODate("2017-07-25T00:00:00.000Z")}})
Returns - 15210
db.itemtable.count({'_id': {'$gt' : ObjectId("59918c9014450171039b7e1f")}})
Returns - 987652
Thanks!
Bharathi
db.itemtable.find({"systemdate" : { $gte: ISODate("2017-07-25T00:00:00.000Z")}}).count()
returns count of those documents.
If you want cursor to those documents, use just find db.itemtable.find({"systemdate" : { $gte: ISODate("2017-07-25T00:00:00.000Z")}})

Using Mongodb _id field to query partially on one or more of the composite fields

I am using _id field as a compound key for my document with 2 fields as below.
{
"_id" : {
"timestamp" : ISODate("2016-08-25T05:43:00.000-19:30"),
"hostName" : "nj"
}
}
What I noticed it I am able to only query if I use both the fields together in my query. If I use one of them, I do not get any documents returned.
db.getCollection('sales').find(
{
"_id" : {
"hostName" : "tryme"
}
}
db.getCollection('sales').find(
{
"_id" : {
"timestamp" : ISODate("2016-08-25T05:43:00.000-19:30")
}
}
The above script does not return any documents.
Also, I am not able to use $gte/$lte operators on the date fields,
db.getCollection('sales').find(
{
"_id" : {
"timestamp" : {
"$lte":ISODate("2016-08-25T04:51:00.000-19:30")
},
"hostName" : "tryme"
}
}
)
The above also does not return any docs.
The below queries works but I see as per explain() it uses a collection scan and index is not used.
db.getCollection('sales').find(
{
"_id.timestamp" : ISODate("2016-08-25T04:51:00.000-19:30"),
"_id.hostName" : "tryme"
}
)
==
db.getCollection('sales').find(
{
"_id.timestamp" : {
"$gte": ISODate("2016-08-25T04:52:00.000-19:30")
},
"_id.hostName" : "tryme"
}
)
Not sure If I have understood how the _id field works correctly.
Basically, I want to be able to use partial fields of the composite query and also use the date type field also for range queries like between/greaterthan/lesser than etc at the same time leveraging the index on the _id field.
Can someone please help me on this.
Thanks,
Sri
From the docs:
MongoDB uses the dot notation to access the elements of an array and
to access the fields of an embedded document.
Your firsts attempts doesn't work because you are passing a nested object as query which matches for equality, use dot notation instead.

How do I fetch records matching output of aggregate function in mongoDB?

I have queried mongodb by using aggregate function and got two fields in the output.
The result of my db.Collection.aggregate(..) looks like the below:
{
"_id" : NumberLong(203440),
"date" : ISODate("2013-05-11T00:00:00Z")
}
{
"_id" : NumberLong(203520),
"date" :ISODate("2013-01-05T00:00:00Z")
}
{
"_id" : NumberLong(203970),
"date": ISODate("2013-01-11T00:00:00Z")
}
{
"_id" : NumberLong(203660),
"date" : ISODate("2013-01-11T00:00:00Z")
}
{
"_id" : NumberLong(203360),
"date" : ISODate("2013-01-11T00:00:00Z")
}
How do I get the records in the collection for which these two fields are true?(in a single query)
i.e If each record in my collection has the fields data,_id, x, y, z, a , b and c,
how do I fetch list of records for which the date and _id are equal to the above result of aggregate?
In the aggregation command, using the below in the $group part helped me.
"allFields": {
"$first": "$$CURRENT"
}
In my response, I got the entire document in "allFields" field.

Mongo: querying for object inside array

I've tried using $in and $elemMatch to query for all objects matching member_id field within an object array, neither are returning data. Example queries:
db.events.find({"source_site":{"event_hosts":{$in:[{"member_id":12300113}]}}})
and
db.events.find({source_site:{event_hosts:{$elemMatch:{member_id:12300113}}}})
Sample data to query in Mongo:
{
"_id" : ObjectId("541890c2660a17aa1f7b7bd4"),
"source_site" : {
"event_hosts" : [
{
"member_id" : 12300113,
"member_name" : "Sal Corthen"
},
{
"member_id" : 139930702,
"member_name" : "Erin Morgen"
}
]
}
}
What am I doing wrong?
If you want to match a member_id directly, you can use:
db.events.find({"source_site.event_hosts.member_id":12300113})
or using $in:
db.events.find({"source_site.event_hosts.member_id":{$in:[12300113]}})
or using $elemMatch:
db.events.find({"source_site.event_hosts":{$elemMatch:{"member_id":12300113}}})

Get "data from collection b not in collection a" in a MongoDB shell query

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other collection.
Example:
> db.Test.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 })
> db.Test.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 })
> db.Test.insert({ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 })
> db.Test.insert({ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 })
> db.Test.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 }
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
> db.Test2.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 });
> db.Test2.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 });
> db.Test2.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 }
Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2:
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
I've tried various combinations of $not, $ne, $or, $in but just can't get the right combination and syntax. Also, I don't mind if db.Test2.find({}, {"_id": 1}) is executed first, saved to some variable, which is then used in a second query (though I can't get that to work either).
Update: Zachary's answer pointing to the $nin answered the key part of the question. For example, this works:
> db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}})
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
But (and acknowledging this is not scalable but trying to it anyway because its not an issue in this situation) I still can't combine the two queries together in the shell. This is the closest I can get, which is obviously less than ideal:
vals = db.Test2.find({}, {"_id": 1}).toArray()
db.Test.find({"_id": {"$nin": [ObjectId(vals[0]._id), ObjectId(vals[1]._id)]}})
Is there a way to return just the values in the find command so that vals can be used directly as the array input to $nin?
In mongo 3.2 the following code seems to work
db.collectionb.aggregate([
{
$lookup: {
from: "collectiona",
localField: "collectionb_fk",
foreignField: "collectiona_fk",
as: "matched_docs"
}
},
{
$match: {
"matched_docs": { $eq: [] }
}
}
]);
based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array example
Answering your follow-up. I'd use map().
Given this:
> b1 = {i: 1}
> db.b.save(b1)
> db.b.save({i: 2})
> db.a.save({_id: b1._id})
All you need is:
> vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
> db.b.find({_id: {$nin: vals}})
which returns
{ "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }
You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queries for all of the MongoDB operators.
Your end query, using the example you gave would look something like:
db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"),
ObjectId("4f08a766306b428fb9d8bb2f")]}})`
Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.
Updated for second part:
The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.
I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.
var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])
while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }
...process the second collection that has no mark
db.secondCollection.find({"firstCollectionField":{$exists:false}})