Casbah (Scala) Generating a query statement with nested conditions ($OR $AND) - mongodb

Given items: [{ prop1 : "a", prop2 : "b" , prop3, "c",....},....]
and patterns:[{ prop1 : "a", prop2 : "Any"},...]
I want to create a query to find all patterns that match the given items.
The resulting query is of form:
((A or B or C) AND (D or E or F) AND (G or H or J))
or
((A or B or C) AND (D or E or F) AND (G or H or J))
or
((A or B or C) AND (D or E or F) AND (G or H or J))
....
I've tried to build a DSL-form , but I get an ambiguous implicit error on the init:
Can this notation be used? Or how could I implement this with DBObject.Builder or MongoDbObjects?
Thanks,
Eli
import com.mongodb.casbah.query.Imports._
/* test data */
val thing1 = Map[String,String]("thing_type" -> "PC", "os"-> "Windows", "vendor"-> "lenova")
val thing2 = Map[String,String]("thing_type" -> "Tablet", "os"-> "iOS", "vendor"-> "Apple")
"
val things_list = List(thing1, thing2)
/* end test data */
val atts_for_search = List("thing_type", "os", "vendor" )
var pattern_query = $or() // *causes a compilation error
things_list.foreach ( thing => {
var att_and_list = $and() // *causes a compilation error
atts_for_search.foreach ( att => {
att_and_list ++= $or(att $eq thing(att),att $exists false,att $eq "Any")
}) // foreach attribute
pattern_query ++= att_and_list
})

The $or and $and parts of the dsl require a traversable and can't initiated without one - hence the compile error.
If you can just delay the creation of the $and and $or parts of the query until you have built the traversable it works:
import com.mongodb.casbah.query.Imports._
/* test data */
val thing1 = Map[String,String]("thing_type" -> "PC", "os"-> "Windows", "vendor"-> "lenova")
val thing2 = Map[String,String]("thing_type" -> "Tablet", "os"-> "iOS", "vendor"-> "Apple")
val things_list = List(thing1, thing2)
/* end test data */
val atts_for_search = List("thing_type", "os", "vendor" )
val ors = scala.collection.mutable.ListBuffer[DBObject]()
things_list.foreach ( thing => {
var ands = scala.collection.mutable.ListBuffer[DBObject]()
atts_for_search.foreach ( att => {
ands += $or(att $eq thing(att),att $exists false,att $eq "Any")
}) // foreach attribute
ands += $and(ands)
})
val pattern_query = $or(ors)
This returns the following output:
{"$or": [{ "$and": [{ "$or": [ { "thing_type" : "PC"} , { "thing_type" : { "$exists" : false}} , { "thing_type" : "Any"}]} ,
{ "$or": [ { "os" : "Windows"} , { "os" : { "$exists" : false}} , { "os" : "Any"}]} ,
{ "$or": [ { "vendor" : "lenova"} , { "vendor" : { "$exists" : false}} , { "vendor" : "Any"}]}]} ,
{ "$and": [{ "$or": [ { "thing_type" : "Tablet"} , { "thing_type" : { "$exists" : false}} , { "thing_type" : "Any"}]} ,
{ "$or": [ { "os" : "iOS"} , { "os" : { "$exists" : false}} , { "os" : "Any"}]} ,
{ "$or": [ { "vendor" : "Apple"} , { "vendor" : { "$exists" : false}} , { "vendor" : "Any"}]}]}]}

Related

elastic4s geodistance sort query syntax

I am using elastic4s version 1.6.2 and need to write a query which searches for a given geo location, sorts results by distance and also returns the distance.
I can do this using get request in curl but struggling to find the correct syntax and example got geolocation sort.
case class Store(id: Int, name: String, number: Int, building: Option[String] = None, street: String, postCode: String,county: String, geoLocation: GeoLocation)
case class GeoLocation(lat: Double, lon: Double)
val createMappings = client.execute {
create index "stores" mappings (
"store" as(
"store" typed StringType,
"geoLocation" typed GeoPointType
)
)
}
def searchByGeoLocation(geoLocation: GeoLocation) = {
client.execute {
search in "stores" -> "store" postFilter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(2, KILOMETERS)
}
}
}
Does someone knows how to add the sort by distance from a geo location and get the distance
Following curl command works as expected
curl -XGET 'http://localhost:9200/stores/store/_search?pretty=true' -d '
{
"sort" : [
{
"_geo_distance" : {
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
},
"order" : "asc",
"unit" : "km"
}
}
],
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "20km",
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
}
}
}
}
}
}'
Not an expert in elasti4s but this query should be equivalent to your curl command:
val geoLocation = GeoLocation(52.0333793839746, -0.768937531935448)
search in "stores" -> "store" query {
filteredQuery filter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(20, KILOMETERS)
}
} sort {
geoSort("geoLocation") point (geoLocation.lat, geoLocation.lon) order SortOrder.ASC
}
it prints the following query:
{
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"geo_distance" : {
"geoLocation" : [ -0.768937531935448, 52.0333793839746 ],
"distance" : "20.0km"
}
}
}
},
"sort" : [ {
"_geo_distance" : {
"geoLocation" : [ {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
} ]
}
} ]
}
asc is the default value of sorting so you can remove order SortOrder.ASC. Just wanted to be explicit in this example.

How to get last array element while Projection mongodb

I have following document structure (This is dummy document for understanding purpose)
{
"id" : "p1245",
"Info" : [
{
"cloth_name" : "ABC",
"cloth_type" : "C"
},
{
"cloth_name" : "PQR",
"cloth_type" : "J"
},
{
"cloth_name" : "SAM",
"cloth_type" : "T"
}
]
},
{
"id" : "p124576",
"Info" : [
{
"cloth_name" : "HTC",
"cloth_type" : "C"
}
]
}
From these document I want to project the "cloth_type", so I tried following java code
DBObject fields = new BasicDBObject("id", 1);
fields.put("ClothType","$Info.cloth_type");
DBObject project = new BasicDBObject("$project", fields);
List<DBObject> pipeline = Arrays.asList(project);
AggregationOptions aggregationOptions = AggregationOptions.builder().batchSize(100).outputMode(AggregationOptions.OutputMode.CURSOR).allowDiskUse(true).build();
Cursor cursor = collection.aggregate(pipeline, aggregationOptions);
while (cursor.hasNext())
{
System.out.println(cursor.next());
}
(I don't want to use "$unwind" here)
and get following output:
{ "id" : "p1245" , "ClothType" : [ "C" , "J" , "T"]}
{ "id" : "p124576" , "ClothType" : [ "C"]}
If there are multiple "cloth_type" for single id, then I want only the last cloth_type from this array.
I want something like, e.g. if there is array of "ClothType" [ "C", "J", "T"] then I want to project only [ "T"] i.e last element of array.
Is there any ways to achive this without using "$unwind".

nested pull with scala mongodb casbah

lets say i have a simple object
{
"id":"xyz"
"answers" : [{
"name" : "Yes",
}, {
"name" : "No",
}]
}
I want to remove answer Yes from the array
I'm trying something like this without much luck:
import com.mongodb.casbah.MongoCollection
val searchObject = MongoDBObject("id"->"xyz");
getCollection().update(searchObject,$pull( "answers" -> ( "name" -> "Yes")));
You need to declare ("name" -> "Yes") as a MongoDBObject because look at:
scala> $pull( "answers" -> ( "name" -> "Yes"))
res10: com.mongodb.casbah.query.Imports.DBObject = { "$pull" : { "answers" : [ "name" , "Yes"]}}
Which is not what you want, you want to pull a subdocument:
scala> $pull ( "answers" -> MongoDBObject("name" -> "Yes") )
res11: com.mongodb.casbah.query.Imports.DBObject = { "$pull" : { "answers" : { "name" : "Yes"}}}

MongoDB find where key equals string from array

I am trying to find in a collection all of the documents that have the given key equal to one of the strings in an array.
Heres an example of the collection.
{
roomId = 'room1',
name = 'first'
},
{
roomId = 'room2',
name = 'second'
},
{
roomId = 'room3',
name = 'third'
}
And heres an example of the array to look through.
[ 'room2', 'room3' ]
What i thought would work is...
collection.find({ roomId : { $in : [ 'room2', 'room3' ]}}, function( e, r )
{
// r should return the second and third room
});
How can i achieve this?
One way this could be solve would be to do a for loop...
var roomIds = [ 'room2', 'room3' ];
for ( var i=0; i < roomIds.length; i++ )
{
collection.find({ id : roomIds[ i ]})
}
But this is not ideal....
What you posted should work - no looping required. The $in operator does the job:
> db.Room.insert({ "_id" : 1, name: 'first'});
> db.Room.insert({ "_id" : 2, name: 'second'});
> db.Room.insert({ "_id" : 3, name: 'third'});
> // test w/ int
> db.Room.find({ "_id" : { $in : [1, 2] }});
{ "_id" : 1, "name" : "first" }
{ "_id" : 2, "name" : "second" }
> // test w/ strings
> db.Room.find({ "name" : { $in : ['first', 'third'] }});
{ "_id" : 1, "name" : "first" }
{ "_id" : 3, "name" : "third" }
Isn't that what you expect?
Tested w/ MongoDB 2.1.1

MongoDB group by Functionalities

In MySQL
select a,b,count(1) as cnt from list group by a, b having cnt > 2;
I have to execute the group by function using having condition in mongodb.
But i am getting following error. Please share your input.
In MongoDB
> res = db.list.group({key:{a:true,b:true},
... reduce: function(obj,prev) {prev.count++;},
... initial: {count:0}}).limit(10);
Sat Jan 7 16:36:30 uncaught exception: group command failed: {
"errmsg" : "exception: group() can't handle more than 20000 unique keys",
"code" : 10043,
"ok" : 0
Once it will be executed, we need to run the following file on next.
for (i in res) {if (res[i].count>2) printjson(res[i])};
Regards,
Kumaran
MongoDB group by is very limited in most cases, for instance
- the result set must be lesser than 10000 keys.
- it will not work in sharded environments
So its better to use map reduce. so the query would be like this
map = function() { emit({a:true,b:true},{count:1}); }
reduce = function(k, values) {
var result = {count: 0};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
and then
db.list.mapReduce(map,reduce,{out: { inline : 1}})
Its a untested version. let me know if it works
EDIT:
The earlier map function was faulty. Thats why you are not getting the results. it should have been
map = function () {
emit({a:this.a, b:this.b}, {count:1});
}
Test data:
> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:2,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:7,b:2})
> db.multi_group.mapReduce(map,reduce,{out: { inline : 1}})
{
"results" : [
{
"_id" : {
"a" : 1,
"b" : 2
},
"value" : {
"count" : 2
}
},
{
"_id" : {
"a" : 2,
"b" : 2
},
"value" : {
"count" : 1
}
},
{
"_id" : {
"a" : 3,
"b" : 2
},
"value" : {
"count" : 2
}
},
{
"_id" : {
"a" : 7,
"b" : 2
},
"value" : {
"count" : 1
}
}
],
"timeMillis" : 1,
"counts" : {
"input" : 6,
"emit" : 6,
"reduce" : 2,
"output" : 4
},
"ok" : 1,
}
EDIT2:
Complete solution including applying having count >= 2
map = function () {
emit({a:this.a, b:this.b}, {count:1,_id:this._id});
}
reduce = function(k, values) {
var result = {count: 0,_id:[]};
values.forEach(function(value) {
result.count += value.count;
result._id.push(value._id);
});
return result;
}
>db.multi_group.mapReduce(map,reduce,{out: { replace : "multi_result"}})
> db.multi_result.find({'value.count' : {$gte : 2}})
{ "_id" : { "a" : 1, "b" : 2 }, "value" : { "_id" : [ ObjectId("4f0adf2884025491024f994c"), ObjectId("4f0adf3284025491024f994f") ], "count" : 2 } }
{ "_id" : { "a" : 3, "b" : 2 }, "value" : { "_id" : [ ObjectId("4f0adf3084025491024f994e"), ObjectId("4f0adf3584025491024f9950") ], "count" : 2 } }
You should use MapReduce instead. Group has its limitations.
In future you'll be able to use the Aggregation Framework. But for now, use map/reduce.
Depends on the number of your groups, you might find a simpler and faster solution than group or MapReduce by using distinct:
var res = [];
for( var cur_a = db.list.distinct('a'); cur_a.hasNext(); ) {
var a = cur_a.next();
for( var cur_b = db.list.distinct('b'); cur_b.hasNext(); ) {
var b = cur_b.next();
var cnt = db.list.count({'a':a,'b':b})
if (cnt > 2)
res.push({ 'a': a, 'b' : b 'cnt': cnt}
}
}
It will be faster if you have indexes on a and b
db.list.ensureIndex({'a':1,'b':1})