mapReduce inline results with java mongodb driver 3.2 - mongodb

How to have inline results from a mapReducet with the mongodb java driver 3.2?
with driver version 2.x I was doing:
DBColleciont coll = client.getDB(dbName).getCollection(collName);
coll.mapReduce(map, reduce, null, OutputType.INLINE, query);
the new 3.x driver has two mapReduce() methods returning MapReduceIterable which misses a method to specify the INLINE output mode.
MongoCollection<Documetn> coll = client.getDatabase(dbName).getCollection(collName)
coll
.mapReduce(map, reduce).
.filter(query);

You can create the map-reduce command manually:
String mapFunction = ...
String reduceFunction = ...
BsonDocument command = new BsonDocument();
BsonJavaScript map = new BsonJavaScript(mapFunction);
BsonJavaScript red = new BsonJavaScript(reduceFunction);
BsonDocument query = new BsonDocument("someidentifier", new BsonString("somevalue"));
command.append("mapreduce", new BsonString("mySourceCollection"));
command.append("query", query);
command.append("map", map);
command.append("reduce", red);
command.append("out", new BsonDocument("inline", new BsonBoolean(true)));
Document result = mongoClient.getDatabase(database).runCommand(command);
I think this is extremely ugly, but it is the only working solution I found so far using 3.2. (... and would be very interested in a better variant, too... ;-))

I think I found it...
I had a deeper look into mongodb's Java driver source and it seems that the INLINE output feature is implicitly accessible:
The class MapReduceIterableImpl<TDocument, TResult>(MapReduceIterableImpl.java), which is the default implementation of the Interface return type of mapReduce(),
holds a private boolean inline with initial value true.
The only place where this can ever be switched to false is the method collectionName(final String collectionName) for which the description is as follows:
Sets the collectionName for the output of the MapReduce
The default action is replace the collection if it exists, to change this use action(com.mongodb.client.model.MapReduceAction).
If you never call this method on the object instance after mapReduce(), it will remain true as initialized...meaning: if there is no output collection, it must be inline.
Later on, when you access your result with iterator(), first(), forEach(...) etc internally the execute() method gets called which hast the magic if condition:
if (inline) {
MapReduceWithInlineResultsOperation<TResult> operation =
new MapReduceWithInlineResultsOperation<TResult>(namespace,
new BsonJavaScript(mapFunction),
new BsonJavaScript(reduceFunction),
codecRegistry.get(resultClass))
.filter(toBsonDocument(filter))
.limit(limit)
.maxTime(maxTimeMS, MILLISECONDS)
.jsMode(jsMode)
.scope(toBsonDocument(scope))
.sort(toBsonDocument(sort))
.verbose(verbose)
.readConcern(readConcern);
....
} else {
MapReduceToCollectionOperation operation =
new MapReduceToCollectionOperation(namespace, new BsonJavaScript(mapFunction), new BsonJavaScript(reduceFunction),
collectionName)
.filter(toBsonDocument(filter))
.limit(limit)
.maxTime(maxTimeMS, MILLISECONDS)
.jsMode(jsMode)
.scope(toBsonDocument(scope))
.sort(toBsonDocument(sort))
.verbose(verbose)
.action(action.getValue())
.nonAtomic(nonAtomic)
.sharded(sharded)
.databaseName(databaseName)
.bypassDocumentValidation(bypassDocumentValidation);
...so it is instanciating MapReduceWithInlineResultsOperation when collectionName() has not been called.
I had no chance to test it because my NetBeans hates me at the moment, but I think it is pretty clear.
What do you think, did I miss something?
Would be glad if I could help you shift the code to API 3.x, great project!

Related

Applying the same Analyzer to Queries and Fields

I am trying to build a basic search for my API backend. Users pass arbitrary queries and the backend is supposed to return results (obviously). I would prefer a solution that works with a local index as well as Elasticsearch.
On my entity I defined an analyzer like this:
#AnalyzerDef(name = "ngram",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class ),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = {
#Parameter(name = "minGramSize", value = "2"),
#Parameter(name = "maxGramSize", value = "3") } )
}
)
For the query, I tried the following:
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(this.entityManager);
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");
QueryParser queryParser = new MultiFieldQueryParser(ALL_FIELDS, analyzer);
queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
org.apache.lucene.search.Query query = queryParser.parse(queryString);
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, MyEntity.class);
List<MyEntity> result = persistenceQuery.getResultList();
As far as I understand, I need to provide an analyzer for the Query so that the search query are "ngram-tokenized" and a match can be found. Before, I used SimpleAnalyzer and as a result, the search only matched full words which - I think - backs my theory (Sorry, I am still learning this).
The above code gives me a NullPointerException:
java.lang.NullPointerException: null
at org.hibernate.search.engine.impl.ImmutableSearchFactory.getAnalyzer(ImmutableSearchFactory.java:370) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.engine.impl.MutableSearchFactory.getAnalyzer(MutableSearchFactory.java:203) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.impl.SearchFactoryImpl.getAnalyzer(SearchFactoryImpl.java:50) ~[hibernate-search-orm-5.11.1.Final.jar:5.11.1.Final]
in the line
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");
You cannot retrieve the analyzer from Hibernate Search when using the Elasticsearch integration, because in that case there is no analyzer locally: the analyzer only exists remotely, in the Elasticsearch cluster.
If you only need a subset of the query syntax, try out the "simple query string" query: it's a query that can be built using the DSL (so it will work the same with Lucene and Elasticsearch) and that provides the most common features (boolean queries, fuzziness, phrases, ...). For example:
Query luceneQuery = queryBuilder.simpleQueryString()
.onFields("name", "history", "description")
.matching("war + (peace | harmony)")
.createQuery();
The syntax is a bit different, but only because it's aiming at end users and tries to be simpler.
EDIT: If simple query strings are not an option, you can create an analyzer manually: this should work even when using the Elasticsearch integration.
org.apache.lucene.analysis.custom.CustomAnalyzer#builder() should be a good starting point. There are several examples in the javadoc of that class.
Make sure you only create the analyzer once and store it somewhere, e.g. in a static constant: creating an analyzer may be costly.

JMeter MongoDB Where Clause NOT Working with $gt $lt

I am getting below error, when I try to use greater than or less than in mongoDB where clause: Any idea how to avoid this?
Response message: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: gt for class: Script117
Below is my Full Groovy Code till now:
Everything works except the GREATER THAN (or lt) query filter:
import com.mongodb.DB;
import org.apache.jmeter.protocol.mongodb.config.MongoDBHolder;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import com.mongodb.DBCollection;
import com.mongodb.*;
//======================================================================
MongoClient mongoClient = new MongoClient(new ServerAddress("${serverIP}", ${serverPort}));
DB db = mongoClient.getDB("${mongodb}");
DBCollection coll = db.getCollection("${collectionName}");
StringBuilder rs = new StringBuilder();
rs.append("Collection: \n");
BasicDBObject allQuery = new BasicDBObject();
BasicDBObject fields = new BasicDBObject();
fields.put("name", 1); //projected fields
fields.put("age", 1); //projected fields
fields.put("eyeColor", 1); //projected fields
fields.put("balance", 1); //projected fields
BasicDBObject whereQuery = new BasicDBObject();
//whereQuery.put("gender", "female");
whereQuery.put("age", new BasicDBObject("$gt", 30));
DBCursor cursor = coll.find(whereQuery, fields);
while( cursor.hasNext() ) {
DBObject obj = cursor.next();
rs.append(obj.toString());
rs.append("\n");
}
rs.toString();
//String variable at the LAST LINE of this Groovy script will be displayed in Results tab, even if no Display methods are invoked !!!
The error message: groovy.lang.MissingPropertyException, is thrown when:
An exception occurred while a dynamic property dispatch fails with an
unknown property.
Note: the Missing*Exception classes are named for consistency and to avoid conflicts with JDK exceptions of the same name.
The attempt to insert the "age" as key and the new BasicDBObject("$gt",30) as a value, cause to the exception you get, because the JVM looks for a property calls gt which is not exists.
The put method means: Sets a name/value pair in this object.
In the code you have you intend to append the search criteria, and not to put it.
Related to the code you have, I would recommend replacing the using of put method with append.
This is your code:
BasicDBObject whereQuery = new BasicDBObject();
//whereQuery.put("gender", "female");
whereQuery.put("age", new BasicDBObject("$gt", 30));
If you replace the:
whereQuery.put("age", new BasicDBObject("$gt", 30));
With:
whereQuery.append("age", new BasicDBObject("$gt", 30));
Another suggestion you can use is the Criteria.where syntax,
which is much more readable.
def query = Query.query(Criteria.where('gender').is('female').and('age')
.gt(30))
collection.find(query, fields)
Another small issue, because your code is written in Groovy you should not finish each line with ;
Have you tried using single quotes so groovy doesn't think you're doing string templating?
whereQuery.put("age", new BasicDBObject('$gt', 30))
// ^ ^ here
You should be avoiding:
Referring JMeter Variables as ${var}
Using $var structures
as they have special meaning in Groovy, so:
Replace these ${serverIP} with vars.get("serverIP"), ${mongodb} with vars.get("mongodb"), etc.
Replace double quotes with single quotes, especially in the line where you use $gt function
(Optional) semicolons at the lines end are not necessary
References:
Groovy Differences with Java
How to Load Test MongoDB with JMeter

Having Difficulty Using MongoDb C# Driver's Sample()

I am trying to get some random items from the database using the Sample() method. I updated to the latest version of the driver and copied the using statements from the linked example. However, something isn't working, and I am hoping it's some simple mistake on my part. All the relevant information is in the image below:
Edit:
Greg, I read the aggregation docs here and the raw db method doc here, but I still don't get it. Last two lines are my attempts, I have never used aggregation before:
var mongoCollection = GetMongoCollection<BsonDocument>(collectionName);
long[] locationIds = new long[] { 1, 2, 3, 4, 5 };
var locationsArray = new BsonArray();
foreach (long l in locationIds) { locationsArray.Add(l); };
FilterDefinition<BsonDocument> sampleFilter = Builders<BsonDocument>.Filter.In("LocationId", locationsArray);
var findSomething = mongoCollection.Find(sampleFilter); //this works, the two attempts below don't.
var aggregateSomething = mongoCollection.Aggregate(sampleFilter).Sample(25);
var aggregateSomething2 = mongoCollection.Aggregate().Match(sampleFilter).Sample(25);
Sample is only available from an aggregation. You need to start with Aggregate, not Find. I believe it's also available in Linq.
UPDATE:
Looks like we don't have a method for it specifically. However, you can use the AppendStage method.
mongoCollection.Aggregate(sampleFilter)
.AppendStage<BsonDocument>("{ $sample: { size: 3 } }");

What would be the opposite to hasFields?

I'm using logical deletes by adding a field deletedAt. If I want to get only the deleted documents it would be something like r.table('clients').hasFields('deletedAt'). My method has a withDeletes parameter which determines if deleted documents are excluded or not.
Finally, people at the #rethinkdb IRC channel suggested me to use the filter method and that did the trick:
query = adapter.table(table).filter(filters)
if withDeleted
query = adapter.filter (doc) ->
return doc.hasFields 'deletedAt'
else
query = adapter.filter (doc) ->
return doc.hasFields('deletedAt').not()
query.run connection, (err, results) ->
...
My question is why do I have to use filter and not something like:
query = adapter.table(table).filter(filters)
query = if withDeleted then query.hasFields 'deletedAt' else query.hasFields('deletedAt').not()
...
or something like that.
Thanks in advance.
The hasFields function can be called on both objects and sequences, but not cannot.
This query:
query.hasFields('deletedAt')
Behaves the same as this one (on sequences of objects):
query.filter((doc) -> return doc.hasFields('deletedAt'))
However, this query:
query.hasFields('deletedAt').not()
Behaves like this:
query.filter((doc) -> return doc.hasFields('deletedAt')).not()
But that doesn't make sense. you want the not to be inside the filter, not after it. Like this:
query.filter((doc) -> return doc.hasFields('deletedAt').not())
One nice that about RethinkDB is that because of the way queries are built up in host language it's very easy to define new fluent syntax by just defining functions in your language. For example if you wanted to have a lacksFields function you could define it in Python (sorry I don't really know coffeescript) like so:
def lacks_fields(stream, *args):
res = stream
for arg in args:
res = res.filter(lambda x: ~x.has_fields(arg))
return res
Then you can use a nice fluent syntax like:
lacks_fields(stream, "foo", "bar", "buzz")

Lucene.Net - weird behaviour in different servers

I was writing a search for one of our sites: (SITE A)
BooleanQuery booleanQuery = new BooleanQuery();
foreach (var field in fields)
{
QueryParser qp = new QueryParser(field, new StandardAnalyzer());
Query query = qp.Parse(search.ToLower() + "*");
if (field.Contains("Title")) { query.SetBoost((float)1.8); }
booleanQuery.Add(query, BooleanClause.Occur.SHOULD);
}
// CODE DIFFERENCE IS HERE
Query query2 = new TermQuery(new Term("StateProperties.IsActive", "True"));
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
// END CODE DIFFERENCE
Lucene.Net.Search.TopScoreDocCollector collector = Lucene.Net.Search.TopScoreDocCollector.create(21, true);
searcher.Search(booleanQuery, collector);
hits = collector.TopDocs().scoreDocs;
this was working as expected.
since we own a few sites, and they use the same skeleton,
i uploaded the search to another site ( SITE B )
but the search stopped returning results.
after playing a round a bit with the code, i managed to make it work like so: (showing only the rewriten lines of code)
QueryParser qp2 = new QueryParser("StateProperties.IsActive", new StandardAnalyzer());
Query query2 = qp2.Parse("True");
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
anyone knows why this is happening ?
i have checked the Lucene dll version, and its the same version in both sites (2.9.2.2)
is the code i have written in SITE A wrong ? is SITE B code wrong ?
is this my fault at all ? can production server infulance something like this ?
Doesn't they have individual indexes on disk? If they have been indexed differently, they would also return different results. One thing that comes to mind is if there is some sort of case sensitivity that matters, becayse a TermQuery will look for an EXACT match, where as the parser will try to tokenize/filter the search term according to the analyzer (and probably search for "true" instead of "True".