Returning distinct values of nested objects with Spring Data from CosmosDB - spring-data

I've seen very basic CRUD operations supported by the Azure CosmosDB Spring Data project (see https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/cosmos/azure-spring-data-cosmos-core), but no examples of advanced usage.
Given documents like this:
{
"someKey": "someValue",
"child": {
"someArray": ["one", "two"]
}
}
Is there a way to get the distinct values of child.someArray? This is possible via the SQL Api and AsyncDocumentClient, but looking at the code, I suspect this is currently impossible to do via the Spring Data API. It also does not seem possible to specify custom queries using an annotation on a repository class method.
I tried something like this (simplified example), to no avail:
class Document{ Child child; }
class Child { String[] someArray; }
MyRepository {
Root findDistinctChildSomeArray();
String[] findDistinctChildSomeArray();
}

Related

Mongo db Collection find returns from the first entry not from last from client side

I am using mongodb and i am querying the database with some conditions which are working fine but the results are coming from the first entry to last where as i want to query from the last added entry to the collection in database
TaggedMessages.find({taggedList:{$elemMatch:{tagName:tagObj.tagValue}}}).fetch()
Meteor uses a custom wrapped version of Mongo.Collection and Mongo.Cursor in order to support reactivity out of the box. It also abstracts the Mongo query API to make it easier to work with.
This is why the native way of accessing elements from the end is not working here.
On the server
In order to use $natural correctly with Meteor you can to use the hint property as option (see the last property in the documentation) on the server:
const selector = {
taggedList:{ $elemMatch:{ tagName:tagObj.tagValue } }
}
const options = {
hint: { $natural : -1 }
}
TaggedMessages.find(selector, options).fetch()
Sidenote: If you ever need to access the "native" Mongo driver, you need to use rawCollection
On the client
On the client you have no real access to the Mongo Driver but to a seemingly similar API (called the minimongo package). There you won't have $natural available (maybe in the future), so you need to use sort with a descenging order:
const selector = {
taggedList:{ $elemMatch:{ tagName:tagObj.tagValue } }
}
const options = {
sort: { createdAt: -1 }
}
TaggedMessages.find(selector, options).fetch()

CosmoDb with Robomongo cant see document id's?

Can anyone tell me why when I use DataExplorer for CosmoDb DB I get the following:
{
"id": "d502b51a-e70a-40f1-9285-3861880b8d90",
"Version": 1,
...
}
But when I use Robomongo I get:
{
"Version" : 1,
...
}
minus the id?
Thanks
I tried to repro your scenario but it all worked correctly.
The Mongo document in Portal Data Explorer:
The Mongo document in Robo 3T:
They both have the id property.
Are you applying Projections on Robomongo / Robo 3T?
At this moment cosmodb works separately SQL API and Mongo API, each one has different implementation, SQL API use JSON and Mongo use BSON, you need to be clear this while you are creating the document.
If you create the document with a BSON-based tool like Robo3t for example, you are going to get something like this:
{
"_id": {
"$oid": "5be0d98b9cdcce3c6ce0f6b8"
},
"name": "Name",
"id": "5be0d98b9cdcce3c6ce0f6b8",
...
}
Instead, if you create your document with JSON-based like Data Explorer, you are going to get this:
{
"name": "Name",
"id": "6c5c05b4-dfce-32a5-0779-e30821e6c510",
...
}
As you can see, BSON-based needs that _id and inside $oid be implemented to works right, while JSON-based only id is required. So, you need to add the properties while you save the document (see below) or open it with the right tool, as Matias Quaranta recommend, use Azure Storage Explorer or even Data Explorer to get both protocols properly.
Also, if you use a system to create the document and you want to use BSON format, You need to add the $oid, for example in core net is something like this:
public bool TryGetMemberSerializationInfo(string memberName, out BsonSerializationInfo serializationInfo)
{
switch (memberName)
{
case "Id":
serializationInfo = new BsonSerializationInfo("_id", new ObjectIdSerializer(), typeof(ObjectId));
return true;
case "Name":
serializationInfo = new BsonSerializationInfo("name", new StringSerializer(), typeof(string));
return true;
default:
serializationInfo = null;
return false;
}
}

Use allowDiskUse in criteria query with Grails and the MongoDB plugin?

In order to iterate over all the documents in a MongoDB (2.6.9) collection using Grails (2.5.0) and the MongoDB Plugin (3.0.2) I created a forEach like this:
class MyObjectService {
def forEach(Closure func) {
def criteria = MyObject.createCriteria()
def ids = criteria.list { projections { id() } }
ids.each { func(MyObject.get(it)) }
}
}
Then I do this:
class AnalysisService{
def myObjectService
#Async
def analyze(){
MyObject.withStatelessSession {
myObjectService.forEach { myObject ->
doSomethingAwesome(myObject)
}
}
}
}
This works great...until I hit a collection that is large (>500K documents) at which point a CommandFailureException is thrown because the size of the aggregation result is greater than 16MB.
Caused by CommandFailureException: { "serverUsed" : "foo.bar.com:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0}
In reading about this, I think that one way to handle this situation is to use the option allowDiskUse in the aggregation function that runs on the MongoDB side so that the 16MB memory limit won't apply and I can get a larger aggregation result.
How can I pass this option to my criteria query? I've been reading the docs and the Javadoc for the Grails MongoDB plugin, but I can't seem to find it. Is there is another way to approach the generic problem (iterate over all members of a large collection of domain objects)?
This is not possible with the current implementation of MongoDB Grails plugin. https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L957
If you look at the above line, then you will see that the default options are being used for building AggregationOptions instance so there is no method to provide an option.
But there is another hackish way to do it using the Groovy's metaclass. Let's do it..:-)
Store the original method reference of builder() method before writing criteria in your service:
MetaMethod originalMethod = AggregationOptions.metaClass.static.getMetaMethod("builder", [] as Class[])
Then, replace the builder method to provide your implementation.
AggregationOptions.metaClass.static.builder = { ->
def builderInstance = new AggregationOptions.Builder()
builderInstance.allowDiskUse(true) // solution to your problem
return builderInstance
}
Now, your service method will be called with criteria query and should not results in the aggregation error you are getting since we have not set the allowDiskUse property to true.
Now, reset the original method back so that it should not affect any other call (optional).
AggregationOptions.metaClass.static.addMetaMethod(originalMethod)
Hope this helps!
Apart from this, why do you pulling all IDs in forEach method and then re getting the instance using get() method? You are wasting the database queries which will impact the performance. Also, if you follow this, you don't have to do the above changes.
An example with the same: (UPDATED)
class MyObjectService {
void forEach(Closure func) {
List<MyObject> instanceList = MyObject.createCriteria().list {
// Your criteria code
eq("status", "ACTIVE") // an example
}
// Don't do any of this
// println(instanceList)
// println(instanceList.size())
// *** explained below
instanceList.each { myObjectInstance ->
func(myObjectInstance)
}
}
}
(I'm not adding the code of AnalysisService since there is no change)
*** The main point is here at this point. So whenever you write any criteria in domain class (without projection and in mongo), after executing the criteria code, Grails/gmongo will not immediately fetch the records from the database unless you call some methods like toString(), 'size()ordump()` on them.
Now when you apply each on that instance list, you will not actually loading all instances into memory but instead you are iterating over Mongo Cursor behind the scene and in MongoDB, cursor uses batches to pull record from database which is extremely memory safe. So you are safe to directly call each on your criteria result which will not blow up the JVM unless you called any of the methods which triggers loading all records from the database.
You can confirm this behaviour even in the code: https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L1775
Whenever you write any criteria without projection, you will get an instance of MongoResultList and there is a method named initializeFully() which is being called on toString() and other methods. But, you can see the MongoResultList is implementing iterator which is in turn calling MongoDB cursor method for iterating over the large collection which is again, memory safe.
Hope this helps!

Grails projections ignoring sort order with MongoDB

How do you order the results of projections in a Grails criteria when using MongoDB?
It seems that sorting is ignored by MongoDB. The code below correctly returns a list of sorted book titles when run with the Grails default in-memory HSQLDB database. Switching over to MongoDB causes the sorting to be ignored.
BookController.groovy
class BookController {
def library = [
[author: "Jan", title: "HTML5"],
[author: "Lee", title: "CSS3"],
[author: "Sue", title: "JavaScript"]
]
def titles() {
library.each { if (!Book.findByTitle(it.title)) new Book(it).save() }
def ids = Book.createCriteria().list() {
projections { id() }
order "title"
}
def titles = ids.collect { Book.get(it).title }
render titles as JSON
}
}
Result with default DB (correct):
["CSS3","HTML5","JavaScript"]
Result with MongoDB (wrong):
["HTML5","CSS3","JavaScript"]
Note that the above book example is just some trivial code to illustrate the problem. The real goal is to generate a list of domain IDs sorted by a field of the domain so that the domain can be iterated over in the desired order.
The actual domain I'm dealing with is too large to fit in memory. In other words, this would crash the application: Book.list().title.sort()
Below is additional background information.
Book.groovy
class Book {
String title
String author
static mapWith = "mongo"
}
BuildConfig.groovy
...
compile ":mongodb:1.3.1"
...
DataSource.groovy
...
grails {
mongo {
host = "localhost"
port = 27017
databaseName = "book-store"
}
}
The projections support has been rewritten to use the MongoDb aggregation framework in 3.0 of the plugin. So the example where should work in 3.0 with or without ordering. See https://jira.grails.org/browse/GPMONGODB-305
Relevant commit https://github.com/grails/grails-data-mapping/commit/1d1155d8a9e29a25413effce081c21a36629137d

MongoDB DBRef handling inside GWT's RequestFactory function call

I have question related to DBRef of MongoDB. Imagine this scenario:
Group{
...
"members" : [
{
"$ref" : "User",
"$id" : ObjectId("505857a4e4b5541060863061")
},
{
"$ref" : "User",
"$id" : ObjectId("50586411e4b0b31012363208")
},
{
"$ref" : "User",
"$id" : ObjectId("50574b9ce4b0b3106023305c")
},
]
...
}
So given group document has 3 user DBRef. Where in java class of Group, members is tagged with morphia as #Reference:
public class Group {
...
#Reference
List<User> members;
...
}
Question: When calling RequestFactory function getGroup().with("members") will RequestFactory get all members in ONLY 1 DB access ?
Or will Request factory make 3 DB access for each DBRef in Group document in the scenario given above?
Thank you very much in advance.
RequestFactory itself doesn't access the DB. What it'll do here is:
call getMembers(), as it was requested by the client through .with("members")
for each entity proxy seen (either in the request or in the response), call its locator's isLive method, or if has no Locator, call the entity's findXxx with its getId() (and check whether null is returned).
The first step depends entirely on Morphia's implementation:
because you didn't set lazy = true on your #Reference, it won't matter whether RequestFactory calls getMembers() or not, the members will always be loaded.
in any case (either eager or lazy fetching), it will lead to 4 Mongo queries (one to get the Group and another 3 for the members; I don't think Morphia tries to optimize the number of queries to only make 1 query to get all 3 members at a time)
The second step however depends entirely on your code.
Remember that RequestFactory wants you to have a single instance of an entity per HTTP request. As I understand it, Morphia has an EntityCache doing just that, but I suspect it could be bypassed by some methods/queries.