How do I disable GridFS MD5 calculation in Spring Boot? - mongodb

Now that the md5 attribute of GridFS files collection is obsolete, drivers are not required to compute it, so I'd like to disable it to spare a few milliseconds maybe...
The MongoDB Java driver does provide an option disableMD5 in GridFSBucketImpl, but since I'm using Spring Boot's GridFsTemplate (spring-data-mongodb 2.1.2.RELEASE) I don't have direct access to it. GridFsTemplate has a method getGridFs() that returns a GridFSBucket configured for the current database and bucket name, but unfortunately this method is private so I can't override it.
So what are my options? Do I have to override all of GridFsTemplate? Did I miss a simple setting somewhere? Should I submit a feature request to Spring?
Update
Obviously GridFsTemplate is not meant to be extended (though all it would take is getGridFs and a couple fields to be protected) so I ended up creating my own CustomGridFsTemplate, which is an almost exact copy of GridFsTemplate except that I call GridFSBucket.withDisableMD5(true) in getGridFs.
I'm not very happy with that, but it works and I don't see a better option for now.
Update 2
I have submitted a Spring feature request, please vote for it! https://jira.spring.io/browse/DATAMONGO-2165

There's currently no better way. Looks like you filed a ticket to extend GridFsTemplate to allow the customizations.

Related

Spring data mongodb federation attempts -- how can I get interface methods to use a custom configured mongotemplate?

In my application, I need to be able to connect to any number of mongodb hosts, and any number of databases in any of those hosts to support at least this basic level of query federation. This is specified by configuration, so, for any given installation of our app, I cannot know ahead of time how many collections I will need to access. I based my attempt on configuration that I saw in this Baeldung article with some modifications to suit my requirements. My configuration looks something like this yaml:
datasources:
- name: source1
uri: mongodb://user1:pass1#127.0.0.1:27017
fq_collection: db1.coll1
- name: source2
uri: mongodb://user1:pass1#192.168.0.100:27017
fq_collection: db2.coll2
And, depending on the installation, there could be any number of datasources entries. So, in my #Configuration class, I can iterate through these entries that are injected via configuration properties. But I want to create a MongoTemplate that I can set up for each of these, since I cannot rely on the default MongoTemplate. The solution that I have attempted is to create a repository interface, and then to create a custom impl that will accept the configured MongoTemplate. When I use this code to create each Repository instance with its template:
public MongoRepository<String, Item> mongoCustomRepositories(MongoTemplate template) {
MyCustomMongoRepository customImpl = new MyCustomMongoRepositoryImpl(template);
MongoRepositoryFactory repositoryFactory = new MongoRepositoryFactory(template);
return repositoryFactory.getRepository(MyMongoRepository.class, customImpl);
}
And I call it from a #Bean method that returns the list of all of these repositories created from the config entries, I can inject the repositories into service classes.
UPDATE/EDIT: Ok, I set mongodb profiling to 2 in order to log the queries. It turns out that, in fact, the queries are being sent to mongodb, but the problem is that the collection names are not being set for the model. I can't believe that I forgot about this, but I did, so it was using the lower camel case model class name, which will make sure that there are no documents to be retrieved. I have default collection names, but the specific collection names are set in the configuration, like the example YAML shows. I have a couple of ideas, but if anyone has a suggestion about how to set these dynamically, then that would help a lot.
EDIT 2: I did a bunch of work and I have it almost working. You can see my work in my repo. However, in doing this, I uncovered a bug in spring-data-mongodb, and I filed an issue.

REST new ID with DDD Aggregate

This question seemed fool at first sight for me, but then I realized that I don't have a proper answer yet, and interestingly also didn't find good explanation about it in my searches.
I'm new to Domain Driven Design concepts, so, even if the question is basic, feel free to add any considerations to it.
I'm designing in Rest API to configure Server Instances, and I came up with a Aggregate called Instance that contains a List of Configurations, only one specific Configuration will be active at a given time.
To add a Configuration, one would call an endpoint POST /instances/{id}/configurations with the body on the desired configuration. In response, if all okay, it would receive a HTTP 204 with a Header Location containing the new Configuration ID.
I'm planning to have only one Controller, InstanceController, that would call InstanceService that would manipulate the Instance Aggregate and then store to the Repo.
Since the ID's are generated by the repository, If I call Instance.addConfiguration and then InstanceRepository.store, how would I get the ID of the newly created configuration? I mean, it's a List, so It's not trivial as calling Instance.configuration.identity
A option would implement a method in Instance like, getLastAddedConfiguration, but this seems really brittle.
What is the general approach in this situation?
the ID's are generated by the repository
You could remove this extra complexity. Since Configuration is an entity of the Instance aggregate, its Id only needs to be unique inside the aggregate, not across the whole application. Therefore, the easiest is that the Aggregate assigns the ConfigurationId in the Instance.addConfiguration method (as the aggregate can easily ensure the uniqueness of the new Id). This method can return the new ConfigurationId (or the whole object with the Id if necessary).
What is the general approach in this situation?
I'm not sure about the general approach, but in my opinion, the sooner you create the Ids the better. For Aggregates, you'd create the Id before storing it (maybe a GUID), for entities, the Aggregate can create it the moment of creating/adding the entity. This allows you to perform other actions (eg publishing an event) using these Ids without having to store and retrieve the Ids from the DB, which will necessarily have an impact on how you implement and use your repositories and this is not ideal.

Is it possible to obtain the ServicePath when querying entities in Orion?

the question I'd like to ask was raised some time ago (FIWARE Orion: How to retrieve the servicePath of an entity?) but as far as I've seen, there is no final answer.
In short, I'd like to retrieve the service path of entities when I exec a GET query to /v2/entitites which returns multiple results.
In our FIWARE instance, we strongly rely on the servicePath element to differentiate between entities with the same id. It is not a good design choice but, unfortunately, we cannot change it as many applications use that id convention at the moment.
There was an attempt three years ago to add a virtual field 'servicePath' to the query result (https://github.com/telefonicaid/fiware-orion/pull/2880) but the pull request was discarded because it didn't include test coverage for that feature and the final NGSIv2 spec didn't include that field.
Is there any plan to implement such feature in the future? I guess the answer is no, what brings me to the next question: is there any other way to do it which does not involve creating subscriptions (we found that the initial notification of a subscription does give you that info, but the notification is limited to 1000 results, what is too low for the number of entities we want to retrieve, and it does not allow pagination either)?
Thanks in advance for your responses.
A possible workaround is to use an attribute (provided by the context producer application) to keep the service path. Somehow, this is the same idea of the builtin attribute proposed in PR #2880.

TransactionalCacheManager and how to get TransactionalCache and clear

It is possible to take the specific TransactionalCache per transaction and invoke clear? I'm working with spring context but seems that the transactional cache manager is out of it.
It is not possible now (the current version is 3.4.6).
TransactionalCache is a private field of the TransactionCacheManager which is in turn a private field of the CachingExecutor.
The cache is only cleared during queries and updates when mapper configuration (flushCache attribute and query type) instructs mybatis to do so.

Marklogic REST API search for latest document version

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.
The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.
The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!