Getting Spring Data MongoDB to recreate indexes - spring-data

I'd like to be able to purge the database of all data between Integration test executions. My first thought was to use an org.springframework.test.context.support.AbstractTestExecutionListener
registered using the #TestExecutionListeners annotation to perform the necessary cleanup between tests.
In the afterTestMethod(TestContext testContext) method I tried getting the database from the test context and using the com.mongodb.DB.drop() method. This worked ok, apart from the fact that it also destroys the indexes that were automatically created by Spring Data when it first bound my managed #Document objects.
For now I have fixed this by resorting to iterating through the collection names and calling remove as follows:
for (String collectionName : database.getCollectionNames()) {
if (collectionIsNotASystemCollection(collectionName)
database.getCollection(collectionName).remove(new BasicDBObject());
}
This works and achieves the desired result - but it'd be nice if there was a way I could simply drop the database and just ask Spring Data to "rebind" and perform the same initialisation that it did when it started up to create all of the necessary indexes. That feels a bit cleaner and safer...
I tried playing around with the org.springframework.data.mongodb.core.mapping.MongoMappingContext but haven't yet managed to work out if there is a way to do what I want.
Can anyone offer any guidance?

See this ticket for an explanation why it currently works as it works and why working around this issue creates more problems than it solves.
Supposed you're working with Hibernate and then trigger a call to delete the database, would you even dream to assume that the tables and all indexes reappear magically? If you drop a MongoDB database/collection you remove all metadata associated with it. Thus, you need to set it up the way you'd like it to work.
P.S.: I am not sure we did ourselves a favor to add automatic indexing support as this of course triggers the expectations that you now have :). Feel free to comment on the ticket if you have suggestions how this could be achieved without the downsides I outlined in my initial comment.

Related

Automatically updating related rows with DBIx::Class

In the context of a REST API, I've been using DBIx::Class to create related rows, i.e.
POST /artist
{ "name":"Bob Marley", "cds":[{"title":"Exodus"}] }
That ultimately calls $artist->new($data)->insert() which creates an Artist AND creates the related row(s) in the CD table. It then send back the resulting object to the user (via DBIC::ResultClass::HashRefInflator), including the created primary keys and default values. The problem arises when the user makes changes to those objects and send them back to the API again:
POST /artist/7
{ "name":"Robert Nesta Marley", "artistid":"7",
"cds":[{"title":"Exodus", "cdid":"1", "artistid":"7", "year":"1977"}] }
Now what? From what I can see from testing, DBIC::Row::update doesn't deal with changes in related rows, so in this case the name change would work, but the update to the CD's year would not. DBIC::ResultSet::update_or_create just calls DBIC::Row::update. So I went searching for some alternatives, and they do seem to exist, i.e. DBIC::ResultSet::RecursiveUpdate, but it hasn't been updated in 4 years, and references to it seem to suggest it should/would be folded into DBIC. Did that happen?
Am I missing something easier?
Obviously, I can handle this particular situation, but I have many APIs to write and it would be easier to handle it generically for all of them. I'm certainly tempted to use RecursiveUpdate, but wary due to its apparent abandonment.
Ribasushi promised to core the RecursiveUpdate feature if someone steps up and migrates its test suite to the DBIC schema and comes up with a sane API because he didn't like how RU handles this currently.
Catalyst::Controller::DBIC::API is using RU under the hood as well as HTML::Formhandler::Model::DBIC both here in production since years without any issues.
So currently RecursiveUpdate is the way to go.

DocPad and MongoDB

I'm looking for a way to use DocPad with MongoDB. Tried to google about that, but didn't find anything encouraging. Could you please give at least some hints?
Some parts of what I'm developing need to be persisted.
Thanks in advance.
Starting from version 6.55 released last month, DocPad creates a persistent db file in the root of the project called .docpad.db :
https://github.com/bevry/docpad/blob/master/HISTORY.md
I guess it's a first step in the persistent behaviour you need ; the documents may be parsed and inserted in a Mongo database, because behind the scene, DocPad uses QueryEngine which has an API similar to Mongo :
https://github.com/bevry/query-engine
More work is on the way regarding your concern. Have a look at this discussion that deals with the future architecture of DocPad, especially the importer / exporter decoupling :
https://github.com/bevry/docpad/issues/705
I've written some code that reads from Mongodb and returns an object that can be rendered into docs. I've also tried to write some code to provide the backend for basic editing of the database but the regeneration after update is not yet working (although it may be by the time you read this!). See https://github.com/simonh1000/docpad-plugin-mongo

q.setOrder(...) extremely slow in DataNucleus + MongoDB

I have a latest stable DataNucleus (3.0.1) with MongoDB datastore and JDO implementation.
The collection has around 1M documents.
The "id" field is indexed.
This code takes several minutes to execute:
Query q = pm.newQuery(CellMeasurement.class);
q.setOrdering("id descending");
q.setRange(0, count);
Collection<CellMeasurement> result = (Collection<CellMeasurement>)q.execute();
if I remove the q.setOrdering(...) everything is ok, for count=1000 it takes around a second to load.
It looks like DN does the in-memory reordering, does it have any sense ? MongoDB itself orders instantly by this indexed field, the API supports ordering..
Any idea ? Thanks.
Looking in the log (for any application) would obviously reveal much; in this case what query is actually performed. In this case it would tell you easily enough that ordering is not currently implemented to execute in-datastore.
Obviously anybody could contribute to such codebase that has been open source since its first release. org.datanucleus.store.mongodb.query.QueryToMongoDBMapper method "compileOrdering" is where you need to implement that, and then attach a patch to a JIRA issue when you're done. Thx

Mapping to legacy MongoDB store

I'm attempting to write up a Yesod app as a replacement for a Ruby JSON service that uses MongoDB on the backend and I'm running into some snags.
the sql=foobar syntax in the models file does not seem too affect which collection Persistent.MongoDB uses. How can I change that?
is there a way to easily configure mongodb (preferably through the yaml file) to be explicitly read only? I'd take more comfort deploying this knowing that there was no possible way the app could overwrite or damage production data.
Is there any way I can get Persistent.MongoDB to ignore fields it doesn't know about? This service only needs a fraction of the fields in the collection in question. In order to keep the code as simple as possible, I'd really like to just map to the fields I care about and have Yesod ignore everything else. Instead it complains that the fields don't match.
How does one go about defining instances for models, such as ToJSON. I'd like to customize how that JSON gets rendered but I get the following error:
Handler/ProductStat.hs:8:10:
Illegal instance declaration for ToJSON Product'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration forToJSON Product'
1) seems that sql= is not hooked up to mongo. Since sql is already doing this it shouldn't be difficult for Mongo.
2) you can change the function that runs the queries
in persistent/persistent-mongoDB/Database/Persist there is a runPool function of PersistConfig. That gets used in yesod-defaults. We should probably change the loadConfig function to check a readOnly setting
3) I am ok with changing the reorder function to allow for ignoring, although in the future (if MongoDB returns everything in ordeR) that may have performance implications, so ideally you would list the ignored columns.
4) This shouldn't require changes to Persistent. Did you try turning on TypeSynonymInstances ?
I have several other Yesod/Persistent priorities to attend to before these changes- please roll up your sleeves and let me know what help you need making them. I can change 2 & 3 myself fairly soon if you are committed to testing them.

Solr and custom update handler

I have a question about Solr and the possibility to implement a customized update handler
Basically, the scenario is this:
FIELD-A : my main field
FIELD-B and FIELD-C : 2 copyfield with source in A
After FIELD-A has its value stored, i need this valued to be copied in FIELD-B and C, then processed (let's say extract a substring) and stored in FIELD-B and C before indexing time. I'm not using DIH.
edit: i'm pushing my data via nutch (forgot to mention that)
As far as i've understood, copyfields triggers after indexing (but i'm not so sure about this).
I've already read throu the wiki page and still i don't understand a lot of things:
1) customupdateprocessor is an alternative to conditionalcopyfield or do they have to exist both in my solr?
2) after creating my conditionalcopyfield jar file, how do i declare it in my schema?
3) how do i have to modify my solrconfig.xml to use my updater?
4) if i'm choosing the wrong way, any suggestion is appreciated, better if some examples or well documented links are provided
I read a lot (googling and lucene ml on nabble) but there's not so much documentation about this. I just need to create a custom updater for my two copyfields,
Thanks all in advance!
Its not really complicated.. Following is an excellent link I came across to write a custom solr update handler.
http://knackforge.com/blog/selvam/integrating-solr-and-mahout-classifier
I tested it in my solr and it just works fine!
If you are using SOLR 4 or planning to use it, http://wiki.apache.org/solr/ScriptUpdateProcessor could be an easier solution. Have fun!