DocPad and MongoDB - mongodb

I'm looking for a way to use DocPad with MongoDB. Tried to google about that, but didn't find anything encouraging. Could you please give at least some hints?
Some parts of what I'm developing need to be persisted.
Thanks in advance.

Starting from version 6.55 released last month, DocPad creates a persistent db file in the root of the project called .docpad.db :
https://github.com/bevry/docpad/blob/master/HISTORY.md
I guess it's a first step in the persistent behaviour you need ; the documents may be parsed and inserted in a Mongo database, because behind the scene, DocPad uses QueryEngine which has an API similar to Mongo :
https://github.com/bevry/query-engine
More work is on the way regarding your concern. Have a look at this discussion that deals with the future architecture of DocPad, especially the importer / exporter decoupling :
https://github.com/bevry/docpad/issues/705

I've written some code that reads from Mongodb and returns an object that can be rendered into docs. I've also tried to write some code to provide the backend for basic editing of the database but the regeneration after update is not yet working (although it may be by the time you read this!). See https://github.com/simonh1000/docpad-plugin-mongo

Related

Getting Spring Data MongoDB to recreate indexes

I'd like to be able to purge the database of all data between Integration test executions. My first thought was to use an org.springframework.test.context.support.AbstractTestExecutionListener
registered using the #TestExecutionListeners annotation to perform the necessary cleanup between tests.
In the afterTestMethod(TestContext testContext) method I tried getting the database from the test context and using the com.mongodb.DB.drop() method. This worked ok, apart from the fact that it also destroys the indexes that were automatically created by Spring Data when it first bound my managed #Document objects.
For now I have fixed this by resorting to iterating through the collection names and calling remove as follows:
for (String collectionName : database.getCollectionNames()) {
if (collectionIsNotASystemCollection(collectionName)
database.getCollection(collectionName).remove(new BasicDBObject());
}
This works and achieves the desired result - but it'd be nice if there was a way I could simply drop the database and just ask Spring Data to "rebind" and perform the same initialisation that it did when it started up to create all of the necessary indexes. That feels a bit cleaner and safer...
I tried playing around with the org.springframework.data.mongodb.core.mapping.MongoMappingContext but haven't yet managed to work out if there is a way to do what I want.
Can anyone offer any guidance?
See this ticket for an explanation why it currently works as it works and why working around this issue creates more problems than it solves.
Supposed you're working with Hibernate and then trigger a call to delete the database, would you even dream to assume that the tables and all indexes reappear magically? If you drop a MongoDB database/collection you remove all metadata associated with it. Thus, you need to set it up the way you'd like it to work.
P.S.: I am not sure we did ourselves a favor to add automatic indexing support as this of course triggers the expectations that you now have :). Feel free to comment on the ticket if you have suggestions how this could be achieved without the downsides I outlined in my initial comment.

mongodb why do we need getSisterDB

While playing with mognodb console help, I found a db.getSisterDB() method.
And I am curious what is the purpose of this method. Looking through mongodb documentation and a quick google search did not yield satisfactory results.
By typing db.getSisterDb.help generates an error and typing db.getSisterDB gives the following definition of this method:
function ( name ){
return this.getMongo().getDB( name );
}
which suggests that this is just a wrapper around getDB. My suggestion that it is used in to access databases in a replica set, but I would like to listen to a person who can give me a more thorough explanation.
In the shell, db is a reference to the current database. If you want to query against a different DB in the same mongod instance, the way to get a proper reference to it would be to use this method (which has an alias, more gender neutral getSiblingDB).
If you wanted to use the longer syntax, you could: db.getMongo().getDB(name) gets you the same thing as db.getSiblingDB(name) or db.getSisterDB(name) but the former is longer to type.
All of the above work the same way in standalone mongod as well as replica sets (and sharded clusters).
Im going to add to the accepted answer because I did not find what I wanted as a first result.
getSiblingDB exists for scripting, where the use helper is not available
getSiblingDB is the newer between the identical getSisterDB, so use sibling as getSisterDB is no longer in documentation
when used in the shell, getSiblingDB serves the purpose of getting a database without changing the db variable

q.setOrder(...) extremely slow in DataNucleus + MongoDB

I have a latest stable DataNucleus (3.0.1) with MongoDB datastore and JDO implementation.
The collection has around 1M documents.
The "id" field is indexed.
This code takes several minutes to execute:
Query q = pm.newQuery(CellMeasurement.class);
q.setOrdering("id descending");
q.setRange(0, count);
Collection<CellMeasurement> result = (Collection<CellMeasurement>)q.execute();
if I remove the q.setOrdering(...) everything is ok, for count=1000 it takes around a second to load.
It looks like DN does the in-memory reordering, does it have any sense ? MongoDB itself orders instantly by this indexed field, the API supports ordering..
Any idea ? Thanks.
Looking in the log (for any application) would obviously reveal much; in this case what query is actually performed. In this case it would tell you easily enough that ordering is not currently implemented to execute in-datastore.
Obviously anybody could contribute to such codebase that has been open source since its first release. org.datanucleus.store.mongodb.query.QueryToMongoDBMapper method "compileOrdering" is where you need to implement that, and then attach a patch to a JIRA issue when you're done. Thx

Solr and custom update handler

I have a question about Solr and the possibility to implement a customized update handler
Basically, the scenario is this:
FIELD-A : my main field
FIELD-B and FIELD-C : 2 copyfield with source in A
After FIELD-A has its value stored, i need this valued to be copied in FIELD-B and C, then processed (let's say extract a substring) and stored in FIELD-B and C before indexing time. I'm not using DIH.
edit: i'm pushing my data via nutch (forgot to mention that)
As far as i've understood, copyfields triggers after indexing (but i'm not so sure about this).
I've already read throu the wiki page and still i don't understand a lot of things:
1) customupdateprocessor is an alternative to conditionalcopyfield or do they have to exist both in my solr?
2) after creating my conditionalcopyfield jar file, how do i declare it in my schema?
3) how do i have to modify my solrconfig.xml to use my updater?
4) if i'm choosing the wrong way, any suggestion is appreciated, better if some examples or well documented links are provided
I read a lot (googling and lucene ml on nabble) but there's not so much documentation about this. I just need to create a custom updater for my two copyfields,
Thanks all in advance!
Its not really complicated.. Following is an excellent link I came across to write a custom solr update handler.
http://knackforge.com/blog/selvam/integrating-solr-and-mahout-classifier
I tested it in my solr and it just works fine!
If you are using SOLR 4 or planning to use it, http://wiki.apache.org/solr/ScriptUpdateProcessor could be an easier solution. Have fun!

Using several database setups with Lucene.Net

Hi
I am developing a search function for an web application with Lucene.Net and NHibernate.Search. The application is used by a lots of companies but is runned as a single service, using different databases for different companies. Therefore I would need an index directory for each database rather than one directory for the entire application. Is there a way of achieve this in Lucene.Net?
I have also considering storing the indexes for each company in there respecitive database but havent found any satisfying compontents for this. I have read about Compass and JdbcDirectory for Java but I need something for C# or NHibernate. Does anyone know if there is a port of JdbcDirectory or something simular for C#?
Hmm, it looks like you can't change anything at the session factory level using normal nhibernate.search. You may need separate instances of a configuration, or maybe try something along the lines of Fluent NHibernate Search to ease the pain.
Piecing it together from the project's wiki it appears you could do something like this to spin up separate session factories pointing to different databases / index directories:
Fluently.Configure()
.Database(SQLiteConfiguration.Standard.InMemory())
.Search(s => s.DefaultAnalyzer().Standard()
.DirectoryProvider().FSDirectory()
.IndexBase("~/Index")
.IndexingStrategy().Event()
.MappingClass<LibrarySearchMapping>())
.BuildConfiguration()
.BuildSessionFactory();
The "IndexBase" property and the connection are the parts you'll need to define per customer. Once you get the session factories set up you could resolve them using whatever strategy you use currently.
Hope it helps