Solr and custom update handler - plugins

I have a question about Solr and the possibility to implement a customized update handler
Basically, the scenario is this:
FIELD-A : my main field
FIELD-B and FIELD-C : 2 copyfield with source in A
After FIELD-A has its value stored, i need this valued to be copied in FIELD-B and C, then processed (let's say extract a substring) and stored in FIELD-B and C before indexing time. I'm not using DIH.
edit: i'm pushing my data via nutch (forgot to mention that)
As far as i've understood, copyfields triggers after indexing (but i'm not so sure about this).
I've already read throu the wiki page and still i don't understand a lot of things:
1) customupdateprocessor is an alternative to conditionalcopyfield or do they have to exist both in my solr?
2) after creating my conditionalcopyfield jar file, how do i declare it in my schema?
3) how do i have to modify my solrconfig.xml to use my updater?
4) if i'm choosing the wrong way, any suggestion is appreciated, better if some examples or well documented links are provided
I read a lot (googling and lucene ml on nabble) but there's not so much documentation about this. I just need to create a custom updater for my two copyfields,
Thanks all in advance!

Its not really complicated.. Following is an excellent link I came across to write a custom solr update handler.
http://knackforge.com/blog/selvam/integrating-solr-and-mahout-classifier
I tested it in my solr and it just works fine!

If you are using SOLR 4 or planning to use it, http://wiki.apache.org/solr/ScriptUpdateProcessor could be an easier solution. Have fun!

Related

How to inject (dynamic?) Parameters in Tableau CustomSQL

I currently try to solve the following issue in Tableau:
In the end, I would like to have a Tableau dashboard where the user can select a Customer, and then can see the Customer's KPIs. Nothing spectacular so far.
To obtain a Customer's KPIs, there is a CustomSQL query with a parameter "CustomerName" (that returns the KPIs for that Customer).
Now the thing:
I don't want to have a hardcoded list of CustomerNames, as it would be possible with Tableau Parameters. Instead, the CustomerNames should be fetched from another datasource. I did not find a way to "link" a Parameter to a DataSource, and/or inject something other than static Parameters into CustomSQL.
My Question: Is there really no solution for this, or am I just doing something wrong (I hope so).
I found this workaround here https://www.interworks.com/de/blog/daustin/2015/12/17/dynamic-parameters-tableau that seems to work, but that looks like... a workaround.
Few background info:
I have to stick to using a CustomSQL because
It is not viable for me to calculate all KPIs for all CustomerNames
and then filter by Tableau, since the data amount is too big.
It is not viable to replace the CustomSQL with Tableau Calculations
and Filters (already tried that, ended up in having Tableau pulling
too much data instead of pushing the work to the database).
I cannot believe that Tableau does not offer a solution here, since the use case is pretty common I believe.
Do you have some input for me?
Thank you for your help in advance!
Kind Regards
have you tried using rawsql() functions together with stored functions on the database side? I found it pretty useful when needed to load single value from the dataset completely not related to currently used datasource.
For example, running foo stored function which accepts 2 dates and calculated sum of something, Syntax should be something like:
rawsql_int(your_db_schema.foo(%1,%2),[startDateFieldTableau],[endDateFieldTableau])
but you can access it directly:
rawsql_int("select sum(bar) from sales")
but this is bit risky.
Drawbacks:
it relies on the current connection (you create a calculated field (duh!)
it will not work with extract (but you are using custom sql anyways so I believe you are more into live connection

Getting Spring Data MongoDB to recreate indexes

I'd like to be able to purge the database of all data between Integration test executions. My first thought was to use an org.springframework.test.context.support.AbstractTestExecutionListener
registered using the #TestExecutionListeners annotation to perform the necessary cleanup between tests.
In the afterTestMethod(TestContext testContext) method I tried getting the database from the test context and using the com.mongodb.DB.drop() method. This worked ok, apart from the fact that it also destroys the indexes that were automatically created by Spring Data when it first bound my managed #Document objects.
For now I have fixed this by resorting to iterating through the collection names and calling remove as follows:
for (String collectionName : database.getCollectionNames()) {
if (collectionIsNotASystemCollection(collectionName)
database.getCollection(collectionName).remove(new BasicDBObject());
}
This works and achieves the desired result - but it'd be nice if there was a way I could simply drop the database and just ask Spring Data to "rebind" and perform the same initialisation that it did when it started up to create all of the necessary indexes. That feels a bit cleaner and safer...
I tried playing around with the org.springframework.data.mongodb.core.mapping.MongoMappingContext but haven't yet managed to work out if there is a way to do what I want.
Can anyone offer any guidance?
See this ticket for an explanation why it currently works as it works and why working around this issue creates more problems than it solves.
Supposed you're working with Hibernate and then trigger a call to delete the database, would you even dream to assume that the tables and all indexes reappear magically? If you drop a MongoDB database/collection you remove all metadata associated with it. Thus, you need to set it up the way you'd like it to work.
P.S.: I am not sure we did ourselves a favor to add automatic indexing support as this of course triggers the expectations that you now have :). Feel free to comment on the ticket if you have suggestions how this could be achieved without the downsides I outlined in my initial comment.

DocPad and MongoDB

I'm looking for a way to use DocPad with MongoDB. Tried to google about that, but didn't find anything encouraging. Could you please give at least some hints?
Some parts of what I'm developing need to be persisted.
Thanks in advance.
Starting from version 6.55 released last month, DocPad creates a persistent db file in the root of the project called .docpad.db :
https://github.com/bevry/docpad/blob/master/HISTORY.md
I guess it's a first step in the persistent behaviour you need ; the documents may be parsed and inserted in a Mongo database, because behind the scene, DocPad uses QueryEngine which has an API similar to Mongo :
https://github.com/bevry/query-engine
More work is on the way regarding your concern. Have a look at this discussion that deals with the future architecture of DocPad, especially the importer / exporter decoupling :
https://github.com/bevry/docpad/issues/705
I've written some code that reads from Mongodb and returns an object that can be rendered into docs. I've also tried to write some code to provide the backend for basic editing of the database but the regeneration after update is not yet working (although it may be by the time you read this!). See https://github.com/simonh1000/docpad-plugin-mongo

Magento get language collection

There seems to be a possibility in Magento to get a language collection, namely via Mage::getSingleton('adminhtml/system_config_source_language'), which I would like to use. It results however in an error in my version of Magento (both Enterprise 1.10 and Community 1.4), expecting to get its data from an unexisting table called core_language.
Has anyone found a good solution or alternative to this? Or maybe have used this and has a table dump for core_language?
Magento is built on Zend so you can use,
Zend_Locale::getTranslationList("language")
which returns an array of strings keyed by their abbreviation.
Hmm, I looked through the installation files and apparently the table is created initially but dropped from version 0.7.5, so it's probably deprecated code. The class file doesn't mention this though, so quite obscure.

MongoDB: What's a good way to get a list of all unique tags?

What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!