SOLR/Lucene: How would one go about extending the Scorer class s.t. it could then be wrapped in a Solr plugin - plugins

The reason I am asking this, is because extending the Similarity class or using query function is not enough for me. I plan to personalize user queries in terms of their preferences with respect to document fields. I need to update the score of the documents after the text based scoring has been computed using these preferences (which would have been cached by the Solr plugin). Any thoughts?

I'd write a custom function query, it fits your definition of modifying the calculated score with a custom algorithm.

Related

Is both client and server column filters in ag grid possible?

I want to know if we can do combination filtering in ag grid. Some volumn filtering on client and some on server. is that possible?
I was checking adaptabletools website they have built similar feature with serverOptions.link below. I was trying to achieve similar thing via ag-grid api. Can you please advise
https://api.adaptabletools.com/interfaces/_src_adaptableoptions_searchoptions_.searchoptions.html
Update on this question as I developed AdapTable which the OP refers to in her question.
We DO enable and facilitate server-side searching, sorting and filtering while keeping ag-Grid in ClientSideRowModel mode and many of users take huge advantage of it.
You can learn more at:
https://docs.adaptabletools.com/docs/key-topics/server-functionality
However note that this is suited to the use case where you have a few hundred thousand rows and want to have the best of both worlds; if you have millions of rows of data that needs searching and filtering then you should use ag-Grid's Server or Infinite Row Models (both of which AdapTable fully supports but in different ways to that mentioned in the OP).
Well Client-Side RowModel is the default. The grid will load all of the data into the grid in one go. The grid can then perform filtering, sorting, grouping, pivoting and aggregation all in memory.
The Server-Side Row Model builds on the Infinite Row Model. In addition to lazy-loading the data as the user scrolls down, it also allows lazy-loading of grouped data with server-side grouping and aggregation. Advanced users will use Server-Side Row Model to do ad-hoc slice and dice of data with server-side aggregations.
Ideally developer should choose either of them. Also AG Grid doesn't allow any method to set RowmodelType type programmatically.
So the simple answer is no it can't be done easily.
But I think you can do some workaround by creating another hidden AG Grid which will be created with RowmodelType = 'client side'. update the data in this 2nd grid whenever data changes in the first grid. also switch between grid(using hide show logic) when user wants to filter on client side(may be you can provide a radio button for that) and you can set filterstate/columnstate etc.. settings from 2nd to 1st grid.

Dynamic iFind Index without creating class Index of %iFind.Index.Basic

I am trying to make a general purpose text search feature with %iFind.Index.Basic.
According to the iFind Search Tool documentation, an iFind Index must be created in a Class as below:
Class Aviation.TestSQLSrch Extends %Persistent[...]
{
...
Index NarrBasicIdx On (Narrative) As %iFind.Index.Basic(INDEXOPTION=0,
LANGUAGE="en",LOWER=1);
...
}
But this only applies a field in one single class only.
If the iFind search feature needed to be used generally, then a lot of string fields need to be indexed and that is memory consuming and unpractical.
Is there anyway to do iFind indexing dynamically on demand without the need to alter the Class, and still able to be queried with ##Class(%ResultSet)?
In the documentation, it also mentioned Indexing a JSON Object, but without example given. Is this the place I should explored more on?
Using iFind, you first need to create an index and build it before executing any query (nothing dynamically here, as soon as it is index-based).
If you want something more generic, maybe you should use some other Text Analytics options as NLP (Natural Language Processing)

Deploy Knowledge Studio dictionary pre-annotator to Natural Language Understanding

I'm getting started with Knowledge Studio and Natural Language Understanding.
I'm able to deploy a machine-learning model toNatural Language Understanding and use the API to query it.
I would know if there's a way to deploy only the pre-annotator.
I read from Knowledge Studio's documentation that
You can deploy or export a machine-learning annotator. A dictionary pre-annotator can only be used to pre-annotate documents within Watson Knowledge Studio.
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
You may need to explain this better in what you need.
WKS allows you to pre-annotate documents with dictionaries you upload. Once you have created a ML model, you can alternatively use that to annotate your training documents, and then manually correct. As you continue the amount of manual work will reduce after each model iteration.
The assumption is that you are creating a model with a reasonable amount of examples. In your model results, you will want the mention/relations to be outside or close to outside the gray area of the report.
The other interpretation of your request I took was you want to create a dictionary based model only. This is possible using the "Rule-Based Model" functionality. You would have to create the parsing rules but you just map what you want to find to the dictionary/rule.
Using this in production though is still limited. You should get a warning when you deploy these kinds of models.
It's slightly better than just a keyword search as you can map items to parts of speech.
The last point. The purpose of WKS is to create a machine learning model which will do the work in discovering new terms you haven't seen before. With the rule based engine it can only find what you explicitly tell it to find.
If all you want is just dictionary entries, then you can create a very simple string comparison solution, but you lose the linguistic features.

what is the best way to retrive information in a graph through has Step

I'm using titan graph db with tinkerpop plugin. What is the best way to retrieve a vertex using has step?
Assuming employeeId is a unique attribute which has a unique vertex centric index defined.
Is it through label
i.e g.V().has(label,'employee').has('employeeId','emp123')
g.V().has('employee','employeeId','emp123')
(or)
is it better to retrieve a vertex based on Unique properties directly?
i.e g.V().has('employeeId','emp123')
Which one of the two is the quickest and better way?
First you have 2 options to create the index:
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).buildCompositeIndex()
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).indexOnly(employee).buildCompositeIndex()
For option 1 it doesn't really matter which query you're going to use. For option 2 it's mandatory to use g.V().has('employee','employeeId','emp123').
Note that g.V().hasLabel('employee').has('employeeId','emp123') will NOT select all employees first. Titan is smart enough to apply those filter conditions, that can leverage an index, first.
One more thing I want to point out is this: The whole point of indexOnly() is to allow to share properties between different types of vertices. So instead of calling the property employeeId, you could call it uuid and also use it for employers, companies, etc:
mgmt.buildIndex('employeeById', Vertex.class).addKey(uuid).indexOnly(employee).buildCompositeIndex()
mgmt.buildIndex('employerById', Vertex.class).addKey(uuid).indexOnly(employer).buildCompositeIndex()
mgmt.buildIndex('companyById', Vertex.class).addKey(uuid).indexOnly(company).buildCompositeIndex()
Your queries will then always have this pattern: g.V().has('<label>','<prop-key>','<prop-value>'). This is in fact the only way to go in DSE Graph, since we got completely rid of global indexes that span across all types of vertices. At first I really didn't like this decision, but meanwhile I have to agree that this is so much cleaner.
The second option g.V().has('employeeId','emp123') is better as long as the property employeeId has been indexed for better performance.
This is because each step in a gremlin traversal acts a filter. So when you say:
g.V().has(label,'employee').has('employeeId','emp123')
You first go to all the vertices with the label employee and then from the employee vertices you find emp123.
With g.V().has('employeeId','emp123') a composite index allows you to go directly to the correct vertex.
Edit:
As Daniel has pointed out in his answer, Titan is actually smart enough to not visit all employees and leverages the index immediately. So in this case it appears there is little difference between the traversals. I personally favour using direct global indices without labels (i.e. the first traversal) but that is just a preference when using Titan, I like to keep steps and filters to a minimum.

How do I create a validator for a single collection?

I need to build a custom id validator that will apply to a single collection, whose id will always be pre-defined (won't need a generator).
In the docs about id generators, it's written:
Currently the configuration of the custom generator applies to every resources (buckets, groups, collections, records). This tiny limitation can easily be fixed, don’t hesitate to get in touch with us!
But there is nothing documented about id validation.
So, how do I:
Implement an id validator, that
Will apply to one collection only?
By default cliquet uses a generator which accepts the following regular expression r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$' (All letters and numbers + underscore and "-").
Before you chose to have a different ID validation mechanism, ensure you really need to.
Now, if that's not enough, you would need to select the proper validator depending on some configuration or already existing values, but this is not implemented in cliquet / kinto.
https://github.com/mozilla-services/cliquet/blob/master/cliquet/resource/init.py#L147 is probably a good place to look at / start with.