Is there a way to use Solr's streaming API with spring data solr?

Is there a way to use Solr's streaming API with spring data solr? - spring-data

I have a use case where I need to fetch the ids of my entire solr collection. For that, with solrj, I use the Streaming API like this :
CloudSolrServer server = new CloudSolrServer("zkHost1:2181,zkHost2:2181,zkHost3:2181");
SolrQuery query = new SolrQuery("*:*");
server.queryAndStreamResponse(tmpQuery, handler);
Where handler is a class that implements StreamingResponseCallback, ommited in my code for brevity.
Now, the Spring data repositories abstraction give me the ability to search by pages, by cursors, but I can't seem to find a way to handle the streaming use case.
Is there a workaround ?

SolrTemplate allows to access the underlying SolrClient in a callback style. So you could use that one to work around the current limitations.
The result conversion using the MappingSolrConverter available via the SolrTemplate is broken at the moment (I need to check why) - but you get the idea of how to do it.
solrTemplate.execute(new SolrCallback<Void>() {
#Override
public Void doInSolr(SolrClient solrClient) throws SolrServerException, IOException {
SolrQuery sq = new SolrQuery("*:*");
solrClient.queryAndStreamResponse("collection1", sq, new StreamingResponseCallback() {
#Override
public void streamSolrDocument(SolrDocument doc) {
// the bean conversion fails atm
// ExampleSolrBean bean = solrTemplate.getConverter().read(ExampleSolrBean.class, doc);
System.out.println(doc);
}
#Override
public void streamDocListInfo(long numFound, long start, Float maxScore) {
// do something useful
}
});
return null;
}
});

Related

How to implement near-real time autocompletion using lucene?

Lucene offers different Autocompletion options:
org.apache.lucene.search.suggest.Lookup
I was using the AnalyzingSuggester which is good but it does not support changing data, i.e. when the index changes one needs to reindex everything.
Therefore I tries out the AnalyzingInfixSuggester. This has and add method and an update method but no remove.
Does someone know if it is possible to implement near-real time suggestions with pure lucene?

I do not know why this is not part of the public implementation. At the end I extended the AnalyzingInfixSuggester like this:
public class MyAnalyzingInfixSuggester extends AnalyzingInfixSuggester {
public MyAnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws IOException {
super(dir, analyzer);
}
public void remove(String text) throws IOException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {
// call method ensureOpen via reflection since it is private
Method method = AnalyzingInfixSuggester.class.getDeclaredMethod("ensureOpen");
method.setAccessible(true);
method.invoke(this);
Query query1 = new TermQuery(new Term(TEXT_FIELD_NAME, text.toLowerCase()));
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(query1, BooleanClause.Occur.MUST)
.build();
writer.deleteDocuments(booleanQuery);
}
}

MongoDB and Large Datasets when using a Repository pattern

Okay so at work we are developing a system using MVC C# & MongoDB. When first developing we decided it would probably be a good idea to follow the Repository pattern (what a pain in the ass!), here is the code to give an idea of what is currently implemented.
The MongoRepository class:
public class MongoRepository { }
public class MongoRepository<T> : MongoRepository, IRepository<T>
where T : IEntity
{
private MongoClient _client;
private IMongoDatabase _database;
private IMongoCollection<T> _collection;
public string StoreName {
get {
return typeof(T).Name;
}
}
}
public MongoRepository() {
_client = new MongoClient(ConfigurationManager.AppSettings["MongoDatabaseURL"]);
_database = _client.GetDatabase(ConfigurationManager.AppSettings["MongoDatabaseName"]);
/* misc code here */
Init();
}
public void Init() {
_collection = _database.GetCollection<T>(StoreName);
}
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>();
}
}
The IRepository interface class:
public interface IRepository { }
public interface IRepository<T> : IRepository
where T : IEntity
{
string StoreNamePrepend { get; set; }
string StoreNameAppend { get; set; }
IQueryable<T> SearchFor();
/* misc code */
}
The repository is then instantiated using Ninject but without that it would look something like this (just to make this a simpler example):
MongoRepository<Client> clientCol = new MongoRepository<Client>();
Here is the code used for the search pages which is used to feed into a controller action which outputs JSON for a table with DataTables to read. Please note that the following uses DynamicLinq so that the linq can be built from string input:
tmpFinalList = clientCol
.SearchFor()
.OrderBy(tmpOrder) // tmpOrder = "ClientDescription DESC"
.Skip(Start) // Start = 99900
.Take(PageLength) // PageLength = 10
.ToList();
Now the problem is that if the collection has a lot of records (99,905 to be exact) everything works fine if the data in a field isn't very large for example our Key field is a 5 character fixed length string and I can Skip and Take fine using this query. However if it is something like ClientDescription can be much longer I can 'Sort' fine and 'Take' fine from the front of the query (i.e. Page 1) however when I page to the end with Skip = 99900 & Take = 10 it gives the following memory error:
An exception of type 'MongoDB.Driver.MongoCommandException' occurred
in MongoDB.Driver.dll but was not handled in user code
Additional information: Command aggregate failed: exception: Sort
exceeded memory limit of 104857600 bytes, but did not opt in to
external sorting. Aborting operation. Pass allowDiskUse:true to opt
in..
Okay so that is easy to understand I guess. I have had a look online and mostly everything that is suggested is to use Aggregation and "allowDiskUse:true" however since I use IQueryable in IRepository I cannot start using IAggregateFluent<> because you would then need to expose MongoDB related classes to IRepository which would go against IoC principals.
Is there any way to force IQueryable to use this or does anyone know of a way for me to access IAggregateFluent without going against IoC principals?
One thing of interest to me is why the sort works for page 1 (Start = 0, Take = 10) but then fails when I search to the end ... surely everything must be sorted for me to be able to get the items in order for Page 1 but shouldn't (Start = 99900, Take = 10) just need the same amount of 'sorting' and MongoDB should just send me the last 5 or so records. Why doesn't this error happen when both sorts are done?
ANSWER
Okay so with the help of #craig-wilson upgrading to the newest version of MongoDB C# drivers and changing the following in MongoRepository will fix the problem:
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>(new AggregateOptions { AllowDiskUse = true });
}
I was getting a System.MissingMethodException but this was caused by other copies of the MongoDB drivers needing updated as well.

When creating the IQueryable from an IMongoCollection, you can pass in the AggregateOptions which allow you to set AllowDiskUse.
https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver/IMongoCollectionExtensions.cs#L53

Is it possible to place variables into a resource path within a sling servlet?

We are trying to provide a clean URI structure for external endpoints to pull json information from CQ5.
For example, if you want to fetch information about a particular users history (assuming you have permissions etc), ideally we would like the endpoint to be able to do the following:
/bin/api/user/abc123/phone/555-klondike-5/history.json
In the URI, we would specifying /bin/api/user/{username}/phone/{phoneNumber}/history.json so that it is very easy to leverage the dispatcher to invalidate caching changes etc without invalidating a broad swath of cached information.
We would like to use a sling servlet to handle the request, however, I am not aware as to how to put variables into the path.
It would be great if there were something like #PathParam from JaxRS to add to the sling path variable, but I suspect it's not available.
The other approach we had in mind was to use a selector to recognise when we are accessing the api, and thus could return whatever we wanted to from the path, but it would necessitate a singular sling servlet to handle all of the requests, and so I am not happy about the approach as it glues a lot of unrelated code together.
Any help with this would be appreciated.
UPDATE:
If we were to use a OptingServlet, then put some logic inside the accepts function, we could stack a series of sling servlets on and make the acceptance decisions from the path with a regex.
Then during execution, the path itself can be parsed for the variables.

If the data that you provide comes from the JCR repository, the best is to structure it exactly as you want the URLs to be, that's the recommended way of doing things with Sling.
If the data is external you can create a custom Sling ResourceProvider that you mount on the /bin/api/user path and acquires or generates the corresponding data based on the rest of the path.
The Sling test suite's PlanetsResourceProvider is a simple example of that, see http://svn.apache.org/repos/asf/sling/trunk/launchpad/test-services/src/main/java/org/apache/sling/launchpad/testservices/resourceprovider/
The Sling resources docs at https://sling.apache.org/documentation/the-sling-engine/resources.html document the general resource resolution mechanism.

It is now possible to integrate jersy(JAX-RS) with CQ. We are able to create primitive prototype to say "Hello" to the world.
https://github.com/hstaudacher/osgi-jax-rs-connector
With this we can use the #PathParam to map the requests
Thanks and Regards,
San

There is no direct way to create such dynamic paths. You could register servlet under /bin/api/user.json and provide the rest of the path as a suffix:
/bin/api/user.json/abc123/phone/555-klondike-5/history
^ ^
| |
servlet path suffix starts here
then you could parse the suffix manually:
#SlingServlet(paths = "/bin/api/user", extensions = "json")
public class UserServlet extends SlingSafeMethodsServlet {
public void doGet(SlingHttpServletRequest request, SlingHttpServletResponse response) {
String suffix = request.getRequestPathInfo().getSuffix();
String[] split = StringUtils.split(suffix, '/');
// parse split path and check if the path is valid
// if path is not valid, send 404:
// response.sendError(HttpURLConnection.HTTP_NOT_FOUND);
}
}

The RESTful way to approach this would be to have the information stored in the structure that you want to use. i.e. /content/user/abc123/phone/555-klondike-5/history/ would contain all the history nodes for that path.
In that usage. you can obtain an out of the box json response by simply calling
/content/user/abc123/phone/555-klondike-5/history.json
Or if you need something in a specific json format you could use the sling resource resolution to use a custom json response.

Excited to share this! I've worked ~ a week solving this, finally have the best Answer.
First: Try to use Jersey
The osgi-jax-rs-connector suggested by kallada is best, but I couldn't get it working on Sling 8. I lost a full day trying, all I have to show for it are spooky class not found errors and dependency issues.
Solution: The ResourceProvider
Bertrand's link is for Sling 9 only, which isn't released. So here's how you do it in Sling 8 and older!
Two Files:
ResourceProvider
Servlet
The ResourceProvider
The purpose of this is only to listen to all requests at /service and then produce a "Resource" at that virtual path, which doesn't actually exist in the JCR.
#Component
#Service(value=ResourceProvider.class)
#Properties({
#Property(name = ResourceProvider.ROOTS, value = "service/image"),
#Property(name = ResourceProvider.OWNS_ROOTS, value = "true")
})
public class ImageResourceProvider implements ResourceProvider {
#Override
public Resource getResource(ResourceResolver resourceResolver, String path) {
AbstractResource abstractResource;
abstractResource = new AbstractResource() {
#Override
public String getResourceType() {
return TypeServlet.RESOURCE_TYPE;
}
#Override
public String getResourceSuperType() {
return null;
}
#Override
public String getPath() {
return path;
}
#Override
public ResourceResolver getResourceResolver() {
return resourceResolver;
}
#Override
public ResourceMetadata getResourceMetadata() {
return new ResourceMetadata();
}
};
return abstractResource;
}
#Override
public Resource getResource(ResourceResolver resourceResolver, HttpServletRequest httpServletRequest, String path) {
return getResource(resourceResolver , path);
}
#Override
public Iterator<Resource> listChildren(Resource resource) {
return null;
}
}
The Servlet
Now you just write a servlet which handles any of the resources coming from that path - but this is accomplished by handling any resources with the resource type which is produced by the ResourceProvider listening at that path.
#SlingServlet(
resourceTypes = TypeServlet.RESOURCE_TYPE,
methods = {"GET" , "POST"})
public class TypeServlet extends SlingAllMethodsServlet {
static final String RESOURCE_TYPE = "mycompany/components/service/myservice";
#Override
protected void doGet(SlingHttpServletRequest request, SlingHttpServletResponse response) throws ServletException, IOException {
final String [] pathParts = request.getResource().getPath().split("/");
final String id = pathParts[pathParts.length-1];
response.setContentType("text/html");
PrintWriter out = response.getWriter();
try {
out.print("<html><body>Hello, received this id: " + id + "</body></html>");
} finally {
out.close();
}
}
}
Obviously your servlet would do something much more clever, such as process the "path" String more intelligently and probably produce JSON.

smart gwt- how to use POJO as data source for combo-box/tab/labels

I have a Smart GWT Project where the data that is to be displayed on screen, is stored in a class that is shared by client and server.
I read some docs at the Smart GWT website where they have explained how to connect to XML or JSON data sources.
What I want to do is link my POJO with the Smart GWT widget.
And the data is available client-side, so the server-side data communication component of Smart GWT(Which is only available in paid editions)is not needed.
What is the recommended way to go about implementing this? Are there any best practices while doing this? And am I correct in assuming that I can do the above with the Free edition of Smart GWT?

You must manually add a POJO's fields to attributes of record. We can not simply pass the object as a value in grid. I did it so:
greetingService
.getUsersList(new AsyncCallback<ArrayList<UserForRPC>>() {
public void onFailure(Throwable caught) {
}
public void onSuccess(ArrayList<UserForRPC> result) {
ListGridRecord[] listUsers = new ListGridRecord[result.size()];
int recordNum = 0;
for (UserForRPC user : result) {
ListGridRecord record = new ListGridRecord();
record.setAttribute("id", user.getId());
record.setAttribute("firstName", user.getFirstName());
record.setAttribute("lastName", user.getLastName());
record.setAttribute("login", user.getLogin());
record.setAttribute("password", user.getPassword());
record.setAttribute("email", user.getEmail());
record.setAttribute("role", user.getRole());
record.setAttribute("organization", user.getOrganization());
listUsers[recordNum++] = record;
}
usersGrid.setData(listUsers);
}
});

GWT RequestFactory + CellTable

Does anyone know for an example of GWT's CellTable using RequestFactory and that table is being edited? I would like to list objects in a table (each row is one object and each column is one property), be able to easily add new objects and edit. I know for Google's DynaTableRf example, but that one doesn't edit.
I searched Google and stackoverflow but wasn't able to find one. I got a bit confused with RF's context and than people also mentioned some "driver".
To demonstrate where I currently arrived, I attach code for one column:
// Create name column.
Column<PersonProxy, String> nameColumn = new Column<PersonProxy, String>(
new EditTextCell()) {
#Override
public String getValue(PersonProxy person) {
String ret = person.getName();
return ret != null ? ret : "";
}
};
nameColumn.setFieldUpdater(new FieldUpdater<PersonProxy, String>() {
#Override
public void update(int index, PersonProxy object, String value) {
PersonRequest req = FaceOrgFactory.getInstance().requestFactory().personRequest();
PersonProxy eObject = req.edit(object);
eObject.setName(value);
req.persist().using(eObject).fire();
}
});
and my code for data provider:
AsyncDataProvider<PersonProxy> personDataProvider = new AsyncDataProvider<PersonProxy>() {
#Override
protected void onRangeChanged(HasData<PersonProxy> display) {
final Range range = display.getVisibleRange();
fetch(range.getStart());
}
};
personDataProvider.addDataDisplay(personTable);
...
private void fetch(final int start) {
lastFetch = start;
requestFactory.personRequest().getPeople(start, numRows).fire(new Receiver<List<PersonProxy>>() {
#Override
public void onSuccess(List<PersonProxy> response) {
if (lastFetch != start){
return;
}
int responses = response.size();
if (start >= (personTable.getRowCount()-numRows)){
PersonProxy newP = requestFactory.personRequest().create(PersonProxy.class);
response.add(newP);
responses++;
}
personTable.setRowData(start, response);
personPager.setPageStart(start);
}
});
requestFactory.personRequest().countPersons().fire(new Receiver<Integer>() {
#Override
public void onSuccess(Integer response) {
personTable.setRowCount(response+1, true);
}
});
}
I try to insert last object a new empty object. And when user would fill it, I'd insert new one after it. But the code is not working. I says that user is "attempting" to edit a object previously edited by another RequestContext.
Dilemmas:
* am I creating too many context'es?
* how to properly insert new object into celltable, created on the client side?
* on fieldUpdater when I get an editable object - should I insert it back to table or forget about it?
Thanks for any help.

am I creating too many context'es?
Yes.
You should have one context per HTTP request (per fire()), and a context that is not fire()d is useless (only do that if you/the user change your/his mind and don't want to, e.g., save your/his changes).
You actually have only one context to remove here (see below).
Note that your approach of saving on each field change can lead to "race conditions", because a proxy can be edit()ed by at most one context at a time, and it remains attached to a context until the server responds (and once a context is fired, the proxy is frozen –read-only– also until the server responds).
(this is not true in all cases: when onConstraintViolation is called, the context and its proxies are unfrozen so you can "fix" the constraint violations and fire the context again; this should be safe because validation is done on the server-side before any service method is called).
how to properly insert new object into celltable, created on the client side?
Your code looks OK, except that you should create your proxy in the same context as the one you'll use to persist it.
on fieldUpdater when I get an editable object - should I insert it back to table or forget about it?
I'm not 100% certain but I think you should refresh the table (something like setRowData(index, Collections.singletonList(object)))
BTW, the driver people mention is probably the RequestFactoryEditorDriver from the Editor framework. It won't help you here (quite the contrary actually).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is there a way to use Solr's streaming API with spring data solr? - spring-data

Related

How to implement near-real time autocompletion using lucene?

MongoDB and Large Datasets when using a Repository pattern

Is it possible to place variables into a resource path within a sling servlet?

smart gwt- how to use POJO as data source for combo-box/tab/labels

GWT RequestFactory + CellTable

Categories

Resources