How to implement near-real time autocompletion using lucene? - autocomplete

Lucene offers different Autocompletion options:
org.apache.lucene.search.suggest.Lookup
I was using the AnalyzingSuggester which is good but it does not support changing data, i.e. when the index changes one needs to reindex everything.
Therefore I tries out the AnalyzingInfixSuggester. This has and add method and an update method but no remove.
Does someone know if it is possible to implement near-real time suggestions with pure lucene?

I do not know why this is not part of the public implementation. At the end I extended the AnalyzingInfixSuggester like this:
public class MyAnalyzingInfixSuggester extends AnalyzingInfixSuggester {
public MyAnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws IOException {
super(dir, analyzer);
}
public void remove(String text) throws IOException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {
// call method ensureOpen via reflection since it is private
Method method = AnalyzingInfixSuggester.class.getDeclaredMethod("ensureOpen");
method.setAccessible(true);
method.invoke(this);
Query query1 = new TermQuery(new Term(TEXT_FIELD_NAME, text.toLowerCase()));
BooleanQuery booleanQuery = new BooleanQuery.Builder()
.add(query1, BooleanClause.Occur.MUST)
.build();
writer.deleteDocuments(booleanQuery);
}
}

Related

#Transactional and serializable level problem

I have a problem with isolation levels in JPA. For example I have following code:
#Transactional(isolation = Isolation.SERIALIZABLE)
public void first() {
Obj obj = new Obj();
obj.setName("t");
objDAO.save(obj);
second();
}
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public void second(){
List<Obj> objs = objDAO.findAll();
}
In my opinion the second method should not see uncomitted changes from method first. So new object with name "t" should not be visible till commit (but it is).
If I am wrong, then please, give me example in JPA where it won't be visible. Many thanks for any advice.
If your methods are inside one class it will not work because container will treat this as a single transaction. The container doesn't know that you want to create new transaction.
From the Spring reference:
Note: In proxy mode (which is the default), only 'external' method calls coming in through the proxy will be intercepted. This means that 'self-invocation', i.e. a method within the target object calling some other method of the target object, won't lead to an actual transaction at runtime even if the invoked method is marked with #Transactional!
If you want to call method second() in new transaction you can try this:
#Autowired
private ApplicationContext applicationContext;
#Transactional(isolation = Isolation.SERIALIZABLE)
public void first() {
Obj obj = new Obj();
obj.setName("t");
objDAO.save(obj);
applicationContext.getBean(getClass()).second();
}
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public void second(){
List<Obj> objs = objDAO.findAll();
}

Specify connection string for a query with DbContextScope project

I am currently using Mehdi El Gueddari's DbContextScope project, I think by the book, and it's awesome. But I came across a problem I'm unsure how to solve today. I have a query that I need to execute using a different database login/user because it requires additional permissions. I can create another connection string in my web.config, but I'm not sure how to specify that for this query, I want to use this new connection string. Here is my usage:
In my logic layer:
private static IDbContextScopeFactory _dbContextFactory = new DbContextScopeFactory();
public static Guid GetFacilityID(string altID)
{
...
using (_dbContextFactory.CreateReadOnly())
{
entity = entities.GetFacilityID(altID)
}
}
That calls into my data layer which would look something like this:
private AmbientDbContextLocator _dbcLocator = new AmbientDbContextLocator();
protected CRMEntities DBContext
{
get
{
var dbContext = _dbcLocator.Get<CRMEntities>();
if (dbContext == null)
throw new InvalidOperationException("No ambient DbContext....");
return dbContext;
}
}
public virtual Guid GetFaciltyID(string altID)
{
return DBContext.Set<Facility>().Where(f => f.altID = altID).Select(f => f.ID).FirstOrDefault();
}
Currently my connection string is set in the default way:
public partial class CRMEntities : DbContext
{
public CRMEntities()
: base("name=CRMEntities")
{}
}
Is it possible for this specific query to use a different connection string and how?
I ended up modifying the source code in a way that feels slightly hacky, but is getting the job done for now. I created a new IAmbientDbContextLocator with a Get<TDbContext> method override that accepts a connection string:
public TDbContext Get<TDbContext>(string nameOrConnectionString) where TDbContext : DbContext
{
var ambientDbContextScope = DbContextScope.GetAmbientScope();
return ambientDbContextScope == null ? null : ambientDbContextScope.DbContexts.Get<TDbContext>(nameOrConnectionString);
}
Then I updated the DbContextCollection to pass this parameter to the DbContext's existing constructor overload. Last, I updated the DbContextCollection maintain a Dictionary<KeyValuePair<Type, string>, DbContext> instead of a Dictionary<Type, DbContext> as its cached _initializedDbContexts where the added string is the nameOrConnectionString param. So in other words, I updated it to cache unique DbContext type/connection string pairs.
Then I can get at the DbContext with the connection I need like this:
var dbContext = new CustomAmbientDbContextLocator().Get<CRMEntities>("name=CRMEntitiesAdmin");
Of course you'd have to be careful your code doesn't end up going through two different contexts/connection strings when it should be going through the same one. In my case I have them separated into two different data access class implementations.

MongoDB and Large Datasets when using a Repository pattern

Okay so at work we are developing a system using MVC C# & MongoDB. When first developing we decided it would probably be a good idea to follow the Repository pattern (what a pain in the ass!), here is the code to give an idea of what is currently implemented.
The MongoRepository class:
public class MongoRepository { }
public class MongoRepository<T> : MongoRepository, IRepository<T>
where T : IEntity
{
private MongoClient _client;
private IMongoDatabase _database;
private IMongoCollection<T> _collection;
public string StoreName {
get {
return typeof(T).Name;
}
}
}
public MongoRepository() {
_client = new MongoClient(ConfigurationManager.AppSettings["MongoDatabaseURL"]);
_database = _client.GetDatabase(ConfigurationManager.AppSettings["MongoDatabaseName"]);
/* misc code here */
Init();
}
public void Init() {
_collection = _database.GetCollection<T>(StoreName);
}
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>();
}
}
The IRepository interface class:
public interface IRepository { }
public interface IRepository<T> : IRepository
where T : IEntity
{
string StoreNamePrepend { get; set; }
string StoreNameAppend { get; set; }
IQueryable<T> SearchFor();
/* misc code */
}
The repository is then instantiated using Ninject but without that it would look something like this (just to make this a simpler example):
MongoRepository<Client> clientCol = new MongoRepository<Client>();
Here is the code used for the search pages which is used to feed into a controller action which outputs JSON for a table with DataTables to read. Please note that the following uses DynamicLinq so that the linq can be built from string input:
tmpFinalList = clientCol
.SearchFor()
.OrderBy(tmpOrder) // tmpOrder = "ClientDescription DESC"
.Skip(Start) // Start = 99900
.Take(PageLength) // PageLength = 10
.ToList();
Now the problem is that if the collection has a lot of records (99,905 to be exact) everything works fine if the data in a field isn't very large for example our Key field is a 5 character fixed length string and I can Skip and Take fine using this query. However if it is something like ClientDescription can be much longer I can 'Sort' fine and 'Take' fine from the front of the query (i.e. Page 1) however when I page to the end with Skip = 99900 & Take = 10 it gives the following memory error:
An exception of type 'MongoDB.Driver.MongoCommandException' occurred
in MongoDB.Driver.dll but was not handled in user code
Additional information: Command aggregate failed: exception: Sort
exceeded memory limit of 104857600 bytes, but did not opt in to
external sorting. Aborting operation. Pass allowDiskUse:true to opt
in..
Okay so that is easy to understand I guess. I have had a look online and mostly everything that is suggested is to use Aggregation and "allowDiskUse:true" however since I use IQueryable in IRepository I cannot start using IAggregateFluent<> because you would then need to expose MongoDB related classes to IRepository which would go against IoC principals.
Is there any way to force IQueryable to use this or does anyone know of a way for me to access IAggregateFluent without going against IoC principals?
One thing of interest to me is why the sort works for page 1 (Start = 0, Take = 10) but then fails when I search to the end ... surely everything must be sorted for me to be able to get the items in order for Page 1 but shouldn't (Start = 99900, Take = 10) just need the same amount of 'sorting' and MongoDB should just send me the last 5 or so records. Why doesn't this error happen when both sorts are done?
ANSWER
Okay so with the help of #craig-wilson upgrading to the newest version of MongoDB C# drivers and changing the following in MongoRepository will fix the problem:
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>(new AggregateOptions { AllowDiskUse = true });
}
I was getting a System.MissingMethodException but this was caused by other copies of the MongoDB drivers needing updated as well.
When creating the IQueryable from an IMongoCollection, you can pass in the AggregateOptions which allow you to set AllowDiskUse.
https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver/IMongoCollectionExtensions.cs#L53

Is there a way to use Solr's streaming API with spring data solr?

I have a use case where I need to fetch the ids of my entire solr collection. For that, with solrj, I use the Streaming API like this :
CloudSolrServer server = new CloudSolrServer("zkHost1:2181,zkHost2:2181,zkHost3:2181");
SolrQuery query = new SolrQuery("*:*");
server.queryAndStreamResponse(tmpQuery, handler);
Where handler is a class that implements StreamingResponseCallback, ommited in my code for brevity.
Now, the Spring data repositories abstraction give me the ability to search by pages, by cursors, but I can't seem to find a way to handle the streaming use case.
Is there a workaround ?
SolrTemplate allows to access the underlying SolrClient in a callback style. So you could use that one to work around the current limitations.
The result conversion using the MappingSolrConverter available via the SolrTemplate is broken at the moment (I need to check why) - but you get the idea of how to do it.
solrTemplate.execute(new SolrCallback<Void>() {
#Override
public Void doInSolr(SolrClient solrClient) throws SolrServerException, IOException {
SolrQuery sq = new SolrQuery("*:*");
solrClient.queryAndStreamResponse("collection1", sq, new StreamingResponseCallback() {
#Override
public void streamSolrDocument(SolrDocument doc) {
// the bean conversion fails atm
// ExampleSolrBean bean = solrTemplate.getConverter().read(ExampleSolrBean.class, doc);
System.out.println(doc);
}
#Override
public void streamDocListInfo(long numFound, long start, Float maxScore) {
// do something useful
}
});
return null;
}
});

QueryBuider get parameters for Dao.queryRaw

I'm using QueryBuider to create raw query, but I need to fill parameters to raw query manually.
Properties 'from' and 'to' are filled two times. One in 'where' section of QueryBuider, and one in queryRaw method as parameters.
Method StatementBuilder.prepareStatementString() returns query string with "?" for substitution.
Is there any way to get these parameters directly from QueryBuider instance?
For example, imagine a new method in ormlite - StatementBuilder.getPreparedStatementParameters();
QueryBuilder<AccountableItemEntity, Long> accountableItemQb = accountableItemDao.queryBuilder();
QueryBuilder<AccountingEntryEntity, Long> accountingEntryQb = accountingEntryDao.queryBuilder();
accountingEntryQb.where().eq(
AccountingEntryEntity.ACCOUNTING_ENTRY_STATE_FIELD_NAME,
AccountingEntryStateEnum.CREATED);
accountingEntryQb.join(accountableItemQb);
QueryBuilder<AccountingTransactionEntity, Long> accountingTransactionQb =
accountingTransactionDao.queryBuilder();
accountingTransactionQb.selectRaw("ACCOUNTINGENTRYENTITY.TITLE, " +
"ACCOUNTINGENTRYENTITY.ACCOUNTABLE_ITEM_ID, " +
"SUM(ACCOUNTINGENTRYENTITY.COUNT), " +
"SUM(ACCOUNTINGENTRYENTITY.COUNT * CONVERT(ACCOUNTINGENTRYENTITY.PRICEAMOUNT,DECIMAL(20, 2)))");
accountingTransactionQb.join(accountingEntryQb);
accountingTransactionQb.where().eq(
AccountingTransactionEntity.ACCOUNTING_TRANSACTION_STATE_FIELD_NAME,
AccountingTransactionStateEnum.PRINTED)
.and().between(AccountingTransactionEntity.CREATE_TIME_FIELD_NAME, from, to);
accountingTransactionQb.groupByRaw(
"ACCOUNTINGENTRYENTITY.ACCOUNTABLE_ITEM_ID, ACCOUNTINGENTRYENTITY.TITLE");
String query = accountingTransactionQb.prepareStatementString();
accountingTransactionQb.prepare().getStatement();
Timestamp fromTimestamp = new Timestamp(from.getTime());
Timestamp toTimestamp = new Timestamp(to.getTime());
//TODO: get parameters from accountingTransactionQb
GenericRawResults<Object[]> genericRawResults =
accountingEntryDao.queryRaw(query, new DataType[] { DataType.STRING,
DataType.LONG, DataType.LONG, DataType.BIG_DECIMAL },
fromTimestamp.toString(), toTimestamp.toString());
Is there any way to get these parameters directly from QueryBuider instance?
Yes, there is a way. You need to subclass QueryBuilder and then you can use the appendStatementString(...) method. You provide the argList which then can be used to get the list of arguments.
protected void appendStatementString(StringBuilder sb,
List<ArgumentHolder> argList) throws SQLException {
appendStatementStart(sb, argList);
appendWhereStatement(sb, argList, true);
appendStatementEnd(sb, argList);
}
For example, imagine a new method in ormlite - StatementBuilder.getPreparedStatementParameters();
Good idea. I've made the following changes to the Github repo.
public StatementInfo prepareStatementInfo() throws SQLException {
List<ArgumentHolder> argList = new ArrayList<ArgumentHolder>();
String statement = buildStatementString(argList);
return new StatementInfo(statement, argList);
}
...
public static class StatementInfo {
private final String statement;
private final List<ArgumentHolder> argList;
...
The feature will be in version 4.46. You can build a release from current trunk if you don't want to wait for that release.