JPA Criteria API - possible to do a prefixed, tokenized search with wildcards? - jpa

We have a problem that at the moment we are not allowed to use ElasticSearch, so we need to implement a search function with MySQL. One desired feature is a prefixed, tokenized search, so a sentence like
"The quick brown fox jumped over the lazy dog"
could be findable when you search for "jump". I think I would need to define a rule like (pseudocode):
(*)(beginning OR whitespace)(prefix)(*)
I assume it is possible to do that with JPA (Criteria API)? But what if we have two terms? All of them have to be combined by AND, e.g. the above rule should result in TRUE for both terms in at least one column. That means "jump fox" would result in a hit, but "jump rabbit" would not. Is that also possible with Criteria API?
Or do you know a better solution than Criteria API? I heard Hibernate can do LIKE queries more elegantly (with less code) but unfortunately we use EclipseLink.
Based on the answer below here is my full solution. It's all in one method to keep it simple here ("simple JPA criteria API" is an oxymoron though). If anyone wants to use it, consider some refactoring
public List<Customer> findMatching(String searchPhrase) {
List<String> searchTokens = TextService.splitPhraseIntoNonEmptyTokens(searchPhrase);
if (searchTokens.size() < 1 || searchTokens.size() > 5) { // early out and denial of service attack prevention
return new ArrayList<>();
}
CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
CriteriaQuery<Customer> criteriaQuery = criteriaBuilder.createQuery(Customer.class);
Root<Customer> rootEntity = criteriaQuery.from(Customer.class);
Predicate[] orClausesArr = new Predicate[searchTokens.size()];
for (int i = 0; i < searchTokens.size() ; i++) {
// same normalization methods are used to create the indexed searchable data
String assumingKeyword = TextService.normalizeKeyword(searchTokens.get(i));
String assumingText = TextService.normalizeText(searchTokens.get(i));
String assumingPhoneNumber = TextService.normalizePhoneNumber(searchTokens.get(i));
String assumingKeywordInFirstToken = assumingKeyword + '%';
String assumingTextInFirstToken = assumingText + '%';
String assumingPhoneInFirstToken = assumingPhoneNumber + '%';
String assumingTextInConsecutiveToken = "% " + assumingText + '%';
Predicate query = criteriaBuilder.or(
criteriaBuilder.like(rootEntity.get("normalizedCustomerNumber"), assumingKeywordInFirstToken),
criteriaBuilder.like(rootEntity.get("normalizedPhone"), assumingPhoneInFirstToken),
criteriaBuilder.like(rootEntity.get("normalizedFullName"), assumingTextInFirstToken),
// looking for a prefix after a whitespace:
criteriaBuilder.like(rootEntity.get("normalizedFullName"), assumingTextInConsecutiveToken)
);
orClausesArr[i] = query;
}
criteriaQuery = criteriaQuery
.select(rootEntity) // you can also select only the display columns and ignore the normalized/search columns
.where(criteriaBuilder.and(orClausesArr))
.orderBy(
criteriaBuilder.desc(rootEntity.get("customerUpdated")),
criteriaBuilder.desc(rootEntity.get("customerCreated"))
);
try {
return entityManager
.createQuery(criteriaQuery)
.setMaxResults(50)
.getResultList();
} catch (NoResultException nre) {
return new ArrayList<>();
}
}

The Criteria API is certainly not intended for this but it can be used to create LIKE predicates.
So for each search term and each column you want to search you would create something like the following:
column like :term + '%'
or column like ' ' + :term + '%'
or column like ',' + :term + '%'
// repeat for all other punctuation marks and forms of whitespace you want to support.
This will create horribly inefficient queries!
I see the following alternatives:
Use database specific features. Some databases have some text search capabilities.
If you can limit your application to one or few databases that might work.
Create your own index: Use a proper tokenizer to analyze the columns you want to search and put the resulting tokens in a separate table with backreferences to the original table.
Now search that for the terms you are looking for.
As long as you do only prefixed searches database indexes should be able to keep this reasonable efficient and it is easier to maintain and more flexible than what you can obtain by using the Criteria API on its own.

Related

Doing subqueries in Mybatis, or query recursively the selected values

UPDATE:
I understood that the solution to my problem is doing subqueries, which apply a different filter each time, and they have a reduced result set. But I can't find a way to do that in MyBatis logic. Here is my query code
List<IstanzaMetadato> res = null;
SqlSession sqlSession = ConnectionFactory.getSqlSessionFactory().openSession(true);
try {
IstanzaMetadatoMapper mapper = sqlSession.getMapper(IstanzaMetadatoMapper.class);
IstanzaMetadatoExample example = new IstanzaMetadatoExample();
Iterator<Map.Entry<Integer, String>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, String> entry = it.next();
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue());
}
example.setDistinct(true);
res = mapper.selectByExample(example);
I need to execute a new selectByExample but inside the while cycle, and it has to query the previus "SELECTED" results....
Is there a Solution ?
ORIGINAL QUESTION:
I have this table structure
I have to select rows from the table with different filters, specified by the final user.
Those filters are specified by a couple (id_metadato, valore), in example you can have id_metadato = 3 and valore = "pippo";
the user can specify 0-n filters from the web page typing 0-n values inside the search boxes which are based on id_metadato
Obviusly, the more filters the users specifies, the more restriction would have the final query.
In example if the user fills only the first search box, the query will have only a filter and would provide all the rows that will have the couple (id_metadato, valore) specified by the user.
If he uses two search boxes, than the query will have 2 filters, and it will provide all the rows that verify the first condition AND the second one, after the "first subquery" is done.
I need to do this dinamically, and in the best efficient way. I can't simply add AND clause to my query, they have to filter and reduce the result set every time.
I can't do 0-n subqueries (Select * from ... IN (select * from ....) ) efficiently.
Is there a more elegant way to do that ? I'm reading dynamic SQL queries tutorials with MyBatis, but I'm not sure that is the correct way. I'm still trying to figure out the logic of the resosultio, then I will try to implement with MyBatis.
Thanks for the answers
MyBatis simplified a lot this process of nesting subqueries, it was sufficient to concatenate the filter criterias and to add
the excerpt of the code is the following
try {
IstanzaMetadatoMapper mapper = sqlSession.getMapper(IstanzaMetadatoMapper.class);
IstanzaMetadatoExample example = new IstanzaMetadatoExample();
Iterator<Map.Entry<Integer, String>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, String> entry = it.next();
if (listaIdUd.isEmpty()) {
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue());
example.setDistinct(true);
listaIdUd = mapper.selectDynamicNested(example);
continue;
}
example.clear();
example.createCriteria().andIdMetadatoEqualTo(entry.getKey()).andValoreEqualTo(entry.getValue()).andIdUdIn(listaIdUd);
example.setDistinct(true);
listaIdUd = mapper.selectDynamicNested(example);
}

How to implement search using Query Builder API for partial search text in CQ/AEM

I have a requirement to fetch search results based on partial text match. For example, if there is a node under products say "apple-iphone-6" and the user enters "iphone" text in the searchbox, I should still be able to fetch the result.
I tried the below query on querybuilder and it worked:
http://localhost:4502/bin/querybuilder.json?path=/etc/commerce/products&type=nt:unstructured&nodename=*iphone*
But, how to implement this programatically for the *iphone* part? I am creating a query using the predicates as follows
String searchTerm = "iphone";
map.put("path", "/etc/commerce/products");
map.put("type", "nt:unstructured");
map.put("nodename", searchTerm);
Query query = queryBuilder.createQuery(PredicateGroup.create(map), session);
SearchResult result = query.getResult();
But I do not get any results, reason being, the node name(apple-iphone-6) does not exactly match the search term (iphone).
But the same thing works fine in case I append * to the nodename value which then implements partial text based search in the querybuilder example. What change should I do in the code to get results based on partial node name matches?
You already have found the solution on your own, the NodenamePredicateEvaluator accepts wildcard arguments, so you would need to surround the search term with wildcards, for example like this:
String searchTerm = "iphone";
...
map.put("nodename", "*" + searchTerm + "*");
in this case "like" opration can be used:
EX-> patial text serach for jcr:title
map.put("group.1_property", "fn:lower-case(#jcr:content/jcr:title)");
map.put("group.1_property.value", "%"+fulltextSearchTerm + "%");
map.put("group.1_property.operation", "like");
For just the nodename the answer posted is correct, but if you want to search inside properties as well then :
map.put("fulltext","*"+searchTetm +"*");
map.put("fulltext.relPath","jcr:content");

Count in jpa without getting result [duplicate]

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").
So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.
Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.
oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

NumericRangeQuery in NHibernate.Search

I am creating a search, where the user can both choose an interval and search on a term in the same go.
This is however giving me trouble, since I have up until have only used the usual text query.
I am wondering how I am to go about using both a NumericRangeQuery and a regular term query. Usually I would use a query below:
var parser = new MultiFieldQueryParser(
new[] { "FromPrice", "ToPrice", "Description"}, new SimpleAnalyzer());
Query query = parser.Parse(searchQuery.ToString());
IFullTextSession session = Search.CreateFullTextSession(this.Session);
IQuery fullTextQuery = session.CreateFullTextQuery(query, new[] { typeof(MyObject) });
IList<MyObject> results = fullTextQuery.List<MyObject>();
But if I was to e.g. search the range FromPrice <-> ToPrice and also the description, how should I do this, since session.CreateFullTextQuery only takes one Query object?
you can create a single query that is a BooleanQuery combining all the conditions you want to be met.
For the ranges, heres a link to the synthax using the QueryParser:
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_2/queryparsersyntax.html#Range Searches

Tuple result Criteria API subquery

I am trying to use subqueries in an application I am writing using JPA 2.0 type-safe criteria API, with Hibernate 3.6.1.Final as my provider. I have no problem selecting primitive types (Long, MyEntity, etc.), but I want to select multiple columns.
Here's an example of something completely reasonable. Ignore the needless use of subquery -- it is simply meant as illustrative.
EntityManager em = getEntityManager();
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Tuple> cq = cb.createTupleQuery();
Subquery<Tuple> subQ = cq.subquery(Tuple.class);
Expression<Long> subqCount;
{
Root<MyEntity> root = subQ.from(MyEntity.class);
Path<MyEntity> filter = root.get(MyEntity.challenge);
subqCount = cb.count(root);
// How to select tuple?
Selection<Tuple> tuple = cb.tuple(filter, subqCount);
// !! Run-time exception here
Expression<Tuple> tupleExpr = (Expression<Tuple>) tuple;
// Not sure why I can't use multiSelect on a subQuery
// #select only accepts Expression<Tuple>
createSubQ.select(tupleExpr);
createSubQ.groupBy(filter);
}
cq.multiselect(subqCount);
Although the compiler doesn't complain, I still get a run-time exception.
java.lang.ClassCastException: org.hibernate.ejb.criteria.expression.CompoundSelectionImpl cannot be cast to javax.persistence.criteria.Expression
Is this a bug in hibernate, or am I doing something wrong?
If you can't use multiselect on a subquery, then how can you perform a groupBy?
If you can't use groupBy on a subquery, why is it in the API?
I have the same problem.
I can only attempt to answer your last question by saying you can only really use sub queries to perform very simple queries like:
SELECT name FROM Pets WHERE Pets.ownerID in (
SELECT ID FROM owners WHERE owners.Country = "SOUTH AFRICA"
)
The other thing I wanted to say was how much this incident reminds me of xkcd #979.
I had similar problem.
I had specification, and I wanted to get ids of objects matching this specification.
My solution:
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<Tuple> tupleCriteriaQuery = criteriaBuilder.createTupleQuery();
Root<Issue> root = tupleCriteriaQuery.from(Issue.class);
tupleCriteriaQuery = tupleCriteriaQuery.multiselect(root.get(IssueTable.COLUMN_ID));//select did not work.
tupleCriteriaQuery = tupleCriteriaQuery.where(issueFilter.toPredicate(root, tupleCriteriaQuery, criteriaBuilder));
List<Tuple> tupleResult = em.createQuery(tupleCriteriaQuery).getResultList();
First I select columns (In my case I need only one column), and then I call where method to merge with my given specification.