elastic4s: score stays at 1 with rawQuery - scala

We're using elastic4s for ElasticSearch 2.2.0. A number of queries is stored as JSON on disk and used as rawQuery via the elastic4s driver. The score in the result differs between the query being submitted via command line or the elastic4s driver. The elastic4s driver always returns score of 1 for all results, while the command line execution yields two different scores (for different data types).
The code for elastic4s:
val searchResult = client.execute {
search in indexName types(product, company, orga, "User", "Workplace") rawQuery preparedQuery sourceInclude(preparedSourceField:_*) sort {sortDefintions:_*} start start limit limit
}.await
Note that I removed anything but rawQuery preparedQuery and it didn't change the score 1. The full query via the command line is quite long:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "${search}",
"fields": [
"name",
"abbreviation",
"articleNumberManufacturer",
"productLine",
"productTitle^10",
"productSubtitle",
"productDescription",
"manufacturerRef.name",
"props"
]
}
}
],
"filter": [
{
"or": [
{
"bool": {
"must": [
{
"type": {
"value": "Product"
}
},
{
"term": {
"publishState": "published"
}
}
],
"must_not": [
{
"term": {
"productType": "MASTER"
}
},
{
"term": {
"deleted": true
}
}
]
}
}
]
}
]
}
}
}
Note that this is almost preparedQuery but for the replacement of $search with the search query. The elastic search REST client returns a score of 3.075806 for the matches.

elastic4s rawQuery will wrap your rawQuery-JSON in another query object.
it's like you would query for
{ "query": { "query": {
"bool": {
"must": [
{
"multi_match": {
"query": "${search}",
...
just remove your wrapping "query" from you JSON and the response will show varying scores.
Alternatively you can try to use extraSource instead of rawQuery, like described in elastic4s docu. although it didn't work for me at all:
ErrorMessage:
value extraSource is not a member of com.sksamuel.elastic4s.SearchDefinition

Related

NEST is not returning values for an exact search

I am trying to build a dynamic query using NEST which is as under
string product = "goldpgl";
string agencyid = "1123";
ISearchResponse <ProductModel> res = client().Search<ProductModel>(s => s
.Index("proddata")
.From(0)
.Size(100)
.Query(q =>
+q.Term(p => p.product, product) &&
+q.Term(p => p.agencyid, agencyid)));
If I pass, product value = "GoldPGL" [ N.B.~ Real value in the index ], I am not able to find the result.
However, if I pass the value in lowercase like "goldpgl", it works.
Also, it does not work for values like "Gold - PGL" or "SOME OTHER LOAN".
My POCO is as under
public class ProductModel
{
public string product { get; set; }
public string agencyid { get; set; }
}
What is wrong and how to rectify this?
As you have not provided the mapping and search query, I am assuming its happening because you are using the term query not the match query.
Term queries are not analyzed means whatever you entered in your search query would be matched against the tokens in the index. And by default, all the text fields in Elasticsearch use the standard analyzer which converts tokens to lowercase. hence GoldPGL doesn't match while goldpgl matches in your term query.
While match query as explained the official document is analyzed query and the same analyzer is applied on the search term, which is applied at index time, hence GoldPGL as well as goldpgl converted to goldpgl and both the query matches the documents and same is with Gold - PGL which also matches and verifies by me.
Analyze API comes very handy to troubleshoot these types of issue, where search query doesn't match the indexed tokens and one example of how GOLDPGL would be analyzed is shown below:
POST /_analyze
{
"text": "GOLDPGL",
"analyzer" : "standard"
}
{ "token": "goldpgl",}
{
"text": "GOLD - PGL",
"analyzer" : "standard"
}
{
"token": "gold",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "pgl",
"start_offset": 7,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
I reproduced your issue and as I am not familiar with NEST, showing your example using the REST API.
Index Def
POST /
{
"mappings": {
"properties": {
"product": {
"type": "text"
}
}
}
}
Index some documents
POST //_doc/1
{
"product": "GoldPGL"
}
Index 2nd doc
{
"product": "Gold - PGL"
}
Now search query using term query(as shown in your example), doesn't return any result (when GoldPGL is used)
{
"query": {
"term": {
"product": {
"value": "GoldPGL"
}
}
}
}
When used goldgpl, it gives result
{
"query": {
"term": {
"product": {
"value": "goldpgl"
}
}
}
}
Result
"hits": [
{
"_index": "so-term-nest",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915,
"_source": {
"product": "GoldPGL"
}
}
]
Solution (use match query)
{
"query": {
"match" : {
"product" : {
"query" : "GoldPGL"
}
}
}
}
and this return results
"hits": [
{
"_index": "so-term-nest",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915,
"_source": {
"product": "GoldPGL"
}
}
]

How to create query in Elastic4s

I'm implement query in Elastic4s library. But I don't know how to implement a following Json query for Elasticsearch.
{
"bool": {
"must": [
{
"match_all": {}
},
{
"keywordQuery": "hogehoge"
}
]
}
}
I don't know how to implement this part of Json query.
{
"keywordQuery": "hogehoge"
}
This is a code I implemented halfway.
boolQuery().must(Seq(matchAllQuery(), query("{keywordQuery: hogehoge}")))
and this is an output of an above code.
{
"bool": {
"must": [
{
"match_all": {}
},
{ "queryString": {
"query": "{keywordQuery": "hogehoge}"
}
}
]
}
}
I expect
{
"keywordQuery": "hogehoge"
}
but actually
{ "queryString": {
"query": "{keywordQuery": "hogehoge}"
}
}
Would you help me please?
I can't find a reference to keywordQuery in the ElasticSearch DSL documentation at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl.html or https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl.html - maybe you need a Term query?
(on e.g. Logstash indices 'text' fields have a non-analysed subfield called '.keyword' so if I do a "keyword query" I normally do termQuery("field.keyword","value))
I don't think you need to include matchAllQuery() as it's kinda implied that you start off with the full set of results, so you could drop the bool and simplify the query to:
{
"query": {
"term": {
"field.keyword": "value"
}
}
}
In Elastic4s this would be:
client.execute {
termQuery("field.keyword", "value")
}

Forbid usage of the specifix index for the query

I have a mongodb collection with the following schema:
{
"description": "some arbitrary text",
"type": "TYPE", # there are a lot of different types
"status": "STATUS" # there are a few different statuses
}
I also have two indexes: for "type" and for "status".
Now I run a query:
db.obj.count({
type: { $in: ["SOME_TYPE"] },
status: { $ne: "SOME_STATUS" },
description: { $regex: /.*/ }
})
MongoDB chooses to use an index for "status", while "type" would be much better:
"query": {
"count": "obj",
"query": {
"description": Regex('.*', 2),
"status": {
"$ne": "SOME_STATUS"
},
"type": {
"$in": [
"SOME_TYPE"
]
}
}
},
"planSummary": "IXSCAN { status: 1 }"
I know I can use hint to specify an index to use, but I have different queries (which should use different indexes) and I can't annotate every one of them.
As far as I can see, a possible solution would be to forbid usage of "status" index for all queries that contain status: { $ne: "SOME_STATUS" } condition.
Is there a way to do it? Or maybe I want something weird and there is a better way?

Advanced elasticsearch query

I am using laravel 4.2, mongodb and elasticsearch. Below is a working code, I am trying to convert this advanced where queries to elasticsearch queries:
$products = Product::where(function ($query) {
$query->where (function($subquery1){
$subquery1->where('status', '=', 'discontinued')->where('inventory', '>', 0);
});
$query->orWhere (function($subquery2){
$subquery2->where('status', '<>', 'discontinued');
});
})->get();
All I can get so far is just returning discontinued products, the code below works but it is not what I need:
$must = [
['bool' =>
['should' =>
['term' =>
['status' => 'discontinued']
]
]
]
];
Can you show me how can I achieve the same query I first described above but in elasticsearch? I want to return discontinued products with inventory, then also return products that are not equal to discontinued.
The WHERE query you've described can be expressed in SQL like this
... WHERE (status = discontinued AND inventory > 0)
OR status <> discontinued
In Elasticsearch Query DSL, this can be expressed like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"status": "discontinued"
}
},
{
"range": {
"inventory": {
"gt": 0
}
}
}
]
}
},
{
"bool": {
"must_not": [
{
"term": {
"status": "discontinued"
}
}
]
}
}
]
}
}
}
}
}
Translating this query into PHP should now be straightforward. Give it a try.

Are there restrictions on MongoDB collections property names?

I've a document structure wich contains a property named shares which is an array of objects.
Now I tried to match all documents where shared contains the matching _account string with dot notation (shares._account).
It's not working but it seems it's because of the _ char in front of property _account.
So if I put the string to search for inside the name property in that object everything works fine with dot notation.
Are there any limitations on property names?
Thought an _ is allowed because the id has it also in mongodb and for me it's a kind of convention to daclare bindings.
Example:
// Collection Item example
{
"_account": { "$oid" : "526fd2a571e1e13b4100000c" },
"_id": { "$oid" : "5279456932db6adb60000003" },
"name": "shared.jpg",
"path": "/upload/24795-4ui95s.jpg",
"preview": "/img/thumbs/24795-4ui95s.jpg",
"shared": false,
"shares": [
{
"name": "526fcb177675f27140000001",
"_account": "526fcb177675f27140000001"
},
{
"name": "tim",
"_account": "526fd29871e1e13b4100000b"
}
],
"tags": [
"grüngelb",
"farbe"
],
"type": "image/jpeg"
},
I tried to get the item with following query:
// Query example
{
"$or": [
{
"$and": [
{
"type": {
"$in": ["image/jpeg"]
}
}, {
"shares._account": "526fcb177675f27140000001" // Not working
//"shares.name": "526fcb177675f27140000001" // Working
}
]
}
]
}
Apart from the fact that $and can be omitted and $or is pointless "image/jpeg" != "image/jpg":
db.foo.find({
"type": {"$in": ["image/jpeg"]},
"shares._account": "526fcb177675f27140000002"
})
Or if you really want old one:
db.foo.find({
"$or": [
{
"$and": [
{
"type": {
"$in": ["image/jpeg"]
}
}, {
"shares._account": "526fcb177675f27140000002"
}
]
}
]
}
Both will return example document.
Your current query has some unnecessarily complicated constructs:
you don't need the $or and $and clauses ("and" is the default behaviour)
you are matching a single value using $in
The query won't find match the sample document because your query doesn't match the data:
the type field you are looking for is "image/jpg" but your sample document has "image/jpeg"
the shares._account value you are looking for is "526fcb177675f27140000001" but your sample document doesn't include this.
A simplified query that should work is:
db.shares.find(
{
"type": "image/jpeg",
"shares._account": "526fcb177675f27140000002"
}
)