Using unicode characters in Elasticsearch synonyms - unicode

I am trying to setup elasticsearch index using synonyms and almost succeeded it. My configuration of index:
{
"index": {
"analysis": {
"analyzer": {
"syns": {
"filter": [
"standard",
"lowercase",
"syns_filter"
],
"type": "custom",
"tokenizer": "standard"
}
},
"filter": {
"syns_filter": {
"type": "synonym",
"synonyms": ["Киев , Kyiv", "jee,java"],
}
}
}
}
}
Only thing I could not solve is that it worked for jee and searches result output same results as for java, but does not work for Kyiv.

Related

Highlight characters within words in Opensearch query

I have set up a custom analyser that uses an edge_ngram filter for a text field. I'm then trying to highlight the characters a user types but opensearch is highlighting the entire word, even if only a small number of characters have been typed.
E.g. Typing "Man" in the search bar will result in the word "Manly" being highlighted. <em>Manly</em> Trail Running Tour. What I really want is <em>Man</em>ly Trail Running Tour.
This should be possible with the fvh highlighting type and chars as the boundary_scanner argument per the docs https://opensearch.org/docs/2.1/opensearch/search/highlight/#highlighting-options
Settings
"title_autocomplete": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "autocomplete"
}
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"edge_ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
Query:
{
"track_total_hits": true,
"highlight": {
"type": "fvh",
"boundary_scanner": "chars",
"fields": {
"title_autocomplete": {}
}
},
"size": 6,
"query": {
"multi_match": {
"query": "Man",
"fields": [
"title_autocomplete^2"
]
}
}
}

Gentics Mesh - Multilanguage support - Cross language in a list of node - GraphQL query

Gentics Mesh Version : v1.5.1
Intro:
Let suppose we have schema A with a field of type: list and list type: node and allowed schemas: B. (see (1)).
An instance of B node has been created (b1-EN) in language en and (b1-DE) in de.
An instance of B node has been created (b2-EN) in languages en.
An instance of A node has been created (a1-DE) in language de and b1-DE and b2-EN are added in the node list (Bs) of a1.
As result, when selecting de language in the Gentics Mesh CMS, Node a1-DE (de) has a list of 2 nodes b1-DE, b2-EN.
When the following GraphQL query is applied :
{
node(path: "/a1-DE") {
... on A {
path
uuid
availableLanguages
fields {
Bs {
... on B {
path
fields {
id
}
}
}
}
}
}
}
The result is :
{
"data": {
"node": {
"path": "/a1-DE",
"uuid": "30dfd534cdee40dd8551e6322c6b1518",
"availableLanguages": [
"de"
],
"fields": {
"Bs": [
{
"path": "/b1-DE",
"fields": {
"id": "b1-DE"
}
},
{
"path": null,
"fields": null
}
]
}
}
}
}
Question:
Why the result is not showing the b2-EN node in the list of nodes ? Is the query wrong ? What I would like to get as result is the default language version of the node (b2-EN) because the b2-DE is not contributed yet. so the expected result :
{
"data": {
"node": {
"path": "/a1-DE",
"uuid": "30dfd534cdee40dd8551e6322c6b1518",
"availableLanguages": [
"de"
],
"fields": {
"Bs": [
{
"path": "/b1-DE",
"fields": {
"id": "b1-DE"
}
},
{
"path": "/b2-EN",
"fields": {
"id": "b2-EN"
}
}
]
}
}
}
}
In the documentation (2):
The fallback to the configured default language will be applied if no other matching content found be found. Null will be returned if this also fails.
Can someone enlighten me ?
(1): Schema
{
"name": "A",
"container": false,
"autoPurge": false,
"displayField": "id",
"segmentField": "id",
"urlFields": [
"id"
],
"fields": [
{
"name": "Bs",
"type": "list",
"label": "Bs",
"required": false,
"listType": "node",
"allow": [
"B"
]
},
{
"name": "id",
"type": "string",
"label": "id",
"required": true
}
]
}
(2) https://getmesh.io/docs/graphql/#_multilanguage_support
There are some known issues and inconsistent behaviour when loading nodes via GraphQL. See this issue: https://github.com/gentics/mesh/issues/971
In your case, the queried list of nodes will always be in the configured default language (in mesh.yml). In your case this seems to be de. This is why the English-only node yields no result.
Until this is fixed, you can work around this issue by loading all languages of the node list:
{
node(path: "/a1-DE") {
... on A {
path
uuid
availableLanguages
fields {
Bs {
... on B {
languages {
path
language
fields {
id
}
}
}
}
}
}
}
}
You will the contents of all languages of the node list. This means that you will have to filter for the desired language in your code after receiving the response.

JSON Schema - can array / list validation be combined with anyOf?

I have a json document I'm trying to validate with this form:
...
"products": [{
"prop1": "foo",
"prop2": "bar"
}, {
"prop3": "hello",
"prop4": "world"
},
...
There are multiple different forms an object may take. My schema looks like this:
...
"definitions": {
"products": {
"type": "array",
"items": { "$ref": "#/definitions/Product" },
"Product": {
"type": "object",
"oneOf": [
{ "$ref": "#/definitions/Product_Type1" },
{ "$ref": "#/definitions/Product_Type2" },
...
]
},
"Product_Type1": {
"type": "object",
"properties": {
"prop1": { "type": "string" },
"prop2": { "type": "string" }
},
"Product_Type2": {
"type": "object",
"properties": {
"prop3": { "type": "string" },
"prop4": { "type": "string" }
}
...
On top of this, certain properties of the individual product array objects may be indirected via further usage of anyOf or oneOf.
I'm running into issues in VSCode using the built-in schema validation where it throws errors for every item in the products array that don't match Product_Type1.
So it seems the validator latches onto that first oneOf it found and won't validate against any of the other types.
I didn't find any limitations to the oneOf mechanism on jsonschema.org. And there is no mention of it being used in the page specifically dealing with arrays here: https://json-schema.org/understanding-json-schema/reference/array.html
Is what I'm attempting possible?
Your general approach is fine. Let's take a slightly simpler example to illustrate what's going wrong.
Given this schema
{
"oneOf": [
{ "properties": { "foo": { "type": "integer" } } },
{ "properties": { "bar": { "type": "integer" } } }
]
}
And this instance
{ "foo": 42 }
At first glance, this looks like it matches /oneOf/0 and not oneOf/1. It actually matches both schemas, which violates the one-and-only-one constraint imposed by oneOf and the oneOf fails.
Remember that every keyword in JSON Schema is a constraint. Anything that is not explicitly excluded by the schema is allowed. There is nothing in the /oneOf/1 schema that says a "foo" property is not allowed. Nor does is say that "foo" is required. It only says that if the instance has a keyword "foo", then it must be an integer.
To fix this, you will need required and maybe additionalProperties depending on the situation. I show here how you would use additionalProperties, but I recommend you don't use it unless you need to because is does have some problematic properties.
{
"oneOf": [
{
"properties": { "foo": { "type": "integer" } },
"required": ["foo"],
"additionalProperties": false
},
{
"properties": { "bar": { "type": "integer" } },
"required": ["bar"],
"additionalProperties": false
}
]
}

elasticsearch ngram and postgresql trigram search results are not match

I've crereated an index on elasticsearch same as bellow:
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"issue": {
"properties": {
"description": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
My test items are bellow:
"alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"
"otomatik onay işlemi gecikmiş"
"************* nolu iade islemi urun kargoya verilmedi zamaninda iade islemlerinde urun erorr hata veriyor"
I've test this index with bellow query:
GET issue/_search
{
"query": {
"match": {
"description":{
"query": "otomatik onay istemi zamaninda gerceklesmemis"
}
}
}
}
And result:
{
....
"hits": {
....
"max_score": 2.3507352,
"hits": [
{
....
"_score": 2.3507352,
"_source": {
"issue_id": "*******",
"description": "alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"
}
}
]
}
}
But same data on postgresql with bellow SQL response another result:
SELECT
public.tbl_issue_descriptions_big.description,
similarity(description, 'otomatik onay islemi zamaninda gerceklesmemis') AS sml
FROM
public.tbl_issue_descriptions_big
WHERE
description %'otomatik onay islemi zamaninda gerceklesmemis'
ORDER BY
sml DESC
LIMIT 10
Result is:
description | sml
======================================================|======
otomatik onay islemi gecikmis |0,351852
Why is this difference caused?
I dont know enough about postgres to give a qualified answer there (as this also depends on the documents that are indexed and if they scoring formulas are exactly the same, which I doubt), but Elasticsearch has an explain API and an explain parameter in the search, that help you to find out why a certain document was scored this way.

How to create google datastore composite indices via REST API?

I am trying to change the order of my results but I keep getting an error saying You need an index to execute this query.
In my console, I doesn't say that any indices exist, but I set most of the indexed options to true.
I know in Java, I can create indices that relate to multiple properties either ascending or descending, how do I do this with the REST API?
Following the REST API docs for Google Datastore, my entities are created like this:
{
"mode": "TRANSACTIONAL",
"transaction": "Eb2wksWfYDjkGkkABRmGMQ_vKGijwNwm-tbxAbUPRt8N2RaUCynjSbGT7jFQw3pgaDCT7U0drs3RTPLSIN8TQikdqkdl7pLm2rkMqORmKlO_I_dp",
"mutation": {
"insertAutoId": [
{
"key": {
"path": [
{
"kind": "Attendance"
}
]
},
"properties": {
"section": {
"indexed": true,
"stringValue": "Venturers"
},
"date": {
"dateTimeValue": "2015-01-16T00:00:00+00:00",
"indexed": true
},
"attendee": {
"indexed": true,
"keyValue": {
"path": [
{
"id": "5659313586569216",
"kind": "Attendee"
}
]
}
},
"presence": {
"indexed": false,
"integerValue": 0
}
}
}
]
}
}
And I am trying to query like this:
{
"gqlQuery": {
"allowLiteral": true,
"queryString": "SELECT * FROM Attendance WHERE section = #section ORDER BY date ASC",
"nameArgs": [
{
"name": "section",
"value": {
"stringValue": "Venturers"
}
}
]
}
}
And I get this error:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "FAILED_PRECONDITION",
"message": "no matching index found.",
"locationType": "header",
"location": "If-Match"
}
],
"code": 412,
"message": "no matching index found."
}
}
For future reference:
You can't make a composite index directly through the REST API. You must go through php app engine.
How to build datastore indexes (PHP GAE)