Elastic Search : Configuring icu_tokenizer for czech characters

Elastic Search : Configuring icu_tokenizer for czech characters - unicode

The icu_tokenizer in elasticsearch seems to break a word into segments when it encounters accented characters such as Č and also returns strange numeric tokes. Example
GET /_analyze?text=OBČERSTVENÍ&tokenizer=icu_tokenizer
returns
"tokens": [
{
"token": "OB",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "268",
"start_offset": 4,
"end_offset": 7,
"type": "<NUM>",
"position": 2
},
{
"token": "ERSTVEN",
"start_offset": 8,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 3
}
]
}
I don't know czech, but quick google suggests OBČERSTVENÍ is a single word. Is there way to configure elastic search to work properly for czech words?
I have tried using icu_noramlizer as below, but it didn't help
PUT /my_index_cz
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"char_filter": ["icu_normalizer"],
"tokenizer": "icu_tokenizer"
}
}
}
}
}
GET /my_index_cz/_analyze?text=OBČERSTVENÍ&analyzer=my_analyzer

The issue was I using elasticsearch sense plugin to query this and it was not encoding the data properly. It worked fine when I wrote a test using python client library.

Related

Highlight characters within words in Opensearch query

I have set up a custom analyser that uses an edge_ngram filter for a text field. I'm then trying to highlight the characters a user types but opensearch is highlighting the entire word, even if only a small number of characters have been typed.
E.g. Typing "Man" in the search bar will result in the word "Manly" being highlighted. <em>Manly</em> Trail Running Tour. What I really want is <em>Man</em>ly Trail Running Tour.
This should be possible with the fvh highlighting type and chars as the boundary_scanner argument per the docs https://opensearch.org/docs/2.1/opensearch/search/highlight/#highlighting-options
Settings
"title_autocomplete": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "autocomplete"
}
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete": {
"filter": [
"lowercase",
"edge_ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
Query:
{
"track_total_hits": true,
"highlight": {
"type": "fvh",
"boundary_scanner": "chars",
"fields": {
"title_autocomplete": {}
}
},
"size": 6,
"query": {
"multi_match": {
"query": "Man",
"fields": [
"title_autocomplete^2"
]
}
}
}

Loopback 3 query by Property of a embedded model

I'm using loopback 3 to build a backend with mongoDB.
So i have 2 models: Object and Attachment. Object have a relation Embeds2Many to Attachment.
Objects look like that in mongoDB
[
{
"fieldA": "valueA1",
"attachments": [
{
"id": 1,
"url": "abc.com/image1"
},
{
"id": 2,
"url": "abc.com/image2"
}
]
},
{
"fieldA": "valueA2",
"attachments": [
{
"id": 4,
"url": "abc.com/image4"
},
{
"id": 5,
"url": "abc.com/image5"
}
]
}
]
The question is: how can i get Objects with attachments.id=4 over the RestAPI?
I have try with the where and include filter. But it didn't work. It look like, that this function is not implemented in loopback3, right?

I have found the solution. It only works on Mongodb, Cloudant and Memory database.
{
"filter": {
"where": {
"attachments.id": 4
}
}
}

Why it needs to convert `{"$numberInt": "1"}` to `1` before restoring data with `mongorestore`?

I have some dumped BSON and JSON files from a MongoDB server running on Google Cloud Platform(GCP) and I want to restore the data into a new local server with version 4.0.3. However, I got errors showing that the indices cannot be restored. I had to convert {"$numberInt": "1"} in the JSON files to 1 to make the restoring process success. Why I need to take effort to fix the format of the dumped files. Is it due to the different versions between the source server and the target server or due to some things I did not do correctly?
I have googled and searched stack overflow, but I did not see any one discussed this problem. And the release note of MongoDB does not mention any changes related to this problem.
Here is the JSON example cannot accept by mongorestore with version 4.0.3
{
"options": {},
"indexes": [
{
"v": {
"$numberInt": "2"
},
"key": {
"_id": {
"$numberInt": "1"
}
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": {
"$numberInt": "2"
},
"key": {
"itemId": {
"$numberDouble": "1.0"
}
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And this is the accepted one which is converted on my own.
{
"options": {},
"indexes": [
{
"v": 2,
"key": {
"_id": 1
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": 2,
"key": {
"itemId": 1.0
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And here is the script I use to do the conversion.
Questions:
Why mongorestore does not accept the dumped file created by mongodump?
Is there any method for avoiding from modifying the dumped files manually?

You need to use mongorestore version 4.2+ that supports Extended JSON v2.0 (Canonical mode or Relaxed) format. See reference here.

Can not create new layer (featuretype) in GeoServer using REST API

So I just used 2 working days trying to figure this out. We are automatic rendering process for maps. All the data is given in SQL base and my job is to write "wrapper" so we can implement this in our in-house framework. I managed all but one needed requests.
That request is POST featuretype since this is a way of creating a layer that can later be rendered.
I have all requests saved in postman for pre-testing on example data given by geoserver itself. I can't even get response with status code 201 and always get 500 internal server error. This status is described as possible syntax error in sytax. But I actually just copied and pasted exampled and used geoserver provided data.
This is the requst: http://127.0.0.1:8080/geoserver/rest/workspaces/tiger/datastores/nyc/featuretypes
and its body:
{
"name": "poi",
"nativeName": "poi",
"namespace": {
"name": "tiger",
"href": "http://localhost:8080/geoserver/rest/namespaces/tiger.json"
},
"title": "Manhattan (NY) points of interest",
"abstract": "Points of interest in New York, New York (on Manhattan). One of the attributes contains the name of a file with a picture of the point of interest.",
"keywords": {
"string": [
"poi",
"Manhattan",
"DS_poi",
"points_of_interest",
"sampleKeyword\\#language=ab\\;",
"area of effect\\#language=bg\\;\\#vocabulary=technical\\;",
"Привет\\#language=ru\\;\\#vocabulary=friendly\\;"
]
},
"metadataLinks": {
"metadataLink": [
{
"type": "text/plain",
"metadataType": "FGDC",
"content": "www.google.com"
}
]
},
"dataLinks": {
"org.geoserver.catalog.impl.DataLinkInfoImpl": [
{
"type": "text/plain",
"content": "http://www.google.com"
}
]
},
"nativeCRS": "GEOGCS[\"WGS 84\", \n DATUM[\"World Geodetic System 1984\", \n SPHEROID[\"WGS 84\", 6378137.0, 298.257223563, AUTHORITY[\"EPSG\",\"7030\"]], \n AUTHORITY[\"EPSG\",\"6326\"]], \n PRIMEM[\"Greenwich\", 0.0, AUTHORITY[\"EPSG\",\"8901\"]], \n UNIT[\"degree\", 0.017453292519943295], \n AXIS[\"Geodetic longitude\", EAST], \n AXIS[\"Geodetic latitude\", NORTH], \n AUTHORITY[\"EPSG\",\"4326\"]]",
"srs": "EPSG:4326",
"nativeBoundingBox": {
"minx": -74.0118315772888,
"maxx": -74.00153046439813,
"miny": 40.70754683896324,
"maxy": 40.719885123828675,
"crs": "EPSG:4326"
},
"latLonBoundingBox": {
"minx": -74.0118315772888,
"maxx": -74.00857344353275,
"miny": 40.70754683896324,
"maxy": 40.711945649065406,
"crs": "EPSG:4326"
},
"projectionPolicy": "REPROJECT_TO_DECLARED",
"enabled": true,
"metadata": {
"entry": [
{
"#key": "kml.regionateStrategy",
"$": "external-sorting"
},
{
"#key": "kml.regionateFeatureLimit",
"$": "15"
},
{
"#key": "cacheAgeMax",
"$": "3000"
},
{
"#key": "cachingEnabled",
"$": "true"
},
{
"#key": "kml.regionateAttribute",
"$": "NAME"
},
{
"#key": "indexingEnabled",
"$": "false"
},
{
"#key": "dirName",
"$": "DS_poi_poi"
}
]
},
"store": {
"#class": "dataStore",
"name": "tiger:nyc",
"href": "http://localhost:8080/geoserver/rest/workspaces/tiger/datastores/nyc.json"
},
"cqlFilter": "INCLUDE",
"maxFeatures": 100,
"numDecimals": 6,
"responseSRS": {
"string": [
4326
]
},
"overridingServiceSRS": true,
"skipNumberMatched": true,
"circularArcPresent": true,
"linearizationTolerance": 10,
"attributes": {
"attribute": [
{
"name": "the_geom",
"minOccurs": 0,
"maxOccurs": 1,
"nillable": true,
"binding": "com.vividsolutions.jts.geom.Point"
},
{},
{},
{}
]
}
}
So it is example case and I can't get any useful response from the server. I get the code 500 with body name (the first item in json). Similarly I get same code with body FeatureTypeInfo when trying with xml body(first tag).
I already tried the request in new instance of geoserver in Docker (changed the port) and still no success.
I check if datastore, workspace is available and that layer "poi" doesn't yet exists.
Here are also some logs of request (similar for xml body):
2018-08-03 07:35:02,198 ERROR [geoserver.rest] -
com.thoughtworks.xstream.mapper.CannotResolveClassException: name at
com.thoughtworks.xstream.mapper.DefaultMapper.realClass(DefaultMapper.java:79)
at .....
Does anyone know the solution to this and got it working. I am using GeoServer 2.13.1

So i was still looking for the answer and using this post (https://gis.stackexchange.com/questions/12970/create-a-layer-in-geoserver-using-rest) got to the right content to POST featureType and hence creating a layer in GeoServer.
The documentation is off in REST API docs.
Using above link I found out that when using JSON there is a missing insertion in JSON. For API to work here we need to add:
{featureType:
name: "...",
nativeName: "...",
.
.
.}
So that it doesn't start with "name" attribute but it is contained in "featureType".
I didn't try that for XML also but I guess it could be similar.
Hope this helps someone out there struggling like I did.

Blaz is correct here, you need an outer object of FeatureType and then an inner object with your config. So;
{
"featureType": {
"name": "layer",
"nativeName": "poi",
"your config": "stuff"
}
I find though that using a post request I get very little if any response and its not obvious if the layer creation worked. But you can call http://IP:8080/geoserver/rest/layers.json to check if your new layer is there.

It costs me a lot of time to create FeatureTypes using REST API. Use Json like this really works:
{
"featureType": {
"name": "layer",
"nativeName": "poi"
"otherProperties...":"values..."
}
And use Json below to create Workspace:
{
"workspace": {
"name": "test_workspace"
}
}
The REST API is out of date now. That's disappointing. Is there anyone knows how to get the lastest REST API document?

Using unicode characters in Elasticsearch synonyms

I am trying to setup elasticsearch index using synonyms and almost succeeded it. My configuration of index:
{
"index": {
"analysis": {
"analyzer": {
"syns": {
"filter": [
"standard",
"lowercase",
"syns_filter"
],
"type": "custom",
"tokenizer": "standard"
}
},
"filter": {
"syns_filter": {
"type": "synonym",
"synonyms": ["Киев , Kyiv", "jee,java"],
}
}
}
}
}
Only thing I could not solve is that it worked for jee and searches result output same results as for java, but does not work for Kyiv.