How put index with mapping template to elastic search with elastic4s? - scala

I want create index with dynamic template and turn analyzing off for string fields. I have created query for elastic search, but how to translate it into elastic4s statments? (version elastic4s 1.3.x is preffered)
The statement is:
PUT /myIndex
{
"mappings": {
"myType": {
"dynamic_templates": [
{
"templateName": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index" : "not_analyzed",
"omit_norms" : true
}
}
}
]
}}}
P.S.
May be it is possible to create this index by executing this "raw" request, but I did not find how to do that with elastic4s 1.3.4 :(

Elastic4s (as of 1.5.4) supports dynamic templates when creating indexes. So you can do something like:
val req = create.index("my_index").mappings(
"my_type" templates (
template name "es" matching "*_es" matchMappingType "string" mapping {
field withType StringType analyzer SpanishLanguageAnalyzer
},
template name "en" matching "*" matchMappingType "string" mapping {
field withType StringType analyzer EnglishLanguageAnalyzer
}
)
)
So the equivalent of the example you posted would be:
create.index("my_index").mappings(
"my_type" templates (
template name "templateName" matching "*" matchMappingType "string" mapping {
field typed StringType index NotAnalyzed omitNorms true
}
)

Sometimes it's easier to manage your mapping in raw JSON. You can put the raw JSON on a file to make it possible to be updated without need to rebuild your application. If you want to use this raw JSON to create the index you can do something like this:
client.execute {
create index "myIndex" source rawMapping
}
where rawMapping is the string with your raw JSON content.

Related

How can I parse nested GenericRecord containing Union, Map, Array etc.?

I am having the genericRecord as well as the schema. I want to parse all the fields and append some text only to those values whose type is String.
Assume that schema is like this:
{
"type":"record",
"name":"MyRecord",
"fields":[
{
"name":"a",
"type":[
"null",
"int",
{
"type":"map",
"values":"string"
}
]
}
]
}
I don't want to write nested if else statements that I have already done and recursively called them. Is there any other efficient way ?

How do I use ADF copy activity with multiple rows in source?

I have source which is JSON array, sink is SQL server. When I use column mapping and see the code I can see mapping is done to first element of array so each run produces single record despite the fact that source has multiple records. How do I use copy activity to import ALL the rows?
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"['#odata.context']": "BuyerFinancing",
"['#odata.nextLink']": "PropertyCondition",
"value[0].AssociationFee": "AssociationFee",
"value[0].AssociationFeeFrequency": "AssociationFeeFrequency",
"value[0].AssociationName": "AssociationName",
Use * as the source field to indicate all elements in json format. For example, with json:
{
"results": [
{"field1": "valuea", "field2": "valueb"},
{"field1": "valuex", "field2": "valuey"}
]
}
and a database table with a column result to store the json. The mapping with results as the collection and * and the sub element will create two records with:
{"field1": "valuea", "field2": "valueb"}
{"field1": "valuex", "field2": "valuey"}
in the result field.
Copy Data Field Mapping
ADF support cross apply for json array. Please check the example in this doc. https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#jsonformat-example
For schema mapping: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#schema-mapping

index Elasticsearch document with existing "id" field

I have documents that I want to index into Elasticsearch with an existing unique "id" field.
I get an array of documents from a REST api endpoint ( eg.: http://some.url/api/products) in no particular order and if a document with the _id already exists in Elasticsearch it should update and reindex the document.
I want to create a new document if no document with the _id in Elasticsearch exists and then update a document, if it matches with an existing document in Elasticsearch.
This could be done with:
PUT products/product/un1qu3-1d-b718-105973677e95
{
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
The basic idea is to use the provided "id" field to create or update a document. Extraction of _id from document fields seems deprecated (link). But the indexing/ reindexing of documents with the "id" field can be done manually very easy with the kibana dev tools, with postman or a cURL request.
I want to achieve this (re-)indexing of documents that I receive over this api endpoint programmatically.
Is it possible to achieve this with logstash or a simple cronjob? Does Elasticsearch provide any functionality for this? Or do I need to write some custom backend to achieve this?
I thought of either:
1) index the document into Elasticsearch with the "id" field of my document or
2) find an Elasticsearch query that first searches for the document with the specific "id" field and then updates the document.
I was unable to find a solution for either way and have no clue how a good approach would look like.
Can anyone point me into the right direction on how to achieve this, suggest a better approach or provide a solution?
Any help much appreciated!
Update
I solved the problem with the help of the accepted answer. I used Logstash, the Http_poller input plugin, this article: https://www.elastic.co/blog/new-way-to-ingest-part-1 and this elastic.co question: https://discuss.elastic.co/t/upsert-with-logstash/59116
My output of logstash looks like this at the moment:
output {
elasticsearch {
index => "products"
document_type => "product"
pipeline => "rename_id"
document_id => "%{id}"
doc_as_upsert => true
action => "update"
}
Update 2
just for the sake of completeness I added the "rename_id" pipeline
{
"rename_id": {
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
}
It works this way!
Thanks alot!
Peter,
If I understand correctly, you want to ingest your documents into elastic search and will have some updates in future for these documents ?
If that's the case,
- Use your documents primary key as id for elastic documents.
- You can ingest entire document with updated values, elastic will replace the previous document with new one. given the primary key is same. Old document with same id will be deleted.
We use this approach for our search data.
you can use ingest pipelines to extract the id from the body and the _create endpoint to only create a document if it does not exist. Minor note: If you could specify the id on the client side indexing would be faster, as adding a pipeline adds a certain overhead.
PUT _ingest/pipeline/my_pipeline
{
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
GET twitter/tweet/123
# this call will fail
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
You can use script to UPSERT (update or insert) your document
PUT /products/product/un1qu3-1d-b718-105973677e95/_update
{
"script": {
"inline": "ctx._source.state = \"packaged\"",
"lang": "painless"
},
"upsert": {
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
}
Above query find the document with _id = "un1qu3-1d-b718-105973677e95"
if it is able to find any document then it will update state to "packaged" otherwise create a new document with field "id" and "state" (you can insert as many fields as you want).

does it possible contains multi format document in one lift-mongo-recorder model?

I attempt build a model able to insert different field(only one field is different).
also, the field should be custom format and I will be a embed object.
document form Assembla use case class define embed field, so I can't use base class(trait/abstract class) to define under my recorder model.
hope insert different format, such as
{
_id: ...,
same_field: "s1",
specify_field: {
format1: "..."
}
}
{
_id: ...,
same_field: "s1",
specify_field: [
formatX: "..",
formatY: ".."
]
}
How to construct my model?
thanks

MongoDB - Document with different type of value

I'm very new to MongoDB, i tell you sorry for this question but i have a problem to understand how to create a document that can contain a value with different "type:
My document can contain data like this:
// Example ONE
{
"customer" : "aCustomer",
"type": "TYPE_ONE",
"value": "Value here"
}
// Example TWO
{
"customer": "aCustomer",
"type": "TYPE_TWO",
"value": {
"parameter1": "value for parameter one",
"parameter2": "value for parameter two"
}
}
// Example THREE
{
"customer": "aCustomer",
"type": "TYPE_THREE",
"value": {
"anotherParameter": "another value",
{
"someParameter": "value for some parameter",
...
}
}
}
Customer field will be even present, the type can be different (TYPE_ONE, TYPE_TWO and so on), based on the TYPE the value can be a string, an object, an array etc.
Looking this example, i should create three kind of collections (one for type) or the same collection (for example, a collection named "measurements") can contain differend kind of value on the field "value" ?
Trying some insert in my DB instance i dont get any error (i'm able to insert object, string and array on property value), but i would like to know if is the correct way...
I come from RDBMS, i'm a bit confused right now.. thanks a lot for your support.
You can find the answer here https://docs.mongodb.com/drivers/use-cases/product-catalog
MongoDB's dynamic schema means that each need not conform to the same schema.