Does Mongo-connector supports adding fields before inserting to Elasticsearch? - mongodb

I have many docements in mongoDB. Mongo-connector inserts those data to elasticsearch. Is there a way, before inserting in to ES where we can add extra field to the document and then insert into elasticsearch? Is there any way in mongo-connector to do the above?
UPDATE
based on your UPDATE 3 i created mappings some thing like this is it correct?
PUT my_index2
{
"mappings":{
"my_type2": {
"transform": {
"script": {
"inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
"lang": "groovy"
}
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
ERROR
This what the error i keep getting when i tried to insert your mapping
{
"error": {
"root_cause": [
{
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
"caused_by": {
"type": "script_parse_exception",
"reason": "Value must be of type String: [script]"
}
},
"status": 400
}
UPDATE 2
Now the mapping is getting inserted and getting the acknowledge as true. But when try to insert the json data like below its throwing error.
PUT my_index2/my_type2/1
{
"geopoint": {
"lon": 48.845877,
"lat": 8.821861,
"alt": 0.0
}
}
ERROR FOR UPDATE2
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
}
}
},
"status": 400
}
ERROR 1 FOR UPDATE 2
After adding script.inline:true, tried to insert the data but getting following error.
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "parse_exception",
"reason": "field must be either [lat], [lon] or [geohash]"
}
},
"status": 400
}

mongo-connector aims at synchronizing a Mongo database with another target system, such as ES, Solr or another Mongo DB. Synchronizing means 1:1 replication, so there's no way that I know of for mongo-connector to enrich documents during the replication (and it's not its intent either).
However, in ES 5 we'll soon be able to use ingest nodes in which we'll be able to define processing pipelines whose goal is to enrich documents before they get indexed.
UPDATE
There's probably a way by modifying the formatters.py file.
In transform_value I would add a case to handle Geopoint:
if isinstance(value, dict):
return self.format_document(value)
elif isinstance(value, list):
return [self.transform_value(v) for v in value]
# handle Geopoint class
elif isinstance(value, Geopoint):
return self.format.document({'lat': value['lat'], 'lon': value['lon']})
...
UPDATE 2
Let's try another approach by modifying the transform_element function (on line 104):
def transform_element(self, key, value):
try:
# add these next two lines
if key == 'GeoPoint':
value = {'lat': value['lat'], 'lon': value['lon']}
# do not modify the initial code below
new_value = self.transform_value(value)
yield key, new_value
except ValueError as e:
LOG.warn("Invalid value for key: %s as %s"
% (key, str(e)))
UPDATE 3
Another thing you might try is to add a transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor
You can define your mapping like this:
PUT my_index2
{
"mappings": {
"my_type2": {
"transform": {
"script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
},
"properties": {
"geopoint": {
"type": "geo_point"
}
}
}
}
}
Note: make sure enable dynamic scripting, by adding script.inline: true to elasticsearch.yml and restart your ES node.
What is going to happen is that the alt field will still be visible in the stored _source but it will not be indexed, and hence, no error should occur.
With ES 5, you'd simply create a pipeline with a remove processor, like this:
PUT _ingest/pipeline/geo-pipeline
{
"description" : "remove unsupported altitude field",
"processors" : [
{
"remove" : {
"field": "geopoint.alt"
}
}
]
}

Related

Validation fails but error messages missing

I'm attempting to validate a JSON file against a specific schema using this code:
string data = File.ReadAllText("../../../testFiles/create.json");
string schemaText = File.ReadAllText("../../../schemas/request-payload.schema.json");
var serializer = new JsonSerializer();
var json = JsonValue.Parse(data);
var schema = serializer.Deserialize<JsonSchema>(JsonValue.Parse(schemaText));
var result = schema.Validate(json);
Assert.IsTrue(result.IsValid);
The assertions fails because result.IsValid is false (which is correct - there is an intentional error in my JSON) but there is no indication where the error is happening:
My schema does have sub-schemas in the definition section. Could that have anything to do with it? Do I need to set some property to see that error information?
Update: Added schema and test JSON
My original schema was several hundred lines long, but I pared it down to a subset which still has the problem. Here is the schema:
{
"$schema": "https://json-schema.org/draft/2019-09/schema#",
"$id": "request-payload.schema.json",
"type": "object",
"propertyNames": { "enum": ["template" ] },
"required": ["template" ],
"properties": {
"isPrivate": { "type": "boolean" },
"template": {
"type": "string",
"enum": [ "TemplateA", "TemplateB" ]}},
"oneOf": [
{
"if": {
"properties": { "template": { "const": "TemplateB" }}},
"then": { "required": [ "isPrivate" ] }}]
}
And here is a test JSON object:
{
"template": "TemplateA"
}
The above JSON validates fine. Switch the value to TemplateB and the JSON fails validation (because isPrivate is missing and it is required for TemplateB), but the result doesn't contain any information about why it failed.
The code I'm using to run the validation test is listed above
The issue is likely that you haven't set the output format. The default format is flag which means that you'll only get a true/false of whether the value passed.
To get more details, you'll need to use a different format setting. You can do this via the schema options.
For example:
JsonSchemaOptions.OutputFormat = SchemaValidationOutputFormat.Detailed;
The available options are here.

GraphQL query result for object that does not exist

I have a GraphQL query that calls a REST service to get the return object. The query contains an Id parameter that is then passed to the service. However, the REST service can respond with http status 404 Not Found if an object with that Id does not exist. That seems like the right response.
How do you model a Not Found response in GraphQL?
Is there a way to inform the GQL caller that something does not exist?
Update
Some options I am considering:
Return null
Change the GrqlhQL Query to return a list of objects and return empty list of nothing is found
Return some kind of error object with an error code
but it is unclear if there is a recommended practice in GQL API design.
You might treat it as an error and handle it accordingly.
I recommend you to check the GraphQL spec, the paragraph about error handling.
I hope it contains exactly what you are looking for.
Basically, you should return whatever you could, and inform a client about potential problems in the "errors" field.
The example from the documentation:
Request:
{
hero(episode: $episode) {
name
heroFriends: friends {
id
name
}
}
}
Response:
{
"errors": [
{
"message": "Name for character with ID 1002 could not be fetched.",
"locations": [ { "line": 6, "column": 7 } ],
"path": [ "hero", "heroFriends", 1, "name" ]
}
],
"data": {
"hero": {
"name": "R2-D2",
"heroFriends": [
{
"id": "1000",
"name": "Luke Skywalker"
},
{
"id": "1002",
"name": null
},
{
"id": "1003",
"name": "Leia Organa"
}
]
}
}
}

SharePoint OData filter is not valid

Using this SharePoint OData query:
https://{{siteUrl}}/_api/web/lists('{{ListGuid}}')/items?$top=10&$select=Title,CartNum
I get the follow in my results:
{
"d": {
"results": [
{
"__metadata": {
"id": "32165487-6548-6548-6548-32165498765432",
"uri": "<edited>/_api/Web/Lists(guid'12345678-1234-1234-1234-123456789abc')/Items(1)",
"etag": "\"183\"",
"type": "SP.Data.JobsItem"
},
"Title": "SomeCart",
"CartNum": "11047975"
}
}
}
}
But if I add a $filter option:
https://{{siteUrl}}/_api/web/lists('{{ListGuid}}')/items?$top=10&$select=Title,CartNum&$filter=CartNum endswith '11047975'
I get this:
{
"error": {
"code": "-1, Microsoft.SharePoint.Client.InvalidClientQueryException",
"message": {
"lang": "en-US",
"value": "The expression \"CartNum endswith '11047975'\" is not valid."
}
}
}
If I change it to:
https://{{siteUrl}}/_api/web/lists('{{ListGuid}}')/items?$top=10&$select=Title,CartNum&$filter=CartNum eq '11047975'
I get this:
{
"error": {
"code": "-2147024860, Microsoft.SharePoint.SPQueryThrottledException",
"message": {
"lang": "en-US",
"value": "The attempted operation is prohibited because it exceeds the list view threshold enforced by the administrator."
}
}
}
What am I doing wrong?
I'm using this MS doc and this OData doc as a reference for the $filter, which clearly state:
Operator Description Example
Logical Operators
Eq Equal /Suppliers?$filter=Address/City eq 'Redmond'
No endswith based on official documentation.
When your list data over 5000 items, you can't filter item based on a none-indexed column as throttle limit in SharePoint, you need add the column as indexed so you could filter based on it.
https://collab365.community/working-with-large-lists/

How to create google datastore composite indices via REST API?

I am trying to change the order of my results but I keep getting an error saying You need an index to execute this query.
In my console, I doesn't say that any indices exist, but I set most of the indexed options to true.
I know in Java, I can create indices that relate to multiple properties either ascending or descending, how do I do this with the REST API?
Following the REST API docs for Google Datastore, my entities are created like this:
{
"mode": "TRANSACTIONAL",
"transaction": "Eb2wksWfYDjkGkkABRmGMQ_vKGijwNwm-tbxAbUPRt8N2RaUCynjSbGT7jFQw3pgaDCT7U0drs3RTPLSIN8TQikdqkdl7pLm2rkMqORmKlO_I_dp",
"mutation": {
"insertAutoId": [
{
"key": {
"path": [
{
"kind": "Attendance"
}
]
},
"properties": {
"section": {
"indexed": true,
"stringValue": "Venturers"
},
"date": {
"dateTimeValue": "2015-01-16T00:00:00+00:00",
"indexed": true
},
"attendee": {
"indexed": true,
"keyValue": {
"path": [
{
"id": "5659313586569216",
"kind": "Attendee"
}
]
}
},
"presence": {
"indexed": false,
"integerValue": 0
}
}
}
]
}
}
And I am trying to query like this:
{
"gqlQuery": {
"allowLiteral": true,
"queryString": "SELECT * FROM Attendance WHERE section = #section ORDER BY date ASC",
"nameArgs": [
{
"name": "section",
"value": {
"stringValue": "Venturers"
}
}
]
}
}
And I get this error:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "FAILED_PRECONDITION",
"message": "no matching index found.",
"locationType": "header",
"location": "If-Match"
}
],
"code": 412,
"message": "no matching index found."
}
}
For future reference:
You can't make a composite index directly through the REST API. You must go through php app engine.
How to build datastore indexes (PHP GAE)

GoodData "Create Report Definition" API Call giving 500 Internal Server Error

I'm trying to create a report definition using the GoodData REST API. I use the following endpoint to invoke the rest call.
"/gdc/md/{project-id}/obj"
When i try to invoke the API call with the following dataset in which the projectId and the userId are valid, it gives me the error with the response code 500.
{
"reportDefinition": {
"content": {
"filters": [],
"format": "grid",
"grid": {
"rows": [],
"columns": [
"metricGroup"
],
"sort": {
"columns": [],
"rows": []
},
"columnWidths": [],
"metrics": [
{
"uri": "/gdc/md/qy48iv4flikdlcwpwioizuip74wt8nb5/obj/63f3cecd2a8d3ce2ec9378381c8f39e3",
"alias": ""
}
]
}
},
"meta": {
"title": "Sample report definition",
"summary": "This is a sample report",
"tags": "",
"deprecated": 0,
"category": "samplecategory"
}
}
}
{
"error": {
"message": "Internal server error. Please fill in bug report with request_id='lp78FL5S1IPMqB2n'"
}
}
I'm certain that the user project_id and the user_id are valid. Is this an error in the API?
Thank you in advance.
Apart from the metrics URI that looks weird (hash instead of numeric ID), I was able to dig in our logs an error that says: "Category is not equal to tag structure".
In your example you have its value set to "samplecategory". "category" property defines what type of object are you creating. If you are creating a report definition it should have value of "reportDefinition".
Last time I worked with GoodData API, metrics had numeric IDs. That seems most likely to be the culprit. Where did you get "/gdc/md/qy48iv4flikdlcwpwioizuip74wt8nb5/obj/63f3cecd2a8d3ce2ec9378381c8f39e3" from, especially the "63f3cecd2a8d3ce2ec9378381c8f39e3" part?