I am indexing a data stream to Elasticsearch and I cannot figure out how to normalize incoming data to make it index without error. I have a mapping type "getdatavalues" which is a meta-data query. This meta-data query can return very different looking responses but I'm not seeing the difference. The error I get:
{"index":{"_index":"ens_event-2016.03.11","_type":"getdatavalues","_id":"865800029798177_2016_03_11_03_18_12_100037","status":400,"error":"MapperParsingException[object mapping for [getdatavalues] tried to parse field [output] as object, but got EOF, has a concrete value been provided to it?]"}}
when performing:
curl -XPUT 'http://192.168.99.100:80/es/ens_event-2016.03.11/getdatavalues/865800029798177_2016_03_11_03_18_12_100037' -d '{
"type": "getDataValues",
"input": {
"deviceID": {
"IMEI": "865800029798177",
"serial-number": "64180258"
},
"handle": 644,
"exprCode": "200000010300140000080001005f00a700000000000000",
"noRollHandle": "478669308-578452",
"transactionID": 290
},
"timestamp": "2016-03-11T03:18:12.000Z",
"handle": 644,
"output": {
"noRollPubSessHandle": "478669308-578740",
"publishSessHandle": 1195,
"status": true,
"matchFilter": {
"prefix": "publicExpr.operatorDefined.commercialIdentifier.FoodSvcs.Restaurant.\"A&C Kabul Curry\".\"Rooster Street\"",
"argValues": {
"event": "InternationalEvent",
"hasEvent": "anyEvent"
}
},
"transactionID": 290,
"validFor": 50
}
}'
Here's what Elasticsearch has for the mapping:
"getdatavalues" : {
"dynamic_templates" : [ {
"strings" : {
"mapping" : {
"index" : "not_analyzed",
"type" : "string"
},
"match_mapping_type" : "string"
}
} ],
"properties" : {
"handle" : {
"type" : "long"
},
"input" : {
"properties" : {
"deviceID" : {
"properties" : {
"IMEI" : {
"type" : "string",
"index" : "not_analyzed"
},
"serial-number" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"exprCode" : {
"type" : "string",
"index" : "not_analyzed"
},
"handle" : {
"type" : "long"
},
"noRollHandle" : {
"type" : "string",
"index" : "not_analyzed"
},
"serviceVersion" : {
"type" : "string",
"index" : "not_analyzed"
},
"transactionID" : {
"type" : "long"
}
}
},
"output" : {
"properties" : {
"matchFilter" : {
"properties" : {
"argValues" : {
"properties" : {
"Interests" : {
"type" : "object"
},
"MerchantId" : {
"type" : "string",
"index" : "not_analyzed"
},
"Queue" : {
"type" : "string",
"index" : "not_analyzed"
},
"Vibe" : {
"type" : "string",
"index" : "not_analyzed"
},
"event" : {
"properties" : {
"event" : {
"type" : "string",
"index" : "not_analyzed"
},
"hasEvent" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"hasEvent" : {
"type" : "string",
"index" : "not_analyzed"
},
"interests" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"prefix" : {
"type" : "string",
"index" : "not_analyzed"
},
"transactionID" : {
"type" : "long"
},
"validFor" : {
"type" : "long"
}
}
},
"noRollPubSessHandle" : {
"type" : "string",
"index" : "not_analyzed"
},
"publishSessHandle" : {
"type" : "long"
},
"status" : {
"type" : "boolean"
},
"transactionID" : {
"type" : "long"
},
"validFor" : {
"type" : "long"
}
}
},
"timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"type" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
Looks like the argValues object doesn't quite agree with your mapping:
"argValues": {
"event": "InternationalEvent",
"hasEvent": "anyEvent"
}
Either this:
"argValues": {
"event": {
"event": "InternationalEvent"
},
"hasEvent": "anyEvent"
}
Or this:
"argValues": {
"event": {
"event": "InternationalEvent"
"hasEvent": "anyEvent"
},
}
Would both seem to be valid.
Related
I want to pull an object from com_address where "add_id":111:
{
"_id" : ObjectId("5f7b6b4e327e2111883909f3"),
"FirstName" : "abc",
"LastName" : "abc",
"DateOfBirth" : "05/09/2020",
"gender" : "M",
"address" : {
"country" : "string",
"state" : "string",
"city" : [
{
"id" : 15,
"type" : "string",
"com_address" : [
{
"add_id" : 113,
"street" : "string",
"house_no" : "string",
"landmark" : "string"
},
{
"add_id" : 114,
"street" : "string",
"house_no" : "string",
"landmark" : "string"
}
]
},
{
"id" : 16,
"type" : "string",
"com_address" : [
{
"add_id" : 110,
"street" : "string",
"house_no" : "string",
"landmark" : "string"
},
{
"add_id" : 111,
"street" : "string",
"house_no" : "string",
"landmark" : "string"
}
]
}
]
}
}
This is my query:
db.getCollection('student').update({"_id" : ObjectId("5f7b6b4e327e2111883909f3")},{$pull:{"address.city":{"com_address.add_id":111}}})
By doing this object inside city with id 16 is getting deleted instead of pulling an object from com_address.
How to pull an object from com_address with add_id?
Try this, which uses the $[] operator
db.getCollection('student').update({
"_id" : ObjectId("5f7b6b4e327e2111883909f3")
},
{
$pull: { "address.city.$[].com_address": { "add_id": 111 }}
})
I have setup druid and was able to run the tutorial at:Tutorial: Loading a file. I was also able to execute native json queries and get the results as described at : http://druid.io/docs/latest/tutorials/tutorial-query.html The druid setup is working fine.
I now want to ingest additional data from a Java program into this datasource. Is it possible to send data into druid using tranquility from a java program for a datasource created using batch load?
I tried the example program at : https://github.com/druid-io/tranquility/blob/master/core/src/test/java/com/metamx/tranquility/example/JavaExample.java
But this program just keeps running and doesn't show any output. how can druid be setup to accept data using tranquility core APIs?
Following are the ingestion specs and config file for tranquility:
wikipedia-index.json
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
]
},
"timestampSpec": {
"column": "time",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
}
example.json (tranquility config):
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"metricsSpec" : [
{ "type" : "count", "name" : "count" }
],
"granularitySpec" : {
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"type" : "uniform"
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : { "column": "time", "format": "iso" },
"dimensionsSpec" : {
"dimensions" : ["channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }]
}
}
}
},
"tuningConfig" : {
"type" : "realtime",
"windowPeriod" : "PT10M",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost"
}
}
I did not find any example on setting up a datasource on druid which accepts continuously accepts data from a java program. I don't want to use Kafka. Any pointers on this would be greatly appreciated.
You need to create the data files with the additional data first and than run the ingestion task with new fields, You can't edit the same record in druid, It overwrites to new record.
I am a newbie in druid. Trying to load a very simple data in JSON format to druid. The data contains just one dimension, one metric and timestamp. I have been successfully able to load data to druid for a different dataset but somehow I am getting errors for this dataset.
This is my index file :
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "datatemplate",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"Loc"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "Timestamp"
}
}
},
"metricsSpec" : [{"name" : "Qty","type" : "doubleSum","fieldName" : "Qty"}],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2016-01-01T00:00:00Z/2030-06-30T00:00:00Z"],
"rollup" : true
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "datatemplate/",
"filter" : "datatemplate.json"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 10000000,
"maxRowsInMemory" : 40000,
"forceExtendableShardSpecs" : true
}
}
}
Also here is my dataset in JSON format:
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "A", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "2", "Timestamp": "2017-12-01T00:00:00Z"}
{"Loc": "B", "Qty": "1", "Timestamp": "2017-12-01T00:00:00Z"}
I need to call an API to load data on an ongoing basis. API returns different properties for each event. When I create the swagger file it has properties from sample return, but in the long run there will be more properties which can be added by the source system and they will not be in swagger file.
Is there any way to recreate swagger file before data load dynamically with additional properties?
Swagger file is generated by Informatica Cloud based on a sample return while testing the connection.
Properties list has a different number of entries based on event type.
swagger file:
{"swagger" : "2.0",
"info" : {
"description" : null,
"version" : "1.0.0",
"title" : null,
"termsOfService" : null,
"contact" : null,
"license" : null
},
"host" : "<host>.com",
"basePath" : "/api",
"schemes" : [ "https" ],
"paths" : {
"/2.0" : {
"post" : {
"tags" : [ "events" ],
"summary" : null,
"description" : null,
"operationId" : "events",
"produces" : [ "application/json" ],
"consumes" : [ "application/json" ],
"parameters" : [ {
"name" : "script",
"in" : "query",
"description" : null,
"required" : false,
"type" : "string"
}, {
"name" : "Authorization",
"in" : "header",
"description" : null,
"required" : false,
"type" : "string"
} ],
"responses" : {
"200" : {
"description" : "successful operation",
"schema" : {
"$ref" : "#/definitions/events"
}
}
}
}
}
},
"definitions" : {
"events##properties" : {
"properties" : {
"$app_build_number" : {
"type" : "string"
},
"$app_version_string" : {
"type" : "string"
},
"$carrier" : {
"type" : "string"
},
"$lib_version" : {
"type" : "string"
},
"$manufacturer" : {
"type" : "string"
},
"$model" : {
"type" : "string"
},
"$os" : {
"type" : "string"
},
"$os_version" : {
"type" : "string"
},
"$radio" : {
"type" : "string"
},
"$region" : {
"type" : "string"
},
"$screen_height" : {
"type" : "number",
"format" : "int32"
},
"$screen_width" : {
"type" : "number",
"format" : "int32"
},
"Home Step Enabled" : {
"type" : "string"
},
"Number Of Lifetime Logins" : {
"type" : "number",
"format" : "int32"
},
"Sessions" : {
"type" : "number",
"format" : "int32"
},
"mp_country_code" : {
"type" : "string"
},
"mp_lib" : {
"type" : "string"
}
}
},
"events" : {
"properties" : {
"name" : {
"type" : "string"
},
"distinct_id" : {
"type" : "string"
},
"labels" : {
"type" : "string"
},
"time" : {
"type" : "number",
"format" : "int64"
},
"sampling_factor" : {
"type" : "number",
"format" : "int32"
},
"dataset" : {
"type" : "string"
},
"properties" : {
"$ref" : "#/definitions/events##properties"
}
}
}
}
}
sample return:
"name": "Session",
"distinct_id": "1234567890",
"labels": [],
"time": 1520072505000,
"sampling_factor": 1,
"dataset": "$event_data_set",
"properties": {
"$app_build_number": "900",
"$app_version_string": "1.9",
"$carrier": "AT&T",
"$lib_version": "2.0.1",
"$manufacturer": "Apple",
"$model": "iPhone10,6",
"$os": "iOS",
"$os_version": "11.2.6",
"$radio": "LTE",
"$region": "Florida",
"$screen_height": 667,
"$screen_width": 375,
"Number Of Lifetime Logins": 2,
"Session Length": "00h:00m:08s",
"Sessions": 43,
"mp_country_code": "US",
"mp_lib": "swift"
}
}
To try this error I have tried with Elasticsearch 2.x and 5.x but doesn't work in any of these.
I have lots of logs saved in my Elasticsearch instance. They have a field called timestamp whose format is "YYYY-MM-dd HH-mm-ss.SSS" (for example, "2017-11-02 00:00:00.000"). When I try to send a query via POSTMAN which is this:
{
"query": {
"range": {
"timestamp": {
"gte": "2017-10-21 00:00:00.000",
"lte": "2017-10-27 00:00:00.000"
}
}
}
}
I receive nothing and I there are more than 500 logs in that range. What am I doing wrong?
EDIT:
My index (loganalyzer):
{
"loganalyzer" : {
"aliases" : { },
"mappings" : {
"logs" : {
"properties" : {
"entireLog" : {
"type" : "string"
},
"formattedMessage" : {
"type" : "string"
},
"id" : {
"type" : "string"
},
"level" : {
"type" : "string"
},
"loggerName" : {
"type" : "string"
},
"testNo" : {
"type" : "string"
},
"threadName" : {
"type" : "string"
},
"timestamp" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"refresh_interval" : "1s",
"number_of_shards" : "5",
"creation_date" : "1507415366223",
"store" : {
"type" : "fs"
},
"number_of_replicas" : "1",
"uuid" : "9w3QQQc0S0K0NcKtOERtTw",
"version" : {
"created" : "2040699"
}
}
},
"warmers" : { }
}
}
What I receive sending the request:
{
"took": 429,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
And status 200 (OK).
Your edit with the mappings indicates the problem. The reason you aren't getting any result is because it's attempting to find a "range" for the string you're providing against the values of the field in your index, which are also treated as a string.
"timestamp" : {
"type" : "string"
}
Here's the elastic documentation on that mapping type
You need to apply a date mapping to that field before indexing, or reindex to a new index that has that mapping applied prior to ingestion.
Here is what the mapping request could look like, conforming to your timestamp format:
PUT loganalyzer
{
"mappings": {
"logs": {
"properties": {
"timestamp": {
"type": "date",
"format": "YYYY-MM-dd HH-mm-ss.SSS"
}
}
}
}
}