Druid count differ when we run same query on daliy and row data - druid

When run query to ABS Data Source in Druid.I got some count but that differ when same query run with ABS_DAILY data source. And we make ABS_DAILY from ABS.
{
"queryType" : "groupBy",
"dataSource" : "ABS",
"granularity" : "all",
"intervals" : [ "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
"descending" : "false",
"aggregations" : [ {
"type" : "count",
"name" : "COUNT",
"fieldName" : "COUNT"
} ],
"postAggregations" : [ ],
"dimensions" : [ "event_id" ]
}
Below json used for submit Daily job for druid which will create segments for ABS_DALIY for specific time
{
"spec": {
"ioConfig": {
"firehose": {
"dataSource": "ABS",
"interval": "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z",
"metrics": null,
"dimensions": null,
"type": "ingestSegment"
},
"type": "index"
},
"dataSchema": {
"granularitySpec": {
"queryGranularity": "day",
"intervals": [
"2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z"
],
"segmentGranularity": "day",
"type": "uniform"
},
"dataSource": "ABS_DAILY",
"metricsSpec": [],
"parser": {
"parseSpec": {
"timestampSpec": {
"column": "server_timestamp",
"format": "dd MMMM, yyyy (HH:mm:ss)"
},
"dimensionsSpec": {
"dimensionExclusions": [
"server_timestamp"
],
"dimensions": []
},
"format": "json"
},
"type": "string"
}
}
},
"type": "index"
}
I quired to ABS_DAILY with below it return different result than ABS Count. Which it should not.
{
"queryType" : "groupBy",
"dataSource" : "ERS_DAILY",
"granularity" : "all",
"intervals" : [ "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
"descending" : "false",
"aggregations" : [ {
"type" : "count",
"name" : "COUNT",
"fieldName" : "COUNT"
} ],
"postAggregations" : [ ],
"dimensions" : [ "event_id" ]
}

You are counting rows of daily aggregates.
To summarize pre-aggregated counts you now need to sum the count column (see type)
{
"queryType" : "groupBy",
"dataSource" : "ERS_DAILY",
"granularity" : "all",
"intervals" : [ "2018-07-12T00:00:00.000Z/2018-07-13T00:00:00.000Z" ],
"descending" : "false",
"aggregations" : [ {
"type" : "longSum",
"name" : "COUNT",
"fieldName" : "COUNT"
} ],
"postAggregations" : [ ],
"dimensions" : [ "event_id" ]
}

Related

how to send data to druid using Tranquility core API?

I have setup druid and was able to run the tutorial at:Tutorial: Loading a file. I was also able to execute native json queries and get the results as described at : http://druid.io/docs/latest/tutorials/tutorial-query.html The druid setup is working fine.
I now want to ingest additional data from a Java program into this datasource. Is it possible to send data into druid using tranquility from a java program for a datasource created using batch load?
I tried the example program at : https://github.com/druid-io/tranquility/blob/master/core/src/test/java/com/metamx/tranquility/example/JavaExample.java
But this program just keeps running and doesn't show any output. how can druid be setup to accept data using tranquility core APIs?
Following are the ingestion specs and config file for tranquility:
wikipedia-index.json
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
]
},
"timestampSpec": {
"column": "time",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
}
example.json (tranquility config):
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"metricsSpec" : [
{ "type" : "count", "name" : "count" }
],
"granularitySpec" : {
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"type" : "uniform"
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"timestampSpec" : { "column": "time", "format": "iso" },
"dimensionsSpec" : {
"dimensions" : ["channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }]
}
}
}
},
"tuningConfig" : {
"type" : "realtime",
"windowPeriod" : "PT10M",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
],
"properties" : {
"zookeeper.connect" : "localhost"
}
}
I did not find any example on setting up a datasource on druid which accepts continuously accepts data from a java program. I don't want to use Kafka. Any pointers on this would be greatly appreciated.
You need to create the data files with the additional data first and than run the ingestion task with new fields, You can't edit the same record in druid, It overwrites to new record.

Retrieve multiple queried elements in an object array in MongoDB collection

I want to find all zone data with "AHU":"C". First, I query without projection and get these.
> db.buildings.find({"zone.AHU": "C"}).pretty()
{
"_id" : ObjectId("5aba4460a042dc4a2fdf26cd"),
"name" : "Test Street",
"coordinate" : [
12,
31
],
"yearlyEnergyCost" : 1444,
"zone" : [
{
"name" : "AHU-C-Z2",
"_id" : ObjectId("5aba4460a042dc4a2fdf26ce"),
"AHU" : "C",
"precooling" : [ ],
"subZone" : [ ]
},
{
"name" : "AHU-D-Z1",
"_id" : ObjectId("5abc7528100730697163a3ab"),
"AHU" : "D",
"precooling" : [ ],
"subZone" : [ ]
},
{
"name" : "AHU-C-Z1",
"AHU" : "C",
"_id" : ObjectId("5ac09c898249affa03506eff"),
"precooling" : [ ],
"subZone" : [ ]
},
{
"name" : "AHU-C-Z3",
"AHU" : "C",
"_id" : ObjectId("5ac09c898249affa03506efe"),
"precooling" : [ ],
"subZone" : [ ]
}
],
"__v" : 2
}
However, when I use $elemMatch, it only returns the first zone element with "AHU":"C"
> db.buildings.find({"zone.AHU": "C"}, {_id: 0, zone: {$elemMatch: {AHU: "C"}}}).pretty()
{
"zone" : [
{
"name" : "AHU-C-Z2",
"_id" : ObjectId("5aba4460a042dc4a2fdf26ce"),
"AHU" : "C",
"precooling" : [ ],
"subZone" : [ ]
}
]
}
From the doc, I realised that $elemMatch (projection) only retrieve the first one, but how can I retrieve all corresponded (AHU-C-Z1, AHU-C-Z2, AHU-C-Z3)? Thanks.
This is the collection:
{
"_id":{
"$oid":"5aa65bc96996e045104116e7"
},
"name":"Talker Street",
"coordinate":[
11.82,
-9.26
],
"yearlyEnergyCost":100,
"zone":[
{
"name":"AHU-B-Z1",
"_id":{
"$oid":"5aa65bc96996e045104116e8"
},
"precooling":[
{
"_id":{
"$oid":"5aa73a7d2f991a657fd52c7e"
},
"resultPrecool":{
"$oid":"5aa73a7d2f991a657fd52b5d"
},
"dateRun":{
"$date":"2018-03-14T00:00:00.000+0000"
},
"lastUpdated":{
"$date":"2018-03-13T02:41:02.086+0000"
}
}
]
},
{
"name":"AHU-B-Z2",
"_id":{
"$oid":"5aa9f1f8131e6412c17d71d3"
},
"precooling":[
]
},
{
"name":"AHU-B-Z3",
"_id":{
"$oid":"5aa9f1f8131e6412c17d71d2"
},
"precooling":[
]
}
],
"__v":19
}{
"_id":{
"$oid":"5aba4460a042dc4a2fdf26cd"
},
"name":"Test Street",
"coordinate":[
12,
31
],
"yearlyEnergyCost":1444,
"zone":[
{
"name":"AHU-C-Z2",
"_id":{
"$oid":"5aba4460a042dc4a2fdf26ce"
},
"AHU":"C",
"precooling":[
],
"subZone":[
]
},
{
"name":"AHU-D-Z1",
"_id":{
"$oid":"5abc7528100730697163a3ab"
},
"AHU":"D",
"precooling":[
],
"subZone":[
]
},
{
"name":"AHU-C-Z1",
"AHU":"C",
"_id":{
"$oid":"5ac09c898249affa03506eff"
},
"precooling":[
],
"subZone":[
]
},
{
"name":"AHU-C-Z3",
"AHU":"C",
"_id":{
"$oid":"5ac09c898249affa03506efe"
},
"precooling":[
],
"subZone":[
]
}
],
"__v":2
}{
"_id":{
"$oid":"5aba46c41c8d5e4b52462aea"
},
"name":"123123",
"coordinate":[
12,
31
],
"yearlyEnergyCost":12321,
"zone":[
{
"name":"123423",
"_id":{
"$oid":"5aba46c41c8d5e4b52462aeb"
},
"precooling":[
],
"subZone":[
]
}
],
"__v":0
}
You can use $redact operator:
db.buildings.aggregate([
{$match:{"zone.AHU":{$exists:true}}},
{$redact:{
$cond:{
if:{$or:[{$eq:["$AHU","C"]},{$not: "$AHU"}]},
then:"$$DESCEND",
else:"$$PRUNE"
}
}}
])
Remember {$not: "$AHU"} is important to be included so that top element will not be excluded. If not added, top element will be skipped and hence skipping the entire embedded document.
Output:
{
"_id" : ObjectId("5aba4460a042dc4a2fdf26cd"),
"name" : "Test Street",
"coordinate" : [
12,
31
],
"yearlyEnergyCost" : 1444,
"zone" : [
{
"name" : "AHU-C-Z2",
"_id" : ObjectId("5aba4460a042dc4a2fdf26ce"),
"AHU" : "C",
"precooling" : [],
"subZone" : []
},
{
"name" : "AHU-C-Z1",
"AHU" : "C",
"_id" : ObjectId("5ac09c898249affa03506eff"),
"precooling" : [],
"subZone" : []
},
{
"name" : "AHU-C-Z3",
"AHU" : "C",
"_id" : ObjectId("5ac09c898249affa03506efe"),
"precooling" : [],
"subZone" : []
}
],
"__v" : 2
}

How to get data from mongoDB like a HashMap(Key value Pair)

Here are the records in my mongodb
{
"_id": "5a65a047992e3c2572f74102",
"_class": "com.vuelogix.location.model.LocationModel",
"type": "Feature",
"properties": {
"address": "Purna to Loha Rd, Maharashtra 431511, India",
"device_id": 23613,
"last_updated": "2018-01-22T08:26:47.237Z"
},
"geometry": {
"_class": "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates": [77.065659, 19.145168],
"type": "Point"
}
},
{
"_id": "5a65ae1e992e3c2572f74114",
"_class": "com.vuelogix.location.model.LocationModel",
"type": "Feature",
"properties": {
"address": "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id": 23658,
"last_updated": "2018-01-22T09:25:50.893Z"
},
"geometry": {
"_class": "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates": [74.956284, 28.497661],
"type": "Point"
}
}
I want to get it as a key value pair:
key should be "properties.device_id" and value entire record.
Like this
[23613] => {
"_id": "5a65a047992e3c2572f74102",
"_class": "com.vuelogix.location.model.LocationModel",
"type": "Feature",
"properties": {
"address": "Purna to Loha Rd, Maharashtra 431511, India",
"device_id": 23613,
"last_updated": "2018-01-22T08:26:47.237Z"
},
"geometry": {
"_class": "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates": [77.065659, 19.145168],
"type": "Point"
}
}
[23658] => {
"_id": "5a65ae1e992e3c2572f74114",
"_class": "com.vuelogix.location.model.LocationModel",
"type": "Feature",
"properties": {
"address": "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id": 23658,
"last_updated": "2018-01-22T09:25:50.893Z"
},
"geometry": {
"_class": "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates": [74.956284, 28.497661],
"type": "Point"
}
}
Is there any way to get a result like this without iterating through records?
Use the $addFields pipeline stage to create a new field say root that is an array of a document that contains two fields, k and v where:
The k field contains the field name.
The v field contains the value of the field.
In your case k should be the device_id field. Since this is a double type, you need a hack to convert it to a string for later. So your initial pipeline looks as follows:
db.collection.aggregate([
{
"$addFields": {
"root": [
{
"k": { "$substr": [ "$properties.device_id", 0, -1 ] },
"v": "$$ROOT"
}
]
}
}
])
which will return the following documents
/* 1 */
{
"_id" : "5a65a047992e3c2572f74102",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Purna to Loha Rd, Maharashtra 431511, India",
"device_id" : 23613.0,
"last_updated" : "2018-01-22T08:26:47.237Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
77.065659,
19.145168
],
"type" : "Point"
},
"root" : [
{
"k" : "23613",
"v" : {
"_id" : "5a65a047992e3c2572f74102",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Purna to Loha Rd, Maharashtra 431511, India",
"device_id" : 23613.0,
"last_updated" : "2018-01-22T08:26:47.237Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
77.065659,
19.145168
],
"type" : "Point"
}
}
}
]
}
/* 2 */
{
"_id" : "5a65ae1e992e3c2572f74114",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id" : 23658.0,
"last_updated" : "2018-01-22T09:25:50.893Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
74.956284,
28.497661
],
"type" : "Point"
},
"root" : [
{
"k" : "23658",
"v" : {
"_id" : "5a65ae1e992e3c2572f74114",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id" : 23658.0,
"last_updated" : "2018-01-22T09:25:50.893Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
74.956284,
28.497661
],
"type" : "Point"
}
}
}
]
}
From here you would want to leverage the $arrayToObject operator so that you convert the newly added root to an object with device_id as the key:
db.collection.aggregate([
{
"$addFields": {
"root": [
{
"k": { "$substr": [ "$properties.device_id", 0, -1 ] },
"v": "$$ROOT"
}
]
}
},
{
"$addFields": {
"root": {
"$arrayToObject": "$root"
}
}
}
])
which outputs:
/* 1 */
{
"_id" : "5a65a047992e3c2572f74102",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Purna to Loha Rd, Maharashtra 431511, India",
"device_id" : 23613.0,
"last_updated" : "2018-01-22T08:26:47.237Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
77.065659,
19.145168
],
"type" : "Point"
},
"root" : {
"23613" : {
"_id" : "5a65a047992e3c2572f74102",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Purna to Loha Rd, Maharashtra 431511, India",
"device_id" : 23613.0,
"last_updated" : "2018-01-22T08:26:47.237Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
77.065659,
19.145168
],
"type" : "Point"
}
}
}
}
/* 2 */
{
"_id" : "5a65ae1e992e3c2572f74114",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id" : 23658.0,
"last_updated" : "2018-01-22T09:25:50.893Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
74.956284,
28.497661
],
"type" : "Point"
},
"root" : {
"23658" : {
"_id" : "5a65ae1e992e3c2572f74114",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id" : 23658.0,
"last_updated" : "2018-01-22T09:25:50.893Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
74.956284,
28.497661
],
"type" : "Point"
}
}
}
}
The last step in the pipeline would be to use $replaceRoot pipeline operator to get your desired output:
db.collection.aggregate([
{
"$addFields": {
"root": [
{
"k": { "$substr": [ "$properties.device_id", 0, -1 ] },
"v": "$$ROOT"
}
]
}
},
{
"$addFields": {
"root": {
"$arrayToObject": "$root"
}
}
},
{ "$replaceRoot" : { "newRoot": "$root" } }
])
Output
/* 1 */
{
"23613" : {
"_id" : "5a65a047992e3c2572f74102",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Purna to Loha Rd, Maharashtra 431511, India",
"device_id" : 23613.0,
"last_updated" : "2018-01-22T08:26:47.237Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
77.065659,
19.145168
],
"type" : "Point"
}
}
}
/* 2 */
{
"23658" : {
"_id" : "5a65ae1e992e3c2572f74114",
"_class" : "com.vuelogix.location.model.LocationModel",
"type" : "Feature",
"properties" : {
"address" : "Taranagar - Churu Rd, Chalkoi Baneerotan, Rajasthan 331001, India",
"device_id" : 23658.0,
"last_updated" : "2018-01-22T09:25:50.893Z"
},
"geometry" : {
"_class" : "com.vuelogix.location.model.geojson.geometry.Point",
"coordinates" : [
74.956284,
28.497661
],
"type" : "Point"
}
}
}

Adding index for the mongodb collection for given query in $and

I am having collection with structure below.In this there are some duplicate contents header name and values.But I need to fetch the exact document.
{
"_id": ObjectId("573ebc7bbf50112d55c0b763"),
"topic": "AAA",
"contents": [{
"headerName": "Start Year",
"value": 1995
}, {
"headerName": "Program",
"value": "AAA"
}]
}, {
"_id": ObjectId("573ebc7bbf50112d55c0b763"),
"topic": "BBB",
"contents": [{
"headerName": "Start Year",
"value": 1989
}, {
"headerName": "Program",
"value": "BBB"
}, {
"headerName": "Likes",
"value": 51
}]
}, {
"_id": ObjectId("573ebc7bbf50112d55c0b763"),
"topic": "BBB",
"contents": [{
"headerName": "Start Year",
"value": 1989
}, {
"headerName": "Program",
"value": "BBB"
}]
}
I need to fetch the single document using the below query. How can i add index for this.
db.collections.find({
"$and": [{
"topic": "BBB"
}, {
"contents": [{
"headerName": "Start Year",
"value": 1989
}, {
"headerName": "Program",
"value": "BBB"
}]
}, {
"contents": {
"$size": 2
}
}]
})
Create a compound index on the fields
db.collection.createIndex( { topic:1, contents: 1 } )
Note that order matters in MongoDB compound indexes, as with any database. If you make an index with "topic" first, Mongo can jump straight to the section of the index with signed topics, then do a bound-scan from contents
After that change your query to this:
db.collection.find({
"topic" : "BBB",
"contents.headerName": { "$in": [ "Start Year", "Program" ] },
"contents.value": { "$in": [ 1989, "BBB" ] },
"contents": { "$size": 2 }
})
Output:
{
"_id" : ObjectId("573ee720b986a3b71e1e517b"),
"topic" : "BBB",
"contents" : [
{
"headerName" : "Start Year",
"value" : 1989
},
{
"headerName" : "Program",
"value" : "BBB"
}
]
}
To see how the index perfoms, run explain() on the query:
db.collection.find({
"topic" : "BBB",
"contents.headerName": { "$in": [ "Start Year", "Program" ] },
"contents.value": { "$in": [ 1989, "BBB" ] },
"contents": { "$size": 2 }
}).explain()
Output:
/* 1 */
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"contents" : {
"$size" : 2
}
},
{
"topic" : {
"$eq" : "BBB"
}
},
{
"contents.headerName" : {
"$in" : [
"Program",
"Start Year"
]
}
},
{
"contents.value" : {
"$in" : [
1989,
"BBB"
]
}
}
]
},
"winningPlan" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"contents" : {
"$size" : 2
}
},
{
"contents.headerName" : {
"$in" : [
"Program",
"Start Year"
]
}
},
{
"contents.value" : {
"$in" : [
1989,
"BBB"
]
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"topic" : 1,
"contents" : 1
},
"indexName" : "topic_1_contents_1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"topic" : [
"[\"BBB\", \"BBB\"]"
],
"contents" : [
"[MinKey, MaxKey]"
]
}
}
}
},
"rejectedPlans" : []
}
}

Unexpected results from Elasticsearch

I have some documents stored in ES (by logstash). and the results, when querying ES, do not look right:
The first query (see the queries and the results below) is supposed(meant) to return only documents that do not contain region field.
Even further, based on the result of the first query , obviously there is a document that contains field region, however, the results for second query which should (at least) return a document with region=IN, contains no documents.
Is something wrong with my queries?
How can I investigate where the problem is? (The ES logs do not have anything related to these queries)
Here is the query:
curl -X GET 'http://localhost:9200/logstash*/_search?pretty' -d '{
"query" : {
"match_all" : {}
},
filter : {
"and" : [
{ "term" : { "type" : "xsys" } },
{ "missing" : { "field" : "region" } }
]
}, size: 2
}'
And the result:
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 90,
"successful" : 90,
"failed" : 0
},
"hits" : {
"total" : 5747,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.09.28",
"_type" : "logs",
"_id" : "UMrz9bwKQgCq__TwBT0WmQ",
"_score" : 1.0,
"_source" : {
.....
"type":"xsys",
....
"region":"IN",
}
}, { ....
} ]
}
}
Furthermore, the result for the following query:
curl -X GET 'http://localhost:9200/logstash*/_search?pretty' -d '{
"query" : { "match_all" : {} },
filter : { "term" : { "region" : "IN" } },
size: 1
}'
is:
{
"took" : 55,
"timed_out" : false,
"_shards" : {
"total" : 90,
"successful" : 90,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
The following mapping is used:
curl -XPUT http://localhost:9200/_template/logstash_per_index -d '
{
"template": "logstash*",
"settings": {
"index.query.default_field": "message",
"index.cache.field.type": "soft",
"index.store.compress.stored": true
},
"mappings": {
"_default_": {
"_all": { "enabled": false },
"properties": {
"message": { "type": "string", "index": "analyzed" },
"#version": { "type": "string", "index": "not_analyzed" },
"#timestamp": { "type": "date", "index": "not_analyzed" },
"type": { "type": "string", "index": "not_analyzed" },
....
"region": { "type": "string", "index": "not_analyzed" },
...
}
}
}
}'
Mapping (what ES has returned - curl -XGET 'http://localhost:9200/logstash-2013.09.28/_mapping):
{
"logstash-2013.09.28":{
"logs":{
"_all":{
"enabled":false
},
"properties":{
"#timestamp":{
"type":"date",
"format":"dateOptionalTime"
},
"#version":{
"type":"string",
"index":"not_analyzed",
"omit_norms":true,
"index_options":"docs"
},
"message":{
"type":"string"
},
"region":{
"type":"string"
},
"type":{
"type":"string",
"index":"not_analyzed",
"omit_norms":true,
"index_options":"docs"
}
}
},
"_default_":{
"_all":{
"enabled":false
},
"properties":{
"#timestamp":{
"type":"date",
"format":"dateOptionalTime"
},
"#version":{
"type":"string",
"index":"not_analyzed",
"omit_norms":true,
"index_options":"docs"
},
"message":{
"type":"string"
},
"type":{
"type":"string",
"index":"not_analyzed",
"omit_norms":true,
"index_options":"docs"
}
}
}
}
}